3
15
Learning from context is harder than we thought (tencent.com)
1
Residual Context Diffusion (yuezhouhu.github.io)
2
Step 3.5 Flash (stepfun.com)
1
LongCat-Flash-Lite 100B A3B Technical Report [pdf] (huggingface.co)
1
Jinja.cpp: A single-header C++11 Jinja2 template engine for LLM chat templates (github.com/wangzhaode)
1
Mini-SGLang: A lightweight yet high-performance inference framework for LLM (github.com/sgl-project)
1
Google drops Gemini 3 Pro image preview (reddit.com)
1
DGX Spark may have only half the performance claimed (reddit.com)
2
You Can Cool Chips with Lasers (ieee.org)
2
Fully Integrated 2D Flash Chip Unveiled (bioengineer.org)
7
Linux 6.18 UDP receive performance improved by 47%, under DDoS (kernel.org)
1
OpenAI 2025 ICPC Submissions (github.com/openai)
2
Unveiling Silicon Art: Dieshots of Microchip Masterpieces (dieshot.com)
1
AetherCode: Evaluating LLMs' Ability to Win in Premier Programming Competitions (arxiv.org)
15
Operation Costs in CPU Clock Cycles (2016) (ithare.com)
1
Principles and Methodologies for Serial Performance Optimization (usenix.org)
1
The Koala Benchmarks for the Shell (kben.sh)
1
SmallThinker: A Family of Efficient LLMs Natively Trained for Local Deployment (arxiv.org)
1
Step3 Technical Report [pdf] (github.com/stepfun-ai)
103
FP8 is ~100 tflops faster when the kernel name has "cutlass" in it (twitter.com/cis_female)
5
Polaris: A Post-training recipe for scaling RL on Advanced Reasoning models (hkunlp.github.io)
14
Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths (royeisen.github.io)
1
Neutrino: Probing-Based eBPF-Like GPU Kernel Profiling (github.com/open-neutrino)
3
Machine Learning Conferences Should Establish "Refutations and Critiques" Track (arxiv.org)
1
SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines (supergpqa.github.io)
4
SepLLM: Accelerate LLMs by Compressing One Segment into One Separator (sepllm.github.io)
10
Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model (arxiv.org)
1
Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset (github.com/unakar)
1
Libnginx: Nginx as a Shared Library (github.com/wkgcass)
1
DeepSeek-VL2: Moe Vision-Language Models for Advanced Multimodal Understanding [pdf] (github.com/deepseek-ai)
1
Fast vectorizable algorithms of binary searching for floating point numbers (github.com/fabiocannizzo)
26
New OpenAI Feature: Predicted Outputs (simonwillison.net)
2
Collaborative Filtering Is Wrong and Here Is Why (springer.com)
1
REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training (sites.google.com)
1
Smoke 'em if you got 'em: Hacker gains root access using cigarette lighter (tomshardware.com)
1
O1 Replication Journey: A Strategic Progress Report (github.com/gair-nlp)
1
Failures of Gradient-Based Deep Learning (2017) [pdf] (mlr.press)
1
Qwen2-VL (huggingface.co)
64
Qwen2-Math (qwenlm.github.io)
56
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention (pytorch.org)
1
MiniCPM-v2.6: GPT-4V Level MLLM for Single/Multi Image and Video on Your Phone (github.com/openbmb)
1
MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT (github.com/internlm)
1
Header-only and powerful C++20 static reflection library in 99 lines (github.com/ubpa)
2
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters (arxiv.org)
1
PowerInfer-2: Fast Large Language Model Inference on a Smartphone (arxiv.org)
2
Large-scale photonic chiplet Taichi empowers 160TOPS/W AI (science.org)
1
Asterinas: OS kernel written in Rust and providing Linux-compatible ABI (github.com/asterinas)
1
Mq-deadline scalability improvements (with more than 100% improvement) (kernel.org)
2
Researchers Create First Functional Semiconductor Made from Graphene (technologynetworks.com)
1
Wayland Enjoyed Many Successes in 2023 (phoronix.com)
15
Improving our safety with a physical quantities and units library (open-std.org)
1
Nesting chinstrap penguins sleep by seconds-long microsleeps (science.org)
2
PowerInfer: High-Speed Large Language Model Serving on Consumer-Grade GPUs (github.com/sjtu-ipads)
1
Zpoline: System Call Hook for Linux (github.com/yasukata)
1
Randomized Single-Source Shortest Path Algo. On Undirected Real-Weighted Graphs (arxiv.org)
1
ChatGPT powered Rust proc macro that generates code at compile-time (github.com/retrage)
3
Memcpy is faster than memset on Intel i7 12700 with glibc 2.36 (gist.github.com)
1
Year-in-search-trends: Visualization of search interest over time (github.com/joweich)
1
Make call_rcu() lazy to save power (kernel.org)
2
IOMMUFD (kernel.org)
4
Chinese Chipmaker Loongson Readies 3A6000 to Tackle Zen 3 and Tiger Lake (wccftech.com)
3
Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays (arxiv.org)
1
Slirp Is Dead, Long Live Slirp: a New Approach to User-Mode Networking
1
eRPC: A fast remote procedure call library for datacenters
79
Poly-time algorithm for deciding Hilbert Nullstellensatz. A proof of P=NP
2
How hard is it to open a file (2016)
3
Kernel TLS 1.3 Rx improvements in Linux 5.20
2
Resin: Holistic Service for Dealing with Memory Leaks in Production Cloud Infra
2