3
1
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution (benchevolver.github.io)
2
JITDomain: Instruction-level JIT code isolation (sciencedirect.com)
2
Serving Transformers: Lessons from the Trenches – Stanford CS25 Transformers [video] (youtube.com)
1
Constrained Adaptive Rejection Sampling (arxiv.org)
2
Training an Agentic Router for Optimal Cost-Performance on SWE Tasks (appliedcompute.com)
1
Diagramming Program Values by Spatial Refinement (brownplt.org)
1
Agent Arena: Causal Evaluation of Agents in the Real World (arena.ai)
1
Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures (arxiv.org)
1
Recent improvements to the type checker – Swift Compiler (swift.org)
1
Semantic Reification: A New Paradigm for Random Program Generation (sigplan.org)
1
GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization (arxiv.org)
1
Type-Error Ablation and AI Coding Agents (arxiv.org)
1
Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon (vllm.ai)
1
Directionality in Low Precision (constantinides.net)
1
O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration (arxiv.org)
2
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Agents (nvidia.com)
51
Strace-ui, Bonsai_term, and the TUI renaissance (janestreet.com)
2
Why Larger Models Learn More: Capacity, Interference, Rare-Task Retention (arxiv.org)
2
PyTorch's playbook for AI coding, as of May 2026 (pytorch.org)
2
When does fragmentation occur in the CUDA caching allocator? (pytorch.org)
2
Evaluating Apple Silicon for Data Processing [pdf] (tum.de)
2
PassNet: Scaling Large Language Models for Graph Compiler Pass Generation (arxiv.org)
3
Do GPUs Need New Tabular File Formats? (arxiv.org)
1
Polyhedral Compilation in MLIR (sajidzubair.substack.com)
1
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (arxiv.org)
2
Three Trends from MLSys 2026 (modular.com)
3
What are the real problems of continual learning? (infinitefaculty.substack.com)
1
What "Memory Compiler" Actually Means: From Bitcells to GDS Tiling (thecloudlet.github.io)
5
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-Offs, and Perf (arxiv.org)
1
MIT EECS/CSAIL Agentic Coding in Practice Seminar Series (csail.mit.edu)
1
The Load-Balance Problem Behind Hybrid Parallelism (hecate0821.github.io)
1
Delayed Tensor Parallelism for Faster Transformer Inference (kog.ai)
1
Continuous Diffusion Models Can Obey Formal Syntax (arxiv.org)
1
HartBreaker: Deterministic Fuzzing of Multi-Hart RISC-V CPUs (ethz.ch)
4
Tuning LLVM's SLP Vectorizer Cost Model (kaving.me)
1
A Friendly Tour of Substructural, Uniqueness, Ownership, Capabilities and more! (federicobruzzone.github.io)
1
FlashLib: Bringing Flash Magic to Classical Machine Learning Operators (flashml-org.github.io)
1
FML-Bench: A Controlled Study of AI Research Agent Strategies (arxiv.org)
2
Finding deadlocks in CuTe kernels with SPIN (metaworld.me)
2
A Case for Tracing Based DSL Kernel Languages (metaworld.me)
1
You don't need all the LLM benchmarks (smola.org)
1
Elusive order of async GPU kernels: scheduling, abstractions, DSL implications (ianbarber.blog)
1
MileStone: A Multi-Objective Compiler Phase Ordering Framework (arxiv.org)
3
SSV: Sparse Speculative Verification for Efficient LLM Inference (arxiv.org)
2
Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection (arxiv.org)
3
Characterization of machine learning compilers for LLM inference on NVIDIA GPUs (springer.com)
1
Chip design from the bottom up – Reiner Pope [video] (youtube.com)
2
LT2: Linear-Time Looped Transformers (charlesdddd.github.io)
2
Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel (arxiv.org)
1
PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps (arxiv.org)
33
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs (arxiv.org)
1
[RFC] Open Access to Standards Documents – LLVM Project (llvm.org)
3
Curly braces: An evolution of UNIX and C (thalia.dev)
1
NanoTag: Systems Support for Efficient Byte-Granular Overflow Detection on Arm (github.com/ice-rlab)
1
InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents (inferencebench.ai)
1
Tracking Capabilities for Safer Agents (arxiv.org)
2
Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation (arxiv.org)
1
Verifying EDA and compiler optimizations once and for all (samuelcoward.co.uk)
1
StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] (ucr.edu)
1
Graded Modal Types for Memory and Communication Safety (kent.ac.uk)
1
Systems Are Changing: The Architect's Role in the Era of Agentic Co-Design (sigarch.org)
1
Code-Specify-Test-Debug-Prove: Flexibly Integrating Separation Logic [pdf] (cam.ac.uk)
3
Detecting Relaxed Memory Concurrency Bugs in C and C++ Compilers (lukegeeson.com)
2
The downgrading semantics of memory safety (Extended version) (arxiv.org)
1
Direction-Preserving Number Representations (arxiv.org)
1
On the Unreasonable Effectiveness of PBT for Validating Formal Specifications (proofsandintuitions.net)
2
Understanding, Analyzing, and Optimizing Agentic AI: A CPU-Centric Perspective (arxiv.org)
2
Getting Confidence in (Agentic) Code (ucsd-cse-115-215.github.io)
1
Compute Optimal Tokenization: Scaling Laws for Data Compression in LLMs (co-tok.github.io)
1
Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism (wuklab.io)
1
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference (supercomputing-system-ai-lab.github.io)
7
[flagged] KV cache is becoming the memory hierarchy of inference (touchdown-labs.com)
1
Ada-MK: Adaptive MegaKernel Optimization via DAG-Based Search for LLM Inference (arxiv.org)
42
How to Write to SSDs [pdf] (vldb.org)
1
Scalable GPU Acceleration of Scalar Functions in Analytical Databases [pdf] (vldb.org)
1
The agent principal-agent problem (crawshaw.io)
1
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale (frontier-cs.org)
1
Demystifying the Silence of Correctness Bugs in PyTorch Compiler (arxiv.org)
1
Fork, Explore, Commit: OS Primitives for Agentic Exploration (arxiv.org)
1
Systematically Auditing AI Agent Benchmarks with BenchJack (arxiv.org)
8
mimalloc: A new, high-performance, scalable memory allocator for the modern era (microsoft.com)
1
Let AI Agents Write Your Serving Stack with VibeServe (washington.edu)
2
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-Scale Production (arxiv.org)
2
TorchLean: Verified Neural Networks in Lean (robertj1.com)
98
Deterministic Fully-Static Whole-Binary Translation Without Heuristics (arxiv.org)
1
Dynamic Persistent Tile Scheduling w/ Cluster Launch Control (CLC) on Blackwell (colfax-intl.com)
1
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? (github.com/uw-syfi)
2
CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure (arxiv.org)
1
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPUs (arxiv.org)
2
PyTorch DevLog (pytorch.org)
2
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU (arxiv.org)
1
Aurora: A Leverage-Aware Optimizer for Rectangular Matrices (tilderesearch.com)
1
The Two Abstractions of System Design: Hide or Reduce (muratbuffalo.blogspot.com)
1
Practical Formal Verification for MLIR Programs (arxiv.org)
1
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs (arxiv.org)
2
Capsules: Compile-time lock discipline in OxCaml (kcsrk.info)
1
Data Race Freedom in OxCaml (kcsrk.info)
2
cuda-oxide: a custom rustc backend for compiling GPU kernels in pure Rust (github.com/nvlabs)
3