2
94
Deterministic Fully-Static Whole-Binary Translation Without Heuristics (arxiv.org)
1
Dynamic Persistent Tile Scheduling w/ Cluster Launch Control (CLC) on Blackwell (colfax-intl.com)
1
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? (github.com/uw-syfi)
2
CCL-Bench 1.0: A Trace-Based Benchmark for LLM Infrastructure (arxiv.org)
1
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPUs (arxiv.org)
2
PyTorch DevLog (pytorch.org)
2
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU (arxiv.org)
1
Aurora: A Leverage-Aware Optimizer for Rectangular Matrices (tilderesearch.com)
1
The Two Abstractions of System Design: Hide or Reduce (muratbuffalo.blogspot.com)
1
Practical Formal Verification for MLIR Programs (arxiv.org)
1
Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs (arxiv.org)
2
Capsules: Compile-time lock discipline in OxCaml (kcsrk.info)
1
Data Race Freedom in OxCaml (kcsrk.info)
2
cuda-oxide: a custom rustc backend for compiling GPU kernels in pure Rust (github.com/nvlabs)
3
A case study with Aeneas and jxl-rs (protzenko.fr)
1
Finite Functional Programming (arxiv.org)
3
CommFuse: Hiding Tail Latency via Communication Decomposition and Fusion (arxiv.org)
1
SPEC CPU: The Next Generation (arxiv.org)
1
Persistent Iterators with Value Semantics (arxiv.org)
3
Continual Learning Bench 1.0 (continual-learning-bench.com)
3
The Valley of Calm (joemag.dev)
3
The Static Dynamic JVM – A Many Layered Dive [video] (youtube.com)
1
Learning Randomized Reductions (arxiv.org)
2
Metastability in Recovery: Cascading Recovery with a Loop (charap.co)
3
How the JVM Optimizes Generic Code – A Deep Dive (inside.java)
1
Tessera: Unlocking Heterogeneous GPUs Through Kernel-Granularity Disaggregation (arxiv.org)
2
MathDuels: Evaluating LLMs as Problem Posers and Solvers (arxiv.org)
1
Kernel Contracts: A Spec. Language for Correctness Across Heterogeneous Silicon (arxiv.org)
2
Revealing NVIDIA Driver Command Streams for CPU-GPU Runtime Behavior Insight (arxiv.org)
2
Guardians: Static verification for AI agent workflows (github.com/metareflection)
2
Fast GPU Linear Algebra via Compile Time Expression Fusion (arxiv.org)
3
The AI Compute Extensions (ACE) for x86 [pdf] (x86ecosystem.org)
1
Finding and Understanding Bugs in FPGA Place-and-Route Engines [video] (youtube.com)
1
AutoSP: Long-Context LLM Training via Compiler-Based Sequence Parallelism (pytorch.org)
2
Partial UDF Inlining (doi.org)
10
From Convergence to Confidence: Push-Button Verification for RDTs (kcsrk.info)
19
Low-Compilation-Cost Register Allocation in LLVM-Based Binary Translation (acm.org)
1
AdaExplore: Search for Efficient Kernel Generation (stiglidu.github.io)
1
vLLM-Compile: Bringing Compiler Optimizations to LLM Inference (docs.google.com)
2
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs (arxiv.org)
2
Compiler Testing – Part 1: Coverage-Guided Fuzzing with Grammars and LLMs (nowarp.io)
1
Disaggregated Serving for Hybrid SSM Models in vLLM (vllm-website-lx4pji0mz-inferact-inc.ver...
1
Great Paper: The Calculated Typer – Iowa Type Theory Commute Podcast S7 E6 (pocketcasts.com)
3
Barbara Liskov, Turing Award'08: Data Abstraction, Dijkstra, Distributed Systems (developing.dev)
4
Guess-Verify-Refine: Data-Aware Top-K for Sparse-Attention Decoding on Blackwell (arxiv.org)
1
Reimagining Kernel Generation at the PTX Layer (standardkernel.com)
1
A Deductive System for (Hardware-Software) Contract Satisfaction Proofs (arxiv.org)
1
Tile Kernels: An optimized GPU kernels library written in TileLang (github.com/deepseek-ai)
2
AMD's Zen: Coming Back from the Dead (clamtech.org)
1
Learning to Repair Lean Proofs from Compiler Feedback (arxiv.org)
1
RLix: A scheduling layer for concurrent LLM RL (github.com/rlops)
2
Primus Projection: Estimate Memory and Performance Before You Train (amd.com)
1
PRowhammer: Propagating Bit-flips from CPU to GPU [pdf] (iitb.ac.in)
2
The Quantization Robustness of Diffusion Language Models in Coding Benchmarks (arxiv.org)
2
Different Perspectives of Memory System Simulation (arxiv.org)
2
Adding Compilation Metadata to Binaries to Make Disassembly Decidable (arxiv.org)
1
ICLR 2026 Outstanding Papers (iclr.cc)
1
Decoupled DiLoCo for Resilient Distributed Pre-Training (arxiv.org)
2
spmd_types: A type system for distributed (SPMD) tensor computations in PyTorch (github.com/meta-pytorch)
1
How Do LLM Agents Think Through SQL Join Orders? (ucbskyadrs.github.io)
1
Gluon&Linear Layouts Deep-Dive:Tile-Based GPU Programming with Low-Level Control [video] (youtube.com)
1
SonicMoE: A HW-Efficient and SW-Extensible Blueprint for Fine-Grained MoEs (dao-lab.ai)
1
SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving (arxiv.org)
1
DIRT: Database-Integrated Random Testing (arxiv.org)
1
Scaling Test-Time Compute for Agentic Coding (arxiv.org)
1
An Algorithmic Reconstruction of Normalisation by Evaluation (yangzhixuan.github.io)
2
Faster LLM Inference via Sequential Monte Carlo (arxiv.org)
2
Pure Borrow: Linear Haskell Meets Rust-Style Borrowing (arxiv.org)
1
SSA without Dominance for Higher-Order Programs (arxiv.org)
3
Spotting Specification Gaps with Small Proof-Oriented Tests (risemsr.github.io)
1
Theseus, a Static Windows Emulator (neugierig.org)
1
Advent of Computing: Episode 179 – Programming Block by Block (libsyn.com)
2
Agentic Context Engineering:Evolving Contexts for Self-Improving Language Models (arxiv.org)
4
Prefill-as-a-Service:KVCache of Next-Generation Models Could Go Cross-Datacenter (arxiv.org)
1
Fundamentals of CuTe Layout Algebra and Category-Theoretic Interpretation [video] (youtube.com)
2
OSS code review, in the era of LLMs (ezyang.com)
1
Proteus: Heterogeneous FPGA Virtualization [pdf] (tum.de)
1
Trevex: A Black-Box Detection Framework for Data-Flow Transient Execution Vulns (roots.ec)
1
From SIMT to Systolic Part 2: A Kernel Author's Field Report (twitter.com/mainzonx)
1
Machine Generated and Checked Proofs for a Verified Compiler (Experience Report) (arxiv.org)
2
Machine-Generated Code Deserves Machine-Checked Proofs (zoep.github.io)
2
What Happens to Software When Proof Is Cheap? Allen School Distinguished Lecture [video] (youtube.com)
1
TileTensor Part 1 – Safer, More Efficient GPU Kernels (modular.com)
1
EuroLLVM 2026 Round Table Summary: MLIR Canonicalization (llvm.org)
1
nanomem: An Simple, Inference-Time Memory Module (openanonymity.ai)
1
Building an Unverified Compiler with Agents (basis.ai)
1
WybeCoder: Verified Imperative Code Generation (facebookresearch.github.io)
2
Parcae: Doing More with Fewer Parameters Using Stable Looped Models (sandyresearch.github.io)
1
Characterizing the Impact of Congestion in Modern HPC Interconnects (arxiv.org)
1
Tessera: Unlocking Heterogeneous GPUs Through Kernel-Granularity Disaggregation (arxiv.org)
2
From SIMT to Systolic: A Foundation for GPU and TPU Architecture (twitter.com/mainzonx)
1
Packrat Parsing at the Speed of Wasm [video] (youtube.com)
1
Sparser, Faster, Lighter Transformer Language Models (arxiv.org)
1
When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Telemetry (arxiv.org)
2
Stupid RCU Tricks: Corner-Case RCU Implementations (kernel.org)
1
How Many Compilers Is Too Many? V8's History, Tradeoffs, and Architecture [video] (youtube.com)
1
Fully-Automatic Type Inference for Borrows with Lifetimes (radbox.org)
14
The GNU libc atanh is correctly rounded (hal.science)
1