1
1
The need for better compiler frontend benchmarks: Carbon's benchmarking approach (llvm.org)
1
DAXFS: A Lock-Free Shared Filesystem for CXL Disaggregated Memory (arxiv.org)
2
AST Edits: The Code Editing Format Nobody Uses (geometricagi.github.io)
1
KernelEvolve: Meta's Ranking Engineer Agent Optimizes AI Infrastructure (fb.com)
3
Software Engineering Is Becoming Civil Engineering (christophermeiklejohn.com)
1
Adaptive Block-Scaled Data Types (arxiv.org)
1
AC4A: Access Control for Agents (arxiv.org)
1
Rethinking Language Model Scaling Under Transferable Hypersphere Optimization (arxiv.org)
1
Distributed builds of LLVM with CMake, recc, and NativeLink (reidkleckner.dev)
1
A Pattern Generation Language for MLIR Compiler Matching and Rewriting (radbox.org)
2
Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon (dao-lab.ai)
3
Compiler as a Service: C++ Goes Live – Interactive C++, interop, and beyond [video] (youtube.com)
3
Measuring AI Ability to Complete Long Software Tasks (muratbuffalo.blogspot.com)
1
6o6 v1.1: Faster 6502-on-6502 virtualization for a C64/Apple II Apple-1 emulator (oldvcr.blogspot.com)
2
uops.info Update: Emerald Rapids, Meteor Lake, Arrow Lake, and Zen 5 (uops.info)
1
MXFP8 GEMM: Up to 99% of cuBLAS Performance Using CUDA and PTX (danielvegamyhre.github.io)
2
PyTorch Autograd and Mutation (ezyang.com)
2
The Future of Python: Evolution or Succession – Brett Slatkin – PyCascades 2026 [video] (youtube.com)
2
SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks (scbench.ai)
2
AutoRocq: Agentic Theorem Prover for Verification (github.com/nus-program-verification)
1
Wax: Optimizing Data Center Applications with Stale Profile (github.com/ice-rlab)
2
Dijkstra's Shortest-Path Algorithm: A visual exploration, following Sedgewick (joshmpollock.com)
3
Speculative Decoding: Performance or Illusion? (specdecode-bench.github.io)
2
Goedel-Code-Prover: Hierarchical Proof Search for Open SotA Code Verification (goedelcodeprover.github.io)
1
MLSys 2026 Papers (mlsys.org)
2
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU (arxiv.org)
2
Specula: A framework for finding deep bugs in system code using TLA+ (github.com/specula-org)
3
Equality Saturation for Optimizing High-Level Julia IR (acm.org)
1
UniTe: A Universal Tensor Abstraction for Capturing Spatial Relationships (acm.org)
2
Co-Design of B+-Tree Index with Emerging Zone Interfaces for Small KV Pairs (acm.org)
1
CounterPoint: Using Hardware Counters to Refute and Refine µarch Assumptions (arxiv.org)
1
PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost (arxiv.org)
4
SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems (muratbuffalo.blogspot.com)
1
What Is Coordination, Really? (jhellerstein.github.io)
1
Idempotent Slices with Applications to Code-Size Reduction (arxiv.org)
1
Microsoft Rust Training Books: Beginner, advanced, expert level material (github.com/microsoft)
2
LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis (arxiv.org)
1
Challenges and Design Issues in Finding CUDA Bugs via GPU-Native Fuzzing (arxiv.org)
1
SEVI: Silent Data Corruption of Vector Instructions in Hyper-Scale Datacenters (acm.org)
2
CrypTorch: PyTorch-based Auto-tuning Compiler for ML w/ Multi-party Computation (github.com/psu-paws)
2
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels (arxiv.org)
6
Tony Hoare and His Imprint on Computer Science (acm.org)
1
The End of Dijkstra's Algorithm? Breaking the Sorting Barrier for Shortest Paths [video] (youtube.com)
1
AlgoVeri: An Aligned Benchmark for Verified Code Gen. On Classical Algorithms (arxiv.org)
1
Specy: Learning Specifications for Distributed Systems from Event Traces [pdf] (princeton.edu)
1
Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Kernels (pytorch.org)
1
M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling (arxiv.org)
1
Tools of the Trade: C2C Activation Offloading on Grace Blackwell (poolside.ai)
43
EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages (esolang-bench.vercel.app)
1
Speed-Of-Light ExecBench: A benchmark of real-world DL kernel problems (github.com/nvidia)
2
Equality Saturation and Symbolic Regression (egraphs.org)
2
NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL (arxiv.org)
4
Vectorization of Verilog Designs and its Effects on Verification and Synthesis (arxiv.org)
1
LATTE ’26: Workshop on Languages, Tools, and Techniques for Accelerator Design (cornell.edu)
1
Read Less, Steer More (ezyang.com)
1
The Data Structures of Roads (sandboxspirit.com)
1
Verifying Move Borrow Checker in Lean:An Experiment in AI-Assisted PL Metatheory (proofsandintuitions.net)
4
Real or Slop? – Programming Languages Papers Edition (zackg.me)
33
Mamba-3 (together.ai)
1
EvoX: Letting AI Evolve Its Own Evolution Process (skydiscover-ai.github.io)
1
Native DSLs Ops in PyTorch (ianbarber.blog)
19
Flash-KMeans: Fast and Memory-Efficient Exact K-Means (arxiv.org)
2
Gluon: Explicit Performance (lei.chat)
2
Block Number Formats are (Still!) Direction Preservers (constantinides.net)
3
cuTile Rust: a safe, tile-based kernel programming DSL for Rust (github.com/nvlabs)
1
KernelBlaster: A framework for in context learning for code optimization (github.com/nvlabs)
1
Demystifying and Improving Lazy Promotion in Cache Eviction [pdf] (vldb.org)
1
Journeying through Optimization with Heuristics [video] (youtube.com)
3
To Sparsify or to Quantize: A Hardware Architecture View (sigarch.org)
6
Efficient sparse computations using linear algebra aware compilers (2025) (osti.gov)
1
A Field Guide to Reward Hacking in AI Kernel Generation (wafer.ai)
1
AI and the Mixed-Consistency Future (jhellerstein.github.io)
1
FIDES: End-to-end Compartments for Mixed-language Systems [pdf] (kcsrk.info)
1
Practical Type Inference: High‑Throughput Recovery of Real‑World Types (arxiv.org)
1
Idempotent Slices with Applications to Code-Size Reduction (arxiv.org)
1
Designing AI Chip Hardware and Software (docs.google.com)
2
Refinement Modeling and Verification of RISC-V Assembly Using Knuckledragger (philipzucker.com)
2
Breaking Control Flow Integrity by Abusing Modern C++ (Coroutines) – BH USA 2025 [video] (youtube.com)
1
Programming the Loop (ianbarber.blog)
2
Scalable Training of Mixture-of-Experts Models with Megatron Core (arxiv.org)
3
PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks (arxiv.org)
2
Formalizing Data Structures and Algorithms with Agents (risemsr.github.io)
2
Thinnings: Sublist Witnesses and de Bruijn Index Shift Clumping (philipzucker.com)
2
Advent of Computing: Dan Temkin – Forty-Four Esolangs (libsyn.com)
1
Checking Write Bandwidth on GPUs (clamtech.org)
1
Challenges in Decompilation and Reverse Engineering of CUDA-Based Kernels [pdf] (nicolo.dev)
2
Block Number Formats Are Direction Preservers (constantinides.net)
2
Cutie Fly: CuTe Layout Representation and Algebra, CuTeDSL, FlyDSL (ianbarber.blog)
2
Converting Binary Floating-Point Numbers to Shortest Decimal Strings (wiley.com)
2
Controlling Floating-Point Determinism in NVIDIA CCCL (nvidia.com)
2
Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using LLMs (arxiv.org)
2
Custom Data Structures in E-Graphs (uwplse.org)
2
Formal Verification in the Age of AI (verse.systems)
3
CuTe Layout Representation and Algebra (arxiv.org)
1
Bespoke OLAP: Synthesizing Workload-Specific One-Size-Fits-One Database Engines (arxiv.org)
3
SkyDiscover: A Flexible Framework for AI-Driven Sci. and Algorithmic Discovery (skydiscover-ai.github.io)
4
Silent Backwards Compatibility Breaking Changes in PyTorch (ezyang.com)
1
Building an Open-Source Verilog Simulator with AI: 580K Lines in 43 Days (normalcomputing.com)
1