Articles by matt_d
2

Formal Verification in the Age of AI (verse.systems)

3

CuTe Layout Representation and Algebra (arxiv.org)

1

Bespoke OLAP: Synthesizing Workload-Specific One-Size-Fits-One Database Engines (arxiv.org)

3

SkyDiscover: A Flexible Framework for AI-Driven Sci. and Algorithmic Discovery (skydiscover-ai.github.io)

4

Silent Backwards Compatibility Breaking Changes in PyTorch (ezyang.com)

1

Building an Open-Source Verilog Simulator with AI: 580K Lines in 43 Days (normalcomputing.com)

1

AgentCgroup: Understanding and Controlling OS Resources of AI Agents (github.com/eunomia-bpf)

1

Equality Saturation for Circuit Synthesis and Verification (imperial.ac.uk)

1

An Introduction to Folios (oracle.com)

2

Perplexity Cannot Always Tell Right from Wrong (ianbarber.blog)

1

Ganak: The Making of a Versatile, High Performance Model Counter (msoos.org)

12

TorchLean: Formalizing Neural Networks in Lean (leandojo.org)

1

Fast Autoscheduling for Sparse ML Frameworks (fredrikbk.com)

1

TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report) (ndss-symposium.org)

1

A Reinforcement Learning Environment for Automatic Code Optimization in MLIR (arxiv.org)

2

Metamorphic Testing for Infrastructure-as-Code Engines [pdf] (programming-group.com)

2

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model (arxiv.org)

1

Midtraining Bridges Pretraining and Posttraining Distributions (arxiv.org)

2

Testing "Raw" GPU Cache Latency (clamtech.org)

4

Hexagon-MLIR: An AI Compilation Stack for Qualcomm's NPUs (arxiv.org)

1

Analyzing Latency Hiding and Parallelism in an MLIR-Based AI Kernel Compiler (arxiv.org)

1

Argus: Automated Discovery of Test Oracles for DBMSs Using LLMs (joyemang33.github.io)

2

A Decade of Docker Containers (acm.org)

1

In Pursuit of High-Fidelity GPU Kernel Benchmarking (standardkernel.com)

2

From ASPLOS to Orbit: Unikernels Twelve Years Later (gazagnaire.org)

1

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean (utopia-group.github.io)

3

CSLib: The Lean Computer Science Library (arxiv.org)

3

Heliostat: Harnessing Ray Tracing Accelerators for Page Table Walks – ISCA 2025 [video] (youtube.com)

2

LDOS: Toward a Learning-Directed Operating System (sigops.org)

1

GenAI for Systems: Recurring Challenges&Design Principles from SW to Silicon (arxiv.org)

3

Precise exceptions in relaxed architectures [video] (youtube.com)

1

BitFields API: Type-Safe Bit Packing for Lock-Free Data Structures (rocksdb.org)

2

ThunderKittens 2.0: Even Faster Kernels for Your GPUs (stanford.edu)

1

Proof Assistants in the Age of AI (leodemoura.github.io)

1

Open Source Software Projects Are Brands (reidkleckner.dev)

1

Evaluating the Hardest CS Problems in the Age of LLMs (frontier-cs.org)

2

SE Radio 708: Jens Gustedt on C in 2026 (se-radio.net)

1

Spaghetti Bench: Evaluating AI Agents on Concurrency Bug Fixes (pastalab.org)

2

Computer Science as Infrastructure: The Spine of the Lean CSLib (arxiv.org)

2

Problems with a weak tryLock operation in C and C++ standards (swift.org)

1

Two mechanisms for dynamic type checks (wingolog.org)

1

Semantics, Operations, and Properties of P3109 Floating-Point Formats in Lean (github.com/rutgers-apl)

2

Oral History of Michael J. Flynn [video] (youtube.com)

2

Productively Programming Accelerated Computing Systems – Rohan Yadav (Stanford) [video] (youtube.com)

8

How to train your program verifier (risemsr.github.io)

3

Minimalist Design for Space Camera Flight Software (acm.org)

1

AMO-Lean: Towards Formally Verified Optimization via Equality Saturation in Lean (lambdaclass.com)

4

Fine-Tuning GPT-5 for GPU Kernel Generation (arxiv.org)

3

"Am I the only one still wondering what is the deal with linear types?" – Jon S (jonmsterling.com)

1

Running the "Reflections on Trusting Trust" Compiler: Revisiting the Backdoor (acm.org)

2

TileIR (ianbarber.blog)

1

Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language (arxiv.org)

2

TLX: Triton-Like Simplicity, a Clear Path to Peak Performance [video] (youtube.com)

1

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy (nvidia.com)

1

SMTLIB as a Compiler IR I (philipzucker.com)

1

Equality Saturation Meets ML: The Next Step for Smarter Optimizing Compilers [video] (youtube.com)

2

Silicon Photonics in the Data Center: What a CMOS Exec Needs to Know (semiengineering.com)

1

Verifying Distributed Protocols in Veil (proofsandintuitions.net)

1

Project Pterodactyl: Incremental Architecture (jonmsterling.com)

2

European Lisp Symposium 2025: Talks (youtube.com)

2

AutoOverlap: Enabling Fine-Grained Overlap of Computation and Communication (arxiv.org)

2

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers (arxiv.org)

2

35th ACM SIGPLAN International Conference on Compiler Construction (CC 2026) (acm.org)

1

VFlatten: Selective Value-Object Flattening Using Hybrid Static&Dynamic Analysis [pdf] (iitb.ac.in)

1

Agentic Proof-Oriented Programming (risemsr.github.io)

1

MLIR-Tutor: Exercises for Learning MLIR (Originally Written for PPoPP 2026) (github.com/groverkss)

1

Fast Autoscheduling for Sparse ML Frameworks (ajroot.pl)

1

Replicate Forwards, Partial Backwards (ezyang.com)

1

Frontier-CS 1.0 Release (frontier-cs.org)

2

uops-again.info: corner-case behaviours of port assignment on Intel processors (uops-again.info)

1

When magic meets multicore: OCaml and its elegant era of parallelism [video] (youtube.com)

51

FlashAttention-T: Towards Tensorized Attention (acm.org)

1

Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters (acm.org)

1

MetaAttention: A Unified&Performant Attention Framework across Hardware Backends (acm.org)

6

Ga68, a GNU Algol 68 Compiler (fosdem.org)

1

LLMs versus the Halting Problem: Revisiting Program Termination Prediction (orensultan.com)

1

DTensor Erasure (ezyang.com)

1

Let the Barbarians In: How AI Can Accelerate Systems Performance Research (sigops.org)

1

Triton Bespoke Layouts (lei.chat)

6

AMD64 Bit Matrix Multiply and Bit Reversal Instructions (amd.com)

30

Demystifying ARM SME to Optimize General Matrix Multiplications (arxiv.org)

6

Evolving the OCaml programming language – CSE Bytes: K C Sivaramakrishnan [video] (youtube.com)

2

Magellan: Autonomous Discovery of Compiler Optimization Heuristics w/AlphaEvolve (arxiv.org)

1

Automatic Data Enumeration for Fast Collections (mcmichen.cc)

1

An MLIR Lowering Pipeline for Stencils at Wafer-Scale (arxiv.org)

1

The JAX sharding type system (ezyang.com)

1

AutoSP: Unlocking Long-Context LLM Training via Compiler-Based SP (ICLR 2026) (openreview.net)

1

Disentangling unification and implicit coercion (subtyping interaction problem) (jonmsterling.com)

1

Global vs. Local SPMD (ezyang.com)

3

ACM SIGPLAN Symposium on Principles of Programming Languages (POPL) 2026 talks (youtube.com)

1

Long branches in compilers, assemblers, and linkers (maskray.me)

1

Megatron via shard_map (ezyang.com)

1

Cloud-Hardware Co-Design for Memory Bandwidth-Bound HPC Workloads: Azure HBv5 (acm.org)

2

CuTile on Blackwell: NVIDIA's Compiler Moat Is Already Built (patricktoulme.substack.com)

1

Compiling Classical Sequent Calculus to Stock Hardware: Duality of Compilation [video] (youtube.com)

2

Computing Sharding with Einsum (ezyang.com)

2

What Is Control Flow Analysis for Lambda Calculus? [audio] (podcasts.apple.com)

1

Introduction to Coinduction in Agda Part 1: Coinductive Programming (jesper.cx)

1

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in CLIs (arxiv.org)

1

Multi-Modal Program Verification in Velvet (proofsandintuitions.net)