1
1
Oral History of Jensen Huang – Computer History Museum [video] (youtube.com)
1
The Equational Theories Project: Collaborative Mathematical Research at Scale (terrytao.wordpress.com)
1
The Quest Toward That Perfect Compiler – ACM SPLASH / OOPSLA 2025 Keynote [video] (youtube.com)
1
Learning to love mesh-oriented sharding (ezyang.com)
1
Microbenchmarking NVIDIA's Blackwell: An In-Depth Architectural Analysis (arxiv.org)
1
tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection (arxiv.org)
1
RFC: Forming a Working Group on Formal Specification for LLVM (llvm.org)
3
hls4ml: A Flexible, OSS Platform for ML Acceleration on Reconfigurable Hardware (arxiv.org)
1
Nice to Meet You: Synthesizing Practical MLIR Abstract Transformers [pdf] (utah.edu)
1
SAT Etudes 2: Toy DPLL (philipzucker.com)
3
The Hitchhiker's Guide to Coherent Fabrics: 5 Programming Rules (sigarch.org)
1
Optimizing libdwarf .eh_frame enumeration (rovarma.com)
1
GSoC 2025: ClangIR Upstreaming (llvm.org)
2
Normal Forms for MLIR – 2025 US LLVM Developers' Meeting – Alex Zinenko [video] (youtube.com)
1
Place Capability Graphs: A General-Purpose Model of Rust's Ownership & Borrowing [video] (youtube.com)
1
LLM Inference Beyond a Single Node: From Bottlenecks to Mitigations (arxiv.org)
2
What Scala can learn from Rust, Swift, and C++ [video] (youtube.com)
1
Lifetime Safety in Clang – 2025 US LLVM Developers' Meeting [video] (youtube.com)
3
Constant-time support coming to LLVM: Protecting cryptographic code (trailofbits.com)
1
Seymour Cray at 100 – Clive England – TNMoC Talk [video] (youtube.com)
5
Mitigating Application Resource Overload with Targeted Task Cancellation (muratbuffalo.blogspot.com)
1
MetaOCaml: Ten Years Later System Description (sciencedirect.com)
1
Where "Simulation" Came From (decomposition.al)
1
Inside VOLT: Designing an Open-Source GPU Compiler (arxiv.org)
1
An MLIR Pipeline for Offloading Fortran to FPGAs via OpenMP (acm.org)
3
Inside Nvidia GPU: Blackwell's Limitations & Future Rubin's Microarchitecture (github.com/zartbot)
1
Kitsune: Enabling Dataflow Execution on GPUs with Spatial Pipelines (acm.org)
1
DMA Collectives for Efficient ML Communication Offloads (arxiv.org)
4
10 Myths of Scalable Parallel Languages Part 8: Striving Toward Adoptability (chapel-lang.org)
8
Slicing Is All You Need: Towards a Universal One-Sided Distributed MatMul (arxiv.org)
2
Machine Scheduler in LLVM – Part II (myhsu.xyz)
1
The content-addressed storage (CAS) model of incremental build systems (jonmsterling.com)
2
Defeating the Training-Inference Mismatch via FP16 (arxiv.org)
3
Opportunistically Parallel Lambda Calculus (acm.org)
1
Place Capability Graphs: A General-Purpose Model of Rust's Ownership & Borrowing (acm.org)
2
Linear effects, exceptions, resources: Curry-Howard destructors correspondence (arxiv.org)
3
Making the Clang AST Leaner and Faster (cppalliance.org)
3
Draw high dimensional tensors as a matrix of matrices (ezyang.com)
1
Wafer-Scale AI Compute: A System Software Perspective (sigops.org)
2
Towards Automated GPU Kernel Generation (simonguo.tech)
1
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (arxiv.org)
2
Triton Developer Conference 2025 Talks [video] (youtube.com)
1
OpenEstimate Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data (arxiv.org)
1
torchcomms: A modern PyTorch communications API (github.com/meta-pytorch)
2
Building an Open ABI and FFI for ML Systems (apache.org)
1
Instruction Set Migration at Warehouse Scale (arxiv.org)
2
Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, COSE [pdf] (microsoft.com)
2
The Calculated Typer – Haskell Symposium (ICFP⧸SPLASH'25) [video] (youtube.com)
1
PickleBall: Secure Deserialization of Pickle-Based Machine Learning Models (github.com/columbia)
1
Clang Bytecode Interpreter Update (redhat.com)
1
Scaling Instruction-Selection Verification Against Authoritative ISA Semantics (doi.org)
2
10 Myths of Scalable Parallel Languages Part 7: Minimalist Language Designs (chapel-lang.org)
1
CPU Autoscaling with a Kernel of Truth (acm.org)
1
SafeRace: WebGPU Memory Safety in the Presence of Data Races (acm.org)
1
A guided tour through Oxidized OCaml (gavinleroy.com)
1
Functional Networking for Millions of Docker Desktops (Experience Report) (acm.org)
1
Does Linux Provide Performance Isolation for NVMe SSDs? Configuring cgroups [pdf] (atlarge-research.com)
1
International Conference on Managed Programming Languages & Runtimes (MPLR) 2025 (acm.org)
1
Collective Matrix Multiplication – JAX Pallas:Mosaic GPU (jax.dev)
1
StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs (arxiv.org)
2
Can AI Co-Design Distributed Systems? Scaling from 1 GPU to 1k (harvard-edge.github.io)
1
Hybrid Architectures for Language Models: Systematic Analysis & Design Insights (arxiv.org)
2
LLMc: Beating All Compression with LLMs (washington.edu)
1
All in on MatMul? Don’t Put All Your Tensors in One Basket! (sigarch.org)
1
Muon Outperforms Adam in Tail-End Associative Memory Learning [video] (youtube.com)
2
Barbara Liskov Oral History [video] (youtube.com)
2
CMOS 2.0 – Redefining the Future of Scaling (arxiv.org)
1
Fuss-Free Universe Hierarchies (jonmsterling.com)
1
Pretraining Large Language Models with NVFP4 (arxiv.org)
3
The Next Computing Revolution: Bringing Processing Inside Memory (computer.org)
1
llvm-mos: Modern C/C++ on the Venerable 6502 | VCFMW 20 (2025) (2025) [video] (youtube.com)
1
Mercury: Unlocking Multi-GPU Optimization for LLMs via Remote Memory Scheduling [pdf] (storage.googleapis.com)
10
Optimizing a 6502 image decoder – part II: assembly (colino.net)
1
Arm A-Profile Architecture developments 2025: Armv9.7-A (arm.com)
1
TypeDis: A Type System for Disentanglement [pdf] (nyu.edu)
1
From CPU Transparency to GPU Complexity – The Performance Engineering Frontier (harvard-edge.github.io)
1
Quotient Polymorphism [pdf] (nott.ac.uk)
1
Labelled preorders and coercions: different approaches to multiple inheritance (jonmsterling.com)
3
GPU Mode Lecture 80: How FlashAttention 4 Works [video] (youtube.com)
1
When You Have a Fuzzer, Everything Looks Like a Reachability Problem [pdf] (ic.ac.uk)
4
F3: The Open-Source Data File Format for the Future (doi.org)
2
Efficient LLM:Bandwidth, Compute, Synchronization, and Capacity are all you need (arxiv.org)
3
HieraSynth: A Parallel Framework for Complete Super-Optimization [pdf] (lsrcz.github.io)
2
Fusion: An Analytics Object Store Optimized for Query Pushdown (doi.org)
2
Type Theory Forall – Philip Wadler – Type Classes, Monads, Logic, Future of PL [video] (youtube.com)
4
3rd Largest Element: SIMD Edition (parallelprogrammer.substack.com)
2
FP64 Floating-Point Emulation in INT8 (arxiv.org)
1
A Early History of Algebraic Data Types (hillelwayne.com)
1
RISC-V Conditional Moves (corsix.org)
3
Global Economic History: Cradle of Modernity [pdf] (upenn.edu)
2
"Is it time for a new proof assistant?" – Jon Sterling [video] (youtube.com)
2
OpenSTA: Open-source static timing analysis for FPGAs (zeroasic.com)
1
Arm SIMD Loops – C, ACLE intrinsics, inline assembly – Neon, SVE, SME (arm.com)
1
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 (arxiv.org)
2
Weak Memory Model Formalisms: Introduction and Survey (arxiv.org)
2
Program Optimisations via Hylomorphisms for Extraction of Executable Code (dagstuhl.de)
8
Identity Types (bartoszmilewski.com)
1
Categorical Foundations for CuTe Layouts (colfax-intl.com)
6