All
5+
10+
25+
50+
100+
Next >
2
Machine Scheduler in LLVM – Part II (myhsu.xyz)
2 days ago |
matt_d
| myhsu.xyz
|
newest
1
The content-addressed storage (CAS) model of incremental build systems (jonmsterling.com)
4 days ago |
matt_d
| jonmsterling.com
|
newest
2
Defeating the Training-Inference Mismatch via FP16 (arxiv.org)
5 days ago |
matt_d
| arxiv.org
|
newest
3
Opportunistically Parallel Lambda Calculus (acm.org)
6 days ago |
matt_d
| acm.org
|
best
1
Place Capability Graphs: A General-Purpose Model of Rust's Ownership & Borrowing (acm.org)
6 days ago |
matt_d
| acm.org
|
newest
2
Linear effects, exceptions, resources: Curry-Howard destructors correspondence (arxiv.org)
a week ago |
matt_d
| arxiv.org
|
newest
3
Making the Clang AST Leaner and Faster (cppalliance.org)
a week ago |
matt_d
| cppalliance.org
|
newest
3
Draw high dimensional tensors as a matrix of matrices (ezyang.com)
a week ago |
matt_d
| ezyang.com
|
best
1
Wafer-Scale AI Compute: A System Software Perspective (sigops.org)
a week ago |
matt_d
| sigops.org
|
frontpage
2
Towards Automated GPU Kernel Generation (simonguo.tech)
a week ago |
matt_d
| simonguo.tech
|
newest
1
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (arxiv.org)
a week ago |
matt_d
| arxiv.org
|
newest
2
Triton Developer Conference 2025 Talks [video] (youtube.com)
a week ago |
matt_d
| youtube.com
|
newest
1
OpenEstimate Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data (arxiv.org)
a week ago |
matt_d
| arxiv.org
|
newest
1
torchcomms: A modern PyTorch communications API (github.com/meta-pytorch)
2 weeks ago |
matt_d
| github.com
|
newest
2
Building an Open ABI and FFI for ML Systems (apache.org)
2 weeks ago |
matt_d
| apache.org
|
newest
1
Instruction Set Migration at Warehouse Scale (arxiv.org)
2 weeks ago |
matt_d
| arxiv.org
|
newest
2
Secure Parsing and Serializing with Separation Logic Applied to CBOR, CDDL, COSE [pdf] (microsoft.com)
2 weeks ago |
matt_d
| microsoft.com
|
newest
2
The Calculated Typer – Haskell Symposium (ICFP⧸SPLASH'25) [video] (youtube.com)
2 weeks ago |
matt_d
| youtube.com
|
newest
1
PickleBall: Secure Deserialization of Pickle-Based Machine Learning Models (github.com/columbia)
2 weeks ago |
matt_d
| github.com
|
newest
1
Clang Bytecode Interpreter Update (redhat.com)
2 weeks ago |
matt_d
| redhat.com
|
newest
1
Scaling Instruction-Selection Verification Against Authoritative ISA Semantics (doi.org)
2 weeks ago |
matt_d
| doi.org
|
newest
2
10 Myths of Scalable Parallel Languages Part 7: Minimalist Language Designs (chapel-lang.org)
2 weeks ago |
matt_d
| chapel-lang.org
|
newest
1
CPU Autoscaling with a Kernel of Truth (acm.org)
3 weeks ago |
matt_d
| acm.org
|
newest
1
SafeRace: WebGPU Memory Safety in the Presence of Data Races (acm.org)
3 weeks ago |
matt_d
| acm.org
|
newest
1
A guided tour through Oxidized OCaml (gavinleroy.com)
3 weeks ago |
matt_d
| gavinleroy.com
|
newest
1
Functional Networking for Millions of Docker Desktops (Experience Report) (acm.org)
3 weeks ago |
matt_d
| acm.org
|
newest
1
Does Linux Provide Performance Isolation for NVMe SSDs? Configuring cgroups [pdf] (atlarge-research.com)
3 weeks ago |
matt_d
| atlarge-research.com
|
newest
1
International Conference on Managed Programming Languages & Runtimes (MPLR) 2025 (acm.org)
3 weeks ago |
matt_d
| acm.org
|
newest
1
Collective Matrix Multiplication – JAX Pallas:Mosaic GPU (jax.dev)
3 weeks ago |
matt_d
| jax.dev
|
newest
1
StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs (arxiv.org)
3 weeks ago |
matt_d
| arxiv.org
|
newest
2
Can AI Co-Design Distributed Systems? Scaling from 1 GPU to 1k (harvard-edge.github.io)
3 weeks ago |
matt_d
| github.io
|
newest
1
Hybrid Architectures for Language Models: Systematic Analysis & Design Insights (arxiv.org)
3 weeks ago |
matt_d
| arxiv.org
|
newest
2
LLMc: Beating All Compression with LLMs (washington.edu)
3 weeks ago |
matt_d
| washington.edu
|
newest
1
All in on MatMul? Don’t Put All Your Tensors in One Basket! (sigarch.org)
4 weeks ago |
matt_d
| sigarch.org
|
frontpage
1
Muon Outperforms Adam in Tail-End Associative Memory Learning [video] (youtube.com)
4 weeks ago |
matt_d
| youtube.com
|
newest
2
Barbara Liskov Oral History [video] (youtube.com)
4 weeks ago |
matt_d
| youtube.com
|
newest
2
CMOS 2.0 – Redefining the Future of Scaling (arxiv.org)
4 weeks ago |
matt_d
| arxiv.org
|
newest
1
Fuss-Free Universe Hierarchies (jonmsterling.com)
4 weeks ago |
matt_d
| jonmsterling.com
|
newest
1
Pretraining Large Language Models with NVFP4 (arxiv.org)
4 weeks ago |
matt_d
| arxiv.org
|
newest
3
The Next Computing Revolution: Bringing Processing Inside Memory (computer.org)
4 weeks ago |
matt_d
| computer.org
|
newest
1
llvm-mos: Modern C/C++ on the Venerable 6502 | VCFMW 20 (2025) (2025) [video] (youtube.com)
4 weeks ago |
matt_d
| youtube.com
|
frontpage
1
Mercury: Unlocking Multi-GPU Optimization for LLMs via Remote Memory Scheduling [pdf] (storage.googleapis.com)
a month ago |
matt_d
| googleapis.com
|
newest
10
Optimizing a 6502 image decoder – part II: assembly (colino.net)
a month ago |
matt_d
| colino.net
|
best
1
Arm A-Profile Architecture developments 2025: Armv9.7-A (arm.com)
a month ago |
matt_d
| arm.com
|
newest
1
TypeDis: A Type System for Disentanglement [pdf] (nyu.edu)
a month ago |
matt_d
| nyu.edu
|
newest
1
From CPU Transparency to GPU Complexity – The Performance Engineering Frontier (harvard-edge.github.io)
a month ago |
matt_d
| github.io
|
newest
1
Quotient Polymorphism [pdf] (nott.ac.uk)
a month ago |
matt_d
| nott.ac.uk
|
newest
1
Labelled preorders and coercions: different approaches to multiple inheritance (jonmsterling.com)
a month ago |
matt_d
| jonmsterling.com
|
newest
3
GPU Mode Lecture 80: How FlashAttention 4 Works [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
frontpage
1
When You Have a Fuzzer, Everything Looks Like a Reachability Problem [pdf] (ic.ac.uk)
a month ago |
matt_d
| ic.ac.uk
|
newest
4
F3: The Open-Source Data File Format for the Future (doi.org)
a month ago |
matt_d
| doi.org
|
frontpage
2
Efficient LLM:Bandwidth, Compute, Synchronization, and Capacity are all you need (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
frontpage
3
HieraSynth: A Parallel Framework for Complete Super-Optimization [pdf] (lsrcz.github.io)
a month ago |
matt_d
| github.io
|
newest
2
Fusion: An Analytics Object Store Optimized for Query Pushdown (doi.org)
a month ago |
matt_d
| doi.org
|
frontpage
2
Type Theory Forall – Philip Wadler – Type Classes, Monads, Logic, Future of PL [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
newest
4
3rd Largest Element: SIMD Edition (parallelprogrammer.substack.com)
a month ago |
matt_d
| substack.com
|
frontpage
2
FP64 Floating-Point Emulation in INT8 (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
newest
1
A Early History of Algebraic Data Types (hillelwayne.com)
a month ago |
matt_d
| hillelwayne.com
|
newest
1
RISC-V Conditional Moves (corsix.org)
a month ago |
matt_d
| corsix.org
|
newest
3
Global Economic History: Cradle of Modernity [pdf] (upenn.edu)
a month ago |
matt_d
| upenn.edu
|
newest
2
"Is it time for a new proof assistant?" – Jon Sterling [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
frontpage
2
OpenSTA: Open-source static timing analysis for FPGAs (zeroasic.com)
a month ago |
matt_d
| zeroasic.com
|
frontpage
1
Arm SIMD Loops – C, ACLE intrinsics, inline assembly – Neon, SVE, SME (arm.com)
a month ago |
matt_d
| arm.com
|
newest
1
GraphMend: Code Transformations for Fixing Graph Breaks in PyTorch 2 (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
newest
2
Weak Memory Model Formalisms: Introduction and Survey (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
newest
2
Program Optimisations via Hylomorphisms for Extraction of Executable Code (dagstuhl.de)
a month ago |
matt_d
| dagstuhl.de
|
frontpage
8
Identity Types (bartoszmilewski.com)
a month ago |
matt_d
| bartoszmilewski.com
|
frontpage
1
Categorical Foundations for CuTe Layouts (colfax-intl.com)
a month ago |
matt_d
| colfax-intl.com
|
newest
6
Transforming recursion into iteration for LLVM loop optimizations (dspace.mit.edu)
a month ago |
matt_d
| mit.edu
|
frontpage
1
Unweaving Warp Specialization (rohany.github.io)
a month ago |
matt_d
| github.io
|
newest
1
The grind tactic in Lean 4 [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
newest
3
10 Myths of Scalable Parallel Languages Part 6: Performance of High-Level Langs (chapel-lang.org)
a month ago |
matt_d
| chapel-lang.org
|
newest
4
Reflection – C++'s decade-defining rocket engine (herbsutter.com)
a month ago |
matt_d
| herbsutter.com
|
newest
8
A Generalized Algebraic Theory of Directed Equality (jacobneu.phd)
a month ago |
matt_d
| jacobneu.phd
|
frontpage
2
Cppless: Single-Source and High-Performance Serverless Programming in C++ (acm.org)
a month ago |
matt_d
| acm.org
|
newest
32
Gluon: a GPU programming language based on the same compiler stack as Triton (github.com/triton-lang)
a month ago |
matt_d
| github.com
|
frontpage
1
Machine Scheduler in LLVM – Part I (myhsu.xyz)
a month ago |
matt_d
| myhsu.xyz
|
frontpage
12
Safepoints and Fil-C (fil-c.org)
a month ago |
matt_d
| fil-c.org
|
frontpage
2
Simon Peyton Jones: Pursuing a Trick a Long Way, Just to See Where It Goes [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
frontpage
3
Faux Type Theory: three minimalist OCaml simple proof checker implementations (github.com/andrejbauer)
a month ago |
matt_d
| github.com
|
newest
1
Inside vLLM: Anatomy of a High-Throughput LLM Inference System (vllm.ai)
a month ago |
matt_d
| vllm.ai
|
newest
1
simdjson Version 4.0.0 Released (github.com/simdjson)
a month ago |
matt_d
| github.com
|
newest
2
Introducing BackendBench: how well LLMs and humans can write PyTorch backends (github.com/meta-pytorch)
a month ago |
matt_d
| github.com
|
newest
2
Rethinking Analytical Processing in the GPU Era (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
newest
2
EUV: Lithography: History, Latest Results, Technology Roadmap [video] (youtube.com)
a month ago |
matt_d
| youtube.com
|
newest
3
Disaggregation: A New Architecture for Cloud Databases (muratbuffalo.blogspot.com)
a month ago |
matt_d
| blogspot.com
|
frontpage
1
GSoC 2025 – Byte Type: Supporting Raw Data Copies in the LLVM IR (llvm.org)
a month ago |
matt_d
| llvm.org
|
newest
1
The Future of Memory: Limits and Opportunities (arxiv.org)
a month ago |
matt_d
| arxiv.org
|
newest
2
Interposing on clone() system calls in-process, from Linux userspace (humprog.org)
a month ago |
matt_d
| humprog.org
|
newest
1
So you want to control flow in PyTorch 2 (ezyang.com)
2 months ago |
matt_d
| ezyang.com
|
newest
7
IRHash: Efficient Multi-Language Compiler Caching by IR-Level Hashing (usenix.org)
2 months ago |
matt_d
| usenix.org
|
frontpage
46
Evolving the OCaml Programming Language (2025) [pdf] (kcsrk.info)
2 months ago |
matt_d
| kcsrk.info
|
best
2
Why ML Needs a New Programming Language – Chris Lattner – Signals and Threads (signalsandthreads.com)
2 months ago |
matt_d
| signalsandthreads.com
|
newest
1
vLLM with torch.compile: Efficient LLM inference on PyTorch (vllm.ai)
2 months ago |
matt_d
| vllm.ai
|
newest
1
vLLM with torch.compile: Efficient LLM inference on PyTorch (vllm.ai)
2 months ago |
matt_d
| vllm.ai
|
newest
2
DaCe AD: Unifying High-Performance Automatic Differentiation for ML and SciComp (arxiv.org)
2 months ago |
matt_d
| arxiv.org
|
newest
7
Still Asking: How Good Are Query Optimizers, Really? [pdf] (vldb.org)
3 months ago |
matt_d
| vldb.org
|
frontpage
1
Sphinx: A Succinct Perfect Hash Index for x86 [pdf] (vldb.org)
3 months ago |
matt_d
| vldb.org
|
newest
3
DialEgg: Dialect-Agnostic MLIR Optimizer Using Equality Saturation with Egglog [video] (youtube.com)
3 months ago |
matt_d
| youtube.com
|
newest
5
Dependent Types: Universes, or types of types (jonmsterling.com)
3 months ago |
matt_d
| jonmsterling.com
|
frontpage
Next >