4
1
petite-vllm Part 2: KV Cache & Paged Attention (kristenmcintosh.dev)
1
Analyzing Bytes: Pre-Disassembly Static Binary Analysis (research.google)
1
Terminal-Bench Challenges: long-horizon, token-intensive, single-task benchmarks (tbench.ai)
1
Dana Scott: Lambda Calculus, Forcing and the Foundations of Math: #14 aboutlogic [video] (youtube.com)
2
SE Radio 725: Danny Yang and Sam Goldman on the Pyrefly Type Checker (se-radio.net)
9
Integer Quantization: Deep Dive (hello-fri-end.github.io)
3
M* (M-Star): A Modular, Extensible, Serving System for Multimodal Models (stanford.edu)
3
From Minutes to Seconds: LLM-Guided Autotuning for Helion Kernels (pytorch.org)
1
Zigzag Decoding with AVX-512 (zeux.io)
1
GenDB – LLM-Powered Generative Query Engine (solidlao.github.io)
20
AI Compute Extensions (ACE) Specification (x86ecosystem.org)
1
Loop Unrolling in the ML Era (hiraditya.github.io)
1
System call instrumentation on Linux/x86‑64 using memory‑indirect calls, part I (humprog.org)
1
Fearless Concurrency on the GPU (arxiv.org)
1
Using Task Graph Caching to Accelerate TVM Code Generation (acm.org)
1
Google's Training Supercomputers from TPU v2 to Ironwood: Five Generations (arxiv.org)
6
The Return of Rigorous Full-System Timing Simulation (sigarch.org)
4
Language integrated LLMs as an OCaml function (recoil.org)
2
Using OxCaml to implement type-safe reference counting between OCaml and Python (janestreet.com)
2
Scalable GPU Acceleration of Scalar Functions in Analytical Databases (microsoft.com)
1
Compiling Strassen-Like Matrix Multiplication Algorithms to Fast CUDA Kernels (acm.org)
2
Programming Language Design and Implementation (PLDI) 2026 Live Streams (sigplan.org)
1
Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions? (epfl.ch)
2
One More Type in the Tiny Type Theory (jcreedcmu.github.io)
3
A Galois Field Arithmetic Primer (tomverbeure.github.io)
1
An O(x)Caml book that runs (kcsrk.info)
4
Type Theory Forall #62 – Dependent Haskell – Vladislav Zavialov [video] (youtube.com)
4
Trip report: June 2026 ISO C++ standards meeting (Brno, Czechia) (herbsutter.com)
1
UnpredictaBench: A Benchmark for Evaluating Distributional Randomness in LLMs (arxiv.org)
1
Linear Algebra Kernels for the Age of Research (gpumode.com)
2
Latent learning: episodic memory complements parametric learning (openreview.net)
1
NEURA: A Unified and Retargetable Compilation Framework for CGRAs (acm.org)
2
System Call Stack Alignment (humprog.org)
1
Making FlashAttention-4 faster for inference (modal.com)
1
Precision Matters in Block Scales (constantinides.net)
2
Agents' Last Exam (arxiv.org)
2
Does the Harness Matter? Lessons from Ale-Claw on Agents' Last Exam (agents-last-exam.org)
1
Demystifying NVSHMEM: System-Level: Symmetric Memory, Device-Initiated Ops (arxiv.org)
1
Enumerating Ill-Typed Programs for Testing Type Analyzers (acm.org)
1
Agentic Memory Management for GPU Code Generation (ucbskyadrs.github.io)
1
CommBench: Can LLMs Write Correct and Efficient GPU Communication Code? (uccl-project.github.io)
1
Frontier: A Discrete-Event Simulator for Modern LLM Serving (github.com/netx-lab)
2
Piper: A Programmable Distributed Training System (washington.edu)
1
Piper: A Programmable Distributed Training System (arxiv.org)
1
Radix Top-K: finding the top-k elements in an array without sorting (veitner.bearblog.dev)
1
A Case for a Simulation-Driven Exploration of Distributed GenAI Platforms (acm.org)
1
Defeat the Heap: Zero-Copy Data Movement in AXI4MLIR (arxiv.org)
2
Breaking the Ice: Analyzing Cold Start Latency in vLLM (arxiv.org)
2
An Empirical Comparison of General Context-Free Parsers (arxiv.org)
1
RFC: Programming Languages Course Reboot, 2026 – Shriram Krishnamurthi (docs.google.com)
1
CodegenBench: Can LLMs Write Efficient Code Across Architectures? (arxiv.org)
1
ACM SIGPLAN Programming Language Design and Implementation (PLDI) 2026 (acm.org)
2
Human Judgment as a Specification (brownplt.org)
3
OOBdump: Relocation Oriented Programming: Arbitrary code execution in objdump -g (calif.io)
3
Inference: Turning Electricity into Intelligence – Stanford CS336 – Dan Fu [video] (youtube.com)
5
FP8 Is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail (arxiv.org)
4
Modular Arithmetic Challenge (terrytao.wordpress.com)
1
The Return of Rigorous Full-System Timing Simulation (sigarch.org)
2
Co-Creator of Haskell: Functional Prog., Thinking in Types, Useless Languages [video] (youtube.com)
1
Types for more than memory safety in OxCaml – Stephen Dolan – VeTSS 2026 [video] (youtube.com)
112
The 29th International Obfuscated C Code Contest (IOCCC) 2025 Winners (ioccc.org)
3
Tensor Shapes in Pyrefly – Avik Chaudhuri – PyCon US 2026 Typing Summit [video] (youtube.com)
1
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution (benchevolver.github.io)
2
JITDomain: Instruction-level JIT code isolation (sciencedirect.com)
2
Serving Transformers: Lessons from the Trenches – Stanford CS25 Transformers [video] (youtube.com)
1
Constrained Adaptive Rejection Sampling (arxiv.org)
2
Training an Agentic Router for Optimal Cost-Performance on SWE Tasks (appliedcompute.com)
1
Diagramming Program Values by Spatial Refinement (brownplt.org)
1
Agent Arena: Causal Evaluation of Agents in the Real World (arena.ai)
1
Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures (arxiv.org)
1
Recent improvements to the type checker – Swift Compiler (swift.org)
1
Semantic Reification: A New Paradigm for Random Program Generation (sigplan.org)
1
GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization (arxiv.org)
1
Type-Error Ablation and AI Coding Agents (arxiv.org)
1
Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon (vllm.ai)
1
Directionality in Low Precision (constantinides.net)
1
O-POPE: High-Frequency Pipelined Outer Product based GEMM acceleration (arxiv.org)
2
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Agents (nvidia.com)
51
Strace-ui, Bonsai_term, and the TUI renaissance (janestreet.com)
2
Why Larger Models Learn More: Capacity, Interference, Rare-Task Retention (arxiv.org)
2
PyTorch's playbook for AI coding, as of May 2026 (pytorch.org)
2
When does fragmentation occur in the CUDA caching allocator? (pytorch.org)
2
Evaluating Apple Silicon for Data Processing [pdf] (tum.de)
2
PassNet: Scaling Large Language Models for Graph Compiler Pass Generation (arxiv.org)
3
Do GPUs Need New Tabular File Formats? (arxiv.org)
1
Polyhedral Compilation in MLIR (sajidzubair.substack.com)
1
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (arxiv.org)
2
Three Trends from MLSys 2026 (modular.com)
3
What are the real problems of continual learning? (infinitefaculty.substack.com)
1
What "Memory Compiler" Actually Means: From Bitcells to GDS Tiling (thecloudlet.github.io)
5
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-Offs, and Perf (arxiv.org)
1
MIT EECS/CSAIL Agentic Coding in Practice Seminar Series (csail.mit.edu)
1
The Load-Balance Problem Behind Hybrid Parallelism (hecate0821.github.io)
1
Delayed Tensor Parallelism for Faster Transformer Inference (kog.ai)
1
Continuous Diffusion Models Can Obey Formal Syntax (arxiv.org)
1
HartBreaker: Deterministic Fuzzing of Multi-Hart RISC-V CPUs (ethz.ch)
4
Tuning LLVM's SLP Vectorizer Cost Model (kaving.me)
1
A Friendly Tour of Substructural, Uniqueness, Ownership, Capabilities and more! (federicobruzzone.github.io)
1