Next >
1
GEMM with Thread Block Clusters on Nvidia Blackwell GPUs (colfax-intl.com)
3 days ago | ashvardanian | colfax-intl.com | newest
2
Visualizing Foursquare Places with ClickHouse (clickhouse.com)
4 days ago | ashvardanian | clickhouse.com | frontpage
1
Meta Perception Encoder (meta.com)
a week ago | ashvardanian | meta.com | newest
32
Load-Store Conflicts (zeux.io)
a week ago | ashvardanian | zeux.io | best
3
Vibe Coding Is Rapidly Reshaping the Software Developer Profession (thenewstack.io)
2 weeks ago | ashvardanian | thenewstack.io | newest
52
CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU (github.com/tracel-ai)
2 weeks ago | ashvardanian | github.com | best
2
Burn v0.17: Deep Learning in Rust gets new backends and improved kernel fusion (github.com/tracel-ai)
2 weeks ago | ashvardanian | github.com | newest
1
Cutlass Tutorial: Writing GEMM Kernels Using Tensor Memory for Blackwell GPUs (colfax-intl.com)
3 weeks ago | ashvardanian | colfax-intl.com | newest
84
Less Slow C++ (github.com/ashvardanian)
3 weeks ago | ashvardanian | github.com | best
1
Calling CUDA in 3000 Words (ashvardanian.com)
a month ago | ashvardanian | ashvardanian.com | newest
2
Redpanda debuts new Agentic AI platform with MCP (redpanda.com)
a month ago | ashvardanian | redpanda.com | newest
3
The Tragic Case of Intel AI (geohot.github.io)
a month ago | ashvardanian | github.io | frontpage
1
Arm Introduces New Developer Initiative to Expedite Migration on Cloud Platforms (arm.com)
a month ago | ashvardanian | arm.com | newest
4
Show HN: Less Slow C++: Revisiting Performance Tricks for C/C++/CUDA/Asm/PTX (github.com/ashvardanian)
a month ago | ashvardanian | github.com | newest
3
Isomorphic Labs announces $600M external investment round (isomorphiclabs.com)
a month ago | ashvardanian | isomorphiclabs.com | newest
2
Mixing ARM NEON with SVE code for fun and profit (lemire.me)
a month ago | ashvardanian | lemire.me | newest
4
CMake 4.0.0 Released (kitware.com)
a month ago | ashvardanian | kitware.com | frontpage
1
Unlock High-Precision Keyword Search with pinecone-sparse-English-v0 (pinecone.io)
a month ago | ashvardanian | pinecone.io | newest
1
Democratizing AI Compute, Part 7: What about Triton and Python EDSLs? (modular.com)
a month ago | ashvardanian | modular.com | newest
42
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework (github.com/ai-dynamo)
a month ago | ashvardanian | github.com | best
1
Git 2.49 Released with Faster Packing, Rust Foreign Language Interface (phoronix.com)
a month ago | ashvardanian | phoronix.com | newest
2
Cerebras Announces Six New AI Datacenters Across North America and Europe (cerebras.ai)
2 months ago | ashvardanian | cerebras.ai | newest
1
`std::generator`: Standard Library Coroutine Support (microsoft.com)
3 months ago | ashvardanian | microsoft.com | newest
2
Deep Dive into Matrix Optimization on AMD GPUs (seb-v.github.io)
3 months ago | ashvardanian | github.io | newest
3
The Longest Nvidia PTX Instruction (ashvardanian.com)
3 months ago | ashvardanian | ashvardanian.com | frontpage
2
CPU Ports and Latency Hiding on x86 (ashvardanian.com)
4 months ago | ashvardanian | ashvardanian.com | newest
2
Tenstorrent Wormhole Series Part 7: Bits of the MatMul (corsix.org)
4 months ago | ashvardanian | corsix.org | newest
2
GitHub Models (github.com/marketplace)
4 months ago | ashvardanian | github.com | newest
1
Camera Calibration: What to perfect before touching the code (nikolasent.github.io)
4 months ago | ashvardanian | github.io | newest
10
Nvidia Statement on the Biden Administration's Misguided 'AI Diffusion' Rule (nvidia.com)
4 months ago | ashvardanian | nvidia.com | frontpage
3
Biden unveils last round of AI chip curbs aimed at China, Russia (cnn.com)
4 months ago | ashvardanian | cnn.com | newest
1
Parsing JSON in C and C++: Singleton Tax (ashvardanian.com)
4 months ago | ashvardanian | ashvardanian.com | newest
14
Volodymyr Zelenskyy on Lex Fridman Podcast #456 (youtube.com)
4 months ago | ashvardanian | youtube.com | frontpage
2
Unifying Generative and Dense Retrieval for Sequential Recommendation (arxiv.org)
5 months ago | ashvardanian | arxiv.org | frontpage
2
Efficient In-Place UTF-16 Unicode Correction with ARM Neon (lemire.me)
5 months ago | ashvardanian | lemire.me | newest
4
Scaling data collection for training software engineering agents (nebius.com)
5 months ago | ashvardanian | nebius.com | newest
5
Linux on GPU (dmitryduka.github.io)
5 months ago | ashvardanian | github.io | newest
1
Amazon's AI Self Sufficiency – Trainium2 Architecture and Networking (semianalysis.com)
5 months ago | ashvardanian | semianalysis.com | newest
2
Outperforming cuBLAS on H100: A Worklog (cudaforfun.substack.com)
6 months ago | ashvardanian | substack.com | newest
2
Sign operation using VFIXUPIMM in AVX-512 (wunkolo.github.io)
6 months ago | ashvardanian | github.io | newest
2
Cerebras 748x faster than Frontier supercomputer in molecular simulation (networkworld.com)
6 months ago | ashvardanian | networkworld.com | newest
1
The Next 31 Years of Developing Unum (ashvardanian.com)
6 months ago | ashvardanian | ashvardanian.com | newest
2
Understanding SIMD: Infinite Complexity of Trivial Problems (modular.com)
6 months ago | ashvardanian | modular.com | newest
1
SIMD Library for Evaluating Elementary Functions, Vectorized Libm and DFT (github.com/shibatch)
6 months ago | ashvardanian | github.com | newest
1
The Microsoft Azure HBv5 and AMD MI300C (servethehome.com)
6 months ago | ashvardanian | servethehome.com | newest
1
Azure HBv5: 352 AMD Genoa Cores with over 400 GB of HBM3 (microsoft.com)
6 months ago | ashvardanian | microsoft.com | newest
11
Waiting for many things at once with io_uring (mazzo.li)
6 months ago | ashvardanian | mazzo.li | best
2
Is Prefix of String in Table? A Journey into SIMD String Processing (trent.me)
6 months ago | ashvardanian | trent.me | newest
1
iMac Announcement – October 28 [video] (youtube.com)
7 months ago | ashvardanian | youtube.com | newest
2
Create Index Externally: Offloading Pgvector Indexing from Postgres (lantern.dev)
7 months ago | ashvardanian | lantern.dev | newest
2
Tenstorrent Wormhole Series Part 6: Vector instruction set (corsix.org)
7 months ago | ashvardanian | corsix.org | newest
4
Over-Engineering 5x Faster Set Intersections in SVE2, AVX-512, & Neon (ashvardanian.com)
8 months ago | ashvardanian | ashvardanian.com | newest
1
Grounding AI in reality with a little help from Data Commons (research.google)
8 months ago | ashvardanian | research.google | newest
2
DataGemma: Using real-world data to address AI hallucinations (blog.google)
8 months ago | ashvardanian | blog.google | newest
1
Holy Macroni A recipe for progressive language enhancement (2023) (trailofbits.com)
8 months ago | ashvardanian | trailofbits.com | newest
4
Chai-1: Decoding the molecular interactions of life (chaidiscovery.com)
8 months ago | ashvardanian | chaidiscovery.com | newest
1
35% Discount on Keyword Arguments in Python (ashvardanian.com)
8 months ago | ashvardanian | ashvardanian.com | newest
4
Cerebras reaches 1800 tokens/s for 8B Llama3.1 (forbes.com/sites/craigsmith)
9 months ago | ashvardanian | forbes.com | frontpage
1
The Painful Pitfalls of C++ STL Strings (ashvardanian.com)
9 months ago | ashvardanian | ashvardanian.com | newest
1
Stack-PR: an open source tool for managing stacked PRs on GitHub (modular.com)
10 months ago | ashvardanian | modular.com | newest
6
Benchmarking ARM Processors: Graviton 4, Graviton 3 and Apple M2 (lemire.me)
10 months ago | ashvardanian | lemire.me | frontpage
2
FAIR's Chameleon Early Fusion Code (github.com/facebookresearch)
11 months ago | ashvardanian | github.com | newest
1
We Found Something Testing the 5th Gen Intel Xeon Scalable 1P CPUs (servethehome.com)
12 months ago | ashvardanian | servethehome.com | newest
2
Gemma-10M Technical Overview (medium.com/akshgarg_36829)
a year ago | ashvardanian | medium.com | newest
1
How fast can construct a small list of strings in C for Python? (lemire.me)
a year ago | ashvardanian | lemire.me | newest
2
Meilisearch 1.8 (meilisearch.com)
a year ago | ashvardanian | meilisearch.com | newest
1
Visual Language Models on Nvidia Hardware with VILA (nvidia.com)
a year ago | ashvardanian | nvidia.com | newest
5
Show HN: Swift-powered AI apps on iOS: Real-time multimodal semantic search (github.com/ashvardanian)
a year ago | ashvardanian | github.com | newest
2
Liburing 2.6 Released (github.com/axboe)
a year ago | ashvardanian | github.com | frontpage
1
MinSH: Near-linear time global string alignment in a few lines of Python (github.com/pesho-ivanov)
a year ago | ashvardanian | github.com | newest
1
FM-Indexes and Backwards Search (2011) (alexbowe.com)
a year ago | ashvardanian | alexbowe.com | newest
56
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B (github.com/ollama)
a year ago | ashvardanian | github.com | best
1
GCC 14.1 Compiler Aiming for Release Around 7 May with C23 and AVX10.1 (phoronix.com)
a year ago | ashvardanian | phoronix.com | frontpage
2
Writing a WASM Runtime in Rust (skanehira.github.io)
a year ago | ashvardanian | github.io | newest
1
Multimodal Embeddings for JavaScript, Swift, and Python (github.com/unum-cloud)
a year ago | ashvardanian | github.com | newest
1
PyTorch internals: ezyang's blog (2019) (ezyang.com)
a year ago | ashvardanian | ezyang.com | newest
4
KraftCloud (unikraft.io)
a year ago | ashvardanian | unikraft.io | newest
40
KraftCloud (github.com/unikraft)
a year ago | ashvardanian | github.com | best
2
Show HN: UForm v2 Featuring Multimodal Matryoshka, Multimodal DPO, and ONNX (github.com/unum-cloud)
a year ago | ashvardanian | github.com | newest
1
VideoPrism: A foundational visual encoder for video understanding (research.google)
a year ago | ashvardanian | research.google | newest
7
Technical Deep Dive: AI Accent Localization for Call Centers (krisp.ai)
a year ago | ashvardanian | krisp.ai | newest
2
Tenstorrent Kernel by George Hotz (github.com/geohot)
a year ago | ashvardanian | github.com | frontpage
3
LLM Inference Speed of Light (zeux.io)
a year ago | ashvardanian | zeux.io | frontpage
4
C++ meets TypeScript: bidirectional type interoperability in the Cheerp compiler (leaningtech.com)
a year ago | ashvardanian | leaningtech.com | newest
6
NumPy vs. BLAS: Losing 90% of Throughput (ashvardanian.com)
a year ago | ashvardanian | ashvardanian.com | newest
11
Max Developer Edition Preview (modular.com)
a year ago | ashvardanian | modular.com | frontpage
1
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models (arxiv.org)
a year ago | ashvardanian | arxiv.org | newest
1
Intel Announces New Edge Platform for Scaling AI Applications (intc.com)
a year ago | ashvardanian | intc.com | newest
1
Lightbug: Simple and fast HTTP framework for Mojo (github.com/saviorand)
a year ago | ashvardanian | github.com | newest
2
USearch SQLite Extensions for Vector and Text Search (github.com/unum-cloud)
a year ago | ashvardanian | github.com | newest
0
VideoPrism: A Foundational Visual Encoder for Video Understanding (arxiv.org)
a year ago | ashvardanian | arxiv.org | newest
35
Measuring energy usage: regular code vs. SIMD code (lemire.me)
a year ago | ashvardanian | lemire.me | best
2
Borrow Checker, Lifetimes and Destructor Arguments in C++ (a10nw01f.github.io)
a year ago | ashvardanian | github.io | newest
1
Python Is Portable (2021) (ahgamut.github.io)
a year ago | ashvardanian | github.io | newest
2
Galileo v0.1 cross-platform map rendering engine released (maximkaaa.github.io)
a year ago | ashvardanian | github.io | frontpage
0
C++ Package Managers: The Ultimate Roundup (moderncppdevops.com)
a year ago | ashvardanian | moderncppdevops.com | newest
2
C++ Safety, Revisited (accu.org)
a year ago | ashvardanian | accu.org | newest
1
Development-Cycle in Cargo: 1.77 (rust-lang.org)
a year ago | ashvardanian | rust-lang.org | newest
6
What's Wrong with C++ Strings? (ashvardanian.com)
a year ago | ashvardanian | ashvardanian.com | newest
7
Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON (github.com/ashvardanian)
a year ago | ashvardanian | github.com | newest
Next >