Articles by limoce
3

Reduce Complexity of Matmul to O(N^2logN) by Asymptotically Optimal Quantum Algo (arxiv.org)

15

Learning from context is harder than we thought (tencent.com)

1

Residual Context Diffusion (yuezhouhu.github.io)

2

Step 3.5 Flash (stepfun.com)

1

LongCat-Flash-Lite 100B A3B Technical Report [pdf] (huggingface.co)

1

Jinja.cpp: A single-header C++11 Jinja2 template engine for LLM chat templates (github.com/wangzhaode)

1

Mini-SGLang: A lightweight yet high-performance inference framework for LLM (github.com/sgl-project)

1

Google drops Gemini 3 Pro image preview (reddit.com)

1

DGX Spark may have only half the performance claimed (reddit.com)

2

You Can Cool Chips with Lasers (ieee.org)

2

Fully Integrated 2D Flash Chip Unveiled (bioengineer.org)

7

Linux 6.18 UDP receive performance improved by 47%, under DDoS (kernel.org)

1

OpenAI 2025 ICPC Submissions (github.com/openai)

2

Unveiling Silicon Art: Dieshots of Microchip Masterpieces (dieshot.com)

1

AetherCode: Evaluating LLMs' Ability to Win in Premier Programming Competitions (arxiv.org)

15

Operation Costs in CPU Clock Cycles (2016) (ithare.com)

1

Principles and Methodologies for Serial Performance Optimization (usenix.org)

1

The Koala Benchmarks for the Shell (kben.sh)

1

SmallThinker: A Family of Efficient LLMs Natively Trained for Local Deployment (arxiv.org)

1

Step3 Technical Report [pdf] (github.com/stepfun-ai)

103

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it (twitter.com/cis_female)

5

Polaris: A Post-training recipe for scaling RL on Advanced Reasoning models (hkunlp.github.io)

14

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths (royeisen.github.io)

1

Neutrino: Probing-Based eBPF-Like GPU Kernel Profiling (github.com/open-neutrino)

3

Machine Learning Conferences Should Establish "Refutations and Critiques" Track (arxiv.org)

1

SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines (supergpqa.github.io)

4

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator (sepllm.github.io)

10

Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model (arxiv.org)

1

Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset (github.com/unakar)

1

Libnginx: Nginx as a Shared Library (github.com/wkgcass)

1

DeepSeek-VL2: Moe Vision-Language Models for Advanced Multimodal Understanding [pdf] (github.com/deepseek-ai)

1

Fast vectorizable algorithms of binary searching for floating point numbers (github.com/fabiocannizzo)

26

New OpenAI Feature: Predicted Outputs (simonwillison.net)

2

Collaborative Filtering Is Wrong and Here Is Why (springer.com)

1

REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training (sites.google.com)

1

Smoke 'em if you got 'em: Hacker gains root access using cigarette lighter (tomshardware.com)

1

O1 Replication Journey: A Strategic Progress Report (github.com/gair-nlp)

1

Failures of Gradient-Based Deep Learning (2017) [pdf] (mlr.press)

1

Qwen2-VL (huggingface.co)

64

Qwen2-Math (qwenlm.github.io)

56

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention (pytorch.org)

1

MiniCPM-v2.6: GPT-4V Level MLLM for Single/Multi Image and Video on Your Phone (github.com/openbmb)

1

MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT (github.com/internlm)

1

Header-only and powerful C++20 static reflection library in 99 lines (github.com/ubpa)

2

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters (arxiv.org)

1

PowerInfer-2: Fast Large Language Model Inference on a Smartphone (arxiv.org)

2

Large-scale photonic chiplet Taichi empowers 160TOPS/W AI (science.org)

1

Asterinas: OS kernel written in Rust and providing Linux-compatible ABI (github.com/asterinas)

1

Mq-deadline scalability improvements (with more than 100% improvement) (kernel.org)

2

Researchers Create First Functional Semiconductor Made from Graphene (technologynetworks.com)

1

Wayland Enjoyed Many Successes in 2023 (phoronix.com)

15

Improving our safety with a physical quantities and units library (open-std.org)

1

Nesting chinstrap penguins sleep by seconds-long microsleeps (science.org)

2

PowerInfer: High-Speed Large Language Model Serving on Consumer-Grade GPUs (github.com/sjtu-ipads)

1

Zpoline: System Call Hook for Linux (github.com/yasukata)

1

Randomized Single-Source Shortest Path Algo. On Undirected Real-Weighted Graphs (arxiv.org)

1

ChatGPT powered Rust proc macro that generates code at compile-time (github.com/retrage)

3

Memcpy is faster than memset on Intel i7 12700 with glibc 2.36 (gist.github.com)

1

Year-in-search-trends: Visualization of search interest over time (github.com/joweich)

1

Make call_rcu() lazy to save power (kernel.org)

2

IOMMUFD (kernel.org)

4

Chinese Chipmaker Loongson Readies 3A6000 to Tackle Zen 3 and Tiger Lake (wccftech.com)

3

Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays (arxiv.org)

1

Slirp Is Dead, Long Live Slirp: a New Approach to User-Mode Networking

1

eRPC: A fast remote procedure call library for datacenters

79

Poly-time algorithm for deciding Hilbert Nullstellensatz. A proof of P=NP

2

How hard is it to open a file (2016)

3

Kernel TLS 1.3 Rx improvements in Linux 5.20

2

Resin: Holistic Service for Dealing with Memory Leaks in Production Cloud Infra

2

YMTC launched 232-layer flash memory chip: performance improved by 50%