103
FP8 is ~100 tflops faster when the kernel name has "cutlass" in it (twitter.com/cis_female)
18 hours ago | limoce | twitter.com | best
5
Polaris: A Post-training recipe for scaling RL on Advanced Reasoning models (hkunlp.github.io)
2 days ago | limoce | github.io | newest
14
Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths (royeisen.github.io)
5 days ago | limoce | github.io | frontpage
1
Neutrino: Probing-Based eBPF-Like GPU Kernel Profiling (github.com/open-neutrino)
a week ago | limoce | github.com | newest
3
Machine Learning Conferences Should Establish "Refutations and Critiques" Track (arxiv.org)
2 weeks ago | limoce | arxiv.org | newest
1
SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines (supergpqa.github.io)
4 months ago | limoce | github.io | newest
4
SepLLM: Accelerate LLMs by Compressing One Segment into One Separator (sepllm.github.io)
4 months ago | limoce | github.io | frontpage
10
Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model (arxiv.org)
5 months ago | limoce | arxiv.org | frontpage
1
Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset (github.com/unakar)
5 months ago | limoce | github.com | newest
1
Libnginx: Nginx as a Shared Library (github.com/wkgcass)
5 months ago | limoce | github.com | newest
1
DeepSeek-VL2: Moe Vision-Language Models for Advanced Multimodal Understanding [pdf] (github.com/deepseek-ai)
7 months ago | limoce | github.com | newest
1
Fast vectorizable algorithms of binary searching for floating point numbers (github.com/fabiocannizzo)
8 months ago | limoce | github.com | newest
26
New OpenAI Feature: Predicted Outputs (simonwillison.net)
8 months ago | limoce | simonwillison.net | frontpage
2
Collaborative Filtering Is Wrong and Here Is Why (springer.com)
9 months ago | limoce | springer.com | newest
1
REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training (sites.google.com)
9 months ago | limoce | google.com | newest
1
Smoke 'em if you got 'em: Hacker gains root access using cigarette lighter (tomshardware.com)
9 months ago | limoce | tomshardware.com | newest
1
O1 Replication Journey: A Strategic Progress Report (github.com/gair-nlp)
9 months ago | limoce | github.com | newest
1
Failures of Gradient-Based Deep Learning (2017) [pdf] (mlr.press)
11 months ago | limoce | mlr.press | newest
1
Qwen2-VL (huggingface.co)
11 months ago | limoce | huggingface.co | newest
64
Qwen2-Math (qwenlm.github.io)
11 months ago | limoce | github.io | best
56
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention (pytorch.org)
11 months ago | limoce | pytorch.org | best
1
MiniCPM-v2.6: GPT-4V Level MLLM for Single/Multi Image and Video on Your Phone (github.com/openbmb)
11 months ago | limoce | github.com | newest
1
MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT (github.com/internlm)
11 months ago | limoce | github.com | newest
1
Header-only and powerful C++20 static reflection library in 99 lines (github.com/ubpa)
a year ago | limoce | github.com | newest
2
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters (arxiv.org)
a year ago | limoce | arxiv.org | frontpage
1
PowerInfer-2: Fast Large Language Model Inference on a Smartphone (arxiv.org)
a year ago | limoce | arxiv.org | newest
2
Large-scale photonic chiplet Taichi empowers 160TOPS/W AI (science.org)
a year ago | limoce | science.org | newest
1
Asterinas: OS kernel written in Rust and providing Linux-compatible ABI (github.com/asterinas)
a year ago | limoce | github.com | newest
1
Mq-deadline scalability improvements (with more than 100% improvement) (kernel.org)
a year ago | limoce | kernel.org | newest
2
Researchers Create First Functional Semiconductor Made from Graphene (technologynetworks.com)
a year ago | limoce | technologynetworks.com | newest
1
Wayland Enjoyed Many Successes in 2023 (phoronix.com)
a year ago | limoce | phoronix.com | newest
15
Improving our safety with a physical quantities and units library (open-std.org)
a year ago | limoce | open-std.org | best
1
Nesting chinstrap penguins sleep by seconds-long microsleeps (science.org)
a year ago | limoce | science.org | newest
2
PowerInfer: High-Speed Large Language Model Serving on Consumer-Grade GPUs (github.com/sjtu-ipads)
a year ago | limoce | github.com | newest
1
Zpoline: System Call Hook for Linux (github.com/yasukata)
a year ago | limoce | github.com | newest
1
Randomized Single-Source Shortest Path Algo. On Undirected Real-Weighted Graphs (arxiv.org)
2 years ago | limoce | arxiv.org | newest
1
ChatGPT powered Rust proc macro that generates code at compile-time (github.com/retrage)
2 years ago | limoce | github.com | newest
3
Memcpy is faster than memset on Intel i7 12700 with glibc 2.36 (gist.github.com)
2 years ago | limoce | github.com | frontpage
1
Year-in-search-trends: Visualization of search interest over time (github.com/joweich)
2 years ago | limoce | github.com | newest
1
Make call_rcu() lazy to save power (kernel.org)
2 years ago | limoce | kernel.org | newest
2
IOMMUFD (kernel.org)
2 years ago | limoce | kernel.org | newest
4
Chinese Chipmaker Loongson Readies 3A6000 to Tackle Zen 3 and Tiger Lake (wccftech.com)
2 years ago | limoce | wccftech.com | frontpage
3
Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays (arxiv.org)
2 years ago | limoce | arxiv.org | frontpage
1
Slirp Is Dead, Long Live Slirp: a New Approach to User-Mode Networking
2 years ago | limoce | sched.com | newest
1
eRPC: A fast remote procedure call library for datacenters
2 years ago | limoce | erpc.io | newest
79
Poly-time algorithm for deciding Hilbert Nullstellensatz. A proof of P=NP
2 years ago | limoce | arxiv.org | frontpage
2
How hard is it to open a file (2016)
2 years ago | limoce | criu.org | newest
3
Kernel TLS 1.3 Rx improvements in Linux 5.20
2 years ago | limoce | kernel.org | newest
2
Resin: Holistic Service for Dealing with Memory Leaks in Production Cloud Infra
2 years ago | limoce | microsoft.com | newest
2
YMTC launched 232-layer flash memory chip: performance improved by 50%
2 years ago | limoce | inf.news | frontpage