11
3
Can LLMs play real-time games like supermario (other than Pokemon red)? (twitter.com/haoailab)
3
Sliding Tile Attention: A New Method That Speeds Up HunyuanVideo's Outputs by 3x (reddit.com)
13
Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)
8
More Efficient Chain-of-Thought Reasoning Through Certainty Probing (huggingface.co)
8
AI Space Escape: Playing Games While Evaluting LLM Reasonsing (lmgame.org)
3
Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io)
36
FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab)
1
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io)
128
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)
6
Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)
5
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding (lmsys.org)
1
Important and *MUST-KNOW* techniques for a 2023 LLM serving system (twitter.com/haozhangml)
6