zhisbug - Hacker News

11

Create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU (fastvideo.org)

2 hours ago zhisbug fastvideo.org

3

Can LLMs play real-time games like supermario (other than Pokemon red)? (twitter.com/haoailab)

a year ago zhisbug twitter.com

3

Sliding Tile Attention: A New Method That Speeds Up HunyuanVideo's Outputs by 3x (reddit.com)

a year ago zhisbug reddit.com

13

Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)

a year ago zhisbug github.io

8

More Efficient Chain-of-Thought Reasoning Through Certainty Probing (huggingface.co)

a year ago zhisbug huggingface.co

8

AI Space Escape: Playing Games While Evaluting LLM Reasonsing (lmgame.org)

a year ago zhisbug lmgame.org

3

Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io)

a year ago zhisbug github.io

36

FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab)

a year ago zhisbug github.com

1

MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io)

a year ago zhisbug github.io

128

Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)

a year ago zhisbug github.io

6

Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)

a year ago zhisbug github.io

5

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding (lmsys.org)

2 years ago zhisbug lmsys.org

1

Important and MUST-KNOW techniques for a 2023 LLM serving system (twitter.com/haozhangml)

2 years ago zhisbug twitter.com

6

Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg)

2 years ago zhisbug twitter.com

Create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU (fastvideo.org)

Can LLMs play real-time games like supermario (other than Pokemon red)? (twitter.com/haoailab)

Sliding Tile Attention: A New Method That Speeds Up HunyuanVideo's Outputs by 3x (reddit.com)

Fast Video Generation with Sliding Tile Attention (hao-ai-lab.github.io)

More Efficient Chain-of-Thought Reasoning Through Certainty Probing (huggingface.co)

AI Space Escape: Playing Games While Evaluting LLM Reasonsing (lmgame.org)

Efficient LLM Scheduling by Learning to Rank (hao-ai-lab.github.io)

FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab)

MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving (hao-ai-lab.github.io)

Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)

Throughput Is Not All You Need: Maxing Goodput in LLM Serving via Disaggregation (hao-ai-lab.github.io)

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding (lmsys.org)

Important and *MUST-KNOW* techniques for a 2023 LLM serving system (twitter.com/haozhangml)

Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg)

Important and MUST-KNOW techniques for a 2023 LLM serving system (twitter.com/haozhangml)