12
1
Benchmark: A100 vs. H100 NVMe Random Read throughput during multi-GPU loading
1
Show HN: 50+ LLMs on 2 GPUs with 2-Second Swapping? We built AI-Native Runtime (github.com/inferx-net)
1
Show HN: InferX - AI Lambda-Like Inference Function as a Service
2
Show HN: We run 50+ LLMs on 2 GPUs using snapshot-based inference (inferx.net)
3
We're running 50 LLMs on 2 GPUs – no cold starts, no overprovisioning
1