Articles by robertnishihara
27

vLLM large scale serving: DeepSeek 2.2k tok/s/h200 with wide-ep (vllm.ai)

2

Massively Parallel Agentic Simulations with Ray (anyscale.com)

1

Deploy DeepSeek‑R1 with VLLM and Ray Serve on Kubernetes (anyscale.com)

1

An Open Source Stack for AI Compute: Kubernetes and Ray and PyTorch and VLLM (anyscale.com)

1

Native LLM APIs in Ray Data and Ray Serve (anyscale.com)

2

Joins and Hash-Shuffle in Ray Data (anyscale.com)

2

AsyncFlow: An Asynchronous Streaming RL Framework for LLM Post-Training (arxiv.org)

1

Open Source RL Libraries for LLMs (anyscale.com)

1

Large-Scale Deployment of Ray in Tencent's Weixin AI Infrastructure (anyscale.com)

17

Uv and Ray: Pain-Free Python Dependencies in Clusters (anyscale.com)

1

Roll: Reinforcement Learning Optimization for Large-Scale Learning (github.com/alibaba)

1

An Open Source Stack for AI Compute: Kubernetes and Ray and PyTorch and VLLM (anyscale.com)

1

Uv and Ray: Pain-Free Python Dependencies in Clusters (anyscale.com)

1

Ray Batch Inference at Pinterest (Part 3) (medium.com/pinterest-engineering)

1

Direct Preference Optimization with Synthetic Data on Anyscale (anyscale.com)

1

Building an LLM Router for High-Quality and Cost-Effective Responses (anyscale.com)

1

Ray Infrastructure at Pinterest (medium.com/pinterest-engineering)

2

Lessons from training a Stable Diffusion model on 2B images (anyscale.com)

1

Canva Built a Modern AI Platform Using Anyscale (anyscale.com)

2

Building RAG-Based LLM Applications for Production (anyscale.com)

1

Fine-tuning LLMs for longer context and better RAG systems (anyscale.com)

2

Two-day hands-on RAG Bootcamp for developers (twitter.com/martin_casado)

1

RAG at Scale: 10x Cheaper Embedding Computations with Anyscale and Pinecone (anyscale.com)

1

Comparing LLM Performance: Introducing the Open Source Leaderboard for LLM APIs (anyscale.com)

2

LLMPerf Leaderboard (github.com/ray-project)

2

Anyscale Endpoints: JSON Mode and Function Calling Features (anyscale.com)

1

LLM summarization: A case study of human, Llama-2, & GPT-4 summarization quality (anyscale.com)

2

Reproducible Performance Metrics for LLM Inference (anyscale.com)

1

Building Rag-Based LLM Applications for Production (anyscale.com)

1

Anyscale Endpoints: LLM inference and fine-tuning (anyscale.com)

1

Anyscale Private Endpoints and Anyscale Endpoints Fine-Tuning (anyscale.com)

3

Loading Llama-2 70B 20x faster with Anyscale Endpoints (anyscale.com)

50

A Comprehensive Guide for Building Rag-Based LLM Applications (github.com/ray-project)

6

Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper (anyscale.com)

2

Nearly all LLMs will be multi-modal (twitter.com/robertnishihara)

4

ByteDance Scales Offline Inference with Multi-Modal LLMs to 200 TB Data (anyscale.com)

95

Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Custom Models (anyscale.com)

1

Distributed Machine Learning at Instacart (instacart.com)

2

ChatGPT got all the buzz, but beneath it is a $1B developer framework (businessinsider.com)

1

Training One Million Machine Learning Models in Record Time with Ray (anyscale.com)