High-Performance Serving Framework for LLMs and Multimodal Models
SGLang powers fast, scalable inference for large language and multimodal models.

Trusted by industry leaders:
Production-Grade Inference
Built for large-scale deployments, delivering reliable, low-latency, high-throughput serving from a single GPU to distributed clusters.
Model & Hardware Flexibility
Supports a wide range of open models — from LLMs to diffusion models — and runs across diverse hardware platforms.
Advanced Optimizations
Incorporates disaggregated prefill/decode, speculative decoding, parallelisms, a zero-overhead scheduler, and optimized GPU kernels.
Get Started in Seconds
Select your preferences and run the deployment command. SGLang is designed to be easy to install and deploy.
Install via pip or docker
The easiest ways to get started.
Launch the server
Start the server with a single command pointing to your model.
Query the API
Use standard OpenAI-compatible endpoints to interact with your model.
uv pip install sglang sglang-kernel \
--extra-index-url https://sgl-project.github.io/whl/cu129/ \
--extra-index-url https://download.pytorch.org/whl/cu129 \
--index-strategy unsafe-best-matchBroad Model & Hardware Support
A single engine that runs across various models and hardware.
Join the Community
From first-time users to teams debugging complex deployments, the community is open to everyone.






