Day-0 Model Support
When new open models drop, SGLang is ready on Day 0
DeepSeek V4
NewDeepSeek
1.6T/284B MoE model with 1M context, FP4+FP8 mixed precision, three reasoning modes

Qwen3.6-27B
NewQwen
Dense 27B multimodal model that delivers flagship-level agentic coding performance

Kimi K2.6
NewMoonshot AI
Multimodal model with top-tier coding, long-horizon execution, and agent swarm capabilities

Qwen3.6-35B-A3B
NewQwen
Features a Gated Delta Networks combined with sparse Mixture-of-Experts architecture

MiniMax M2.7
NewMiniMax
Agentic LLM with the unique ability to participate in its own training and evolution
GLM-5.1
Z-AI
Flagship model designed for agentic engineering, and complex, long-horizon tasks

Gemma 4
Multimodal family with built-in thinking, 256K context, and native function calling

Mistral Small 4
Mistral AI
Multimodal MoE model that combines reasoning, coding, and vision in a single model

NVIDIA Nemotron3-Super
NVIDIA
Designed to deliver strong agentic, reasoning, and conversational capabilities

FishAudio S2 Pro
Fish Audio
TTS model featuring fine-grained prosody and emotion control.

Qwen3.5 Medium Series
Qwen
Include Qwen3.5-Flash / Qwen3.5-35B-A3B / Qwen3.5-122B-A10B / Qwen3.5-27B

Qwen3.5-397B-A17B
Qwen
A native vision-language model combining Gated DeltaNet (linear attention) with a sparse MoE

Ling-2.5-1T
Ant Ling
New flagship model with 1T params (63B active), 29T pre-training corpus & 1M context

Ring-1T-2.5
Ant Ling
First hybrid linear-architecture 1T thinking model designed for long-horizon and agentic tasks

LLaDA 2.1
Ant Open Source
100B discrete diffusion LLM with Token-to-Token editing
GLM-OCR
Z-AI
Multimodal OCR model for complex documents, built on the GLM-V encoder–decoder architecture

Step 3.5 Flash
StepFun
Sparse MoE model purpose-built to power autonomous agents at scale
Deepseek-OCR-2
DeepSeek
State-of-the-art OCR model with multimodal document understanding

Nemotron Nano 3 NVFP4
NVIDIA
Highly efficient hybrid MoE model with 1M context window and thinking budget

MOVA
OpenMoss
32B MoE model, designed for simultaneous, high-fidelity video and audio generation

Kimi K2.5
Moonshot AI
Powerful MoE model with advanced reasoning and tool-use capabilities

Z-Image
Tongyi Lab
Lightweight model designed for high-speed, high-quality image generation and editing

STEP3-VL-10B
StepFun
Efficient vision-language model with strong visual understanding

FlashLab Chroma
FlashLabs
The first open-source, end-to-end, real-time speech-to-speech model
GLM-4.7-Flash
Z-AI
High-performance 30B-A3B MoE model optimized for fast, local, and agentic coding tasks
DeepSeek-V3.2
DeepSeek
Long-context LLM combines sparse attention, scaled RL, and large-scale agentic training
DeepSeek-V3.2-Speciale
DeepSeek
Long-context model that integrates sparse attention, scaled RL, and agentic training.
DeepSeek-Math-V2
DeepSeek
Math reasoning model that achieves gold-level performance on benchmarks.

Qwen3-VL-30B-A3B-Thinking
Qwen
Powerful multimodal model that integrates advanced vision, language, and reasoning

Qwen3-VL-30B-A3B-Instruct
Qwen
Powerful multimodal model that integrates advanced vision, language, and reasoning

Wan2.2-T2V-A14B
Wan-AI
Text-to-Video MoE model, supports 480P & 720P with cinematic-level aesthetics

Wan2.2-I2V-A14B
Wan-AI
Image-to-Video MoE model, supports 480P & 720P with complex motion generation

Wan2.2-TI2V-5B
Wan-AI
High-compression VAE, T2V+I2V, supports 720P with efficient high-definition hybrid TI2V
DeepSeek-V3
DeepSeek
671B MoE (37B active) language model that combines MLA and DeepSeekMoE architectures
DeepSeek-R1
DeepSeek
First-generation RL-trained reasoning model setting new open-source benchmarks.
