Day-0 Model Support

When new open models drop, SGLang is ready on Day 0

DeepSeek V4

New

DeepSeek

1.6T/284B MoE model with 1M context, FP4+FP8 mixed precision, three reasoning modes

LLMMoE4/23/2026

Qwen3.6-27B

New

Qwen

Dense 27B multimodal model that delivers flagship-level agentic coding performance

MultimodalMoE4/22/2026

Kimi K2.6

New

Moonshot AI

Multimodal model with top-tier coding, long-horizon execution, and agent swarm capabilities

MultimodalMoE4/20/2026

Qwen3.6-35B-A3B

New

Qwen

Features a Gated Delta Networks combined with sparse Mixture-of-Experts architecture

LLMMoE4/16/2026

MiniMax M2.7

New

MiniMax

Agentic LLM with the unique ability to participate in its own training and evolution

LLMMoE4/12/2026

GLM-5.1

Z-AI

Flagship model designed for agentic engineering, and complex, long-horizon tasks

LLMMoE4/7/2026

Gemma 4

Google

Multimodal family with built-in thinking, 256K context, and native function calling

MultimodalMoE4/2/2026

Mistral Small 4

Mistral AI

Multimodal MoE model that combines reasoning, coding, and vision in a single model

Multimodal MoE3/16/2026

NVIDIA Nemotron3-Super

NVIDIA

Designed to deliver strong agentic, reasoning, and conversational capabilities

LLMMoE3/11/2026

FishAudio S2 Pro

Fish Audio

TTS model featuring fine-grained prosody and emotion control.

Omni3/10/2026

Qwen3.5 Medium Series

Qwen

Include Qwen3.5-Flash / Qwen3.5-35B-A3B / Qwen3.5-122B-A10B / Qwen3.5-27B

MultimodalMoE2/24/2026

Qwen3.5-397B-A17B

Qwen

A native vision-language model combining Gated DeltaNet (linear attention) with a sparse MoE

MultimodalMoE2/16/2026

Ling-2.5-1T

Ant Ling

New flagship model with 1T params (63B active), 29T pre-training corpus & 1M context

LLMMoE2/15/2026

MiniMax-M2.5

MiniMax

Open-source frontier model designed for real-world productivity

LLMMoE2/13/2026

Ring-1T-2.5

Ant Ling

First hybrid linear-architecture 1T thinking model designed for long-horizon and agentic tasks

LLMMoE2/12/2026

GLM-5

Z-AI

744B MoE model designed for complex, long-horizon agentic tasks

LLMMoE2/11/2026

LLaDA 2.1

Ant Open Source

100B discrete diffusion LLM with Token-to-Token editing

DiffusiondLLM2/10/2026

GLM-OCR

Z-AI

Multimodal OCR model for complex documents, built on the GLM-V encoder–decoder architecture

Multimodal2/3/2026

Step 3.5 Flash

StepFun

Sparse MoE model purpose-built to power autonomous agents at scale

LLMMoE2/2/2026

Qwen3-Coder-Next

Qwen

Open-weight LM built for coding agents & local development

LLMCoding2/2/2026

Deepseek-OCR-2

DeepSeek

State-of-the-art OCR model with multimodal document understanding

VisionVL1/30/2026

Nemotron Nano 3 NVFP4

NVIDIA

Highly efficient hybrid MoE model with 1M context window and thinking budget

LLMHybrid MoE1/28/2026

MOVA

OpenMoss

32B MoE model, designed for simultaneous, high-fidelity video and audio generation

Diffusionvideo-audio generation1/28/2026

Kimi K2.5

Moonshot AI

Powerful MoE model with advanced reasoning and tool-use capabilities

VLMMoE1/27/2026

Z-Image

Tongyi Lab

Lightweight model designed for high-speed, high-quality image generation and editing

Diffusiontext-to-image1/27/2026

STEP3-VL-10B

StepFun

Efficient vision-language model with strong visual understanding

VisionVL1/23/2026

FlashLab Chroma

FlashLabs

The first open-source, end-to-end, real-time speech-to-speech model

Multimodal1/21/2026

GLM-4.7-Flash

Z-AI

High-performance 30B-A3B MoE model optimized for fast, local, and agentic coding tasks

LLMMoE1/19/2026

DeepSeek-V3.2

DeepSeek

Long-context LLM combines sparse attention, scaled RL, and large-scale agentic training

LLMAgentic12/1/2025

DeepSeek-V3.2-Speciale

DeepSeek

Long-context model that integrates sparse attention, scaled RL, and agentic training.

LLM12/1/2025

DeepSeek-Math-V2

DeepSeek

Math reasoning model that achieves gold-level performance on benchmarks.

LLMReasoning11/27/2025

Qwen3-VL-30B-A3B-Thinking

Qwen

Powerful multimodal model that integrates advanced vision, language, and reasoning

VLMMultimodal11/26/2025

Qwen3-VL-30B-A3B-Instruct

Qwen

Powerful multimodal model that integrates advanced vision, language, and reasoning

VLMMultimodal11/26/2025

Wan2.2-T2V-A14B

Wan-AI

Text-to-Video MoE model, supports 480P & 720P with cinematic-level aesthetics

Text-to-VideoDiffusion8/7/2025

Wan2.2-I2V-A14B

Wan-AI

Image-to-Video MoE model, supports 480P & 720P with complex motion generation

Image-to-VideoDiffusion8/7/2025

Wan2.2-TI2V-5B

Wan-AI

High-compression VAE, T2V+I2V, supports 720P with efficient high-definition hybrid TI2V

Text-to-VideoDiffusion8/7/2025

DeepSeek-V3

DeepSeek

671B MoE (37B active) language model that combines MLA and DeepSeekMoE architectures

LLMMoE3/26/2025

DeepSeek-R1

DeepSeek

First-generation RL-trained reasoning model setting new open-source benchmarks.

LLM3/26/2025

See full list in Cookbook →