THUDM

slime

AI#LLM#Reinforcement Learning#Megatron-LM#SGLang#Post-training

5,543

// summary

Slime is a specialized post-training framework designed to scale reinforcement learning for large language models. It integrates Megatron-LM for high-performance training with SGLang to provide flexible, efficient data generation workflows. The architecture decouples training and rollout processes, enabling researchers to build and deploy complex agentic RL systems.

// technical analysis

slime is an SGLang-native post-training framework designed to scale reinforcement learning for large language models by decoupling training and rollout processes. Its architecture integrates Megatron-LM for high-performance model training with SGLang for efficient data generation, connected via a centralized data buffer. This design addresses the bottleneck of RL scaling by allowing asynchronous workflows and flexible data generation, enabling researchers to train complex models like GLM-5 and DeepSeek V3 with improved throughput and modularity.

// key highlights

Provides high-performance training by integrating Megatron-LM with SGLang for optimized GPU utilization.

Features a flexible data generation engine that supports custom interfaces for diverse RL workflows.

Supports a wide range of state-of-the-art models including the GLM series, Qwen, DeepSeek V3, and Llama 3.

Utilizes a decoupled architecture with a dedicated data buffer to bridge training and rollout modules effectively.

Enables advanced RL techniques such as asynchronous training, multi-turn rollouts, and verifiable environment integration.

Offers a comprehensive argument system that allows granular control over Megatron, SGLang, and framework-specific configurations.

// use cases

High-performance RL training for large language models

Flexible and asynchronous data generation workflows

Development of agentic RL systems and verifiable environments

// getting started

To begin using slime, developers should consult the official Quick Start Guide located in the documentation folder, which covers environment setup, data preparation, and training initialization. Users can explore the provided examples directory to understand specific use cases and refer to the usage documentation for detailed command-line argument configurations.