// summary
Slime is a specialized post-training framework designed to scale reinforcement learning for large language models. It integrates Megatron-LM for high-performance training with SGLang to provide flexible, efficient data generation workflows. The architecture decouples training and rollout processes, enabling researchers to build and deploy complex agentic RL systems.
// technical analysis
slime is an SGLang-native post-training framework designed to scale reinforcement learning for large language models by decoupling training and rollout processes. Its architecture integrates Megatron-LM for high-performance model training with SGLang for efficient data generation, connected via a centralized data buffer. This design addresses the bottleneck of RL scaling by allowing asynchronous workflows and flexible data generation, enabling researchers to train complex models like GLM-5 and DeepSeek V3 with improved throughput and modularity.
// key highlights
// use cases
// getting started
To begin using slime, developers should consult the official Quick Start Guide located in the documentation folder, which covers environment setup, data preparation, and training initialization. Users can explore the provided examples directory to understand specific use cases and refer to the usage documentation for detailed command-line argument configurations.