PaddlePaddle

PaddleFormers

AI#PaddlePaddle #LLM #Deep Learning#Transformers#Distributed Training

12,991

// summary

PaddleFormers is a Transformers library built on the Baidu PaddlePaddle framework, designed to provide training interfaces and functional experiences for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on mainstream models. Furthermore, it fully supports domestic computing chips and is compatible with the Safetensors format, helping developers efficiently complete the entire process from pre-training to post-training.

// technical analysis

PaddleFormers is a Transformers library built on the Baidu PaddlePaddle deep learning framework, designed to provide the PaddlePaddle ecosystem with model interfaces and functional experiences equivalent to Hugging Face Transformers. By integrating high-performance distributed training strategies such as tensor parallelism, pipeline parallelism, expert parallelism, and automatic mixed precision, the project significantly improves training efficiency for Large Language Models (LLM) and Vision-Language Models (VLM), with performance on some key models even surpassing Megatron-LM. Its core design philosophy is to abstract away complex underlying optimization details, providing developers with high-performance, low-resource training solutions, and it implements full support for the Safetensors format to ensure model interoperability across different frameworks.

// key highlights

Supports 100+ mainstream Large Language Models and Vision-Language Models, covering DeepSeek, Qwen, Llama, and the Ernie series models.

Built-in high-performance distributed training strategies, achieving excellent training performance through FP8 low-precision training, communication-computation overlapping, and memory-computation balancing optimizations.

Provides full-process support from pre-training to post-training (CPT/SFT/DPO) and is compatible with efficient fine-tuning techniques such as LoRA.

Comprehensive support for the Safetensors format, ensuring that trained model weights can be used directly in mainstream inference frameworks such as vLLM and FastDeploy.

Deeply adapted to domestic computing chips, including platforms such as Kunlunxin P800, Iluvatar CoreX Tiangai 150, and MetaX C550.

Supports training for cutting-edge model capabilities like Function Call and Thinking, and utilizes Data Packing and Padding Free technologies to improve data processing efficiency.

// use cases

Supports the full-process training of 100+ mainstream Large Language Models and Vision-Language Models

Provides various efficient fine-tuning and alignment capabilities including CPT, SFT, and DPO

Deeply adapts to domestic computing platforms such as Kunlunxin, Iluvatar CoreX, and MetaX

// getting started

Developers can install PaddleFormers via Docker containers or pip; using a virtual environment (conda/venv/uv) is recommended to avoid dependency conflicts. After installation, you can refer to the API examples provided by the project to quickly load models using AutoTokenizer and AutoModelForCausalLM, or use the paddleformers-cli tool to execute model training tasks.