HubLensTopicsDeep Learning
// topic

Deep Learning

25 trending in last 90 days ·25 all-time

// new this month

// this week's top 10

01
PaddlePaddle / PaddleOCR
PaddleOCR is a comprehensive toolkit designed to convert images and PDF documents into structured, LLM-ready data formats like Markdown and JSON. It features state-of-the-art vision-language models and high-performance text recognition engines that support over 100 languages. The platform is widely integrated into major AI agent and RAG frameworks, offering efficient deployment options across various hardware backends.
9175,510
02
deepseek-ai / DeepGEMM
DeepGEMM is a lightweight CUDA library designed for efficient General Matrix Multiplications, supporting FP8 and BF16 data formats. It utilizes a just-in-time compilation module to eliminate the need for pre-installation kernel compilation while maintaining performance comparable to expert-tuned libraries. The library provides specialized APIs for dense and MoE-grouped GEMMs, making it a clean resource for learning GPU kernel optimization.
886,348
03
deepseek-ai / DeepEP
DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It supports both training and inference workloads by providing advanced features like asymmetric-domain bandwidth forwarding and hook-based communication-computation overlapping. The library is highly optimized for NVLink and RDMA environments, offering significant performance gains for large-scale model deployments.
889,125
04
bilibili / Index-anisora
Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system integrates a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including image-to-video generation, style transfer, character 3D modeling, and multimodal guidance.
882,410
05
google / magika
Magika is an AI-powered tool that utilizes deep learning to provide highly accurate file type identification for both binary and textual formats. It features a highly optimized model capable of performing inference in milliseconds on a single CPU with approximately 99% accuracy. The project offers a versatile command-line interface and language bindings, making it suitable for large-scale security and content policy scanning.
7895
06
OpenBMB / VoxCPM
VoxCPM2 is a tokenizer-free, 2B parameter text-to-speech system that utilizes a diffusion autoregressive architecture for highly natural audio synthesis. The model supports 30 languages and offers advanced capabilities including voice design from text descriptions and controllable voice cloning. It delivers studio-quality 48kHz audio output and is fully open-source under the Apache-2.0 license for commercial use.
7854
07
baidu / ERNIE-Image
ERNIE-Image is an open-source text-to-image model developed by Baidu based on a single-stream Diffusion Transformer architecture. The model is equipped with a lightweight prompt enhancer that can transform short inputs into structurally rich descriptions. With an 8B parameter scale, it excels at handling complex instructions, text rendering, and structured visual tasks, while supporting efficient deployment on consumer-grade GPUs.
78201
08
alibaba / rtp-llm
RTP-LLM is a high-performance large language model inference acceleration engine developed by the Alibaba Foundation Model Inference Team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, and it supports multiple mainstream model formats and hardware backends. By integrating advanced operator optimization, quantization techniques, and distributed inference capabilities, it provides developers with efficient production-grade inference solutions.
781,088
09
PaddlePaddle / PaddleFormers
PaddleFormers is a Transformers library built on the PaddlePaddle framework, designed to provide training interfaces for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on key models. Furthermore, it fully supports the Safetensors format and is deeply adapted to various domestic computing chips, helping developers efficiently complete the full model training process.
7812,987
10
PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features such as unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate industrial AI commercialization across diverse sectors through its mature, heterogeneous hardware-compatible architecture.
7823,827

// all-time featured (25)

PaddlePaddle / PaddleOCR
PaddleOCR is a comprehensive toolkit designed to convert images and PDF documents into structured, LLM-ready data formats like Markdown and JSON. It features state-of-the-art vision-language models and high-performance text recognition engines that support over 100 languages. The platform is widely integrated into major AI agent and RAG frameworks, offering efficient deployment options across various hardware backends.
91
deepseek-ai / DeepGEMM
DeepGEMM is a lightweight CUDA library designed for efficient General Matrix Multiplications, supporting FP8 and BF16 data formats. It utilizes a just-in-time compilation module to eliminate the need for pre-installation kernel compilation while maintaining performance comparable to expert-tuned libraries. The library provides specialized APIs for dense and MoE-grouped GEMMs, making it a clean resource for learning GPU kernel optimization.
88
deepseek-ai / DeepEP
DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It supports both training and inference workloads by providing advanced features like asymmetric-domain bandwidth forwarding and hook-based communication-computation overlapping. The library is highly optimized for NVLink and RDMA environments, offering significant performance gains for large-scale model deployments.
88
bilibili / Index-anisora
Index-AniSora is a powerful open-source framework designed specifically for high-quality anime video generation and animation production. The system integrates a comprehensive data processing pipeline, a controllable generation model with spatiotemporal masking, and a specialized evaluation benchmark. It supports diverse creative tasks including image-to-video generation, style transfer, character 3D modeling, and multimodal guidance.
88
Tencent / ncnn
ncnn is a high-performance neural network forward computation framework specifically optimized for mobile platforms, designed to simplify the deployment of deep learning algorithms on mobile devices. The framework has no third-party dependencies and features cross-platform capabilities, with execution speeds on mobile CPUs that outperform all currently known open-source frameworks. Currently, ncnn is widely used in various mainstream applications under Tencent, helping developers easily build intelligent applications.
86
deepseek-ai / DeepEP
DeepEP is a specialized communication library designed to optimize Mixture-of-Experts and expert parallelism through high-throughput, low-latency GPU kernels. It provides advanced features such as asymmetric-domain bandwidth forwarding and low-precision support to enhance both training and inference performance. The library also includes hook-based mechanisms for communication-computation overlapping to maximize hardware efficiency without occupying additional streaming multiprocessor resources.
82
google / magika
Magika is an AI-powered tool that utilizes deep learning to provide highly accurate file type identification for both binary and textual formats. It features a highly optimized model capable of performing inference in milliseconds on a single CPU with approximately 99% accuracy. The project offers a versatile command-line interface and language bindings, making it suitable for large-scale security and content policy scanning.
78
OpenBMB / VoxCPM
VoxCPM2 is a tokenizer-free, 2B parameter text-to-speech system that utilizes a diffusion autoregressive architecture for highly natural audio synthesis. The model supports 30 languages and offers advanced capabilities including voice design from text descriptions and controllable voice cloning. It delivers studio-quality 48kHz audio output and is fully open-source under the Apache-2.0 license for commercial use.
78
baidu / ERNIE-Image
ERNIE-Image is an open-source text-to-image model developed by Baidu based on a single-stream Diffusion Transformer architecture. The model is equipped with a lightweight prompt enhancer that can transform short inputs into structurally rich descriptions. With an 8B parameter scale, it excels at handling complex instructions, text rendering, and structured visual tasks, while supporting efficient deployment on consumer-grade GPUs.
78
alibaba / rtp-llm
RTP-LLM is a high-performance large language model inference acceleration engine developed by the Alibaba Foundation Model Inference Team. This engine has been widely applied in various Alibaba business scenarios such as Taobao and Tmall, and it supports multiple mainstream model formats and hardware backends. By integrating advanced operator optimization, quantization techniques, and distributed inference capabilities, it provides developers with efficient production-grade inference solutions.
78
PaddlePaddle / PaddleFormers
PaddleFormers is a Transformers library built on the PaddlePaddle framework, designed to provide training interfaces for Large Language Models and Vision-Language Models equivalent to Hugging Face. By integrating tensor parallelism, pipeline parallelism, and automatic mixed precision, the project achieves training performance that surpasses Megatron-LM on key models. Furthermore, it fully supports the Safetensors format and is deeply adapted to various domestic computing chips, helping developers efficiently complete the full model training process.
78
PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides core frameworks, model libraries, and end-to-end development tools. It supports advanced features such as unified dynamic and static graphs, automatic parallelism, and high-order differentiation for scientific computing. The platform is designed to facilitate industrial AI commercialization across diverse sectors through its mature, heterogeneous hardware-compatible architecture.
78
PaddlePaddle / Paddle
PaddlePaddle is a comprehensive industrial deep learning platform that provides a complete ecosystem of frameworks, model libraries, and development tools. It supports advanced capabilities such as automatic parallelism, unified training and inference, and high-order differentiation for scientific computing. The platform is designed to facilitate AI commercialization across various sectors by offering a flexible, high-performance architecture for diverse model development.
78
alibaba / ROLL
ROLL is an efficient, user-friendly reinforcement learning library specifically designed for training and scaling Large Language Models on large-scale GPU clusters. It utilizes a multi-role distributed architecture powered by Ray to support complex tasks like human preference alignment, reasoning, and agentic interactions. The framework integrates advanced technologies such as Megatron-Core, vLLM, and SGLang to accelerate model training and inference across diverse hardware environments.
72
bytedance / Protenix
Protenix is an open-source framework designed for high-accuracy biomolecular structure prediction, offering models that perform competitively with state-of-the-art methods. The project provides multiple versions, including the enhanced Protenix-v2, which demonstrates significant improvements in antibody-antigen structure prediction and ligand-related plausibility. It is released under the Apache 2.0 license, making it freely accessible for both academic and commercial research applications.
68
bilibili / Index-anisora
Index-AniSora is a comprehensive open-source system developed by Bilibili for high-quality anime video generation. The project provides a controllable generation model, a specialized data processing pipeline, and an evaluation benchmark tailored for animation aesthetics. It supports advanced features such as character 3D video generation, video style transfer, and multimodal guidance to facilitate diverse animation production tasks.
68
alibaba / rtp-llm
RTP-LLM is a high-performance large model inference acceleration engine developed by the Alibaba Foundation Model Inference Team, widely used in various business scenarios such as Taobao and Tmall. By integrating various advanced CUDA kernels and quantization techniques, the engine significantly improves model inference performance and efficiency. Furthermore, it possesses high flexibility, supporting multiple model formats, multimodal inputs, and LoRA service deployment.
68
PaddlePaddle / community
The PaddlePaddle community serves as a central hub for developers to contribute to the framework through code improvements, documentation, and presentations. It provides structured governance, specialized working groups, and various mentorship programs to support active participation. Contributors are recognized through official certifications, release notes, and inclusion in the project's authorship records.
58
baidu / vLLM-Kunlun
vLLM Kunlun is a community-maintained hardware plugin that enables the seamless execution of vLLM on Kunlun XPU devices. It functions as a hardware-pluggable interface, allowing users to run various large language and multimodal models without modifying the original vLLM source code. The project supports advanced features like quantization, LoRA fine-tuning, and hardware-accelerated graph optimization to ensure high-performance inference.
52
alibaba / TorchEasyRec
TorchEasyRec is a PyTorch-based framework designed for building production-ready deep learning models for recommendation tasks. It supports a wide range of algorithms including candidate generation, ranking, multi-task learning, and generative recommendation. The framework enables efficient development through simple configuration, high scalability, and seamless integration with various data sources and deployment environments.
48
PaddlePaddle / PaConvert
This tool is officially maintained by Paddle and aims to achieve efficient automated migration from PyTorch code to PaddlePaddle code. It supports one-click conversion of over 1,600 PyTorch APIs and 200 torchvision APIs, maintaining an average conversion rate of over 95% in tests. The conversion process is operated via the command line, preserves the style and structure of the original code, and provides detailed conversion logs and summaries.
48
PaddlePaddle / PaddleCustomDevice
PaddleCustomDevice is a custom hardware integration solution provided by the PaddlePaddle framework. This project aims to help developers integrate various third-party hardware backends into the PaddlePaddle ecosystem. It currently supports multiple mainstream hardware backends, including Ascend, Cambricon, Intel GPU, and Apple MPS.
42
PaddlePaddle / PaddleCustomDevice
PaddleCustomDevice is a custom hardware integration solution provided by the PaddlePaddle deep learning framework. This project aims to help developers efficiently integrate various third-party hardware backends into the PaddlePaddle ecosystem. Currently, it supports a variety of mainstream hardware platforms, including Ascend, Cambricon, Intel GPU, and Apple MPS.
42
microsoft / VibeVoice
VibeVoice is a collection of open-source voice AI models that utilize continuous speech tokenizers and a next-token diffusion framework to achieve high-fidelity audio processing. The project provides specialized models for long-form automatic speech recognition, real-time streaming text-to-speech, and multi-speaker synthesis. These models are designed for research purposes, offering capabilities like single-pass processing for hour-long audio and support for over 50 languages.
38
k2-fsa / OmniVoice
OmniVoice is an advanced large-scale multilingual zero-shot speech synthesis model based on a diffusion language model architecture, supporting over 600 languages. The model features exceptional inference speed and enables high-quality voice cloning and voice design capabilities. Users can easily perform speech generation via Python API or command-line tools, with support for fine-grained non-linguistic symbols and pronunciation control.
28

// related topics