HubLensDeep Learninggoogle/magika
// archived 2026-04-18
google

magika

AI#Deep Learning#File Identification#Security#Rust#Python
View on GitHub
108

// summary

Magika is an AI-powered tool that utilizes deep learning to provide highly accurate file type identification for over 200 content types. It features a highly optimized model that delivers inference results in milliseconds while maintaining approximately 99% accuracy. The project offers a versatile command-line interface and language bindings for Python, JavaScript, and Rust to support diverse developer workflows.

// technical analysis

Magika is an AI-powered file type identification tool that leverages a custom, highly optimized deep learning model to provide high-precision file classification. By training on a massive dataset of 100 million samples across 200+ content types, it solves the challenge of accurate file detection for security and content policy routing at scale. The project prioritizes performance, achieving sub-millisecond inference times on a single CPU by analyzing only a limited subset of file content, making it suitable for high-throughput environments like Gmail and Google Drive.

// key highlights

01
Achieves approximately 99% average precision and recall across 200+ file types, significantly outperforming traditional detection methods.
02
Delivers near-constant inference speeds of about 5ms per file, regardless of the total file size, by utilizing a limited content subset.
03
Provides flexible prediction modes including high-confidence, medium-confidence, and best-guess to allow users to manage error tolerance.
04
Implements a per-content-type threshold system that intelligently decides whether to trust the model's prediction or return a generic label.
05
Offers multi-language support through a Rust-based CLI, Python API, and experimental JavaScript/TypeScript bindings for diverse integration needs.
06
Supports recursive directory scanning and batch processing, enabling efficient analysis of thousands of files simultaneously.

// use cases

01
High-speed, accurate file type identification for security and content policy scanning
02
Recursive directory scanning and batch file analysis via command-line interface
03
Integration into applications via Python, JavaScript, or Rust language bindings

// getting started

Developers can install the command-line tool via pipx, Homebrew, or the provided installer scripts, or integrate the library directly using 'pip install magika' for Python or 'npm install magika' for JavaScript. Once installed, users can identify file types by passing file paths to the 'magika' command or by importing the Magika class in their code to process bytes, streams, or paths.