jamiepine

voicebox

AI#TTS#Voice Cloning#Rust #Tauri #Python

// summary

Voicebox is a comprehensive, local-first voice synthesis studio that allows users to clone voices and generate speech using seven different TTS engines. The platform features a multi-track timeline editor for creating complex narratives and supports advanced post-processing effects to refine audio output. Designed for privacy and performance, it runs natively on major operating systems while providing a robust REST API for developer integrations.

// technical analysis

Voicebox is a local-first voice synthesis studio designed as an open-source, privacy-focused alternative to cloud-based services like ElevenLabs. It utilizes a modular architecture that integrates seven distinct TTS engines and Spotify's pedalboard library for post-processing, allowing users to perform complex audio tasks entirely on their own hardware. By leveraging Tauri for the desktop shell and a FastAPI backend, the project balances native performance with a rich, multi-track editing experience, effectively solving the trade-off between high-quality voice cloning and data sovereignty.

// key highlights

Provides complete privacy by ensuring all voice models and user data remain locally on the machine.

Supports seven diverse TTS engines, including Qwen3-TTS and Kokoro, to offer a wide range of voice cloning and synthesis capabilities.

Features a multi-track stories editor that enables users to compose complex audio narratives, podcasts, and conversations.

Includes a comprehensive suite of eight post-processing audio effects, such as pitch shifting, reverb, and compression, to refine generated speech.

Implements an API-first design, exposing a REST interface that allows developers to integrate voice synthesis into external applications.

Supports cross-platform hardware acceleration, including Apple Silicon MLX, NVIDIA CUDA, and AMD ROCm, for optimized inference performance.

// use cases

High-quality voice cloning and speech generation in 23 languages

Multi-track audio composition for podcasts and narratives

Integration of voice synthesis into external applications via REST API

// getting started

To begin using Voicebox, download the appropriate installer for your operating system from the releases page or use Docker. Once installed, you can launch the application to start cloning voices, generating speech, or using the stories editor. Developers interested in contributing or running from source should clone the repository and use the 'just' command runner to set up the environment and launch the development build.