Chatterbox TTS

Discover Chatterbox TTS, the leading zero-shot voice cloning model. Generate expressive speech for AI agents, video voiceovers, and more. Try it now!

Platform: Replicate

Zero-shot TTSVoice CloningEmotion ControlSpeech Synthesis

672 runs

License Check Required

🚀Function Overview

A state-of-the-art zero-shot text-to-speech model that generates expressive speech by cloning voices from reference audio while allowing emotional intensity and speech pattern adjustments.

Key Features

Voice cloning from audio references
Emotional exaggeration control
Text fidelity adjustment (cfg_weight)
Speech randomness control (temperature)
Alignment-informed inference for stability
Output watermarking
Pre-trained on 500,000 hours of cleaned data

Use Cases

•Voiceovers for videos/memes
•Game character dialogue
•AI agent speech generation
•Dramatic speech production
•Voice conversion applications

⚙️Input Parameters

text

string

Text to synthesize

audio_prompt_path

string

Reference audio file to clone

exaggeration

number

Controls how expressive or exaggerated the speech sounds; higher values increase emotional intensity

cfg_weight

number

Balances text fidelity and creativity; higher values make speech closer to the input text

temperature

number

Adjusts randomness in speech generation; higher values produce more varied and natural output

💡Usage Examples

Example 1

Input Parameters

{
  "text": "Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level.",
  "cfg_weight": 0.3,
  "temperature": 0.8,
  "exaggeration": 0.5,
  "audio_prompt_path": "https://maskgct.github.io/audios/celeb_samples/rick_0.wav"
}

Output Results

https://replicate.delivery/czjl/uIFQnxe7zE34e0JwyGe7CVvvZY5OrmDo6x9f0mwCvF7DcaHTB/output.wav

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: T4
Run Count: 672
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Zero-shot voice cloningEmotional speech synthesisAI agent speech generationVideo voiceoversGame character dialogueDramatic speech production

Related Models

NVIDIA PDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

Piper Persian Text-to-Speech

A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.

Minimax Voice Cloning

Clone voices to use with Minimax's speech-02-hd and speech-02-turbo