G
GetLLMs

Chatterbox TTS

Discover Chatterbox TTS, the leading zero-shot voice cloning model. Generate expressive speech for AI agents, video voiceovers, and more. Try it now!

Platform: Replicate
Zero-shot TTSVoice CloningEmotion ControlSpeech Synthesis
672 runs
T4
License Check Required

🚀Function Overview

A state-of-the-art zero-shot text-to-speech model that generates expressive speech by cloning voices from reference audio while allowing emotional intensity and speech pattern adjustments.

Key Features

  • Voice cloning from audio references
  • Emotional exaggeration control
  • Text fidelity adjustment (cfg_weight)
  • Speech randomness control (temperature)
  • Alignment-informed inference for stability
  • Output watermarking
  • Pre-trained on 500,000 hours of cleaned data

Use Cases

  • Voiceovers for videos/memes
  • Game character dialogue
  • AI agent speech generation
  • Dramatic speech production
  • Voice conversion applications

⚙️Input Parameters

text

string

Text to synthesize

audio_prompt_path

string

Reference audio file to clone

exaggeration

number

Controls how expressive or exaggerated the speech sounds; higher values increase emotional intensity

cfg_weight

number

Balances text fidelity and creativity; higher values make speech closer to the input text

temperature

number

Adjusts randomness in speech generation; higher values produce more varied and natural output

💡Usage Examples

Example 1

Input Parameters

{
  "text": "Then I would never talk to that person about boa constrictors, or primeval forests, or stars. I would bring myself down to his level.",
  "cfg_weight": 0.3,
  "temperature": 0.8,
  "exaggeration": 0.5,
  "audio_prompt_path": "https://maskgct.github.io/audios/celeb_samples/rick_0.wav"
}

Output Results

https://replicate.delivery/czjl/uIFQnxe7zE34e0JwyGe7CVvvZY5OrmDo6x9f0mwCvF7DcaHTB/output.wav

Quick Actions

Technical Specifications

Hardware Type
T4
Run Count
672
Commercial Use
Unknown/Restricted
Platform
Replicate

Related Keywords

Zero-shot voice cloningEmotional speech synthesisAI agent speech generationVideo voiceoversGame character dialogueDramatic speech production