Spark TTS

Spark TTS is a cutting-edge text-to-speech model that excels in voice cloning and custom voice creation. See what makes this AI model special!

Platform: Replicate

Text-to-Speech GenerationVoice CloningSpeech Parameter Control

209 runs

L40S

License Check Required

🚀Function Overview

Converts text into speech through voice cloning (using audio prompts) or voice creation (with customizable gender, pitch, and speed parameters) while controlling generation randomness.

Key Features

Voice cloning using prompt audio files
Custom voice creation with gender/pitch/speed settings
Randomness control via temperature sampling
Top-k/top-p token selection options

Use Cases

•Personalized voice assistants
•Audiobook narration
•Accessibility tools for text-to-speech
•Content creation with cloned voices

⚙️Input Parameters

text

string

Text for TTS generation - REQUIRED in both modes (要转换为语音的文本 - 两种模式下都必需)

mode

string

TTS mode: voice cloning requires a prompt audio file to mimic the voice; voice creation generates speech with specified gender/pitch/speed parameters. (TTS模式：声音克隆需要提供语音样本来模仿声音；声音创建使用指定的性别/音高/语速参数生成语音)

prompt_speech_path

string

[Voice Cloning] Path to the prompt audio file - REQUIRED in voice cloning mode (声音克隆模式：提示音频文件路径 - 声音克隆模式下必需)

prompt_text

string

[Voice Cloning] Transcript of prompt audio - Optional but improves quality (声音克隆模式：提示音频的文本转录 - 可选，但提供可提高质量)

gender

string

[Voice Creation] Voice gender - REQUIRED in voice creation mode (声音创建模式：声音性别 - 声音创建模式下必需)

pitch

string

[Voice Creation] Voice pitch level - REQUIRED in voice creation mode (声音创建模式：声音音高 - 声音创建模式下必需)

speed

string

[Voice Creation] Speaking speed - REQUIRED in voice creation mode (声音创建模式：说话速度 - 声音创建模式下必需)

temperature

number

Sampling temperature (0.0-1.0) - Controls randomness in generation (采样温度 - 控制生成的随机性)

top_k

integer

Top-k sampling parameter - Limits the token selection to top k options (Top-k采样参数 - 将令牌选择限制为前k个选项)

top_p

number

Top-p sampling parameter - Nucleus sampling probability threshold (Top-p采样参数 - 核采样概率阈值)

💡Usage Examples

Example 1

Input Parameters

{
  "mode": "voice_creation",
  "text": "白日依山尽，黄河入海流。",
  "pitch": "high",
  "speed": "low",
  "top_k": 50,
  "top_p": 0.95,
  "gender": "female",
  "prompt_text": "",
  "temperature": 0.8
}

Output Results

https://replicate.delivery/xezq/ZmtRBVfNDXWYPCLLVsA8XfQfS5IfSBnVhWnKwq8CfhFjdyfGF/generated_speech.wav

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: L40S
Run Count: 209
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Voice CloningCustom Voice CreationText-to-Speech GenerationPersonalized Voice AssistantsAudiobook NarrationAccessibility ToolsContent Creation

Related Models

Minimax Speech-02-HD

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.

Cog Orpheus 3B

Spanish and English Text to Speech model from Canopy Labs (3b-es_it-ft-research_release)

Dia 1.6B

Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning