Spark TTS
Spark TTS is a cutting-edge text-to-speech model that excels in voice cloning and custom voice creation. See what makes this AI model special!
🚀Function Overview
Converts text into speech through voice cloning (using audio prompts) or voice creation (with customizable gender, pitch, and speed parameters) while controlling generation randomness.
Key Features
- Voice cloning using prompt audio files
- Custom voice creation with gender/pitch/speed settings
- Randomness control via temperature sampling
- Top-k/top-p token selection options
Use Cases
- •Personalized voice assistants
- •Audiobook narration
- •Accessibility tools for text-to-speech
- •Content creation with cloned voices
⚙️Input Parameters
text
stringText for TTS generation - REQUIRED in both modes (要转换为语音的文本 - 两种模式下都必需)
mode
stringTTS mode: voice cloning requires a prompt audio file to mimic the voice; voice creation generates speech with specified gender/pitch/speed parameters. (TTS模式:声音克隆需要提供语音样本来模仿声音;声音创建使用指定的性别/音高/语速参数生成语音)
prompt_speech_path
string[Voice Cloning] Path to the prompt audio file - REQUIRED in voice cloning mode (声音克隆模式:提示音频文件路径 - 声音克隆模式下必需)
prompt_text
string[Voice Cloning] Transcript of prompt audio - Optional but improves quality (声音克隆模式:提示音频的文本转录 - 可选,但提供可提高质量)
gender
string[Voice Creation] Voice gender - REQUIRED in voice creation mode (声音创建模式:声音性别 - 声音创建模式下必需)
pitch
string[Voice Creation] Voice pitch level - REQUIRED in voice creation mode (声音创建模式:声音音高 - 声音创建模式下必需)
speed
string[Voice Creation] Speaking speed - REQUIRED in voice creation mode (声音创建模式:说话速度 - 声音创建模式下必需)
temperature
numberSampling temperature (0.0-1.0) - Controls randomness in generation (采样温度 - 控制生成的随机性)
top_k
integerTop-k sampling parameter - Limits the token selection to top k options (Top-k采样参数 - 将令牌选择限制为前k个选项)
top_p
numberTop-p sampling parameter - Nucleus sampling probability threshold (Top-p采样参数 - 核采样概率阈值)
💡Usage Examples
Example 1
Input Parameters
{ "mode": "voice_creation", "text": "白日依山尽,黄河入海流。", "pitch": "high", "speed": "low", "top_k": 50, "top_p": 0.95, "gender": "female", "prompt_text": "", "temperature": 0.8 }
Quick Actions
Technical Specifications
- Hardware Type
- L40S
- Run Count
- 209
- Commercial Use
- Unknown/Restricted
- Platform
- Replicate
Related Keywords
Related Models
Minimax Speech-02-HD
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Cog Orpheus 3B
Spanish and English Text to Speech model from Canopy Labs (3b-es_it-ft-research_release)
Dia 1.6B
Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning