G
GetLLMs

Spark TTS

Spark TTS is a cutting-edge text-to-speech model that excels in voice cloning and custom voice creation. See what makes this AI model special!

Platform: Replicate
Text-to-Speech GenerationVoice CloningSpeech Parameter Control
209 runs
L40S
License Check Required

🚀Function Overview

Converts text into speech through voice cloning (using audio prompts) or voice creation (with customizable gender, pitch, and speed parameters) while controlling generation randomness.

Key Features

  • Voice cloning using prompt audio files
  • Custom voice creation with gender/pitch/speed settings
  • Randomness control via temperature sampling
  • Top-k/top-p token selection options

Use Cases

  • Personalized voice assistants
  • Audiobook narration
  • Accessibility tools for text-to-speech
  • Content creation with cloned voices

⚙️Input Parameters

text

string

Text for TTS generation - REQUIRED in both modes (要转换为语音的文本 - 两种模式下都必需)

mode

string

TTS mode: voice cloning requires a prompt audio file to mimic the voice; voice creation generates speech with specified gender/pitch/speed parameters. (TTS模式:声音克隆需要提供语音样本来模仿声音;声音创建使用指定的性别/音高/语速参数生成语音)

prompt_speech_path

string

[Voice Cloning] Path to the prompt audio file - REQUIRED in voice cloning mode (声音克隆模式:提示音频文件路径 - 声音克隆模式下必需)

prompt_text

string

[Voice Cloning] Transcript of prompt audio - Optional but improves quality (声音克隆模式:提示音频的文本转录 - 可选,但提供可提高质量)

gender

string

[Voice Creation] Voice gender - REQUIRED in voice creation mode (声音创建模式:声音性别 - 声音创建模式下必需)

pitch

string

[Voice Creation] Voice pitch level - REQUIRED in voice creation mode (声音创建模式:声音音高 - 声音创建模式下必需)

speed

string

[Voice Creation] Speaking speed - REQUIRED in voice creation mode (声音创建模式:说话速度 - 声音创建模式下必需)

temperature

number

Sampling temperature (0.0-1.0) - Controls randomness in generation (采样温度 - 控制生成的随机性)

top_k

integer

Top-k sampling parameter - Limits the token selection to top k options (Top-k采样参数 - 将令牌选择限制为前k个选项)

top_p

number

Top-p sampling parameter - Nucleus sampling probability threshold (Top-p采样参数 - 核采样概率阈值)

💡Usage Examples

Example 1

Input Parameters

{
  "mode": "voice_creation",
  "text": "白日依山尽,黄河入海流。",
  "pitch": "high",
  "speed": "low",
  "top_k": 50,
  "top_p": 0.95,
  "gender": "female",
  "prompt_text": "",
  "temperature": 0.8
}

Output Results

https://replicate.delivery/xezq/ZmtRBVfNDXWYPCLLVsA8XfQfS5IfSBnVhWnKwq8CfhFjdyfGF/generated_speech.wav

Quick Actions

Technical Specifications

Hardware Type
L40S
Run Count
209
Commercial Use
Unknown/Restricted
Platform
Replicate

Related Keywords

Voice CloningCustom Voice CreationText-to-Speech GenerationPersonalized Voice AssistantsAudiobook NarrationAccessibility ToolsContent Creation