G
GetLLMs

Minimax Speech-02 Turbo

Minimax Speech-02 Turbo offers real-time voice synthesis and voice cloning. Ready to experience the power of AI? Start your journey here!

Platform: Replicate
Text-to-SpeechVoice CloningMultilingual TTSReal-Time Audio
57.6k runs
0.06 per thousand input tokens
Commercial

🚀Function Overview

A real-time text-to-audio model with voice cloning, emotional expression controls, and multilingual support.

Key Features

  • Voice cloning with 99% similarity using custom or pre-built voices
  • Emotion control via auto-detection or manual settings
  • Support for 30+ languages including regional accents
  • Granular audio controls (pitch, speed, volume)
  • Low-latency processing for real-time applications
  • English text normalization for improved number reading

Use Cases

  • Real-time voice synthesis for interactive applications
  • Audiobook and podcast narration
  • Multilingual customer service chatbots
  • Personalized voice cloning solutions
  • Emotion-aware voice interfaces
  • Accessibility tools for text-to-speech conversion

⚙️Input Parameters

text

string

Text to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).

voice_id

string

Desired voice ID. Use a voice ID you have trained (https://replicate.com/minimax/voice-cloning), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl

speed

number

Speech speed

volume

number

Speech volume

pitch

integer

Speech pitch

emotion

string

Speech emotion

english_normalization

boolean

Enable English text normalization for better number reading (slightly increases latency)

sample_rate

integer

Sample rate for the generated speech

bitrate

integer

Bitrate for the generated speech

channel

string

Number of audio channels

language_boost

string

Enhance recognition of specific languages and dialects

💡Usage Examples

Example 1

Input Parameters

{
  "text": "Speech-02-series is a Text-to-Audio and voice cloning technology that offers voice synthesis, emotional expression, and multilingual capabilities.\n\nThe HD version is optimized for high-fidelity applications like voiceovers and audiobooks. While the turbo one is designed for real-time applications with low latency.\n\nWhen using this model on Replicate, each character represents 1 token.",
  "pitch": 0,
  "speed": 1,
  "volume": 1,
  "bitrate": 128000,
  "channel": "mono",
  "emotion": "angry",
  "voice_id": "Deep_Voice_Man",
  "sample_rate": 32000,
  "language_boost": "English",
  "english_normalization": true
}

Output Results

https://replicate.delivery/xezq/SnPxXgl26yaAApm29BJpcHRl5PyxHAxpDt97TP59rPiFeWUKA/tmp517d49p_.mp3

Quick Actions

Technical Specifications

Hardware Type
Run Count
57.6k
Commercial Use
Supported
Pricing
0.06 per thousand input tokens
Platform
Replicate

Related Keywords

Real-time voice synthesisVoice cloningEmotional expressionMultilingual supportLow-latency processingAudiobook and podcast narrationCustomer service chatbotsPersonalized voice solutions