Cog Orpheus 3B

Discover Cog Orpheus 3B, a powerful multilingual text-to-speech model. Ready to experience the power of AI? Start your journey here!

Platform: Replicate

Multilingual TTSVoice CloningEmotion Control

94 runs

L40S

License Check Required

🚀Function Overview

A multilingual text-to-speech model generating expressive speech with emotional tags and voice cloning capabilities.

Key Features

Human-like speech with natural intonation
Zero-shot voice cloning without prior training
Emotion and intonation control via tags
Low-latency streaming for real-time applications

Use Cases

•Real-time voice streaming systems
•Localized virtual assistants
•Audiobook narration with emotional expression
•Accessibility tools for speech generation

⚙️Input Parameters

text

string

Text to convert to speech

voice

string

Voice to use

temperature

number

Temperature for generation

top_p

number

Top P for nucleus sampling

repetition_penalty

number

Repetition penalty

max_new_tokens

integer

Maximum number of tokens to generate

💡Usage Examples

Example 1

Input Parameters

{
  "text": "Hola, me llamo Javi, encantado de conocerte <giggle>",
  "voice": "javi",
  "temperature": 0.6,
  "top_p": 0.95,
  "repetition_penalty": 1.1,
  "max_new_tokens": 1200
}

Output Results

https://replicate.delivery/xezq/P9lQt0WoXUa1NlrgAVKFAhTxbEBMRv1NSSAs9oqr1VmKFlIF/output.wav

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: L40S
Run Count: 94
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Multilingual Text-to-SpeechVoice CloningEmotional SpeechLow-Latency StreamingReal-time VoiceLocalized Virtual AssistantsAudiobook NarrationAccessibility Tools

Related Models

Minimax Speech-02-HD

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.

Spark TTS

A model for text-to-speech generation with voice cloning and adjustable vocal parameters.

Dia 1.6B

Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning