G
GetLLMs

Voice Synthesis Models

Find 13 advanced text-to-speech and voice generation models.

13 models

Found 13 models

Sort by:
Chatterbox TTS

thomcle/chatterbox-tts

Experience Chatterbox TTS for state-of-the-art zero-shot voice cloning and expressive speech synthesis. Perfect for AI agents and video voiceovers.

Zero-shot TTSVoice CloningEmotion Control+1
NVIDIA PDF to Podcast

nvidia/pdf-to-podcast

Transform your PDFs into engaging AI podcasts with NVIDIA PDF to Podcast. Convert documents to audio, customize voices, and set durations for rich, on-the-go content.

PDF to PodcastDocument-to-AudioAI Content Conversion+1
Piper Persian Text-to-Speech

mosnfar/piper_persian

Get Piper Persian Text-to-Speech for efficient, local speech synthesis. Optimized for Raspberry Pi, it converts Persian text to audio, ideal for voice assistants and accessibility.

Persian TTSText-to-SpeechRaspberry Pi Optimization+1
Minimax Voice Cloning

minimax/voice-cloning

Commercial

Minimax Voice Cloning allows you to create custom AI voices from short audio samples, perfect for personalized TTS applications and generating unique audio content.

Voice CloningAudio TrainingTTS Integration
4.1k
3 per output
View Details
Minimax Speech-02 Turbo

minimax/speech-02-turbo

Commercial

Experience Minimax Speech-02 Turbo for real-time voice synthesis and voice cloning. Unlock emotional expression and multilingual support for dynamic audio applications.

Text-to-SpeechVoice CloningMultilingual TTS+1
57.6k
0.06 per thousand input tokens
View Details
Minimax Speech-02-HD

minimax/speech-02-hd

Commercial

Discover Minimax Speech-02-HD: advanced text-to-audio with emotional expression and multilingual support for high-fidelity voiceovers and audiobooks.

Text-to-SpeechEmotional Voice SynthesisMultilingual TTS+1
95.6k
0.10 per thousand input tokens
View Details
Kimi Audio 7B Instruct

zsxkib/kimi-audio-7b-instruct

Commercial

Explore Kimi Audio 7B Instruct for universal audio processing, speech transcription, and emotion recognition. Unlock advanced audio AI capabilities now.

Universal Audio ProcessingSpeech TranscriptionEmotion Recognition+1
216
NVIDIA L40S
View Details
PrunaAI Dia 1.6B

prunaai/dia-1.6b

PrunaAI Dia 1.6B revolutionizes expressive voice generation with multi-speaker dialogue and non-verbal cues. Create dynamic, natural-sounding audio for diverse applications.

Text-to-SpeechDialogue GenerationAudio Synthesis
1.7k
A100 (80GB)
View Details
Dia 1.6B

zsxkib/dia

Commercial

Unlock realistic dialogue audio with Dia 1.6B. Generate multi-speaker conversations, non-verbal cues, and even clone voices for your projects.

Dialogue SynthesisVoice CloningText-to-Speech+1
5.9k
L40S
View Details
DeepAudio-V1 Model

acappemin/deepaudio-v1

DeepAudio-V1 Model excels in Video-to-Speech and Video-to-Audio generation, transforming video inputs into synchronized speech and audio. Experience seamless multimedia content creation.

Video-to-SpeechVideo-to-AudioEnd-to-End Generation
Cog Orpheus 3B

gianpaj/cog-orpheus-3b-0.1-ft

Experience Cog Orpheus 3B, a multilingual text-to-speech model offering voice cloning and emotional speech. Perfect for real-time applications and expressive audio.

Multilingual TTSVoice CloningEmotion Control
F5-TTS Vietnamese Text-to-Speech

tuannha/f5-tts-vi

F5-TTS Vietnamese Text-to-Speech model by EraX-AI offers zero-shot voice cloning and adjustable speech for personalized voiceovers. Get natural Vietnamese audio.

Vietnamese TTSZero-Shot Voice CloningSpeech Synthesis
Spark TTS

jichengdu/spark-tts

Spark TTS offers advanced text-to-speech generation with voice cloning for personalized audio and custom voice creation with adjustable parameters. Explore its features!

Text-to-Speech GenerationVoice CloningSpeech Parameter Control