F5-TTS Vietnamese Text-to-Speech

Discover F5-TTS Vietnamese Text-to-Speech, a powerful model from EraX-AI. It excels in zero-shot voice cloning and produces natural Vietnamese speech.

Platform: Replicate

Vietnamese TTSZero-Shot Voice CloningSpeech Synthesis

79 runs

License Check Required

🚀Function Overview

A Vietnamese text-to-speech model that converts input text into speech using reference audio for voice cloning with adjustable speech parameters.

Key Features

Converts Vietnamese text to natural speech
Supports zero-shot voice cloning via reference audio
Adjustable speech speed control
Automatic text extraction from reference audio when needed

Use Cases

•Creating personalized voiceovers for Vietnamese content
•Developing voice assistants with custom voices
•Accessibility tools for text-to-audio conversion
•Generating speech samples for language learning

⚙️Input Parameters

input_text

string

Input text to convert to speech | Văn bản cần đọc

reference_audio

string

Reference audio file in wav format | Giọng tham chiếu, chỉ nhận file wav

reference_text

string

Reference text, must match the reference audio | Văn bản tham chiếu, phải trùng với giọng tham chiếu, nếu trống sẽ tự động trích xuất từ audio

speed

number

Speed of the speech | Tốc độ đọc

💡Usage Examples

Example 1

Input Parameters

{
  "speed": 1,
  "input_text": "Đây là đài tiếng nói Việt Nam. Phát thanh từ kênh trung ương Hà Nội.",
  "reference_text": "Càng trưởng thành bạn càng nhận ra tranh luận đúng sai, hơn thua cũng không còn quan trọng nữa.",
  "reference_audio": "https://replicate.delivery/pbxt/Mr9f6kUPs0oOP841x4pV4n0vSBafLSf4yEgSjYkOAqN57CUl/ash_vn.wav"
}

Output Results

https://replicate.delivery/czjl/OB8PX8kbeSXhHi0228AvrQjIKFooSTaViG8Xff5e9DzYihPSB/c4cded21-cf7f-458b-b357-48492f3671dd.wav

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: T4
Run Count: 79
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Zero-Shot Voice CloningVietnamese VoiceoversAdjustable Speech SpeedCustom Voice AssistantsText-to-Audio ConversionLanguage Learning Speech

Related Models

Chatterbox TTS

Chatterbox is a state-of-the-art zeroshot TTS

NVIDIA PDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

Piper Persian Text-to-Speech

A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.