MOSS-TTS v1.5
MOSS-TTS v1. Ready to experience the power of AI? Start your journey here!
🚀Function Overview
A speech generation model for direct TTS and reference-audio voice cloning, with controls for multilingual synthesis, pronunciation, duration, and explicit pauses.
Key Features
- Zero-shot voice cloning from reference audio
- Long-form speech generation for narration and spoken content
- Multilingual synthesis and code-switching
- Pinyin and IPA pronunciation control
- Token-level duration control and explicit pause markup such as [pause X.Ys]
- Open weights under the Apache-2.0 repository license
Use Cases
- •Generating voiceovers and narration from text
- •Testing open-source voice cloning workflows
- •Creating multilingual or code-switched speech samples
- •Controlling pronunciation for names, technical terms, and non-English text
- •Evaluating local or self-hosted TTS instead of a closed hosted voice API
⚙️Input Parameters
model
stringUse OpenMOSS-Team/MOSS-TTS-v1.5 when loading from Hugging Face or compatible Transformers-based examples.
text
stringText to synthesize. The official examples include direct generation, multilingual text with language tags, Pinyin, IPA, duration control, and explicit pause control.
language
stringOptional language tag. OpenMOSS recommends setting the language whenever the language is known for multilingual inputs.
reference_audio
stringOptional reference audio for zero-shot voice cloning.
reference_text
stringOptional transcript or prompt text for the reference audio when the runtime supports voice-cloning prompts.
duration_control
numberOptional duration or timing control for matching target speech length or pause behavior.
💡Usage Examples
Example 1
Input Parameters
{
"model": "OpenMOSS-Team/MOSS-TTS-v1.5",
"language": "English",
"text": "Welcome to MOSS-TTS v1.5. This sample demonstrates open text-to-speech with controllable pauses [pause 1.0s] and multilingual-ready synthesis."
}Output Results
Quick Actions
Technical Specifications
- Hardware Type
- Hugging Face / Transformers / local GPU or compatible OpenMOSS inference backends
- Run Count
- 0
- Commercial Use
- Supported
- Pricing
- Open weights; no first-party hosted token price was verified in the collected sources. Runtime cost depends on the local or self-hosted backend.
- Platform
- Replicate
Related Keywords
Related Models
Chatterbox TTS
Chatterbox is a state-of-the-art zeroshot TTS
NVIDIA PDF to Podcast
Transform PDFs into AI podcasts for engaging on-the-go audio content.
Piper Persian Text-to-Speech
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.