MOSS-TTS v1.5

MOSS-TTS v1. Ready to experience the power of AI? Start your journey here!

Platform: Replicate

Text-to-SpeechVoice CloningLong-Form SpeechMultilingual Speech

0 runs

Hugging Face / Transformers / local GPU or compatible OpenMOSS inference backends

Open weights; no first-party hosted token price was verified in the collected sources. Runtime cost depends on the local or self-hosted backend.

Commercial

🚀Function Overview

A speech generation model for direct TTS and reference-audio voice cloning, with controls for multilingual synthesis, pronunciation, duration, and explicit pauses.

Key Features

Zero-shot voice cloning from reference audio
Long-form speech generation for narration and spoken content
Multilingual synthesis and code-switching
Pinyin and IPA pronunciation control
Token-level duration control and explicit pause markup such as [pause X.Ys]
Open weights under the Apache-2.0 repository license

Use Cases

•Generating voiceovers and narration from text
•Testing open-source voice cloning workflows
•Creating multilingual or code-switched speech samples
•Controlling pronunciation for names, technical terms, and non-English text
•Evaluating local or self-hosted TTS instead of a closed hosted voice API

⚙️Input Parameters

model

string

Use OpenMOSS-Team/MOSS-TTS-v1.5 when loading from Hugging Face or compatible Transformers-based examples.

text

string

Text to synthesize. The official examples include direct generation, multilingual text with language tags, Pinyin, IPA, duration control, and explicit pause control.

language

string

Optional language tag. OpenMOSS recommends setting the language whenever the language is known for multilingual inputs.

reference_audio

string

Optional reference audio for zero-shot voice cloning.

reference_text

string

Optional transcript or prompt text for the reference audio when the runtime supports voice-cloning prompts.

duration_control

number

Optional duration or timing control for matching target speech length or pause behavior.

💡Usage Examples

Example 1

Input Parameters

{
  "model": "OpenMOSS-Team/MOSS-TTS-v1.5",
  "language": "English",
  "text": "Welcome to MOSS-TTS v1.5. This sample demonstrates open text-to-speech with controllable pauses [pause 1.0s] and multilingual-ready synthesis."
}

Output Results

A generated speech audio file from the selected local or hosted MOSS-TTS runtime.

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: Hugging Face / Transformers / local GPU or compatible OpenMOSS inference backends
Run Count: 0
Commercial Use: Supported
Pricing: Open weights; no first-party hosted token price was verified in the collected sources. Runtime cost depends on the local or self-hosted backend.
Platform: Replicate

Related Keywords

MOSS-TTSMOSS-TTS v1.5OpenMOSS-Team/MOSS-TTS-v1.5open-source text-to-speechzero-shot voice cloningmultilingual TTSPinyin IPA pronunciation controllong-form speech generationlocal TTS model

Related Models

Chatterbox TTS

Chatterbox is a state-of-the-art zeroshot TTS

NVIDIA PDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.

Piper Persian Text-to-Speech

A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.