G
GetLLMs

MOSS-TTS v1.5

MOSS-TTS v1. Ready to experience the power of AI? Start your journey here!

Platform: Replicate
Text-to-SpeechVoice CloningLong-Form SpeechMultilingual Speech
0 runs
Hugging Face / Transformers / local GPU or compatible OpenMOSS inference backends
Open weights; no first-party hosted token price was verified in the collected sources. Runtime cost depends on the local or self-hosted backend.
Commercial

🚀Function Overview

A speech generation model for direct TTS and reference-audio voice cloning, with controls for multilingual synthesis, pronunciation, duration, and explicit pauses.

Key Features

  • Zero-shot voice cloning from reference audio
  • Long-form speech generation for narration and spoken content
  • Multilingual synthesis and code-switching
  • Pinyin and IPA pronunciation control
  • Token-level duration control and explicit pause markup such as [pause X.Ys]
  • Open weights under the Apache-2.0 repository license

Use Cases

  • Generating voiceovers and narration from text
  • Testing open-source voice cloning workflows
  • Creating multilingual or code-switched speech samples
  • Controlling pronunciation for names, technical terms, and non-English text
  • Evaluating local or self-hosted TTS instead of a closed hosted voice API

⚙️Input Parameters

model

string

Use OpenMOSS-Team/MOSS-TTS-v1.5 when loading from Hugging Face or compatible Transformers-based examples.

text

string

Text to synthesize. The official examples include direct generation, multilingual text with language tags, Pinyin, IPA, duration control, and explicit pause control.

language

string

Optional language tag. OpenMOSS recommends setting the language whenever the language is known for multilingual inputs.

reference_audio

string

Optional reference audio for zero-shot voice cloning.

reference_text

string

Optional transcript or prompt text for the reference audio when the runtime supports voice-cloning prompts.

duration_control

number

Optional duration or timing control for matching target speech length or pause behavior.

💡Usage Examples

Example 1

Input Parameters

{
  "model": "OpenMOSS-Team/MOSS-TTS-v1.5",
  "language": "English",
  "text": "Welcome to MOSS-TTS v1.5. This sample demonstrates open text-to-speech with controllable pauses [pause 1.0s] and multilingual-ready synthesis."
}

Output Results

A generated speech audio file from the selected local or hosted MOSS-TTS runtime.

Quick Actions

Technical Specifications

Hardware Type
Hugging Face / Transformers / local GPU or compatible OpenMOSS inference backends
Run Count
0
Commercial Use
Supported
Pricing
Open weights; no first-party hosted token price was verified in the collected sources. Runtime cost depends on the local or self-hosted backend.
Platform
Replicate

Related Keywords

MOSS-TTSMOSS-TTS v1.5OpenMOSS-Team/MOSS-TTS-v1.5open-source text-to-speechzero-shot voice cloningmultilingual TTSPinyin IPA pronunciation controllong-form speech generationlocal TTS model