PrunaAI Dia 1.6B

Discover PrunaAI Dia 1.6B, the cutting-edge model for expressive voice generation. Craft dynamic multi-speaker dialogues with non-verbal cues.

Platform: Replicate

Text-to-SpeechDialogue GenerationAudio Synthesis

1.7k runs

A100 (80GB)

License Check Required

🚀Function Overview

Generates voice audio from formatted dialogue text with customizable parameters for expression, duration, and speech characteristics.

Key Features

Multi-speaker dialogue support (using [S1], [S2] markers)
Non-verbal cue integration (e.g., laughs, whispers)
Adjustable audio length via token control
Speech faithfulness and randomness parameters
Playback speed modification

Use Cases

•Generating voiceovers for animations
•Creating podcast dialogues with multiple speakers
•Producing expressive dialogue for virtual assistants
•Adding vocal effects in audio storytelling

⚙️Input Parameters

text

string

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

max_new_tokens

integer

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

cfg_scale

number

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

temperature

number

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.

top_p

number

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

cfg_filter_top_k

integer

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

speed_factor

number

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

seed

integer

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

💡Usage Examples

Example 1

Input Parameters

{
  "seed": -1,
  "text": "[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\n\n[S2] Oh my god! Okay… it's happening. Everybody stay calm!\n\n[S1] What's the procedure…\n\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\n\n[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\n\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…",
  "top_p": 0.95,
  "cfg_scale": 3,
  "temperature": 1.3,
  "speed_factor": 0.94,
  "max_new_tokens": 3072,
  "cfg_filter_top_k": 35
}

Output Results

https://replicate.delivery/yhqm/pDh7bUfS46zUdCeYrR6orcbohXAx2DlAGsv0pR50edezHnXSB/output.wav

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: A100 (80GB)
Run Count: 1.7k
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Expressive Voice GenerationMulti-speaker DialogueNon-verbal CuesCustomizable Audio LengthSpeech CharacteristicsVoiceoversPodcast DialogueVirtual Assistants

Related Models

Dia 1.6B

Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning

DeepAudio-V1 Model

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Chatterbox TTS

Chatterbox is a state-of-the-art zeroshot TTS