G
GetLLMs

PrunaAI Dia 1.6B

Discover PrunaAI Dia 1.6B, the cutting-edge model for expressive voice generation. Craft dynamic multi-speaker dialogues with non-verbal cues.

Platform: Replicate
Text-to-SpeechDialogue GenerationAudio Synthesis
1.7k runs
A100 (80GB)
License Check Required

🚀Function Overview

Generates voice audio from formatted dialogue text with customizable parameters for expression, duration, and speech characteristics.

Key Features

  • Multi-speaker dialogue support (using [S1], [S2] markers)
  • Non-verbal cue integration (e.g., laughs, whispers)
  • Adjustable audio length via token control
  • Speech faithfulness and randomness parameters
  • Playback speed modification

Use Cases

  • Generating voiceovers for animations
  • Creating podcast dialogues with multiple speakers
  • Producing expressive dialogue for virtual assistants
  • Adding vocal effects in audio storytelling

⚙️Input Parameters

text

string

Input text for dialogue generation. Use [S1], [S2] to indicate different speakers and (description) in parentheses for non-verbal cues e.g., (laughs), (whispers).

max_new_tokens

integer

Controls the length of generated audio. Higher values create longer audio. (86 tokens ≈ 1 second of audio).

cfg_scale

number

Controls how closely the audio follows your text. Higher values (3-5) follow text more strictly; lower values may sound more natural but deviate more.

temperature

number

Controls randomness in generation. Higher values (1.3-2.0) increase variety; lower values (0.1-0.9) make output more consistent and predictable.

top_p

number

Controls diversity of word choice. Higher values include more unusual options. Most users shouldn't need to adjust this parameter.

cfg_filter_top_k

integer

Technical parameter for filtering audio generation tokens. Higher values allow more diverse sounds; lower values create more consistent audio.

speed_factor

number

Adjusts playback speed of the generated audio. Values below 1.0 slow down the audio; 1.0 is original speed.

seed

integer

Random seed for reproducible results. Use the same seed value to get the same output for identical inputs. Leave blank for random results each time.

💡Usage Examples

Example 1

Input Parameters

{
  "seed": -1,
  "text": "[S1] It's on Replicate!!! Oh fire! Oh my goodness! What's the procedure? What do we do people? The Dia text-to-speech model — now Pruna-optimized — just dropped on Replicate!!\n\n[S2] Oh my god! Okay… it's happening. Everybody stay calm!\n\n[S1] What's the procedure…\n\n[S2] Everybody stay fricking calm!!!... Everybody fudging calm down!!!!!\n\n[S1] Yes! Yes! Let's try it out at prunaai/dia-1.6b (laughs) — powered up and made leaner with Pruna!\n\n[S2] (whispers) try it now… (whispers) turbocharged by Pruna…",
  "top_p": 0.95,
  "cfg_scale": 3,
  "temperature": 1.3,
  "speed_factor": 0.94,
  "max_new_tokens": 3072,
  "cfg_filter_top_k": 35
}

Output Results

https://replicate.delivery/yhqm/pDh7bUfS46zUdCeYrR6orcbohXAx2DlAGsv0pR50edezHnXSB/output.wav

Quick Actions

Technical Specifications

Hardware Type
A100 (80GB)
Run Count
1.7k
Commercial Use
Unknown/Restricted
Platform
Replicate

Related Keywords

Expressive Voice GenerationMulti-speaker DialogueNon-verbal CuesCustomizable Audio LengthSpeech CharacteristicsVoiceoversPodcast DialogueVirtual Assistants