G
GetLLMs

DeepAudio-V1 Model

Discover the DeepAudio-V1 Model, a powerful tool for Video-to-Speech and Video-to-Audio generation. Let's explore what this AI model can do for you!

Platform: Replicate
Video-to-SpeechVideo-to-AudioEnd-to-End Generation
47 runs
L40S
License Check Required

🚀Function Overview

Generates synchronized audio and speech outputs from video inputs using multi-stage processing, including configurable generation steps and prompts.

Key Features

  • Processes video inputs to generate audio tracks
  • Supports audio generation via text prompts
  • Enables speech synthesis through transcriptions and reference audio
  • Configurable generation steps for fine-tuned output

Use Cases

  • Adding voiceovers or narration to silent videos
  • Generating soundtracks/sound effects for video content
  • Creating lip-synced audio for dubbed video content
  • Producing educational or explanatory narrations from visual media

⚙️Input Parameters

video

string

Input Video

prompt

string

Video-to-Audio Text Prompt

v2a_num_steps

integer

Video-to-Audio Num Steps

text

string

Video-to-Speech Transcription

audio_prompt

string

Video-to-Speech Speech Prompt

text_prompt

string

Video-to-Speech Speech Prompt Transcription

v2s_num_steps

integer

Video-to-Speech Num Steps

💡Usage Examples

Example 1

Input Parameters

{
  "text": "I've still got a few knocking around in here",
  "video": "https://replicate.delivery/pbxt/MuPEtAEGUF26jSP0uZJhdOC2wvbKKmW5g1roFl5RHrAImfGd/0778.mp4",
  "prompt": "",
  "text_prompt": "Who finally decided to show up for work Yay",
  "audio_prompt": "https://replicate.delivery/pbxt/MuPEsSDVjpUAc1o8kwYBzISXpeTaUqsMqISkRr8tcmXphbN3/Gobber-00-0235.wav",
  "v2a_num_steps": 25,
  "v2s_num_steps": 32
}

Output Results

https://replicate.delivery/xezq/GaCOnlYIxbrlCFeRheWmtqCpyogvrEp72BLznhwKAqVeJzNpA/__tmp__tmplzbtv4zt.mp4.mp4.gen.mp4

Quick Actions

Technical Specifications

Hardware Type
L40S
Run Count
47
Commercial Use
Unknown/Restricted
Platform
Replicate

Related Keywords

Video-to-SpeechVideo-to-AudioMulti-ModalEnd-to-End GenerationVoiceoversSoundtracksLip-Synced Audio