DeepAudio-V1 Model

Discover the DeepAudio-V1 Model, a powerful tool for Video-to-Speech and Video-to-Audio generation. Let's explore what this AI model can do for you!

Platform: Replicate

Video-to-SpeechVideo-to-AudioEnd-to-End Generation

47 runs

L40S

License Check Required

🚀Function Overview

Generates synchronized audio and speech outputs from video inputs using multi-stage processing, including configurable generation steps and prompts.

Key Features

Processes video inputs to generate audio tracks
Supports audio generation via text prompts
Enables speech synthesis through transcriptions and reference audio
Configurable generation steps for fine-tuned output

Use Cases

•Adding voiceovers or narration to silent videos
•Generating soundtracks/sound effects for video content
•Creating lip-synced audio for dubbed video content
•Producing educational or explanatory narrations from visual media

⚙️Input Parameters

video

string

Input Video

prompt

string

Video-to-Audio Text Prompt

v2a_num_steps

integer

Video-to-Audio Num Steps

text

string

Video-to-Speech Transcription

audio_prompt

string

Video-to-Speech Speech Prompt

text_prompt

string

Video-to-Speech Speech Prompt Transcription

v2s_num_steps

integer

Video-to-Speech Num Steps

💡Usage Examples

Example 1

Input Parameters

{
  "text": "I've still got a few knocking around in here",
  "video": "https://replicate.delivery/pbxt/MuPEtAEGUF26jSP0uZJhdOC2wvbKKmW5g1roFl5RHrAImfGd/0778.mp4",
  "prompt": "",
  "text_prompt": "Who finally decided to show up for work Yay",
  "audio_prompt": "https://replicate.delivery/pbxt/MuPEsSDVjpUAc1o8kwYBzISXpeTaUqsMqISkRr8tcmXphbN3/Gobber-00-0235.wav",
  "v2a_num_steps": 25,
  "v2s_num_steps": 32
}

Output Results

https://replicate.delivery/xezq/GaCOnlYIxbrlCFeRheWmtqCpyogvrEp72BLznhwKAqVeJzNpA/__tmp__tmplzbtv4zt.mp4.mp4.gen.mp4

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type: L40S
Run Count: 47
Commercial Use: Unknown/Restricted
Platform: Replicate

Related Keywords

Video-to-SpeechVideo-to-AudioMulti-ModalEnd-to-End GenerationVoiceoversSoundtracksLip-Synced Audio

Related Models

PrunaAI Dia 1.6B

A model for generating expressive voice audio from dialogue scripts.

Chatterbox TTS

Chatterbox is a state-of-the-art zeroshot TTS

NVIDIA PDF to Podcast

Transform PDFs into AI podcasts for engaging on-the-go audio content.