GPT-4o Mini Transcribe

Discover GPT-4o Mini Transcribe, a cutting-edge speech-to-text model from OpenAI. Ready to experience the power of AI? Start your journey here!

Platform: Replicate

Speech-to-TextAudio TranscriptionMultilingual Support

197 runs

Priced by multiple properties

Commercial

🚀Function Overview

Transcribes audio files to text using GPT-4o mini, offering improved accuracy and language recognition over previous models.

Key Features

Supports multiple audio formats (mp3, mp4, mpeg, etc.)
Language specification via ISO-639-1 codes for enhanced accuracy
Optional text prompts to guide transcription style
Adjustable sampling temperature for output control
16,000 token context window for long audio handling
2,000 max output tokens
Improved word error rate over Whisper models

Use Cases

•Converting spoken content into written transcripts
•Generating subtitles for video/audio content
•Accessibility tools for hearing-impaired users
•Meeting/lecture documentation

⚙️Input Parameters

audio_file

string

The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm

language

string

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

prompt

string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperature

number

Sampling temperature between 0 and 1

💡Usage Examples

Example 1

Input Parameters

{
  "language": "en",
  "audio_file": "https://replicate.delivery/xezq/ejt5KPWzFp25fUGtjPhwFmeeG5nFpCvu5zSMIySXnemTWn0lC/tmptuxz6n1z.mp3",
  "temperature": 0
}

Output Results

So, we just added GPT-4o Mini Transcribe to Replicate, and thought you'd want to know. It's basically a speech-to-text model that uses GPT-4o Mini to turn your audio into text. The cool thing is that it's noticeably better than the Whisper models we've been using. Fewer errors, better at recognizing different languages, and just more accurate overall. If you've ever been frustrated with transcripts that mess up technical terms or struggle with different accents, you'll probably appreciate this upgrade. It just works better. Some quick tech specs if you're curious: It has a 16,000 token context window, which means it can handle longer audio clips in one go. And it can output up to 2,000 tokens, so you'll get nice complete transcripts. The model's knowledge is current up to June 2024.

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type
Run Count: 197
Commercial Use: Supported
Pricing: Priced by multiple properties
Platform: Replicate

Related Keywords

Speech-to-TextAudio TranscriptionMultilingual SupportSubtitle GenerationAccessibility ToolsMeeting DocumentationImproved Accuracy

Related Models

DrumTest2 Rhythmic Audio Transformer

Transforms any rhythmic sound—a drum kit, beatboxing, a toy drum, even drumming on your belly—into a pro-quality performance on Zohar's studio drum kit.

Speaker Diarization

Speaker Diarization with "pyannote/speaker-diarization-3.1"

Resemble Enhance AI

Optimizes audio files with speech