GPT-4o Transcribe

Transform your audio into accurate text with GPT-4o Transcribe. Ready to experience the power of AI? Start your journey here!

Platform: Replicate

Speech-to-TextAudio TranscriptionMultilingual SupportGPT-4o Model

975 runs

Priced by multiple properties

Commercial

🚀Function Overview

Transcribes audio files into text using GPT-4o technology with improved accuracy over previous models.

Key Features

High-accuracy speech recognition
Supports multiple audio formats (mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
Optional language specification for better accuracy
Prompt-guided transcription style control
Temperature parameter for output variability
16,000 token context window
2,000 token output limit
Up-to-date knowledge (June 2024)

Use Cases

•Transcribing meetings or interviews
•Generating captions for videos
•Accessibility applications for audio content
•Multilingual content transcription
•Academic research transcription

⚙️Input Parameters

audio_file

string

The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm

language

string

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

prompt

string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperature

number

Sampling temperature between 0 and 1

💡Usage Examples

Example 1

Input Parameters

{
  "language": "en",
  "audio_file": "https://replicate.delivery/xezq/XoxHeakty0z3KKc46cMLPKC2ct54ekT3EtvcwDQuRIuxfJdpA/tmpsglqtqn5.mp3",
  "temperature": 0
}

Output Results

just

added

GPT

trans

cribe

Rep

licate

and

thought

you'd

want

know

It's

basically

speech

-to

-text

model

that

uses

GPT

turn

your

audio

into

text

The

cool

thing

that

it's

noticeably

better

than

the

Whisper

models

we've

been

using

fewer

errors

better

recognizing

different

languages

and

just

accurate

overall

you've

ever

been

frustrated

with

transcripts

that

mess

technical

terms

struggle

with

different

accents

you'll

probably

appreciate

this

upgrade

just

works

better

Some

quick

tech

specs

you're

curious

has

000

token

context

window

which

means

can

handle

longer

audio

clips

one

And

can

output

000

tokens

you'll

get

nice

complete

transcripts

The

model

knowledge

current

June

202

it's

pretty

-to

-date

with

language

and

terminology

Quick Actions

Use NowView Documentation

Technical Specifications

Hardware Type
Run Count: 975
Commercial Use: Supported
Pricing: Priced by multiple properties
Platform: Replicate

Related Keywords

High-accuracy speech recognitionMultilingual supportTranscribing meetings or interviewsGenerating captions for videosAccessibility applicationsAcademic research transcription

Related Models

DrumTest2 Rhythmic Audio Transformer

Transforms any rhythmic sound—a drum kit, beatboxing, a toy drum, even drumming on your belly—into a pro-quality performance on Zohar's studio drum kit.

Speaker Diarization

Speaker Diarization with "pyannote/speaker-diarization-3.1"

Resemble Enhance AI

Optimizes audio files with speech