G
GetLLMs

GPT-4o Transcribe

Transform your audio into accurate text with GPT-4o Transcribe. Ready to experience the power of AI? Start your journey here!

Platform: Replicate
Speech-to-TextAudio TranscriptionMultilingual SupportGPT-4o Model
975 runs
Priced by multiple properties
Commercial

🚀Function Overview

Transcribes audio files into text using GPT-4o technology with improved accuracy over previous models.

Key Features

  • High-accuracy speech recognition
  • Supports multiple audio formats (mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
  • Optional language specification for better accuracy
  • Prompt-guided transcription style control
  • Temperature parameter for output variability
  • 16,000 token context window
  • 2,000 token output limit
  • Up-to-date knowledge (June 2024)

Use Cases

  • Transcribing meetings or interviews
  • Generating captions for videos
  • Accessibility applications for audio content
  • Multilingual content transcription
  • Academic research transcription

⚙️Input Parameters

audio_file

string

The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm

language

string

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

prompt

string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

temperature

number

Sampling temperature between 0 and 1

💡Usage Examples

Example 1

Input Parameters

{
  "language": "en",
  "audio_file": "https://replicate.delivery/xezq/XoxHeakty0z3KKc46cMLPKC2ct54ekT3EtvcwDQuRIuxfJdpA/tmpsglqtqn5.mp3",
  "temperature": 0
}

Output Results

So
we
just
added
GPT
-
4
o
trans
cribe
to
Rep
licate
and
thought
you'd
want
to
know
.
It's
basically
a
speech
-to
-text
model
that
uses
GPT
-
4
o
to
turn
your
audio
into
text
.
The
cool
thing
is
that
it's
noticeably
better
than
the
Whisper
models
we've
been
using
,
fewer
errors
,
better
at
recognizing
different
languages
,
and
just
more
accurate
overall
.
If
you've
ever
been
frustrated
with
transcripts
that
mess
up
technical
terms
or
struggle
with
different
accents
,
you'll
probably
appreciate
this
upgrade
.
It
just
works
better
.
Some
quick
tech
specs
if
you're
curious
.
It
has
a
16
,
000
token
context
window
,
which
means
it
can
handle
longer
audio
clips
in
one
go
.
And
it
can
output
up
to
2
,
000
tokens
,
so
you'll
get
nice
complete
transcripts
.
The
model
's
knowledge
is
current
up
to
June
202
4
,
so
it's
pretty
up
-to
-date
with
language
and
terminology
.

Quick Actions

Technical Specifications

Hardware Type
Run Count
975
Commercial Use
Supported
Pricing
Priced by multiple properties
Platform
Replicate

Related Keywords

High-accuracy speech recognitionMultilingual supportTranscribing meetings or interviewsGenerating captions for videosAccessibility applicationsAcademic research transcription