Member-only story

OpenAI GPT-4o released

Emad Dehnavi
1 min readMay 19, 2024

OpenAI just released the GPT-4o, the OpenAI’s new multimodal LLM can understand and generate across text, audio, and vision in real-time.

Model evaluations

  • Text, Text + Image, Text + Audio, Text + Video, Audio
  • Output: Image, Image + Text, Text, Audio
  • 88.7% on MMLU; 90.2% on HumanEval
  • < 5% WER for Western European languages in transcription
  • 69.1% on MMU; 92.8% on DocVQA
  • Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
  • Near real-time audio with 320ms on average, similar to human conversation
  • New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x — 4.4x fewer tokens needed across 20 languages

Model capabilities

GPT-4o, trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.

Read the full release post and model capabilities: https://openai.com/index/hello-gpt-4o/

--

--

Emad Dehnavi
Emad Dehnavi

Written by Emad Dehnavi

With 8 years as a software engineer, I write about AI and technology in a simple way. My goal is to make these topics easy and interesting for everyone.

No responses yet