Member-only story

OpenAI GPT-4o released

1 min readMay 19, 2024

OpenAI just released the GPT-4o, the OpenAI’s new multimodal LLM can understand and generate across text, audio, and vision in real-time.

Model evaluations

Text, Text + Image, Text + Audio, Text + Video, Audio
Output: Image, Image + Text, Text, Audio
88.7% on MMLU; 90.2% on HumanEval
< 5% WER for Western European languages in transcription
69.1% on MMU; 92.8% on DocVQA
Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
Near real-time audio with 320ms on average, similar to human conversation
New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x — 4.4x fewer tokens needed across 20 languages

Model capabilities

GPT-4o, trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.

Read the full release post and model capabilities: https://openai.com/index/hello-gpt-4o/

OpenAI GPT-4o released

Model evaluations

Model capabilities

Written by Emad Dehnavi

No responses yet