Member-only story
OpenAI GPT-4o released
1 min readMay 19, 2024
OpenAI just released the GPT-4o, the OpenAI’s new multimodal LLM can understand and generate across text, audio, and vision in real-time.
Model evaluations
- Text, Text + Image, Text + Audio, Text + Video, Audio
- Output: Image, Image + Text, Text, Audio
- 88.7% on MMLU; 90.2% on HumanEval
- < 5% WER for Western European languages in transcription
- 69.1% on MMU; 92.8% on DocVQA
- Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
- Near real-time audio with 320ms on average, similar to human conversation
- New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x — 4.4x fewer tokens needed across 20 languages
Model capabilities
GPT-4o, trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.
Read the full release post and model capabilities: https://openai.com/index/hello-gpt-4o/