signal insight

OpenAI adds GPT-Realtime-2, live translation, and streaming transcription to its voice API

OpenAI introduced three realtime audio models in its API: GPT-Realtime-2 for reasoning-heavy voice agents, GPT-Realtime-Translate for live multilingual conversations, and GPT-Realtime-Whisper for low-latency streaming transcription. The update also expands context length, tool use, and controllable reasoning levels for voice workflows.

Published May 7, 2026 Updated May 8, 2026 1 sources

voice-aiapirealtimemultimodalmodel update

Summary

What changed

OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API with stronger reasoning, longer context, translation, and transcription features.

Why it matters

OpenAI is treating voice as an agent interface, not just a speech front end. The release tightens the link between realtime conversation, tool calling, translation, and production workflow automation, which raises the bar for platform vendors competing in voice assistants and customer-service agents.

Evidence excerpt

OpenAI says the new API release introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper so developers can build voice apps that reason, translate, and transcribe in real time.

Sources

openai.com