Summary
OpenAI introduced three realtime audio models in its API: GPT-Realtime-2 for reasoning-heavy voice agents, GPT-Realtime-Translate for live multilingual conversations, and GPT-Realtime-Whisper for low-latency streaming transcription. The update also expands context length, tool use, and controllable reasoning levels for voice workflows.
What changed
OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper in the Realtime API with stronger reasoning, longer context, translation, and transcription features.
Why it matters
OpenAI is treating voice as an agent interface, not just a speech front end. The release tightens the link between realtime conversation, tool calling, translation, and production workflow automation, which raises the bar for platform vendors competing in voice assistants and customer-service agents.
Evidence excerpt
OpenAI says the new API release introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper so developers can build voice apps that reason, translate, and transcribe in real time.