Summary

OpenAI published details on a new WebSocket mode for the Responses API that keeps a persistent connection alive for multi-step agent loops. The company says the change made agentic workflows up to 40% faster end to end, helped GPT-5.3-Codex-Spark reach roughly 1,000 tokens per second with bursts to 4,000, and improved downstream products including Cursor, Vercel AI SDK, and Cline.

What changed

OpenAI introduced a persistent WebSocket transport for the Responses API so agent loops can reuse connection state instead of sending a fresh synchronous request for each tool step.

Why it matters

As coding and workflow agents get faster, API overhead becomes a real bottleneck. This upgrade matters because it turns transport design into a product differentiator for agent platforms and shows how much performance room is left outside the model itself.

Evidence excerpt

OpenAI says WebSocket mode made agent loops up to 40% faster end to end, hit about 1,000 TPS for GPT-5.3-Codex-Spark with bursts to 4,000 TPS, and made OpenAI models in Cursor up to 30% faster.

Sources