Summary
OpenAI published an engineering case study showing how Codex was used with Thrive Holdings and Crete accountants to build tax agents that improve from practitioner corrections. The post frames Codex less as a one-shot coding helper and more as part of a production feedback loop for vertical agents.
What changed
OpenAI described a three-part loop that turns practitioner corrections and production traces into evals that Codex can use to improve tax-agent behavior over time.
Why it matters
This is a concrete signal that Codex is being positioned for domain-specific agent systems, not only general developer assistance. It also shows where competition is heading: product teams want agents that learn from operational failures, not just better prompts.
Evidence excerpt
OpenAI says Codex was used to help build Tax AI for Crete accountants and describes a loop where practitioner corrections become evals and new improvement targets.