I'm a bit confused by this branding (never even noticed that there was a 5.2-Instant), it's not a super fast 1000tok/s Cerebras based model which they have for codex-spark, it's just 5.2 w/out the router / "non-thinking" mode?
I feel like openai is going to get right back to where they were pre GPT-5 with a ton of different options and no one knows which model to use for what.
Yeah, for a while ChatGPT Plus has been powered by two series of models under the hood.
One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).
We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.
(I work at OpenAI.)
By the way, I imagine you know this, but the product split is not obvious, even to my 20-something kids that are Plus subscribers - I saw one of them chatting with the instant model recently and I was like "No!! Never do that!!" and they did not understand they were getting the (I'm sorry to say) much less capable model.
I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.
I agree -- we're on the ChatGPT Enterprise plan at work and every time someone complains about it screwing up a task it turns out they were using the instant model. There needs to be a way to disable it at the bare minimum.
- [deleted]
Before GPT-5 was launched, and after sama had said they would unify the ordinary and reasoning models, I think we all expected more than an (auto-)switcher, we expected some small innovation (smaller than the ordinary-to-reasoning one, but still a significant one) that would make both kinds of replies be in a way generated by a single model (don't know exactly how, I expected OpenAI to surprise us with something that would feel obvious in retrospect).
You could perhaps show the "instant" reply right away and provide a button labeled "Think longer and give me a better answer" that starts the thinking model and eventually replaces the answer.
For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.
Wouldn't this be 1.5x as expensive?
Not if the Instant answer is sufficient.
That's assuming that the instant answer is even directionally correct. A misleading instant answer could pollute the context and lead the thinking model astray.
Can the context of the pre-revision, Instant response be simply be discarded -- or forked or branched or [insert appropriate nomenclature here] -- instead of being included as potential poison?
(It seems absurd that to consider that there may be no undo button that the machine can push.)
Auto will never work, because for the exact same prompt sometimes you want a quick answer because it's not something very important to you, and sometimes you want the answer to be as accurate as possible, even if you have to wait 10 minutes.
In my case it would be more useful to have a slider of how much I'm willing to wait. For example instant, or think up to 1 minute, or think up to 15 minutes.
They have an "answer now" button that stops the reasoning and starts the reply. Same with Gemini.
Yeah I use that, but it's not really a solution that allows to only have auto. It doesn't help when it chooses Instant instead of Thinking, and it's also much slower than using Instant outright because the Skip button doesn't immediately show, and it's generally slow to restart.
Is there a way to get sticky model selection back, or the reason is that it is just too expensive to serve alternative models?
For coding I love codex-5.3-xhigh, but for non-coding prompts I still far prefer o3 even if it's considered a legacy model.
I can imagine that its higher tool use is too expensive to serve, but as a pro user I would love it to come back.
Thanks for clarifying! I guess the default for most users is going to be to use the router / auto switcher which is fine since most people won't change the default.
Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).
For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.
I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.
Thank you for confirming!
I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.
> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).
Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.
but why not have "sane defaults but configurable"?
hide away the extra complexity for everyone. give power users a way to get it back.
The model doesn't even need to be exposed in the UI. Let the user specify "use model foobar-4" or "use a coding model" or "use a middle-tier attorney model".
VIM does this well: no UI, magic incantations to use features.
Do your fully autonomous offensive weapons and domestic surveillance systems use Instant?
Not today, but response time would be a lot better if they did.
Forgiveness but while you're here can you look into why the Notion connector in chat doesn't have the capability to write pages but the MCP (which I use via Codex) can? it looks like it's entirely possible, just mostly a missing action in the connector.
none granted.
It's because people like choice and control, and "5.2" vs "5.2 thinking" is confusing. Making them "5.2 instant" and "5.2 thinking" is less confusing to more people. Their competitors already do this (Gemini 3 Fast & Gemini 3 Thinking).
ChatGPT 5.2 Intuitive
ChatGPT 5.2 Ponderous
“I had this dream the other night…” – https://www.youtube.com/watch?v=6gYIbMwswKM
They had ~800k people still using gpt4o daily, presumably for their girlfriends. They need to address them somehow. Plus, serving "thinking" models is much more expensive than "instant" models. So they want to keep the horny people hornying on their platform, but at a cheaper cost.
Are you not vibe coding in girlfriend mode?
I can't fathom using LLMs like this. Does ChatGPT actually do this? I thought people who were into this stuff used dedicated apps or Grok?
Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.
- [deleted]
Reminder that OpenAI serves a lot of customers for free, most of the people I know use the free tier. There is a big limit on thinking queries on free tier, so a decent non thinking model is probably a positive ROI for them.