This Week I Learned about GPT-5
At the announcement post, OpenAI made some bold claims about GPT-5. Including:
The best response, every time
ChatGPT is now designed to think deeply when you need it to.
Great at coding
As a coding collaborator, GPT‑5 tackles complex tasks end-to-end and delivers more readily usable code, better design, and is more effective at debugging.
An expressive writing partner
Create clearer, more compelling messaging for everything from stories to speeches and beyond.
More useful health answers
Our best model yet for health-related questions, providing more precise and reliable responses while acting as more of a proactive thought partner.
Safer and more accurate
Our most reliable model yet. It’s less prone to hallucinations and pretending to know things.
The last two points seemed particularly sus to me. Health answers? I still wouldn’t trust that. “Less prone to hallucinations”? That’s a big claim. Hallucinations have consistently been one of the biggest issues with LLM models. Even though it’s not as bad as the ChatGPT 3.5 days, it’s not clear if this problem will ever be fully solved without a paradigm shift.
My Experimentation
During the week, I experimented with GPT-5. My first impressions on Friday weren’t too positive. Sam Altman tweeted that “the autoswitcher broke” and promised improvements.
GPT-5 did feel better the following week, but still not as good as it was hyped to be. My favorite model until now was OpenAI o3, and even ChatGPT-5 Thinking often didn’t seem as good as I remembered it. Not that I was able to compare, since access to previous models was removed altogether! I eventually got 4o back (after massive backlash from people who formed deep bondings with 4o), but I didn’t get o3 on ChatGPT Team and so was not able to compare.
After a few days, I learned to use GPT-5 better. Because of the auto-router, GPT-5 benefits from more prompt engineering (“think deeply”). This felt like a step back, since previous models got good at knowing what I want. Or I just already learnt how to use them. A few months ago I was confused with ChatGPT offering so many confusingly named models (what’s better, o3 or o4-mini high?). However, I eventually learned to use them and now missed the choice being taken away from me. Even though, from a user perspective, a model that knows how to choose for you to give you the best answer should be a better option. On the other hand, I suspect GPT-5’s auto-router is actually a cost-cutting measure in disguise, behind the scenes too often opting to use cheaper models even though they give noticeably worse answers.
The results I’ve been getting with ChatGPT 5 have been really inconsistent. Some answers are great, others are dumb. I guess that’s par for the course for AI but I was expecting an improvement, and this doesn’t feel like it. The jumps between ChatGPT 3.5 to 4, or 4o to o3 felt more considerable to me.
As for the claim of reduced hallucinations? This has not been my experience. I caught ChatGPT 5 lying in many occasions. It’s hard to compare if it’s any worse than previous models (again because I lost access to them), but it does sometimes feel like it.
Even ChatGPT 5 Thinking hallucinates. In one instance, I was asking ChatGPT 5 (Auto) how to configure a certain GitHub setting for an Organization. ChatGPT 5 confidently answered that it’s impossible. I then switched model to ChatGPT 5 Thinking to see if I would get a different answer. After several minutes of “thinking”, ChatGPT 5 Thinking confidently answered that it’s possible and even gave me exact instructions. Except, the instructions were impossible to follow because the answer was entirely hallucinated. In this case, ChatGPT 5 was more correct than ChatGPT 5 Thinking. The setting didn’t exist (even though both I and ChatGPT 5 Thinking wish that it did).
Cursor CLI with GPT-5
A few hours after GPT-5 was announced, Cursor announced the release of Cursor CLI plus free GPT-5 credits for one week. My 1 month Claude Pro subscription was just ending, so I decided to use Cursor CLI with GPT-5 for the week to experiment with both (compared to Claude Code with Sonnet 4).
Cursor CLI is clearly inspired by Claude Code. I don’t mind the rip-off personally since I like Claude Code. In Claude Code with Sonnet 4 the agent is far more transparent about what it is doing and tends to consult more; it even shows a checklist of the tasks the agent plans and executes. That clarity is missing in Cursor CLI for now: it explains less, simply makes changes, and sometimes it’s not clear why - though you can always stop it and ask questions.
Another thing missing in Cursor CLI is support for MCP, even though regular Cursor already has solid MCP support. But Cursor CLI came out less than a week ago. I assume they will improve it over time.
Aside from those gaps, I got decent results with Cursor CLI. The quality felt comparable to Claude Code, and the interface is almost a complete copy.
After the GPT-5 free credits ended for me, I decided to go back to Claude Code for now (I resubscribed for 1 month of Claude Pro). While Cursor CLI might improve in the future, for now it’s not as good as Claude Code. I also worry that the CLI might be an afterthought for Cursor.
Microsoft Copilot with GPT-5
At my current client, the only approved AI tool is Microsoft 365 Copilot (not GitHub Copilot). I had subpar results with it in the past, so I was glad that it was now updated to use GPT-5.
This was also a good way to experiment with GPT-5 for free. Even without an account, Microsoft Copilot offers a generous amount of GPT-5 requests (you have to remember to enable GPT-5 every time you start a new chat).
Still, the experience of using GPT-5 in Microsoft Copilot feels different from using it in ChatGPT, despite claiming to use the same model. I suspect the infamous auto-router likes to give Microsoft Copilot the cheaper models more often than not, unless you prompt it specifically not to. Even when prompting heavily, I still got considerably faster results than ChatGPT 5 Thinking. Perhaps the Azure backend is more optimized here or maybe Microsoft Copilot rarely gets routed to the best models by GPT-5.
Regardless, I did feel an improvement in the answers compared to the previous Microsoft Copilot models (“Quick response” and “Think Deeper”, which I believe are based on some variation of GPT-4). Even so, Microsoft Copilot is still limited in other ways (compared to ChatGPT), such as a small context window.
Overall conclusion, Microsoft Copilot is usable for basic work but far from my preference. I wouldn’t use it unless I had no other choice (which is the case at the current client).
Usage Notes
- 3,000 GPT-5 Thinking messages per week is a massive bump; it used to be ~200, and o3 was once limited to just 50 (I hit that every week until I started pairing it with Claude).
- I had to ration o3 carefully, so I’m glad the cap is higher now. I doubt I’ll reach 2,000 messages a week even if ChatGPT were my only tool.
- Defaulting to Thinking mode takes a lot longer - sometimes minutes. Usually the answer is better (often worth the wait), but not always.
- At least once GPT-5 (Auto) gave the correct answer while GPT-5 Thinking spent minutes and returned the opposite, wrong answer.
My Overall Impressions on GPT-5
Overall a disappointment, but still useful. I will continue to use it, particularly with ChatGPT 5 Thinking.
OpenAI has addressed some of the negative feedback already and will no doubt continue to improve GPT-5.
Featured image by Igor Omilaev on Unsplash.




