<?xml version="1.0" encoding="utf-8" standalone="yes"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><script src="https://www.rss.style/js/atom-style.js" xmlns="http://www.w3.org/1999/xhtml"/><title>Tower of Kubes</title><link rel="self" type="application/atom+xml" hreflang="en" href="https://www.towerofkubes.com/tags/openai/feed.xml"/><link rel="alternate" type="application/atom+xml" hreflang="he" href="https://www.towerofkubes.com/he/tags/openai/feed.xml"/><link rel="alternate" type="application/atom+xml" hreflang="x-default" href="https://www.towerofkubes.com/he/tags/openai/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/tags/openai/"/><link rel="alternate" type="application/rss+xml" hreflang="en" href="https://www.towerofkubes.com/tags/openai/index.xml"/><id>/</id><updated>2025-08-16T00:00:00Z</updated><author><name>Ro'i Bandel</name></author><generator>Hugo 0.157.0</generator><entry><title>GPT-5</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/gpt-5/"/><id>https://www.towerofkubes.com/articles/gpt-5/</id><updated>2025-08-16T00:00:00Z</updated><summary type="html">Hands-on impressions of GPT-5 across ChatGPT, Cursor CLI, and Microsoft Copilot, plus notes on quotas, hallucinations, and the auto-router trade-offs.</summary><content type="html"><![CDATA[
<h2 class="relative group">This Week I Learned about GPT-5
    <div id="this-week-i-learned-about-gpt-5" class="anchor"></div>
    
</h2>
<p>At the <a href="https://openai.com/gpt-5/"  target="_blank" rel="noreferrer">announcement post</a>, OpenAI made some bold claims about GPT-5. <strong>Including:</strong></p>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>The best response, every time</span>
      </summary>
      <div class="admonition-content">
        <p>ChatGPT is now designed to think deeply when you need it to.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Great at coding</span>
      </summary>
      <div class="admonition-content">
        <p>As a coding collaborator, GPT‑5 tackles complex tasks end-to-end and delivers more readily usable code, better design, and is more effective at debugging.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>An expressive writing partner</span>
      </summary>
      <div class="admonition-content">
        <p>Create clearer, more compelling messaging for everything from stories to speeches and beyond.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>More useful health answers</span>
      </summary>
      <div class="admonition-content">
        <p>Our best model yet for health-related questions, providing more precise and reliable responses while acting as more of a proactive thought partner.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Safer and more accurate</span>
      </summary>
      <div class="admonition-content">
        <p>Our most reliable model yet. It’s less prone to hallucinations and pretending to know things.</p>
      </div>
    </details><p>The last two points seemed particularly <em>sus</em> to me. Health answers? I still wouldn’t trust that. “Less prone to hallucinations”? That’s a big claim. Hallucinations have consistently been one of the biggest issues with LLM models. Even though it’s not as bad as the ChatGPT 3.5 days, it’s not clear if this problem will ever be fully solved without a paradigm shift.</p>

<h2 class="relative group">My Experimentation
    <div id="my-experimentation" class="anchor"></div>
    
</h2>
<p>During the week, I experimented with GPT-5. My first impressions on Friday weren’t too positive. Sam Altman tweeted that “the autoswitcher broke” and promised improvements.</p>
<p>GPT-5 did feel better the following week, but still not as good as it was hyped to be. My favorite model until now was OpenAI o3, and even ChatGPT-5 Thinking often didn’t seem as good as I remembered it. Not that I was able to compare, since access to previous models was removed altogether! I eventually got 4o back (after massive backlash from people who formed deep bondings with 4o), but I didn’t get o3 on ChatGPT Team and so was not able to compare.</p>
<p>After a few days, I learned to use GPT-5 better. Because of the auto-router, GPT-5 benefits from more prompt engineering (“think deeply”). This felt like a step back, since previous models got good at knowing what I want. Or I just already learnt how to use them. A few months ago I was confused with ChatGPT offering so many confusingly named models (what’s better, o3 or o4-mini high?). However, I eventually learned to use them and now missed the choice being taken away from me. Even though, from a user perspective, a model that knows how to choose for you to give you the best answer <em>should</em> be a better option. On the other hand, I suspect GPT-5’s auto-router is actually a cost-cutting measure in disguise, behind the scenes too often opting to use cheaper models even though they give noticeably worse answers.</p>
<p>The results I’ve been getting with ChatGPT 5 have been really inconsistent. Some answers are great, others are dumb. I guess that’s par for the course for AI but I was expecting an improvement, and this doesn’t feel like it. The jumps between ChatGPT 3.5 to 4, or 4o to o3 felt more considerable to me.</p>
<p>As for the claim of reduced hallucinations? This has not been my experience. I caught ChatGPT 5 lying in many occasions. It’s hard to compare if it’s any worse than previous models (again because I lost access to them), but it does sometimes feel like it.</p>
<p>Even ChatGPT 5 Thinking hallucinates. In one instance, I was asking ChatGPT 5 (Auto) how to configure a certain GitHub setting for an Organization. ChatGPT 5 confidently answered that it’s <em>impossible</em>. I then switched model to ChatGPT 5 Thinking to see if I would get a different answer. After several minutes of “thinking”, ChatGPT 5 Thinking confidently answered that it’s <em>possible</em> and even gave me exact instructions. Except, the instructions were impossible to follow because the answer was entirely hallucinated. In this case, ChatGPT 5 was more correct than ChatGPT 5 Thinking. The setting didn’t exist (even though both I and ChatGPT 5 Thinking wish that it did).</p>

<h2 class="relative group">Cursor CLI with GPT-5
    <div id="cursor-cli-with-gpt-5" class="anchor"></div>
    
</h2>
<p>A few hours after GPT-5 was announced, Cursor announced the release of <a href="https://cursor.com/cli"  target="_blank" rel="noreferrer">Cursor CLI</a> plus <strong>free GPT-5 credits for one week</strong>. My 1 month Claude Pro subscription was just ending, so I decided to use Cursor CLI with GPT-5 for the week to experiment with both (compared to Claude Code with Sonnet 4).</p>
<p>Cursor CLI is clearly inspired by Claude Code. I don’t mind the rip-off personally since I like Claude Code. In Claude Code with Sonnet 4 the agent is far more transparent about what it is doing and tends to consult more; it even shows a checklist of the tasks the agent plans and executes. That clarity is missing in Cursor CLI for now: it explains less, simply makes changes, and sometimes it’s not clear why - though you can always stop it and ask questions.</p>
<p>Another thing missing in Cursor CLI is support for MCP, even though regular Cursor already has solid MCP support. But Cursor CLI came out less than a week ago. I assume they will improve it over time.</p>
<p>Aside from those gaps, I got decent results with Cursor CLI. The quality felt comparable to Claude Code, and the interface is almost a complete copy.</p>
<p>After the GPT-5 free credits ended for me, I decided to go back to Claude Code for now (I resubscribed for 1 month of Claude Pro). While Cursor CLI might improve in the future, for now it’s not as good as Claude Code. I also worry that the CLI might be an afterthought for Cursor.</p>

<h2 class="relative group">Microsoft Copilot with GPT-5
    <div id="microsoft-copilot-with-gpt-5" class="anchor"></div>
    
</h2>
<p>At my current client, the only approved AI tool is Microsoft 365 Copilot (<em>not</em> GitHub Copilot). I had subpar results with it in the past, so I was glad that it was now updated to use GPT-5.</p>
<p>This was also a good way to experiment with GPT-5 for free. Even without an account, Microsoft Copilot offers a generous amount of GPT-5 requests (you have to remember to enable GPT-5 every time you start a new chat).</p>
<p>Still, the experience of using GPT-5 in Microsoft Copilot feels different from using it in ChatGPT, despite claiming to use the same model. I suspect the infamous auto-router likes to give Microsoft Copilot the cheaper models more often than not, unless you prompt it specifically not to. Even when prompting heavily, I still got considerably faster results than ChatGPT 5 Thinking. Perhaps the Azure backend is more optimized here or maybe Microsoft Copilot rarely gets routed to the best models by GPT-5.</p>
<p>Regardless, I did feel an improvement in the answers compared to the previous Microsoft Copilot models (“Quick response” and “Think Deeper”, which I believe are based on some variation of GPT-4). Even so, Microsoft Copilot is still limited in other ways (compared to ChatGPT), such as a small context window.</p>
<p>Overall conclusion, Microsoft Copilot is usable for basic work but far from my preference. I wouldn’t use it unless I had no other choice (which is the case at the current client).</p>

<h2 class="relative group">Usage Notes
    <div id="usage-notes" class="anchor"></div>
    
</h2>
<ul>
<li>3,000 GPT-5 Thinking messages per week is a massive bump; it used to be ~200, and o3 was once limited to just 50 (I hit that every week until I started pairing it with Claude).</li>
<li>I had to ration o3 carefully, so I’m glad the cap is higher now. I doubt I’ll reach 2,000 messages a week even if ChatGPT were my only tool.</li>
<li>Defaulting to Thinking mode takes a lot longer - sometimes minutes. Usually the answer is better (often worth the wait), but not always.</li>
<li>At least once GPT-5 (Auto) gave the correct answer while GPT-5 Thinking spent minutes and returned the opposite, wrong answer.</li>
</ul>

<h2 class="relative group">My Overall Impressions on GPT-5
    <div id="my-overall-impressions-on-gpt-5" class="anchor"></div>
    
</h2>
<p>Overall a disappointment, but still useful. I will continue to use it, particularly with ChatGPT 5 Thinking.</p>
<p>OpenAI has addressed some of the negative feedback already and will no doubt continue to improve GPT-5.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@omilaev?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Igor Omilaev</a> on <a href="https://unsplash.com/?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="gpt" label="Gpt" scheme="https://www.towerofkubes.com/tags/gpt/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="openai" label="Openai" scheme="https://www.towerofkubes.com/tags/openai/"/><published>2025-08-16T00:00:00Z</published></entry><entry><title>OpenAI o3 Review</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/openai-o3/"/><id>https://www.towerofkubes.com/articles/openai-o3/</id><updated>2025-05-28T00:00:00Z</updated><summary type="html">Hands-on review of OpenAI o3: deep research-style answers, multi-source web lookups, latency tradeoffs, and comparisons to ChatGPT 4o/4.5/4.1.</summary><content type="html"><![CDATA[<p>I’ve used o3 extensively, and I think it’s a really strong model compared to earlier ChatGPT models.</p>
<p>It doesn’t just “think”; it also does web research and cross-checks sources to reach a conclusion. Other ChatGPT models can browse too, but o3 digs deeper and pulls more sources (when I read its “thoughts,” it said it tries to fetch at least 10 sources).</p>
<p>This is similar to what Deep Research does, which makes sense because ChatGPT’s DR used the o3 model even before it launched. However, DR returns essay-length answers (and is limited to 10 uses per month on ChatGPT Plus), which isn’t always practical. o3 gives answers closer in length to the other ChatGPT models. There are Plus usage limits, but I had to use it quite a lot before hitting them.</p>
<p>The model “thinks” for several minutes before responding. Usually it’s worth the wait, except for simple questions another model could answer faster. For complex questions, o3 is often noticeably better. I tried tough coding prompts that 4o struggled with (confident answers with hallucinations), then asked o3 and got much better results. For bigger tasks I sometimes had to tweak the prompt a few times, but in most cases o3 eventually delivered (unlike 4o).</p>
<p>The model isn’t perfect. There are still hallucinations and mistakes. Neverthless, in my experience fewer than other models I’ve tried.</p>
<p>AI moves so fast that it’s hard to keep up. Last month o3 was probably the best model around, and now people say Gemini 2.5 has overtaken it. It takes time to use a new model enough to really understand its strengths and weaknesses.</p>
<p>I also played a bit with ChatGPT 4.5 and 4.1. I haven’t used them much yet and so far I’m less impressed.</p>
<p>I haven’t tried o4 or o4-mini-high yet. Assuming o3 is better, I’d rather wait a few minutes for deeper reasoning. For simpler questions I still default to 4o.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@siva_photography?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Levart_Photographer</a> on <a href="https://unsplash.com/photos/a-computer-screen-with-a-bunch-of-buttons-on-it-drwpcjkvxuU?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="openai" label="Openai" scheme="https://www.towerofkubes.com/tags/openai/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="gpt" label="Gpt" scheme="https://www.towerofkubes.com/tags/gpt/"/><category term="chatgpt" label="Chatgpt" scheme="https://www.towerofkubes.com/tags/chatgpt/"/><published>2025-05-28T00:00:00Z</published></entry></feed>