<?xml version="1.0" encoding="utf-8" standalone="yes"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><script src="https://www.rss.style/js/atom-style.js" xmlns="http://www.w3.org/1999/xhtml"/><title>Tower of Kubes</title><link rel="self" type="application/atom+xml" hreflang="en" href="https://www.towerofkubes.com/tags/ai/feed.xml"/><link rel="alternate" type="application/atom+xml" hreflang="he" href="https://www.towerofkubes.com/he/tags/ai/feed.xml"/><link rel="alternate" type="application/atom+xml" hreflang="x-default" href="https://www.towerofkubes.com/tags/ai/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/tags/ai/"/><link rel="alternate" type="application/rss+xml" hreflang="en" href="https://www.towerofkubes.com/tags/ai/index.xml"/><id>/</id><updated>2026-05-05T00:00:00Z</updated><author><name>Ro'i Bandel</name></author><generator>Hugo 0.157.0</generator><entry><title>OpenCode: The Agentic Tool That Anthropic and Google Don't Want You To Use</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/opencode/"/><id>https://www.towerofkubes.com/articles/opencode/</id><updated>2026-05-05T00:00:00Z</updated><summary type="html">OpenCode is the open-source agentic CLI tool that both Anthropic and Google moved to block from their subscription APIs. Here’s a hands-on look at what makes it genuinely different, and whether it’s worth switching from Claude Code.</summary><content type="html"><![CDATA[<p>For the past four months, <a href="https://opencode.ai"  target="_blank" rel="noreferrer">OpenCode</a> has been my primary agent tool. A piece of AI industry drama is what brought it to my attention.</p>
<figure><img
    class="my-0 rounded-md"
    loading="lazy"
    decoding="async"
    fetchpriority="auto"
    alt="OpenCode logo with tagline: “The open source AI coding agent”"
    width="1280"
    height="721"
    src="/articles/opencode/opencode-logo-with-tagline_hu_897512d6efaab35c.webp"
    srcset="/articles/opencode/opencode-logo-with-tagline_hu_897512d6efaab35c.webp 800w, /articles/opencode/opencode-logo-with-tagline.webp 1280w"
    sizes="(min-width: 768px) 50vw, 65vw"
    data-zoom-src="/articles/opencode/opencode-logo-with-tagline.webp"></figure>

<h2 class="relative group">Background
    <div id="background" class="anchor"></div>
    
</h2>
<p>In January 2026, I started seeing drama online: <a href="https://github.com/anomalyco/opencode/issues/7410"  target="_blank" rel="noreferrer">Anthropic blocks third-party use of Claude subscriptions</a>. The most surprising part to me wasn’t that Anthropic decided to block this type of usage, that’s unfortunate but expected. What surprised me was that I hadn’t known this was even possible in the first place.</p>
<p>I had briefly read about OpenCode and Crush during my <a href="/articles/agentic-cli-tools-comparison/" >Agentic CLI Tools Comparison</a>, but hadn’t used them due to their <a href="/articles/agentic-cli-tools-comparison/#byo-bring-your-own-api-keys" >BYO (Bring Your Own) API key requirement</a>, which in most cases is significantly more expensive than subscription tiers. As it turns out, people had found ways to use those subscriptions anyway. OpenCode had implemented an OAuth flow that spoofed Claude Code’s HTTP headers to authenticate against Anthropic’s API with a Claude Pro or Max subscription. This gave OpenCode users access to Claude models at subscription pricing, a significant cost advantage.</p>

<h3 class="relative group">The Crackdown
    <div id="the-crackdown" class="anchor"></div>
    
</h3>
<p>Anthropic’s response came in several phases. Active enforcement began on January 9, 2026, when Anthropic deployed server-side protections blocking all unofficial OAuth access. On February 19, Anthropic updated its legal compliance page to make the OAuth restriction explicit: OAuth tokens obtained from Claude subscription accounts are only permitted for use with official Claude tools.</p>
<p>Legal requests followed, and in mid-March OpenCode’s maintainers <a href="https://github.com/anomalyco/opencode/pull/18186"  target="_blank" rel="noreferrer">merged a PR</a> removing the Anthropic OAuth plugin from the project. By early April, Anthropic extended restrictions to OpenClaw and other third-party harnesses. Google ran the same playbook with Gemini around the same period, banning third-party OAuth access and issuing account-level suspensions.</p>

<h3 class="relative group">The Community Reaction
    <div id="the-community-reaction" class="anchor"></div>
    
</h3>
<p>The <a href="https://news.ycombinator.com/item?id=46549823"  target="_blank" rel="noreferrer">Hacker News thread</a> filled with genuine disappointment. Many users felt OpenCode was a significantly better tool than Claude Code. The main advantages cited were its open-source <a href="https://github.com/anomalyco/opencode#MIT-1-ov-file"  target="_blank" rel="noreferrer">MIT license</a>, an optional web UI and client/server architecture, and the absence of flickering, a complaint about Claude Code that hasn’t gone away. OpenCode had also grown remarkably fast, reaching over 150,000 GitHub stars.</p>
<p>OpenAI and GitHub went the other direction. Tibo, OpenAI’s Codex lead, <a href="https://x.com/thsottiaux/status/2009742187484065881"  target="_blank" rel="noreferrer">announced on X</a> that Codex subscribers could use their subscription directly within OpenCode, and GitHub formally <a href="https://github.blog/changelog/2026-01-16-github-copilot-now-supports-opencode/"  target="_blank" rel="noreferrer">announced support for OpenCode</a> across all GitHub Copilot subscriptions. That’s what originally got me to give OpenCode a real try, paired with GitHub Copilot and ChatGPT subscriptions, and I’ve been using it regularly since.</p>

<h2 class="relative group">My Impressions of OpenCode
    <div id="my-impressions-of-opencode" class="anchor"></div>
    
</h2>
<p>OpenCode immediately seemed appealing when I started using it. Until that point, Claude Code had remained my preferred agentic CLI tool. In the months since I wrote <a href="/articles/agentic-cli-tools-comparison/" >Agentic CLI Tools Comparison</a>, I had continued experimenting with different CLI tools and models, notably <a href="/articles/claude-sonnet-4.5-and-claude-code-2.0/" >Claude Code 2.0</a>, Codex CLI, Gemini CLI, and GitHub Copilot CLI. Claude Code consistently remained the best tool in my opinion, both in terms of UI design and features, and in terms of Anthropic’s models feeling the strongest at coding and agentic tool usage based on my experience. The other tools felt like UI imitations of Claude Code running different models, with no meaningful improvements. OpenCode is genuinely different, though. It runs on a client/server model with an HTTP API, supports 75+ AI providers including local models, and has native multi-session support.</p>
<p>When opening OpenCode in a terminal, it feels familiar but different. The starting screen looks a lot like a classic search engine, with the prompt box centered on the screen, rather than being off to the bottom like in most other agentic CLI tools.</p>
<figure><img
    class="my-0 rounded-md"
    loading="lazy"
    decoding="async"
    fetchpriority="auto"
    alt="OpenCode welcome screen"
    width="1280"
    height="640"
    src="/articles/opencode/opencode-welcome-screen_hu_74a83788b244a153.webp"
    srcset="/articles/opencode/opencode-welcome-screen_hu_74a83788b244a153.webp 800w, /articles/opencode/opencode-welcome-screen.webp 1280w"
    sizes="(min-width: 768px) 50vw, 65vw"
    data-zoom-src="/articles/opencode/opencode-welcome-screen.webp"></figure>
<p>However, once you enter an initial prompt, the prompt box moves to the bottom of the terminal, making for a more familiar look. In my opinion, OpenCode strikes a good balance: it will feel familiar to users who have used Claude Code (and similar tools) before, but at the same time it does not feel like a clone of other tools. OpenCode does a lot of unique things that other tools don’t do. For example, OpenCode has a useful sidebar that displays information about active MCPs, LSPs (language servers) and token usage for the current session.</p>
<figure><img
    class="my-0 rounded-md"
    loading="lazy"
    decoding="async"
    fetchpriority="auto"
    alt="OpenCode sidebar showing MCP connections, LSP status, and token usage"
    width="1920"
    height="900"
    src="/articles/opencode/opencode-sidebar_hu_c305c873a185037.webp"
    srcset="/articles/opencode/opencode-sidebar_hu_c305c873a185037.webp 800w, /articles/opencode/opencode-sidebar_hu_116abb3292d419d0.webp 1280w"
    sizes="(min-width: 768px) 50vw, 65vw"
    data-zoom-src="/articles/opencode/opencode-sidebar.webp"></figure>
<p>The look of OpenCode becomes even more unique when using its <a href="https://opencode.ai/docs/web/"  target="_blank" rel="noreferrer">web UI</a> or the OpenCode desktop app.</p>
<figure><img
    class="my-0 rounded-md"
    loading="lazy"
    decoding="async"
    fetchpriority="auto"
    alt="OpenCode Web - New Session"
    width="1400"
    height="997"
    src="/articles/opencode/opencode-web-homepage-new-session_hu_2e42a15a6c01ef0b.webp"
    srcset="/articles/opencode/opencode-web-homepage-new-session_hu_2e42a15a6c01ef0b.webp 800w, /articles/opencode/opencode-web-homepage-new-session_hu_9c29e52cad0c0b1.webp 1280w"
    sizes="(min-width: 768px) 50vw, 65vw"
    data-zoom-src="/articles/opencode/opencode-web-homepage-new-session.webp"></figure>
<p><em>Image source: <a href="https://opencode.ai/docs/web/"  target="_blank" rel="noreferrer">Web | OpenCode</a></em></p>

<h3 class="relative group">Models and Providers
    <div id="models-and-providers" class="anchor"></div>
    
</h3>
<p>When first using OpenCode, it defaults to using the OpenCode Zen models. As of today, <a href="https://opencode.ai/docs/zen/#pricing"  target="_blank" rel="noreferrer">OpenCode Zen offers several free models</a>, as well as paid models.</p>

    <div class="admonition tip">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 384 512"><path d="M272 384c9.6-31.9 29.5-59.1 49.2-86.2c0 0 0 0 0 0c5.2-7.1 10.4-14.2 15.4-21.4c19.8-28.5 31.4-63 31.4-100.3C368 78.8 289.2 0 192 0S16 78.8 16 176c0 37.3 11.6 71.9 31.4 100.3c5 7.2 10.2 14.3 15.4 21.4c0 0 0 0 0 0c19.8 27.1 39.7 54.4 49.2 86.2l160 0zM192 512c44.2 0 80-35.8 80-80l0-16-160 0 0 16c0 44.2 35.8 80 80 80zM112 176c0 8.8-7.2 16-16 16s-16-7.2-16-16c0-61.9 50.1-112 112-112c8.8 0 16 7.2 16 16s-7.2 16-16 16c-44.2 0-80 35.8-80 80z"/></svg>
        <span>When using OpenCode Zen, it’s recommended to read about the <a href="https://opencode.ai/docs/zen/#privacy"  target="_blank" rel="noreferrer">privacy for each model</a>.</span>
      </div>
    </div><p>These paid models can either be used by paying for credits (similar to OpenRouter) or using the <a href="https://opencode.ai/go"  target="_blank" rel="noreferrer">OpenCode Go subscription</a>. However, OpenCode does not limit to only using their offering. One of the best features of OpenCode is its wide <a href="https://opencode.ai/docs/providers/"  target="_blank" rel="noreferrer">provider</a> support. LLM models can be used from practically any provider (that hasn’t outright blocked OpenCode), or even use local models. This provides users a lot of flexibility to use the same tool across many different models, with one unified agent harness. It also means users are not “locked-in” to one provider if they want to continue using OpenCode. When providers change the terms, such as Claude and Gemini limiting usage of OpenCode, or <a href="https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/"  target="_blank" rel="noreferrer">GitHub Copilot changing the terms of their subscriptions</a>, OpenCode users can just move to other providers and continue their existing workflow.</p>

<h3 class="relative group">Agentic Tool Usage
    <div id="agentic-tool-usage" class="anchor"></div>
    
</h3>
<p>Using one tool for all providers also means that I can have a unified place to configure my <a href="https://modelcontextprotocol.io"  target="_blank" rel="noreferrer">MCP</a> servers, <a href="https://agentskills.io"  target="_blank" rel="noreferrer">Skills</a> and <a href="https://agents.md/"  target="_blank" rel="noreferrer">AGENTS.md</a> files. While there have been attempts to standardize the agents world, including the <a href="https://aaif.io/"  target="_blank" rel="noreferrer">Agentic AI Foundation (AAIF)</a>, the reality is that agentic tools still have different ways to configure. For example, Anthropic to date has refused to adopt the usage of the <code>AGENTS.md</code> file, instead referring only to the <code>CLAUDE.md</code> file.</p>
<p>OpenCode supports these emerging agent standards, as well as <a href="https://opencode.ai/docs/lsp/"  target="_blank" rel="noreferrer">LSP servers</a> (Language Server Protocol, which has been around before agents, to give code editors better support for programming languages). At the same time, <a href="https://opencode.ai/docs/config/"  target="_blank" rel="noreferrer">OpenCode also has its own config file</a>.</p>
<p>As an example, if you want to configure <a href="/articles/chrome-devtools-mcp" >Chrome DevTools MCP server</a>, add the following to your <a href="https://opencode.ai/docs/config/"  target="_blank" rel="noreferrer">OpenCode config</a>:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"$schema"</span><span class="p">:</span> <span class="s2">"https://opencode.ai/config.json"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"mcp"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">"chrome-devtools"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"local"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"command"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"npx"</span><span class="p">,</span> <span class="s2">"-y"</span><span class="p">,</span> <span class="s2">"chrome-devtools-mcp@latest"</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div></div>
<p>OpenCode also supports a range of <a href="https://opencode.ai/docs/tools/"  target="_blank" rel="noreferrer">built-in tools</a>, including web searches. One of my personal favorite tools is the <a href="https://opencode.ai/docs/tools/#question"  target="_blank" rel="noreferrer">question tool</a>. It allows the model to ask you questions mid-task: for gathering preferences, clarifying instructions, or getting decisions on implementation choices. Each question includes a header, question text, and a list of options, with the ability to type a custom answer. When there are multiple questions, you can navigate between them before submitting.</p>
<figure><img
    class="my-0 rounded-md"
    loading="lazy"
    decoding="async"
    fetchpriority="auto"
    alt="OpenCode question tool prompting a choice of rollout strategy"
    width="1280"
    height="500"
    src="/articles/opencode/opencode-question-tool_hu_88c5c43675966c68.webp"
    srcset="/articles/opencode/opencode-question-tool_hu_88c5c43675966c68.webp 800w, /articles/opencode/opencode-question-tool.webp 1280w"
    sizes="(min-width: 768px) 50vw, 65vw"
    data-zoom-src="/articles/opencode/opencode-question-tool.webp"></figure>

<h3 class="relative group">It’s Dangerous: Permissions and Safety
    <div id="its-dangerous-permissions-and-safety" class="anchor"></div>
    
</h3>
<p>OpenCode is a powerful tool, and with great power comes great responsibility. By default, it will happily edit anything, run anything, and delete anything without asking, which can feel great for vibe-coding but can also wreak havoc on your machine and codebases if left unchecked. For users that are coming from Claude Code, the default permissions feel similar to the <code>claude --dangerously-skip-permissions</code> flag. By default, OpenCode does not ask permission for anything. It edits files freely and can run <em>any</em> command. Even when using “Plan” mode (instead of “Build” mode), OpenCode can still run commands (by default the “Plan” mode only disallows file edits). Fortunately, this is fairly easy to fix. To get a locked-down OpenCode, add this to your <a href="https://opencode.ai/docs/config/"  target="_blank" rel="noreferrer">OpenCode config</a>:</p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"$schema"</span><span class="p">:</span> <span class="s2">"https://opencode.ai/config.json"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"permission"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">"*"</span><span class="p">:</span> <span class="s2">"ask"</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div></div>

    <div class="admonition tip">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 384 512"><path d="M272 384c9.6-31.9 29.5-59.1 49.2-86.2c0 0 0 0 0 0c5.2-7.1 10.4-14.2 15.4-21.4c19.8-28.5 31.4-63 31.4-100.3C368 78.8 289.2 0 192 0S16 78.8 16 176c0 37.3 11.6 71.9 31.4 100.3c5 7.2 10.2 14.3 15.4 21.4c0 0 0 0 0 0c19.8 27.1 39.7 54.4 49.2 86.2l160 0zM192 512c44.2 0 80-35.8 80-80l0-16-160 0 0 16c0 44.2 35.8 80 80 80zM112 176c0 8.8-7.2 16-16 16s-16-7.2-16-16c0-61.9 50.1-112 112-112c8.8 0 16 7.2 16 16s-7.2 16-16 16c-44.2 0-80 35.8-80 80z"/></svg>
        <span><a href="https://opencode.ai/docs/permissions/"  target="_blank" rel="noreferrer">OpenCode Permissions</a> can be customized further.</span>
      </div>
    </div><p>It is also worth running OpenCode in a sandboxed environment. Refer to my previous article on <a href="/articles/claude-code-sandboxing" >Claude Code Sandboxing</a> for examples on how to achieve this.</p>

<h2 class="relative group">Final Verdict: Is OpenCode Better Than Claude Code?
    <div id="final-verdict-is-opencode-better-than-claude-code" class="anchor"></div>
    
</h2>
<p>Overall, OpenCode is a very compelling agent tool, with wide model support and lots of features. It is certainly among the best AI tools I have ever used.</p>
<p>On the question of “OpenCode vs. Claude Code”, I would say both tools are honestly equally strong. OpenCode felt like a breath of fresh air after months of using Claude Code, with many unique features. For example, mouse support, which Claude Code has only recently gained and is currently still a preview feature. At the same time, going back to Claude Code after several months of only using OpenCode, I have noticed Anthropic have not been resting and have been frantically adding new features to Claude Code, including plugins and a plugin marketplace, Agent Teams for multi-agent orchestration, the <code>/btw</code> command for lightweight side questions, and Auto mode, a new permission tier that sits between manual approval and skipping permissions entirely.</p>
<p>Overall, OpenCode feels surprisingly more polished (despite being developed by a much smaller team), while Claude Code has the edge in raw features. Nevertheless, the tools feel very close in quality. The choice between them ultimately comes down to one question: do you have a Claude subscription?</p>
<p>As I explained at the opening of this article, Anthropic has made their stance clear that Claude subscriptions are only for use within official Claude tools, and third-party tool usage is blocked for subscribers. Claude Code also locks you into Claude models exclusively, with no support for other providers.</p>
<p>If you’re already paying for a Claude subscription, Claude Code is the natural fit, as it’s the only tool where Anthropic’s subscriptions are officially supported. If you’re not, OpenCode’s model flexibility and open-source nature make it a compelling alternative that gives you full control over both your models and your costs.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@sonance?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Viktor Forgacs</a> on <a href="https://unsplash.com/photos/red-and-white-open-neon-signage-LNwIJHUtED4?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="cli" label="Cli" scheme="https://www.towerofkubes.com/tags/cli/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="backman-feed" label="Backman-Feed" scheme="https://www.towerofkubes.com/tags/backman-feed/"/><published>2026-05-05T00:00:00Z</published></entry><entry><title>Claude Code Sandboxing</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/claude-code-sandboxing/"/><id>https://www.towerofkubes.com/articles/claude-code-sandboxing/</id><updated>2026-01-13T00:00:00Z</updated><summary type="html">Ways to run Claude Code in a sandbox</summary><content type="html"><![CDATA[<p>A couple of days ago, my coworker Roey Wullman wrote this article: <a href="https://www.develeap.com/claude-code-sandboxing-stop-babysitting-your-ai-assistant/roey/"  target="_blank" rel="noreferrer">Claude Code Sandboxing: Stop Babysitting Your AI Assistant</a> (published in <a href="https://www.develeap.com/magazine/"  target="_blank" rel="noreferrer">Develeap’s Magazine</a>).</p>
<p>This morning, I saw the latest announcement by Anthropic: <a href="https://claude.com/blog/cowork-research-preview"  target="_blank" rel="noreferrer">Introducing Cowork | Claude</a>, then read the <a href="https://news.ycombinator.com/item?id=46593022"  target="_blank" rel="noreferrer">comments on Hacker News</a>. Some of the comments discussed how secure Cowork is (or isn’t) and how it’s sandboxing works. Then other comments mentioned different approaches of sandboxing <a href="/articles/claude-sonnet-4.5-and-claude-code-2.0/" >Claude Code</a> (e.g. <a href="https://news.ycombinator.com/item?id=46594916"  target="_blank" rel="noreferrer">this comment</a> and <a href="https://news.ycombinator.com/item?id=46594059"  target="_blank" rel="noreferrer">these comments</a>).</p>

<h2 class="relative group">Ways to Sandbox Claude Code
    <div id="ways-to-sandbox-claude-code" class="anchor"></div>
    
</h2>
<ul>
<li><a href="https://www.develeap.com/claude-code-sandboxing-stop-babysitting-your-ai-assistant/roey/"  target="_blank" rel="noreferrer">Claude Code Sandboxing: Stop Babysitting Your AI Assistant - Develeap</a></li>
<li><a href="https://github.com/nezhar/claude-container"  target="_blank" rel="noreferrer">nezhar/claude-container: Container workflow for Claude Code. Complete isolation from host system while maintaining persistent credentials and workspace access.</a></li>
<li><a href="https://github.com/ashishb/amazing-sandbox"  target="_blank" rel="noreferrer">ashishb/amazing-sandbox: Amazing Sandbox  - inspired from https://ashishb.net/programming/run-tools-inside-docker/</a></li>
<li><a href="https://github.com/dagger/container-use"  target="_blank" rel="noreferrer">dagger/container-use: Development environments for coding agents. Enable multiple agents to work safely and independently with your preferred stack.</a></li>
<li><a href="https://github.com/mensfeld/claude-on-incus"  target="_blank" rel="noreferrer">mensfeld/claude-on-incus: Run coding agents in isolated Incus containers with session persistence, workspace isolation, and multi-slot support.</a></li>
</ul>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@markusspiske?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Markus Spiske</a> on <a href="https://unsplash.com/photos/green-and-black-tractor-toy-KU3lOAiP-tQ?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="claude" label="Claude" scheme="https://www.towerofkubes.com/tags/claude/"/><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="security" label="Security" scheme="https://www.towerofkubes.com/tags/security/"/><published>2026-01-13T00:00:00Z</published></entry><entry><title>Chrome DevTools MCP server</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/chrome-devtools-mcp/"/><id>https://www.towerofkubes.com/articles/chrome-devtools-mcp/</id><updated>2025-11-16T00:00:00Z</updated><summary type="html">Comparison of Playwright MCP server vs. Chrome DevTools MCP server</summary><content type="html"><![CDATA[<p>I have recently been using <a href="https://github.com/ChromeDevTools/chrome-devtools-mcp"  target="_blank" rel="noreferrer">Chrome DevTools MCP server</a> (which I tend to call Chrome MCP) to work on personal projects, notably <a href="https://github.com/CALMe25"  target="_blank" rel="noreferrer">CALMe</a>. In my first day of using MCP, I added <a href="https://github.com/microsoft/playwright-mcp"  target="_blank" rel="noreferrer">Playwright MCP server</a> to my <code>.mcp.json</code>. Both Playwright MCP and Chrome DevTools are MCP <em>servers</em> that work in similar ways, they give MCP <em>clients</em> (<a href="/articles/agentic-cli-tools-comparison/" >agentic CLI tools</a>) various tools that give the ability to browse web pages, click on buttons, read console logs and even “see” how the web page looks by allowing the client to take screenshots/snapshots. Playwright MCP is based on the <a href="https://github.com/microsoft/playwright"  target="_blank" rel="noreferrer">Playwright</a> framework for Web Testing and Automation, and is developed by Microsoft. Chrome DevTools MCP is based on the world’s most popular browser, and specifically its <a href="https://developer.chrome.com/docs/devtools"  target="_blank" rel="noreferrer">DevTools</a>, and is developed by Google. Two big tech giants, which means these MCPs are well developed.</p>

<h2 class="relative group">The comment that prompted me to try Chrome DevTools MCP
    <div id="the-comment-that-prompted-me-to-try-chrome-devtools-mcp" class="anchor"></div>
    
</h2>
<p>While Playwright MCP was working okay for me, I saw that Chrome DevTools was released after and wondered if it’s any better.</p>
<p>A comment from this thread (which I also linked in Cool MCP Servers) prompted me to try it: <a href="https://www.reddit.com/r/ClaudeCode/comments/1olhiam/what_mcps_are_you_using_with_claude_code_right_now/#nmkg5oz"  target="_blank" rel="noreferrer">What MCPs are you using with Claude Code right now? : r/ClaudeCode</a></p>

    <div class="admonition question">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M256 512A256 256 0 1 0 256 0a256 256 0 1 0 0 512zM169.8 165.3c7.9-22.3 29.1-37.3 52.8-37.3l58.3 0c34.9 0 63.1 28.3 63.1 63.1c0 22.6-12.1 43.5-31.7 54.8L280 264.4c-.2 13-10.9 23.6-24 23.6c-13.3 0-24-10.7-24-24l0-13.5c0-8.6 4.6-16.5 12.1-20.8l44.3-25.4c4.7-2.7 7.6-7.7 7.6-13.1c0-8.4-6.8-15.1-15.1-15.1l-58.3 0c-3.4 0-6.4 2.1-7.5 5.3l-.4 1.2c-4.4 12.5-18.2 19-30.6 14.6s-19-18.2-14.6-30.6l.4-1.2zM224 352a32 32 0 1 1 64 0 32 32 0 1 1 -64 0z"/></svg>
        <span>Question</span>
      </div>
      <div class="admonition-content">
        <p>What’s the advantage of chrome devtools vs playwright mcp?</p>
      </div>
    </div><hr>

    <div class="admonition conclusion">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 576 512"><path d="M0 64C0 28.7 28.7 0 64 0L224 0l0 128c0 17.7 14.3 32 32 32l128 0 0 38.6C310.1 219.5 256 287.4 256 368c0 59.1 29.1 111.3 73.7 143.3c-3.2 .5-6.4 .7-9.7 .7L64 512c-35.3 0-64-28.7-64-64L0 64zm384 64l-128 0L256 0 384 128zM288 368a144 144 0 1 1 288 0 144 144 0 1 1 -288 0zm211.3-43.3c-6.2-6.2-16.4-6.2-22.6 0L416 385.4l-28.7-28.7c-6.2-6.2-16.4-6.2-22.6 0s-6.2 16.4 0 22.6l40 40c6.2 6.2 16.4 6.2 22.6 0l72-72c6.2-6.2 6.2-16.4 0-22.6z"/></svg>
        <span>Conclusion</span>
      </div>
      <div class="admonition-content">
        <p>Faster, more capable. Reads the console logs, and can execute scripts. The long screenshots are great too</p>
<p>I used to use playwright but Chrome dev tools blew me away</p>
      </div>
    </div>
<h2 class="relative group">Guide: Using Chrome DevTools MCP
    <div id="guide-using-chrome-devtools-mcp" class="anchor"></div>
    
</h2>

<h3 class="relative group">Claude Code
    <div id="claude-code" class="anchor"></div>
    
</h3>
<p><strong>At the project level, run:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">claude mcp add --scope project chrome-devtools npx chrome-devtools-mcp@latest</span></span></code></pre></div></div>
<p><strong>This configures the following in the <code>.mcp.json</code> file:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"mcpServers"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">"chrome-devtools"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"stdio"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"npx"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"args"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="s2">"chrome-devtools-mcp@latest"</span>
</span></span><span class="line"><span class="cl">      <span class="p">],</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"env"</span><span class="p">:</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div></div>
<p>Then simply open a new instance of <code>claude</code> and confirm that you trust the folder and MCP server. Run the <code>/mcp</code> slash command to verify that the MCP server appears as “✔ connected”.</p>
<p>To use the MCP server, I simply tell Claude something like “use chrome mcp to test and troubleshoot website x”. I would add more context depending on the specific task, but in general this is enough to let Claude know that it can use this MCP server.</p>

<h3 class="relative group">Codex CLI
    <div id="codex-cli" class="anchor"></div>
    
</h3>
<p>The Codex CLI sandbox makes working with Chrome DevTools MCP more challenging, though I managed to make it work (<strong>Source:</strong> <a href="https://github.com/ChromeDevTools/chrome-devtools-mcp?tab=readme-ov-file#connecting-to-a-running-chrome-instance"  target="_blank" rel="noreferrer">Connecting to a running Chrome instance | ChromeDevTools/chrome-devtools-mcp: Chrome DevTools for coding agents</a>).</p>
<p><strong>Run the following command:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">codex mcp add chrome-devtools -- npx chrome-devtools-mcp@latest --browser-url<span class="o">=</span><span class="s2">"http://127.0.0.1:9222"</span></span></span></code></pre></div></div>
<p><strong>In addition, if live websites need to be tested, allow network access by adding the following lines to the global Codex config:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"><span class="line"><span class="cl"><span class="p">[</span><span class="nx">mcp_servers</span><span class="p">.</span><span class="nx">chrome-devtools</span><span class="p">]</span> 
</span></span><span class="line"><span class="cl"><span class="nx">command</span> <span class="p">=</span> <span class="s2">"npx"</span> 
</span></span><span class="line"><span class="cl"><span class="nx">args</span> <span class="p">=</span> <span class="p">[</span><span class="s2">"chrome-devtools-mcp@latest"</span><span class="p">,</span> <span class="s2">"--browser-url=http://127.0.0.1:9222"</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">[</span><span class="nx">sandbox_workspace_write</span><span class="p">]</span> 
</span></span><span class="line"><span class="cl"><span class="nx">network_access</span> <span class="p">=</span> <span class="kc">true</span> </span></span></code></pre></div></div>
<p><strong>Now, every time we want to use Codex CLI with Chrome DevTools MCP, we must first run this command in the background:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">nohup /usr/bin/google-chrome --remote-debugging-port<span class="o">=</span><span class="m">9222</span> --user-data-dir<span class="o">=</span>/tmp/chrome-debug-headful --no-first-run --disable-gpu about:blank >/tmp/chrome-launch.log 2><span class="p">&</span><span class="m">1</span></span></span></code></pre></div></div>

<h3 class="relative group">Gemini CLI
    <div id="gemini-cli" class="anchor"></div>
    
</h3>
<p><strong>At the project level, run:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-shell" data-lang="shell"><span class="line"><span class="cl">gemini mcp add chrome-devtools npx chrome-devtools-mcp@latest</span></span></code></pre></div></div>
<p><strong>This configures the following project settings:</strong></p>
<div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl">  <span class="nt">"mcpServers"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nt">"chrome-devtools"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"npx"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">      <span class="nt">"args"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="s2">"chrome-devtools-mcp@latest"</span>
</span></span><span class="line"><span class="cl">      <span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></div></div>

<h3 class="relative group">Other MCP clients
    <div id="other-mcp-clients" class="anchor"></div>
    
</h3>
<p>Follow the instructions in <a href="https://github.com/ChromeDevTools/chrome-devtools-mcp?tab=readme-ov-file#mcp-client-configuration"  target="_blank" rel="noreferrer">MCP Client configuration | ChromeDevTools/chrome-devtools-mcp: Chrome DevTools for coding agents</a>.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@growtika?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Growtika</a> on <a href="https://unsplash.com/?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="chrome" label="Chrome" scheme="https://www.towerofkubes.com/tags/chrome/"/><category term="browser" label="Browser" scheme="https://www.towerofkubes.com/tags/browser/"/><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="agents" label="Agents" scheme="https://www.towerofkubes.com/tags/agents/"/><category term="mcp" label="Mcp" scheme="https://www.towerofkubes.com/tags/mcp/"/><category term="google" label="Google" scheme="https://www.towerofkubes.com/tags/google/"/><published>2025-11-16T00:00:00Z</published></entry><entry><title>MCP Security</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/mcp-security/"/><id>https://www.towerofkubes.com/articles/mcp-security/</id><updated>2025-11-04T00:00:00Z</updated><summary type="html">How to harden MCP security: real-world horror stories, supply-chain risks, malicious servers, and practical defenses for agentic CLI tools.</summary><content type="html"><![CDATA[<p><a href="https://zivawernick.wixstudio.com/home"  target="_blank" rel="noreferrer">Ziva Wernick</a> did a Google AI workshop today and learned about MCP. She raised valuable concerns about MCP security and privacy.</p>
<ol>
<li><strong>Security:</strong> Has to do with the security risk of using MCP servers, and the possibility of those servers to facilitate malicious actions.</li>
<li><strong>Privacy:</strong> Has to do with AI tools constantly collecting private information. In some cases there may be an option to opt-out, or pay for an enterprise license that limits what the provider can do with the data.</li>
</ol>
<p>I will focus on <strong>Security</strong> in regards to how it works with <a href="/articles/agentic-cli-tools-comparison/" >agentic CLI tools</a> and MCP servers.</p>

<h2 class="relative group">MCP Horror Stories
    <div id="mcp-horror-stories" class="anchor"></div>
    
</h2>
<p><a href="https://www.docker.com/blog/"  target="_blank" rel="noreferrer">Docker Blog</a> wrote a series called <strong>MCP Horror Stories</strong>:</p>
<ol>
<li><strong>Part 1:</strong> <a href="https://www.docker.com/blog/mcp-security-issues-threatening-ai-infrastructure/"  target="_blank" rel="noreferrer">MCP Security Issues Threatening AI Infrastructure | Docker</a></li>
<li><strong>Part 2:</strong> <a href="https://www.docker.com/blog/mcp-horror-stories-the-supply-chain-attack/"  target="_blank" rel="noreferrer">MCP Horror Stories: The Supply Chain Attack | Docker</a></li>
<li><strong>Part 3:</strong> <a href="https://www.docker.com/blog/mcp-horror-stories-github-prompt-injection/"  target="_blank" rel="noreferrer">The GitHub Prompt Injection Data Heist | Docker</a></li>
<li><strong>Part 4:</strong> <a href="https://www.docker.com/blog/mpc-horror-stories-cve-2025-49596-local-host-breach/"  target="_blank" rel="noreferrer">MCP Horror Stories: The Drive-By Localhost Breach | Docker</a></li>
</ol>
<p>Unrelated to Docker, there’s also this article that features “Five Horror Stories That Actually Happened”: <a href="https://www.ajeetraina.com/the-day-i-told-800-engineers-their-ai-dreams-could-become-security-nightmares/"  target="_blank" rel="noreferrer">The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares</a></p>

    <div class="admonition abstract">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 384 512"><path d="M64 0C28.7 0 0 28.7 0 64L0 448c0 35.3 28.7 64 64 64l256 0c35.3 0 64-28.7 64-64l0-288-128 0c-17.7 0-32-14.3-32-32L224 0 64 0zM256 0l0 128 128 0L256 0zM112 256l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16zm0 64l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16zm0 64l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16z"/></svg>
        <span>Five Horror Stories That Actually Happened 😱</span>
      </div>
      <div class="admonition-content">
        <ol>
<li>The GitHub Data Heist (CVSS: 9.6/10)</li>
<li>The mcp-remote Catastrophe (437,000 Environments Compromised)</li>
<li>Container Escape via Tool Poisoning (CVSS: 9.4/10)</li>
<li>The Great Secrets Exposure</li>
<li>WhatsApp MCP Shadowing</li>
</ol>
      </div>
    </div><hr>

    <div class="admonition info">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M256 512A256 256 0 1 0 256 0a256 256 0 1 0 0 512zM216 336l24 0 0-64-24 0c-13.3 0-24-10.7-24-24s10.7-24 24-24l48 0c13.3 0 24 10.7 24 24l0 88 8 0c13.3 0 24 10.7 24 24s-10.7 24-24 24l-80 0c-13.3 0-24-10.7-24-24s10.7-24 24-24zm40-208a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg>
        <span>For more information on each “horror story”, read the full article:</span>
      </div>
      <div class="admonition-content">
        <p><a href="https://www.ajeetraina.com/the-day-i-told-800-engineers-their-ai-dreams-could-become-security-nightmares/"  target="_blank" rel="noreferrer">The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares</a></p>
      </div>
    </div>
<h2 class="relative group">First Malicious MCP in the Wild
    <div id="first-malicious-mcp-in-the-wild" class="anchor"></div>
    
</h2>
<p>On 2025-09-25, <a href="https://www.koi.ai/blog"  target="_blank" rel="noreferrer">Koi Blog</a> wrote this article: <a href="https://www.koi.ai/blog/postmark-mcp-npm-malicious-backdoor-email-theft"  target="_blank" rel="noreferrer">First Malicious MCP in the Wild: The Postmark Backdoor That’s Stealing Your Emails | Koi Blog</a></p>

    <div class="admonition quote">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Quote</span>
      </div>
      <div class="admonition-content">
        <p><code>postmark-mcp</code> - downloaded <strong>1,500 times every single week</strong>, integrated into hundreds of developer workflows. Since version <code>1.0.16</code>, it’s been quietly copying every email to the developer’s personal server. I’m talking password resets, invoices, internal memos, confidential documents - everything.</p>
<p>This is the <strong>world’s first sighting of a real world malicious MCP server</strong>. The attack surface for endpoint supply chain attacks is slowly becoming the enterprise’s biggest attack surface.</p>
      </div>
    </div><p>The article generated some discussion, including on Hacker News: <a href="https://news.ycombinator.com/item?id=45395957"  target="_blank" rel="noreferrer">A Postmark backdoor that’s downloading emails | Hacker News</a>. Some of the comments pointed out that the MCP risk isn’t really different from existing software risks:</p>

    <div class="admonition quote">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Quote</span>
      </div>
      <div class="admonition-content">
        <p>This has nothing to do with MCP really, the same flaw is there in all software: you have to trust the author and the distributor. Nothing stops Microsoft from copying all your Outlook mail. Nothing stops Google from copying all your gmail. Nothing stops the Mutt project from copying all your email. Open source users like to think that “many eyes” keep the code clean and they probably do help, especially on popular projects where all commits get reviewed in detail, but the chance is still there. And the rest of us just trust the developers. This problem is as old as software.</p>
      </div>
    </div>
<h2 class="relative group">Are MCP Security risks real or overblown?
    <div id="are-mcp-security-risks-real-or-overblown" class="anchor"></div>
    
</h2>
<p>MCP security <strong>risks are a real concern</strong> and I do not want to downplay that. In many ways though, these risks have existed for as long as software itself, MCP is just the latest attack vendor.</p>
<p>I will note that the blogs I featured here, from Docker and Koi Security, are from companies that attempt to sell solutions to this problem. This does not mean that the problem is not real or that the solutions are not needed, just something to note. I actually do find <a href="https://www.docker.com/products/mcp-catalog-and-toolkit/"  target="_blank" rel="noreferrer">Docker’s MCP solutions</a> to be very interesting (I mention <a href="https://hub.docker.com/mcp"  target="_blank" rel="noreferrer">Docker MCP Catalog</a> below in <a href="/articles/mcp-security/#supply-chain-security" >Supply-Chain Security</a>).</p>

<h2 class="relative group">MCP Defense
    <div id="mcp-defense" class="anchor"></div>
    
</h2>
<p>The article “<a href="https://www.ajeetraina.com/the-day-i-told-800-engineers-their-ai-dreams-could-become-security-nightmares/"  target="_blank" rel="noreferrer">The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares</a>” (mentioned above  in <a href="/articles/mcp-security/#mcp-horror-stories" >MCP Horror Stories</a>), suggests five defense solutions:</p>

    <div class="admonition abstract">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 384 512"><path d="M64 0C28.7 0 0 28.7 0 64L0 448c0 35.3 28.7 64 64 64l256 0c35.3 0 64-28.7 64-64l0-288-128 0c-17.7 0-32-14.3-32-32L224 0 64 0zM256 0l0 128 128 0L256 0zM112 256l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16zm0 64l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16zm0 64l160 0c8.8 0 16 7.2 16 16s-7.2 16-16 16l-160 0c-8.8 0-16-7.2-16-16s7.2-16 16-16z"/></svg>
        <span>The Solution: Defense in Depth (That Actually Works) 🛡</span>
      </div>
      <div class="admonition-content">
        <ol>
<li>Component Isolation</li>
<li>️Attack Surface Reduction</li>
<li>Supply Chain Security</li>
<li>Input/Output Sanitization</li>
<li>WhatsApp MCP Shadowing</li>
</ol>
      </div>
    </div><hr>

    <div class="admonition info">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M256 512A256 256 0 1 0 256 0a256 256 0 1 0 0 512zM216 336l24 0 0-64-24 0c-13.3 0-24-10.7-24-24s10.7-24 24-24l48 0c13.3 0 24 10.7 24 24l0 88 8 0c13.3 0 24 10.7 24 24s-10.7 24-24 24l-80 0c-13.3 0-24-10.7-24-24s10.7-24 24-24zm40-208a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg>
        <span>For more information on each solution, read the full article:</span>
      </div>
      <div class="admonition-content">
        <p><a href="https://www.ajeetraina.com/the-day-i-told-800-engineers-their-ai-dreams-could-become-security-nightmares/"  target="_blank" rel="noreferrer">The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares</a></p>
      </div>
    </div>
<h2 class="relative group">What I Do
    <div id="what-i-do" class="anchor"></div>
    
</h2>
<p>So far I have been limiting my MCP usage to personal projects and learning. Below are some of the things I have noted while learning about how to use MCP “safely”:</p>

<h3 class="relative group">Supply-Chain Security
    <div id="supply-chain-security" class="anchor"></div>
    
</h3>
<p>When Ziva asked about the MCP security risks, she was told to “read the code”. While it’s true that many MCP servers are open-source, reviewing all of them is not exactly feasible. I often do a surface level look at the repo, its activity and amount of stars, but this is not same as reviewing the code in-depth. For this reason, I believe it is worth using <strong>MCP servers by known publishers</strong>. Docker does come in handy here with their <a href="https://hub.docker.com/mcp"  target="_blank" rel="noreferrer">Docker MCP Catalog</a>. While, this catalog is not as extensive as other MCP galleries, it focuses on quality over quantity. All of the MCP servers are in the Docker MCP Catalog are by known publishers. Note that I still refuse to use Docker Desktop (due to its license), but these MCP servers can also be used in Docker CLI together with an MCP client.</p>

<h3 class="relative group">MCP Server Configuration
    <div id="mcp-server-configuration" class="anchor"></div>
    
</h3>
<p>Some MCP servers may have permissive default permissions, but can be configured to be more “locked-down” and limited and what they can do and access.</p>
<p>As an example, <a href="https://github.com/containers/kubernetes-mcp-server"  target="_blank" rel="noreferrer">Kubernetes MCP Server</a> can be run in <a href="https://github.com/containers/kubernetes-mcp-server?tab=readme-ov-file#configuration-options"  target="_blank" rel="noreferrer"><strong>read-only mode</strong></a> (this is not the default but can be set with a flag when setting up the MCP server). In this mode, the Kubernetes MCP server cannot make changes to clusters (for example, it is unable to apply manifests, but can still view existing resources). Note that even in this mode there can be security risks. One example is viewing secrets. In Kubernetes, secrets are stored in Base64 strings, which are trivial to decode for anyone that has full read access to the cluster. I have personally witnessed Claude Code attempt to read and decode Kuberenets Secrets (either with Kubernetes MCP Server or just <code>kubectl</code> commands) when asked to help troubleshoot my homelab cluster. For this reason, when using <a href="/articles/agentic-cli-tools-comparison/" >agentic CLI tools</a>, I prefer to approve each command individually. Further, Kubernetes access can be regulated with <a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/"  target="_blank" rel="noreferrer">Role-based access control (RBAC)</a>.</p>

<h3 class="relative group">Ignore files
    <div id="ignore-files" class="anchor"></div>
    
</h3>
<p>Similar to <a href="https://git-scm.com/docs/gitignore"  target="_blank" rel="noreferrer"><code>.gitignore</code></a> files, most <a href="/articles/agentic-cli-tools-comparison/" >agentic CLI tools</a> have a way to exclude specific files from the context. For example, a <code>.env</code> file (that may include secrets), should be specifically excluded (when not doing this, I have seen Claude Code attempt to read these files). Unfortunately, there isn’t really a standard “ignore file” for this, each tool has it own way to achieve this. If using multiple tools, multiple files might be needed.</p>

<h4 class="relative group">Documentation on excluding/ignoring files
    <div id="documentation-on-excludingignoring-files" class="anchor"></div>
    
</h4>
<ul>
<li><a href="https://developers.google.com/gemini-code-assist/docs/create-aiexclude-file"  target="_blank" rel="noreferrer">Exclude files from Gemini Code Assist use  |  Google for Developers</a></li>
<li><a href="https://docs.claude.com/en/docs/claude-code/settings#excluding-sensitive-files"  target="_blank" rel="noreferrer">Claude Code settings - Claude Docs</a></li>
<li><a href="https://cursor.com/docs/context/ignore-files"  target="_blank" rel="noreferrer">Ignore files | Cursor Docs</a></li>
<li><a href="https://docs.github.com/en/copilot/how-tos/configure-content-exclusion/exclude-content-from-copilot"  target="_blank" rel="noreferrer">Excluding content from GitHub Copilot - GitHub Docs</a></li>
<li><a href="https://github.com/charmbracelet/crush?tab=readme-ov-file#ignoring-files"  target="_blank" rel="noreferrer">charmbracelet/crush: The glamourous AI coding agent for your favourite terminal 💘</a></li>
<li><a href="https://opencode.ai/docs/config/#watcher"  target="_blank" rel="noreferrer">Config | OpenCode</a></li>
</ul>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@flyd2069?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">FlyD</a> on <a href="https://unsplash.com/photos/red-and-black-love-lock-zAhAUSdRLJ8?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="mcp" label="Mcp" scheme="https://www.towerofkubes.com/tags/mcp/"/><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="security" label="Security" scheme="https://www.towerofkubes.com/tags/security/"/><published>2025-11-04T00:00:00Z</published></entry><entry><title>My Experience with Claude Sonnet 4.5 and Claude Code 2.0</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/claude-sonnet-4.5-and-claude-code-2.0/"/><id>https://www.towerofkubes.com/articles/claude-sonnet-4.5-and-claude-code-2.0/</id><updated>2025-10-06T00:00:00Z</updated><summary type="html">Hands-on review of Claude Sonnet 4.5 and Claude Code 2.0 for developers: coding experience, benchmarks, usage limits, and workflow tips.</summary><content type="html"><![CDATA[<p>After the announcement of Claude Sonnet 4.5 and Claude Code 2.0, I finally had a little bit of time to experiment with the new Claude versions today.</p>
<p>My first impressions is Claude Sonnet 4.5 feels <em>slightly</em> better than Sonnet 4. At least that’s more than I can say for <a href="/articles/gpt-5/" >GPT-5</a>, which my first impressions of weren’t as positive (it felt like a downgrade compared to o3, but I’ve gotten used to it).</p>
<p>Honestly, it’s hard to tell though. I find it hard to give objective feedback on LLM models. There are benchmarks that claim to be objective, but benchmarks don’t tell the full story of how a model actually feels in real world use. It’s kind of similar to how phone benchmarks don’t necessarily tell the fully story on how smooth a phone actually feels in real world use; for example Google Pixel models are not technically as powerful as some of the competition, but have optimized software that makes them feel smooth to use.</p>
<p>When evaluating LLM models, I try to use them as normal. Sometimes I give the same prompt to different LLM models to gauge the differences in answers and which gives the “best” response. However, even that is not always effective; since LLM answers are non-deterministic and even asking the same model inside the same tool the same prompt twice can give different answers (sometimes even wildly different). The differences can be even larger when using the same model across different tools. I feel like I get significantly different answers when using <a href="/articles/gpt-5/" >GPT-5</a> in ChatGPT 5, Microsoft Copilot, Cursor CLI, Codex CLI and Perplexity Pro.</p>
<p>Which brings me back to today. I was working on documentation frameworks, specifically setting up Docusaurus, with Claude Code 2.0 and Sonnet 4.5. This is actually a task I’ve done several times in the past with previous versions of Claude Code using the Sonnet 4 model. This time, I was trying to vibe code less and actually understand every line of code I was writing so that I would eventually feel confident deploying Docusaurus in production (using <a href="/series/static-website-hosting/" >static website hosting</a>). Nevertheless, I still used Claude Code to help me with some menial tasks, while making an effort to read every single line of code (rather than just “vibe coding”). Because I have done this task before, it might have been a decent benchmark if I had actually tried to examine it in that way, but really I was just trying to get a task done.</p>
<p>As for the results? I managed to achieve what I was trying to do, but really my goal in the first place was to rely less on AI. I still consulted Claude Code frequently. It gave some good responses, some dumb responses and some mid responses. Not too different from usual, maybe <em>slightly</em> better, but again hard to tell. I don’t plan to make a more rigorous test of Sonnet 4 vs Sonnet 4.5, I don’t mind trusting the benchmarks in this case. In many benchmarks Sonnet 4.5 even beats Opus 4.1!</p>

<h2 class="relative group">Usage Limits
    <div id="usage-limits" class="anchor"></div>
    
</h2>
<p>Before I even had a chance to try it myself, I saw many posts on <a href="https://www.reddit.com/r/ClaudeCode/"  target="_blank" rel="noreferrer">r/ClaudeCode</a> complaining about usage limits getting worse. Many of these posts were from users paying for the expensive $100-$200/month Claude MAX plans. A lot of them complained about reaching usage limits faster than before while using Claude Opus 4.1 in Claude Code. It’s not clear to me why those users insisted on still using Opus 4.1 despite some benchmarks showing that Sonnet 4.5 has surpassed it, but to be fair the ability to use Opus in Claude Code is one of the selling points of the MAX plans. On my $20/month Claude Pro plan, I can only use Opus 4.1 on <a href="https://claude.ai"  target="_blank" rel="noreferrer">claude.ai</a>, not inside Claude Code. I haven’t found that a huge limitation though since I was still getting good results with Sonnet 4 and will presumably get even better results with Sonnet 4.5.</p>
<p>One of the most useful features added in Claude Code 2.0 is <a href="https://www.reddit.com/r/ClaudeAI/comments/1ntq8tv/introducing_claude_usage_limit_meter/"  target="_blank" rel="noreferrer"><code>/usage</code></a>, which allows to see daily and weekly usage. It still doesn’t show how much the tokens you use really cost, for that I still use <a href="https://ccusage.com/"  target="_blank" rel="noreferrer">ccusage</a>.</p>
<p>Unfortunately, this comes with new <a href="https://www.reddit.com/r/ClaudeAI/comments/1mbo1sb/updating_rate_limits_for_claude_subscription/"  target="_blank" rel="noreferrer">weekly rate limits</a>. I missed this at first but now I believe this might be the main cause of what the community has been complaining about it. Weekly rate limits were one of the features I disliked most about ChatGPT, back when o3 was limited to 50 prompts a week I was genuinely rationing my usage of o3. Since the launch of <a href="/articles/gpt-5/" >GPT-5</a>, the limits for ChatGPT 5 Thinking have been raised significantly, to the point that I don’t reach those limitations anymore.</p>
<p>As for Claude Code, until now I found the usage limits to be fairly reasonable. The limits were in 5 hour blocks, not daily or weekly. It would take me two full hours of heavy vibe coding before a limit was actually reached. In cases where I was taking a more active role in coding I often did not reach the limit at all. Even when the limit was reached, it was unlikely I would have to wait the full 5 hours, since often I would be either in the middle or near the end of the 5 hour block anyway (one time I only had to wait 5 minutes for the limits to reset). The end result was that I felt like I could practically use Claude Code as much as I want without really worrying about limits, since worse case I would just take a break and wait a few hours for all of the limits to reset. I also saw little value in the more expensive Claude MAX plans.</p>
<p>Now with the weekly limits, there is a larger risk of reaching them. After just one day of medium usage, I already used 11% of the weekly limit (which resets on 2025-10-12). I’m not that worried though, since reaching the limits if anything would give me more time to experiment with other <a href="/articles/agentic-cli-tools-comparison/" >agentic CLI tools</a>. I read that Codex CLI also <a href="https://www.reddit.com/r/codex/comments/1ncbocw/codex_weekly_limit/"  target="_blank" rel="noreferrer">has a weekly limit</a>; one user claimed that Codex is so much better than Claude Code that they ration it, use CC for easier tasks and save Codex for the more complex tasks. In any case, I believe using a combination of free AI tools and paid subscriptions is both more cost-effective and more insightful compared to committing to one tool and paying an expensive “MAX” subscription.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@almoya?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Aerps.com</a> on <a href="https://unsplash.com/photos/a-person-reads-restaurant-recommendations-on-their-phone-_c9iPLn7emA?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="claude" label="Claude" scheme="https://www.towerofkubes.com/tags/claude/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><published>2025-10-06T00:00:00Z</published></entry><entry><title>Agentic CLI Tools Comparison</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/agentic-cli-tools-comparison/"/><id>https://www.towerofkubes.com/articles/agentic-cli-tools-comparison/</id><updated>2025-09-28T00:00:00Z</updated><summary type="html">Comparison of Claude Code vs. Cursor CLI vs. Gemini CLI vs. Codex CLI</summary><content type="html"><![CDATA[<p>GitHub Copilot CLI is the latest Agentic CLI tool. Yet another Agentic CLI tool in the same style of Claude Code, Cursor CLI, Gemini CLI, Codex CLI and Qwen Code (and probably others that I am forgetting). So far I have tried all of these except for Qwen, and am now trying GitHub Copilot CLI as well.</p>

<h2 class="relative group">All Agentic CLI tools look the same
    <div id="all-agentic-cli-tools-look-the-same" class="anchor"></div>
    
</h2>
<p>All of these tools are superficially similar. Claude Code, GPT-5, Cursor CLI, Gemini CLI, Qwen Code and now GitHub Copilot CLI all have a TUI design that looks almost exactly the same, not even trying to hide that they’re copying each other. The notable exception is Codex CLI, which has its own TUI design. Honestly though I find Codex’s TUI to be inferior and kind of wish it also copied the others. I think the common design works well and don’t mind it, it’s just funny that all of these companies copy each other.</p>
<p>Another thing that is similar is that all these tools have npm as their primary installation option. While most tools can also be installed in other ways (such as <a href="https://brew.sh/"  target="_blank" rel="noreferrer">Homebrew</a>), npm is usually recommended first in their respective README files. Of course, npm has been widely-used for years and many developers already have it installed (these tools are primarily for developers, though they can do more than coding); however, I’ve personally never before seen npm recommended as the primary installation method before this wave of Agentic CLI tools started. Some of the tools are written in TypeScript so it makes sense. On the other hand, there’s Codex CLI, which has its own design and is written in Rust, but nevertheless <a href="https://www.npmjs.com/package/@openai/codex"  target="_blank" rel="noreferrer">adapted to work with npm</a> (TIL <a href="https://dev.to/kennethlarsen/how-to-distribute-a-rust-binary-on-npm-75n"  target="_blank" rel="noreferrer">Rust binaries can be distributed on npm</a>).</p>

<h2 class="relative group">Agentic CLI tools have differences
    <div id="agentic-cli-tools-have-differences" class="anchor"></div>
    
</h2>
<p>I <a href="/articles/agentic-cli-tools-comparison/#all-agentic-cli-tools-look-the-same" >mentioned</a> these tools are <em>superficially</em> similar, however that doesn’t mean they all work the same. Outside of design and installation method, there’s the matter of <em>functionality</em> and how well these tools actually work. Differences include:</p>

<h3 class="relative group">Model
    <div id="model" class="anchor"></div>
    
</h3>
<p>Some tools are designed to work with one companie’s models. Claude Code of course uses Claude Sonnet and Claude Opus. OpenAI’s Codex CLI uses GPT-5 models (including GPT‑5-Codex). Gemini CLI uses Gemini 2.5 and 3 (Pro with a fallback to Fast). Other tools support a variety of different models through one service, for example Cursor CLI and GitHub Copilot CLI (the same is true for their non-CLI offerings). Others allow you to <a href="/articles/agentic-cli-tools-comparison/#byo-bring-your-own-api-keys" >BYO (Bring Your Own) API keys</a> (notably <a href="https://opencode.ai/"  target="_blank" rel="noreferrer">OpenCode</a>).</p>

<h3 class="relative group">Tools & Agentic Abilities
    <div id="tools--agentic-abilities" class="anchor"></div>
    
</h3>
<p>Even when two tools use the same AI model, that doesn’t necessarily mean they will work the same. These tools have agentic abilities, enhanced with tools and prompts. Tools can built-in or provided with MCP. As an example, Claude Code has a wide variety of built-in tools that allows it to read and write locals files, browse the web (Search and Fetch websites) and more. On the other hand, while Codex Is Improving, it still does not have as many built-in tools as Claude Code. When tools are missing or limited, the gap can be bridged either with other CLI programs (that these agentic tools know how to run directly) or MCP servers. Most if not all of these tools support both running CLI commands and interacting with MCP servers. Notably, <a href="https://cursor.com/docs/cli/mcp"  target="_blank" rel="noreferrer">Cursor CLI now supports MCP</a> as well (when I first tried it, Cursor CLI was missing MCP support).</p>

<h3 class="relative group">License
    <div id="license" class="anchor"></div>
    
</h3>
<p>Not all of these tools are open source. In a way that is somewhat deceiving, several of these tools have a GitHub repo that is little more than a closed-source LICENSE and README, but does not actually include any code. At present, this even includes GitHub Copilot CLI, which is marked as Public Preview and has <a href="https://docs.github.com/en/site-policy/github-terms/github-pre-release-license-terms"  target="_blank" rel="noreferrer">Pre-release License Terms</a> (it is not clear to me what the license terms would be <em>after</em> release). Claude Code and Cursor CLI are also closed source (others may have copied CC’s design, but not its code). Gemini CLI is open source and was later forked to Qwen Code, which is also open source (both Apache-2.0). OpenCode is also open source (as its name implies), under MIT. <a href="https://github.com/charmbracelet/crush"  target="_blank" rel="noreferrer">charmbracelet/crush</a> (from the same people who created some of my favorite Go CLI and TUI Frameworks) uses this weird license: <a href="https://github.com/charmbracelet/crush/blob/main/LICENSE.md"  target="_blank" rel="noreferrer">Functional Source License, Version 1.1, MIT Future License</a>.</p>

<h3 class="relative group">Pricing & Usage Limits
    <div id="pricing--usage-limits" class="anchor"></div>
    
</h3>
<p>These tools have different limits.</p>

<h4 class="relative group">Claude Code
    <div id="claude-code" class="anchor"></div>
    
</h4>
<p>Out of all of these tools I have (so far) used Claude Code the most and am most fimilar with their <a href="https://claude.com/pricing"  target="_blank" rel="noreferrer">pricing</a> and usage limits. I am using Claude Pro on the $20 a month plan. Claude Code also has the crazy expensive Max plans ($100 or $200 a month). I have mentioned previously in my Claude Code notes about my experience using the Claude Code $20 plan. My experience honestly haven’t changed much. While there was some drama about Claude Code changing usage limits, I still rarely run into usage limits. When I do, I have to wait at most a few hours for the usage limits to reset. In that time I can either use other tools or take a break. Other than not having access to the Opus model on CC, I don’t feel like I’m missing anything by not being on Max and am still baffled at how people justify the price of those Max plans. ccusage implies I use more than $100 a month, significantly more than what I pay. Anthropic either operates at a loss or can somehow afford to do that since it’s their own models.</p>

<h3 class="relative group">Gemini CLI
    <div id="gemini-cli" class="anchor"></div>
    
</h3>
<p>Gemini CLI has a generous free tier and is what I currently recommend for people wanting to try an agentic tool for free. I’m not sure whether my Google AI Pro trial increases my Gemini CLI usage limits or if it’s unrelated, I’m honestly kind of confused with Google’s various AI plans (in typical Google fashion).</p>

    <div class="admonition note">
      <div class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 576 512"><path d="M0 64C0 28.7 28.7 0 64 0L224 0l0 128c0 17.7 14.3 32 32 32l128 0 0 125.7-86.8 86.8c-10.3 10.3-17.5 23.1-21 37.2l-18.7 74.9c-2.3 9.2-1.8 18.8 1.3 27.5L64 512c-35.3 0-64-28.7-64-64L0 64zm384 64l-128 0L256 0 384 128zM549.8 235.7l14.4 14.4c15.6 15.6 15.6 40.9 0 56.6l-29.4 29.4-71-71 29.4-29.4c15.6-15.6 40.9-15.6 56.6 0zM311.9 417L441.1 287.8l71 71L382.9 487.9c-4.1 4.1-9.2 7-14.9 8.4l-60.1 15c-5.5 1.4-11.2-.2-15.2-4.2s-5.6-9.7-4.2-15.2l15-60.1c1.4-5.6 4.3-10.8 8.4-14.9z"/></svg>
        <span>Note</span>
      </div>
      <div class="admonition-content">
        <p><strong>UPDATE:</strong> <a href="https://blog.google/technology/developers/gemini-cli-code-assist-higher-limits/"  target="_blank" rel="noreferrer">Google AI Pro and Ultra subscribers now get Gemini CLI and Gemini Code Assist with higher limits.</a></p>
      </div>
    </div>
<h3 class="relative group">Codex
    <div id="codex" class="anchor"></div>
    
</h3>
<p>Included with paid <a href="https://chatgpt.com/pricing/"  target="_blank" rel="noreferrer">ChatGPT plans</a> including Plus, Pro and Team.</p>

<h3 class="relative group">BYO (Bring Your Own) API keys
    <div id="byo-bring-your-own-api-keys" class="anchor"></div>
    
</h3>
<p>Ironically, the FOSS tools such as opencode and crush might actually be more expensive in this case. When using an API key you have to pay the “real” cost of running the AI model which can end up significantly more expensive than a set plan. The same is true when using Claude Code with an API key instead of a plan; in all but very moderate use a plan would make more sense. Even the expensive Max plans often end up cheaper than what equivalent API use would cost.</p>

<h2 class="relative group">My Opinion
    <div id="my-opinion" class="anchor"></div>
    
</h2>
<p>Claude Code remains my most used agentic CLI tool. Neverthelss, I am still actively experimenting with other tools, I have used Gemini CLI increasingly more in recent weeks (Gemini’s free tier is really good), and am also trying Codex due to its improvements. However, while these tools feel similar in many ways and the competition is closer than ever, I still feel that Claude Code with <a href="/articles/claude-sonnet-4.5-and-claude-code-2.0/" >Claude Sonnet 4.5</a> is noticeably better than all other tools that I have used. This may change in the near future as all of these tools are actively developed and new ones are introduced all the time.</p>
<p>This is in addition to other AI tools which I am also actively using. Right now I am mainly using the web and app versions of ChatGPT, Gemini, Claude and Perplexity Pro (I also use <a href="/articles/gpt-5/#microsoft-copilot-with-gpt-5" >Microsoft Copilot</a> at work, but it’s not very good).</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@steve_j?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Steve Johnson</a> on <a href="https://unsplash.com/?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="cli" label="Cli" scheme="https://www.towerofkubes.com/tags/cli/"/><category term="tui" label="Tui" scheme="https://www.towerofkubes.com/tags/tui/"/><published>2025-09-28T00:00:00Z</published></entry><entry><title>GPT-5</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/gpt-5/"/><id>https://www.towerofkubes.com/articles/gpt-5/</id><updated>2025-08-16T00:00:00Z</updated><summary type="html">Hands-on impressions of GPT-5 across ChatGPT, Cursor CLI, and Microsoft Copilot, plus notes on quotas, hallucinations, and the auto-router trade-offs.</summary><content type="html"><![CDATA[
<h2 class="relative group">This Week I Learned about GPT-5
    <div id="this-week-i-learned-about-gpt-5" class="anchor"></div>
    
</h2>
<p>At the <a href="https://openai.com/gpt-5/"  target="_blank" rel="noreferrer">announcement post</a>, OpenAI made some bold claims about GPT-5. <strong>Including:</strong></p>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>The best response, every time</span>
      </summary>
      <div class="admonition-content">
        <p>ChatGPT is now designed to think deeply when you need it to.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Great at coding</span>
      </summary>
      <div class="admonition-content">
        <p>As a coding collaborator, GPT‑5 tackles complex tasks end-to-end and delivers more readily usable code, better design, and is more effective at debugging.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>An expressive writing partner</span>
      </summary>
      <div class="admonition-content">
        <p>Create clearer, more compelling messaging for everything from stories to speeches and beyond.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>More useful health answers</span>
      </summary>
      <div class="admonition-content">
        <p>Our best model yet for health-related questions, providing more precise and reliable responses while acting as more of a proactive thought partner.</p>
      </div>
    </details><hr>

    <details class="admonition quote">
      <summary class="admonition-header"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><path d="M448 296c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72zm-256 0c0 66.3-53.7 120-120 120l-8 0c-17.7 0-32-14.3-32-32s14.3-32 32-32l8 0c30.9 0 56-25.1 56-56l0-8-64 0c-35.3 0-64-28.7-64-64l0-64c0-35.3 28.7-64 64-64l64 0c35.3 0 64 28.7 64 64l0 32 0 32 0 72z"/></svg>
        <span>Safer and more accurate</span>
      </summary>
      <div class="admonition-content">
        <p>Our most reliable model yet. It’s less prone to hallucinations and pretending to know things.</p>
      </div>
    </details><p>The last two points seemed particularly <em>sus</em> to me. Health answers? I still wouldn’t trust that. “Less prone to hallucinations”? That’s a big claim. Hallucinations have consistently been one of the biggest issues with LLM models. Even though it’s not as bad as the ChatGPT 3.5 days, it’s not clear if this problem will ever be fully solved without a paradigm shift.</p>

<h2 class="relative group">My Experimentation
    <div id="my-experimentation" class="anchor"></div>
    
</h2>
<p>During the week, I experimented with GPT-5. My first impressions on Friday weren’t too positive. Sam Altman tweeted that “the autoswitcher broke” and promised improvements.</p>
<p>GPT-5 did feel better the following week, but still not as good as it was hyped to be. My favorite model until now was OpenAI o3, and even ChatGPT-5 Thinking often didn’t seem as good as I remembered it. Not that I was able to compare, since access to previous models was removed altogether! I eventually got 4o back (after massive backlash from people who formed deep bondings with 4o), but I didn’t get o3 on ChatGPT Team and so was not able to compare.</p>
<p>After a few days, I learned to use GPT-5 better. Because of the auto-router, GPT-5 benefits from more prompt engineering (“think deeply”). This felt like a step back, since previous models got good at knowing what I want. Or I just already learnt how to use them. A few months ago I was confused with ChatGPT offering so many confusingly named models (what’s better, o3 or o4-mini high?). However, I eventually learned to use them and now missed the choice being taken away from me. Even though, from a user perspective, a model that knows how to choose for you to give you the best answer <em>should</em> be a better option. On the other hand, I suspect GPT-5’s auto-router is actually a cost-cutting measure in disguise, behind the scenes too often opting to use cheaper models even though they give noticeably worse answers.</p>
<p>The results I’ve been getting with ChatGPT 5 have been really inconsistent. Some answers are great, others are dumb. I guess that’s par for the course for AI but I was expecting an improvement, and this doesn’t feel like it. The jumps between ChatGPT 3.5 to 4, or 4o to o3 felt more considerable to me.</p>
<p>As for the claim of reduced hallucinations? This has not been my experience. I caught ChatGPT 5 lying in many occasions. It’s hard to compare if it’s any worse than previous models (again because I lost access to them), but it does sometimes feel like it.</p>
<p>Even ChatGPT 5 Thinking hallucinates. In one instance, I was asking ChatGPT 5 (Auto) how to configure a certain GitHub setting for an Organization. ChatGPT 5 confidently answered that it’s <em>impossible</em>. I then switched model to ChatGPT 5 Thinking to see if I would get a different answer. After several minutes of “thinking”, ChatGPT 5 Thinking confidently answered that it’s <em>possible</em> and even gave me exact instructions. Except, the instructions were impossible to follow because the answer was entirely hallucinated. In this case, ChatGPT 5 was more correct than ChatGPT 5 Thinking. The setting didn’t exist (even though both I and ChatGPT 5 Thinking wish that it did).</p>

<h2 class="relative group">Cursor CLI with GPT-5
    <div id="cursor-cli-with-gpt-5" class="anchor"></div>
    
</h2>
<p>A few hours after GPT-5 was announced, Cursor announced the release of <a href="https://cursor.com/cli"  target="_blank" rel="noreferrer">Cursor CLI</a> plus <strong>free GPT-5 credits for one week</strong>. My 1 month Claude Pro subscription was just ending, so I decided to use Cursor CLI with GPT-5 for the week to experiment with both (compared to Claude Code with Sonnet 4).</p>
<p>Cursor CLI is clearly inspired by Claude Code. I don’t mind the rip-off personally since I like Claude Code. In Claude Code with Sonnet 4 the agent is far more transparent about what it is doing and tends to consult more; it even shows a checklist of the tasks the agent plans and executes. That clarity is missing in Cursor CLI for now: it explains less, simply makes changes, and sometimes it’s not clear why - though you can always stop it and ask questions.</p>
<p>Another thing missing in Cursor CLI is support for MCP, even though regular Cursor already has solid MCP support. But Cursor CLI came out less than a week ago. I assume they will improve it over time.</p>
<p>Aside from those gaps, I got decent results with Cursor CLI. The quality felt comparable to Claude Code, and the interface is almost a complete copy.</p>
<p>After the GPT-5 free credits ended for me, I decided to go back to Claude Code for now (I resubscribed for 1 month of Claude Pro). While Cursor CLI might improve in the future, for now it’s not as good as Claude Code. I also worry that the CLI might be an afterthought for Cursor.</p>

<h2 class="relative group">Microsoft Copilot with GPT-5
    <div id="microsoft-copilot-with-gpt-5" class="anchor"></div>
    
</h2>
<p>At my current client, the only approved AI tool is Microsoft 365 Copilot (<em>not</em> GitHub Copilot). I had subpar results with it in the past, so I was glad that it was now updated to use GPT-5.</p>
<p>This was also a good way to experiment with GPT-5 for free. Even without an account, Microsoft Copilot offers a generous amount of GPT-5 requests (you have to remember to enable GPT-5 every time you start a new chat).</p>
<p>Still, the experience of using GPT-5 in Microsoft Copilot feels different from using it in ChatGPT, despite claiming to use the same model. I suspect the infamous auto-router likes to give Microsoft Copilot the cheaper models more often than not, unless you prompt it specifically not to. Even when prompting heavily, I still got considerably faster results than ChatGPT 5 Thinking. Perhaps the Azure backend is more optimized here or maybe Microsoft Copilot rarely gets routed to the best models by GPT-5.</p>
<p>Regardless, I did feel an improvement in the answers compared to the previous Microsoft Copilot models (“Quick response” and “Think Deeper”, which I believe are based on some variation of GPT-4). Even so, Microsoft Copilot is still limited in other ways (compared to ChatGPT), such as a small context window.</p>
<p>Overall conclusion, Microsoft Copilot is usable for basic work but far from my preference. I wouldn’t use it unless I had no other choice (which is the case at the current client).</p>

<h2 class="relative group">Usage Notes
    <div id="usage-notes" class="anchor"></div>
    
</h2>
<ul>
<li>3,000 GPT-5 Thinking messages per week is a massive bump; it used to be ~200, and o3 was once limited to just 50 (I hit that every week until I started pairing it with Claude).</li>
<li>I had to ration o3 carefully, so I’m glad the cap is higher now. I doubt I’ll reach 2,000 messages a week even if ChatGPT were my only tool.</li>
<li>Defaulting to Thinking mode takes a lot longer - sometimes minutes. Usually the answer is better (often worth the wait), but not always.</li>
<li>At least once GPT-5 (Auto) gave the correct answer while GPT-5 Thinking spent minutes and returned the opposite, wrong answer.</li>
</ul>

<h2 class="relative group">My Overall Impressions on GPT-5
    <div id="my-overall-impressions-on-gpt-5" class="anchor"></div>
    
</h2>
<p>Overall a disappointment, but still useful. I will continue to use it, particularly with ChatGPT 5 Thinking.</p>
<p>OpenAI has addressed some of the negative feedback already and will no doubt continue to improve GPT-5.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@omilaev?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Igor Omilaev</a> on <a href="https://unsplash.com/?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="gpt" label="Gpt" scheme="https://www.towerofkubes.com/tags/gpt/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="openai" label="Openai" scheme="https://www.towerofkubes.com/tags/openai/"/><published>2025-08-16T00:00:00Z</published></entry><entry><title>OpenAI o3 Review</title><link rel="alternate" type="text/html" hreflang="en" href="https://www.towerofkubes.com/articles/openai-o3/"/><id>https://www.towerofkubes.com/articles/openai-o3/</id><updated>2025-05-28T00:00:00Z</updated><summary type="html">Hands-on review of OpenAI o3: deep research-style answers, multi-source web lookups, latency tradeoffs, and comparisons to ChatGPT 4o/4.5/4.1.</summary><content type="html"><![CDATA[<p>I’ve used o3 extensively, and I think it’s a really strong model compared to earlier ChatGPT models.</p>
<p>It doesn’t just “think”; it also does web research and cross-checks sources to reach a conclusion. Other ChatGPT models can browse too, but o3 digs deeper and pulls more sources (when I read its “thoughts,” it said it tries to fetch at least 10 sources).</p>
<p>This is similar to what Deep Research does, which makes sense because ChatGPT’s DR used the o3 model even before it launched. However, DR returns essay-length answers (and is limited to 10 uses per month on ChatGPT Plus), which isn’t always practical. o3 gives answers closer in length to the other ChatGPT models. There are Plus usage limits, but I had to use it quite a lot before hitting them.</p>
<p>The model “thinks” for several minutes before responding. Usually it’s worth the wait, except for simple questions another model could answer faster. For complex questions, o3 is often noticeably better. I tried tough coding prompts that 4o struggled with (confident answers with hallucinations), then asked o3 and got much better results. For bigger tasks I sometimes had to tweak the prompt a few times, but in most cases o3 eventually delivered (unlike 4o).</p>
<p>The model isn’t perfect. There are still hallucinations and mistakes. Neverthless, in my experience fewer than other models I’ve tried.</p>
<p>AI moves so fast that it’s hard to keep up. Last month o3 was probably the best model around, and now people say Gemini 2.5 has overtaken it. It takes time to use a new model enough to really understand its strengths and weaknesses.</p>
<p>I also played a bit with ChatGPT 4.5 and 4.1. I haven’t used them much yet and so far I’m less impressed.</p>
<p>I haven’t tried o4 or o4-mini-high yet. Assuming o3 is better, I’d rather wait a few minutes for deeper reasoning. For simpler questions I still default to 4o.</p>
<hr>
<p><em>Featured image by <a href="https://unsplash.com/@siva_photography?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Levart_Photographer</a> on <a href="https://unsplash.com/photos/a-computer-screen-with-a-bunch-of-buttons-on-it-drwpcjkvxuU?utm_source=hugo&utm_medium=referral"  target="_blank" rel="noreferrer">Unsplash</a>.</em></p>
]]></content><author><name>Ro'i Bandel</name></author><category term="ai" label="Ai" scheme="https://www.towerofkubes.com/tags/ai/"/><category term="tools" label="Tools" scheme="https://www.towerofkubes.com/tags/tools/"/><category term="openai" label="Openai" scheme="https://www.towerofkubes.com/tags/openai/"/><category term="llm" label="Llm" scheme="https://www.towerofkubes.com/tags/llm/"/><category term="gpt" label="Gpt" scheme="https://www.towerofkubes.com/tags/gpt/"/><category term="chatgpt" label="Chatgpt" scheme="https://www.towerofkubes.com/tags/chatgpt/"/><published>2025-05-28T00:00:00Z</published></entry></feed>