Tower of Kubes

OpenCode: The Agentic Tool That Anthropic and Google Don't Want You To Use

2026-05-05T00:00:00Z

For the past four months, OpenCode has been my primary agent tool. A piece of AI industry drama is what brought it to my attention.

Background

In January 2026, I started seeing drama online: Anthropic blocks third-party use of Claude subscriptions. The most surprising part to me wasn’t that Anthropic decided to block this type of usage, that’s unfortunate but expected. What surprised me was that I hadn’t known this was even possible in the first place.

I had briefly read about OpenCode and Crush during my Agentic CLI Tools Comparison, but hadn’t used them due to their BYO (Bring Your Own) API key requirement, which in most cases is significantly more expensive than subscription tiers. As it turns out, people had found ways to use those subscriptions anyway. OpenCode had implemented an OAuth flow that spoofed Claude Code’s HTTP headers to authenticate against Anthropic’s API with a Claude Pro or Max subscription. This gave OpenCode users access to Claude models at subscription pricing, a significant cost advantage.

The Crackdown

Anthropic’s response came in several phases. Active enforcement began on January 9, 2026, when Anthropic deployed server-side protections blocking all unofficial OAuth access. On February 19, Anthropic updated its legal compliance page to make the OAuth restriction explicit: OAuth tokens obtained from Claude subscription accounts are only permitted for use with official Claude tools.

Legal requests followed, and in mid-March OpenCode’s maintainers merged a PR removing the Anthropic OAuth plugin from the project. By early April, Anthropic extended restrictions to OpenClaw and other third-party harnesses. Google ran the same playbook with Gemini around the same period, banning third-party OAuth access and issuing account-level suspensions.

The Community Reaction

The Hacker News thread filled with genuine disappointment. Many users felt OpenCode was a significantly better tool than Claude Code. The main advantages cited were its open-source MIT license, an optional web UI and client/server architecture, and the absence of flickering, a complaint about Claude Code that hasn’t gone away. OpenCode had also grown remarkably fast, reaching over 150,000 GitHub stars.

OpenAI and GitHub went the other direction. Tibo, OpenAI’s Codex lead, announced on X that Codex subscribers could use their subscription directly within OpenCode, and GitHub formally announced support for OpenCode across all GitHub Copilot subscriptions. That’s what originally got me to give OpenCode a real try, paired with GitHub Copilot and ChatGPT subscriptions, and I’ve been using it regularly since.

My Impressions of OpenCode

OpenCode immediately seemed appealing when I started using it. Until that point, Claude Code had remained my preferred agentic CLI tool. In the months since I wrote Agentic CLI Tools Comparison, I had continued experimenting with different CLI tools and models, notably Claude Code 2.0, Codex CLI, Gemini CLI, and GitHub Copilot CLI. Claude Code consistently remained the best tool in my opinion, both in terms of UI design and features, and in terms of Anthropic’s models feeling the strongest at coding and agentic tool usage based on my experience. The other tools felt like UI imitations of Claude Code running different models, with no meaningful improvements. OpenCode is genuinely different, though. It runs on a client/server model with an HTTP API, supports 75+ AI providers including local models, and has native multi-session support.

When opening OpenCode in a terminal, it feels familiar but different. The starting screen looks a lot like a classic search engine, with the prompt box centered on the screen, rather than being off to the bottom like in most other agentic CLI tools.

However, once you enter an initial prompt, the prompt box moves to the bottom of the terminal, making for a more familiar look. In my opinion, OpenCode strikes a good balance: it will feel familiar to users who have used Claude Code (and similar tools) before, but at the same time it does not feel like a clone of other tools. OpenCode does a lot of unique things that other tools don’t do. For example, OpenCode has a useful sidebar that displays information about active MCPs, LSPs (language servers) and token usage for the current session.

The look of OpenCode becomes even more unique when using its web UI or the OpenCode desktop app.

Image source: Web | OpenCode

Models and Providers

When first using OpenCode, it defaults to using the OpenCode Zen models. As of today, OpenCode Zen offers several free models, as well as paid models.

When using OpenCode Zen, it’s recommended to read about the privacy for each model.

These paid models can either be used by paying for credits (similar to OpenRouter) or using the OpenCode Go subscription. However, OpenCode does not limit to only using their offering. One of the best features of OpenCode is its wide provider support. LLM models can be used from practically any provider (that hasn’t outright blocked OpenCode), or even use local models. This provides users a lot of flexibility to use the same tool across many different models, with one unified agent harness. It also means users are not “locked-in” to one provider if they want to continue using OpenCode. When providers change the terms, such as Claude and Gemini limiting usage of OpenCode, or GitHub Copilot changing the terms of their subscriptions, OpenCode users can just move to other providers and continue their existing workflow.

Agentic Tool Usage

Using one tool for all providers also means that I can have a unified place to configure my MCP servers, Skills and AGENTS.md files. While there have been attempts to standardize the agents world, including the Agentic AI Foundation (AAIF), the reality is that agentic tools still have different ways to configure. For example, Anthropic to date has refused to adopt the usage of the AGENTS.md file, instead referring only to the CLAUDE.md file.

OpenCode supports these emerging agent standards, as well as LSP servers (Language Server Protocol, which has been around before agents, to give code editors better support for programming languages). At the same time, OpenCode also has its own config file.

As an example, if you want to configure Chrome DevTools MCP server, add the following to your OpenCode config:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "chrome-devtools": {
      "type": "local",
      "command": ["npx", "-y", "chrome-devtools-mcp@latest"]
    }
  }
}

OpenCode also supports a range of built-in tools, including web searches. One of my personal favorite tools is the question tool. It allows the model to ask you questions mid-task: for gathering preferences, clarifying instructions, or getting decisions on implementation choices. Each question includes a header, question text, and a list of options, with the ability to type a custom answer. When there are multiple questions, you can navigate between them before submitting.

It’s Dangerous: Permissions and Safety

OpenCode is a powerful tool, and with great power comes great responsibility. By default, it will happily edit anything, run anything, and delete anything without asking, which can feel great for vibe-coding but can also wreak havoc on your machine and codebases if left unchecked. For users that are coming from Claude Code, the default permissions feel similar to the claude --dangerously-skip-permissions flag. By default, OpenCode does not ask permission for anything. It edits files freely and can run any command. Even when using “Plan” mode (instead of “Build” mode), OpenCode can still run commands (by default the “Plan” mode only disallows file edits). Fortunately, this is fairly easy to fix. To get a locked-down OpenCode, add this to your OpenCode config:

{
  "$schema": "https://opencode.ai/config.json",
  "permission": {
    "*": "ask"
  }
}

OpenCode Permissions can be customized further.

It is also worth running OpenCode in a sandboxed environment. Refer to my previous article on Claude Code Sandboxing for examples on how to achieve this.

Final Verdict: Is OpenCode Better Than Claude Code?

Overall, OpenCode is a very compelling agent tool, with wide model support and lots of features. It is certainly among the best AI tools I have ever used.

On the question of “OpenCode vs. Claude Code”, I would say both tools are honestly equally strong. OpenCode felt like a breath of fresh air after months of using Claude Code, with many unique features. For example, mouse support, which Claude Code has only recently gained and is currently still a preview feature. At the same time, going back to Claude Code after several months of only using OpenCode, I have noticed Anthropic have not been resting and have been frantically adding new features to Claude Code, including plugins and a plugin marketplace, Agent Teams for multi-agent orchestration, the /btw command for lightweight side questions, and Auto mode, a new permission tier that sits between manual approval and skipping permissions entirely.

Overall, OpenCode feels surprisingly more polished (despite being developed by a much smaller team), while Claude Code has the edge in raw features. Nevertheless, the tools feel very close in quality. The choice between them ultimately comes down to one question: do you have a Claude subscription?

As I explained at the opening of this article, Anthropic has made their stance clear that Claude subscriptions are only for use within official Claude tools, and third-party tool usage is blocked for subscribers. Claude Code also locks you into Claude models exclusively, with no support for other providers.

If you’re already paying for a Claude subscription, Claude Code is the natural fit, as it’s the only tool where Anthropic’s subscriptions are officially supported. If you’re not, OpenCode’s model flexibility and open-source nature make it a compelling alternative that gives you full control over both your models and your costs.

Featured image by Viktor Forgacs on Unsplash.

Claude Code Sandboxing

2026-01-13T00:00:00Z

A couple of days ago, my coworker Roey Wullman wrote this article: Claude Code Sandboxing: Stop Babysitting Your AI Assistant (published in Develeap’s Magazine).

This morning, I saw the latest announcement by Anthropic: Introducing Cowork | Claude, then read the comments on Hacker News. Some of the comments discussed how secure Cowork is (or isn’t) and how it’s sandboxing works. Then other comments mentioned different approaches of sandboxing Claude Code (e.g. this comment and these comments).

Ways to Sandbox Claude Code

Featured image by Markus Spiske on Unsplash.

Chrome DevTools MCP server

2025-11-16T00:00:00Z

I have recently been using Chrome DevTools MCP server (which I tend to call Chrome MCP) to work on personal projects, notably CALMe. In my first day of using MCP, I added Playwright MCP server to my .mcp.json. Both Playwright MCP and Chrome DevTools are MCP servers that work in similar ways, they give MCP clients (agentic CLI tools) various tools that give the ability to browse web pages, click on buttons, read console logs and even “see” how the web page looks by allowing the client to take screenshots/snapshots. Playwright MCP is based on the Playwright framework for Web Testing and Automation, and is developed by Microsoft. Chrome DevTools MCP is based on the world’s most popular browser, and specifically its DevTools, and is developed by Google. Two big tech giants, which means these MCPs are well developed.

The comment that prompted me to try Chrome DevTools MCP

While Playwright MCP was working okay for me, I saw that Chrome DevTools was released after and wondered if it’s any better.

A comment from this thread (which I also linked in Cool MCP Servers) prompted me to try it: What MCPs are you using with Claude Code right now? : r/ClaudeCode

Question

What’s the advantage of chrome devtools vs playwright mcp?

Conclusion

Faster, more capable. Reads the console logs, and can execute scripts. The long screenshots are great too

I used to use playwright but Chrome dev tools blew me away

Guide: Using Chrome DevTools MCP

Claude Code

At the project level, run:

claude mcp add --scope project chrome-devtools npx chrome-devtools-mcp@latest

This configures the following in the .mcp.json file:

{
  "mcpServers": {
    "chrome-devtools": {
      "type": "stdio",
      "command": "npx",
      "args": [
        "chrome-devtools-mcp@latest"
      ],
      "env": {}
    }
  }
}

Then simply open a new instance of claude and confirm that you trust the folder and MCP server. Run the /mcp slash command to verify that the MCP server appears as “✔ connected”.

To use the MCP server, I simply tell Claude something like “use chrome mcp to test and troubleshoot website x”. I would add more context depending on the specific task, but in general this is enough to let Claude know that it can use this MCP server.

Codex CLI

The Codex CLI sandbox makes working with Chrome DevTools MCP more challenging, though I managed to make it work (Source: Connecting to a running Chrome instance | ChromeDevTools/chrome-devtools-mcp: Chrome DevTools for coding agents).

Run the following command:

codex mcp add chrome-devtools -- npx chrome-devtools-mcp@latest --browser-url="http://127.0.0.1:9222"

In addition, if live websites need to be tested, allow network access by adding the following lines to the global Codex config:

[mcp_servers.chrome-devtools] 
command = "npx" 
args = ["chrome-devtools-mcp@latest", "--browser-url=http://127.0.0.1:9222"]

[sandbox_workspace_write] 
network_access = true 

Now, every time we want to use Codex CLI with Chrome DevTools MCP, we must first run this command in the background:

nohup /usr/bin/google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug-headful --no-first-run --disable-gpu about:blank >/tmp/chrome-launch.log 2>&1

Gemini CLI

At the project level, run:

gemini mcp add chrome-devtools npx chrome-devtools-mcp@latest

This configures the following project settings:

{
  "mcpServers": {
    "chrome-devtools": {
      "command": "npx",
      "args": [
        "chrome-devtools-mcp@latest"
      ]
    }
  }
}

Other MCP clients

Follow the instructions in MCP Client configuration | ChromeDevTools/chrome-devtools-mcp: Chrome DevTools for coding agents.

Featured image by Growtika on Unsplash.

MCP Security

2025-11-04T00:00:00Z

Ziva Wernick did a Google AI workshop today and learned about MCP. She raised valuable concerns about MCP security and privacy.

Security: Has to do with the security risk of using MCP servers, and the possibility of those servers to facilitate malicious actions.
Privacy: Has to do with AI tools constantly collecting private information. In some cases there may be an option to opt-out, or pay for an enterprise license that limits what the provider can do with the data.

I will focus on Security in regards to how it works with agentic CLI tools and MCP servers.

MCP Horror Stories

Docker Blog wrote a series called MCP Horror Stories:

Part 1: MCP Security Issues Threatening AI Infrastructure | Docker
Part 2: MCP Horror Stories: The Supply Chain Attack | Docker
Part 3: The GitHub Prompt Injection Data Heist | Docker
Part 4: MCP Horror Stories: The Drive-By Localhost Breach | Docker

Unrelated to Docker, there’s also this article that features “Five Horror Stories That Actually Happened”: The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares

Five Horror Stories That Actually Happened 😱

The GitHub Data Heist (CVSS: 9.6/10)
The mcp-remote Catastrophe (437,000 Environments Compromised)
Container Escape via Tool Poisoning (CVSS: 9.4/10)
The Great Secrets Exposure
WhatsApp MCP Shadowing

For more information on each “horror story”, read the full article:

The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares

First Malicious MCP in the Wild

On 2025-09-25, Koi Blog wrote this article: First Malicious MCP in the Wild: The Postmark Backdoor That’s Stealing Your Emails | Koi Blog

Quote

postmark-mcp - downloaded 1,500 times every single week, integrated into hundreds of developer workflows. Since version 1.0.16, it’s been quietly copying every email to the developer’s personal server. I’m talking password resets, invoices, internal memos, confidential documents - everything.

This is the world’s first sighting of a real world malicious MCP server. The attack surface for endpoint supply chain attacks is slowly becoming the enterprise’s biggest attack surface.

The article generated some discussion, including on Hacker News: A Postmark backdoor that’s downloading emails | Hacker News. Some of the comments pointed out that the MCP risk isn’t really different from existing software risks:

Quote

This has nothing to do with MCP really, the same flaw is there in all software: you have to trust the author and the distributor. Nothing stops Microsoft from copying all your Outlook mail. Nothing stops Google from copying all your gmail. Nothing stops the Mutt project from copying all your email. Open source users like to think that “many eyes” keep the code clean and they probably do help, especially on popular projects where all commits get reviewed in detail, but the chance is still there. And the rest of us just trust the developers. This problem is as old as software.

Are MCP Security risks real or overblown?

MCP security risks are a real concern and I do not want to downplay that. In many ways though, these risks have existed for as long as software itself, MCP is just the latest attack vendor.

I will note that the blogs I featured here, from Docker and Koi Security, are from companies that attempt to sell solutions to this problem. This does not mean that the problem is not real or that the solutions are not needed, just something to note. I actually do find Docker’s MCP solutions to be very interesting (I mention Docker MCP Catalog below in Supply-Chain Security).

MCP Defense

The article “The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares” (mentioned above in MCP Horror Stories), suggests five defense solutions:

The Solution: Defense in Depth (That Actually Works) 🛡

Component Isolation
️Attack Surface Reduction
Supply Chain Security
Input/Output Sanitization
WhatsApp MCP Shadowing

For more information on each solution, read the full article:

The Day I Told 800+ Engineers Their AI Dreams Could Become Security Nightmares

What I Do

So far I have been limiting my MCP usage to personal projects and learning. Below are some of the things I have noted while learning about how to use MCP “safely”:

Supply-Chain Security

When Ziva asked about the MCP security risks, she was told to “read the code”. While it’s true that many MCP servers are open-source, reviewing all of them is not exactly feasible. I often do a surface level look at the repo, its activity and amount of stars, but this is not same as reviewing the code in-depth. For this reason, I believe it is worth using MCP servers by known publishers. Docker does come in handy here with their Docker MCP Catalog. While, this catalog is not as extensive as other MCP galleries, it focuses on quality over quantity. All of the MCP servers are in the Docker MCP Catalog are by known publishers. Note that I still refuse to use Docker Desktop (due to its license), but these MCP servers can also be used in Docker CLI together with an MCP client.

MCP Server Configuration

Some MCP servers may have permissive default permissions, but can be configured to be more “locked-down” and limited and what they can do and access.

As an example, Kubernetes MCP Server can be run in read-only mode (this is not the default but can be set with a flag when setting up the MCP server). In this mode, the Kubernetes MCP server cannot make changes to clusters (for example, it is unable to apply manifests, but can still view existing resources). Note that even in this mode there can be security risks. One example is viewing secrets. In Kubernetes, secrets are stored in Base64 strings, which are trivial to decode for anyone that has full read access to the cluster. I have personally witnessed Claude Code attempt to read and decode Kuberenets Secrets (either with Kubernetes MCP Server or just kubectl commands) when asked to help troubleshoot my homelab cluster. For this reason, when using agentic CLI tools, I prefer to approve each command individually. Further, Kubernetes access can be regulated with Role-based access control (RBAC).

Ignore files

Similar to .gitignore files, most agentic CLI tools have a way to exclude specific files from the context. For example, a .env file (that may include secrets), should be specifically excluded (when not doing this, I have seen Claude Code attempt to read these files). Unfortunately, there isn’t really a standard “ignore file” for this, each tool has it own way to achieve this. If using multiple tools, multiple files might be needed.

Documentation on excluding/ignoring files

Featured image by FlyD on Unsplash.

My Experience with Claude Sonnet 4.5 and Claude Code 2.0

2025-10-06T00:00:00Z

After the announcement of Claude Sonnet 4.5 and Claude Code 2.0, I finally had a little bit of time to experiment with the new Claude versions today.

My first impressions is Claude Sonnet 4.5 feels slightly better than Sonnet 4. At least that’s more than I can say for GPT-5, which my first impressions of weren’t as positive (it felt like a downgrade compared to o3, but I’ve gotten used to it).

Honestly, it’s hard to tell though. I find it hard to give objective feedback on LLM models. There are benchmarks that claim to be objective, but benchmarks don’t tell the full story of how a model actually feels in real world use. It’s kind of similar to how phone benchmarks don’t necessarily tell the fully story on how smooth a phone actually feels in real world use; for example Google Pixel models are not technically as powerful as some of the competition, but have optimized software that makes them feel smooth to use.

When evaluating LLM models, I try to use them as normal. Sometimes I give the same prompt to different LLM models to gauge the differences in answers and which gives the “best” response. However, even that is not always effective; since LLM answers are non-deterministic and even asking the same model inside the same tool the same prompt twice can give different answers (sometimes even wildly different). The differences can be even larger when using the same model across different tools. I feel like I get significantly different answers when using GPT-5 in ChatGPT 5, Microsoft Copilot, Cursor CLI, Codex CLI and Perplexity Pro.

Which brings me back to today. I was working on documentation frameworks, specifically setting up Docusaurus, with Claude Code 2.0 and Sonnet 4.5. This is actually a task I’ve done several times in the past with previous versions of Claude Code using the Sonnet 4 model. This time, I was trying to vibe code less and actually understand every line of code I was writing so that I would eventually feel confident deploying Docusaurus in production (using static website hosting). Nevertheless, I still used Claude Code to help me with some menial tasks, while making an effort to read every single line of code (rather than just “vibe coding”). Because I have done this task before, it might have been a decent benchmark if I had actually tried to examine it in that way, but really I was just trying to get a task done.

As for the results? I managed to achieve what I was trying to do, but really my goal in the first place was to rely less on AI. I still consulted Claude Code frequently. It gave some good responses, some dumb responses and some mid responses. Not too different from usual, maybe slightly better, but again hard to tell. I don’t plan to make a more rigorous test of Sonnet 4 vs Sonnet 4.5, I don’t mind trusting the benchmarks in this case. In many benchmarks Sonnet 4.5 even beats Opus 4.1!

Usage Limits

Before I even had a chance to try it myself, I saw many posts on r/ClaudeCode complaining about usage limits getting worse. Many of these posts were from users paying for the expensive $100-$200/month Claude MAX plans. A lot of them complained about reaching usage limits faster than before while using Claude Opus 4.1 in Claude Code. It’s not clear to me why those users insisted on still using Opus 4.1 despite some benchmarks showing that Sonnet 4.5 has surpassed it, but to be fair the ability to use Opus in Claude Code is one of the selling points of the MAX plans. On my $20/month Claude Pro plan, I can only use Opus 4.1 on claude.ai, not inside Claude Code. I haven’t found that a huge limitation though since I was still getting good results with Sonnet 4 and will presumably get even better results with Sonnet 4.5.

One of the most useful features added in Claude Code 2.0 is /usage, which allows to see daily and weekly usage. It still doesn’t show how much the tokens you use really cost, for that I still use ccusage.

Unfortunately, this comes with new weekly rate limits. I missed this at first but now I believe this might be the main cause of what the community has been complaining about it. Weekly rate limits were one of the features I disliked most about ChatGPT, back when o3 was limited to 50 prompts a week I was genuinely rationing my usage of o3. Since the launch of GPT-5, the limits for ChatGPT 5 Thinking have been raised significantly, to the point that I don’t reach those limitations anymore.

As for Claude Code, until now I found the usage limits to be fairly reasonable. The limits were in 5 hour blocks, not daily or weekly. It would take me two full hours of heavy vibe coding before a limit was actually reached. In cases where I was taking a more active role in coding I often did not reach the limit at all. Even when the limit was reached, it was unlikely I would have to wait the full 5 hours, since often I would be either in the middle or near the end of the 5 hour block anyway (one time I only had to wait 5 minutes for the limits to reset). The end result was that I felt like I could practically use Claude Code as much as I want without really worrying about limits, since worse case I would just take a break and wait a few hours for all of the limits to reset. I also saw little value in the more expensive Claude MAX plans.

Now with the weekly limits, there is a larger risk of reaching them. After just one day of medium usage, I already used 11% of the weekly limit (which resets on 2025-10-12). I’m not that worried though, since reaching the limits if anything would give me more time to experiment with other agentic CLI tools. I read that Codex CLI also has a weekly limit; one user claimed that Codex is so much better than Claude Code that they ration it, use CC for easier tasks and save Codex for the more complex tasks. In any case, I believe using a combination of free AI tools and paid subscriptions is both more cost-effective and more insightful compared to committing to one tool and paying an expensive “MAX” subscription.

Featured image by Aerps.com on Unsplash.

Agentic CLI Tools Comparison

2025-09-28T00:00:00Z

GitHub Copilot CLI is the latest Agentic CLI tool. Yet another Agentic CLI tool in the same style of Claude Code, Cursor CLI, Gemini CLI, Codex CLI and Qwen Code (and probably others that I am forgetting). So far I have tried all of these except for Qwen, and am now trying GitHub Copilot CLI as well.

All Agentic CLI tools look the same

All of these tools are superficially similar. Claude Code, GPT-5, Cursor CLI, Gemini CLI, Qwen Code and now GitHub Copilot CLI all have a TUI design that looks almost exactly the same, not even trying to hide that they’re copying each other. The notable exception is Codex CLI, which has its own TUI design. Honestly though I find Codex’s TUI to be inferior and kind of wish it also copied the others. I think the common design works well and don’t mind it, it’s just funny that all of these companies copy each other.

Another thing that is similar is that all these tools have npm as their primary installation option. While most tools can also be installed in other ways (such as Homebrew), npm is usually recommended first in their respective README files. Of course, npm has been widely-used for years and many developers already have it installed (these tools are primarily for developers, though they can do more than coding); however, I’ve personally never before seen npm recommended as the primary installation method before this wave of Agentic CLI tools started. Some of the tools are written in TypeScript so it makes sense. On the other hand, there’s Codex CLI, which has its own design and is written in Rust, but nevertheless adapted to work with npm (TIL Rust binaries can be distributed on npm).

Agentic CLI tools have differences

I mentioned these tools are superficially similar, however that doesn’t mean they all work the same. Outside of design and installation method, there’s the matter of functionality and how well these tools actually work. Differences include:

Model

Some tools are designed to work with one companie’s models. Claude Code of course uses Claude Sonnet and Claude Opus. OpenAI’s Codex CLI uses GPT-5 models (including GPT‑5-Codex). Gemini CLI uses Gemini 2.5 and 3 (Pro with a fallback to Fast). Other tools support a variety of different models through one service, for example Cursor CLI and GitHub Copilot CLI (the same is true for their non-CLI offerings). Others allow you to BYO (Bring Your Own) API keys (notably OpenCode).

Tools & Agentic Abilities

Even when two tools use the same AI model, that doesn’t necessarily mean they will work the same. These tools have agentic abilities, enhanced with tools and prompts. Tools can built-in or provided with MCP. As an example, Claude Code has a wide variety of built-in tools that allows it to read and write locals files, browse the web (Search and Fetch websites) and more. On the other hand, while Codex Is Improving, it still does not have as many built-in tools as Claude Code. When tools are missing or limited, the gap can be bridged either with other CLI programs (that these agentic tools know how to run directly) or MCP servers. Most if not all of these tools support both running CLI commands and interacting with MCP servers. Notably, Cursor CLI now supports MCP as well (when I first tried it, Cursor CLI was missing MCP support).

License

Not all of these tools are open source. In a way that is somewhat deceiving, several of these tools have a GitHub repo that is little more than a closed-source LICENSE and README, but does not actually include any code. At present, this even includes GitHub Copilot CLI, which is marked as Public Preview and has Pre-release License Terms (it is not clear to me what the license terms would be after release). Claude Code and Cursor CLI are also closed source (others may have copied CC’s design, but not its code). Gemini CLI is open source and was later forked to Qwen Code, which is also open source (both Apache-2.0). OpenCode is also open source (as its name implies), under MIT. charmbracelet/crush (from the same people who created some of my favorite Go CLI and TUI Frameworks) uses this weird license: Functional Source License, Version 1.1, MIT Future License.

Pricing & Usage Limits

These tools have different limits.

Claude Code

Out of all of these tools I have (so far) used Claude Code the most and am most fimilar with their pricing and usage limits. I am using Claude Pro on the $20 a month plan. Claude Code also has the crazy expensive Max plans ($100 or $200 a month). I have mentioned previously in my Claude Code notes about my experience using the Claude Code $20 plan. My experience honestly haven’t changed much. While there was some drama about Claude Code changing usage limits, I still rarely run into usage limits. When I do, I have to wait at most a few hours for the usage limits to reset. In that time I can either use other tools or take a break. Other than not having access to the Opus model on CC, I don’t feel like I’m missing anything by not being on Max and am still baffled at how people justify the price of those Max plans. ccusage implies I use more than $100 a month, significantly more than what I pay. Anthropic either operates at a loss or can somehow afford to do that since it’s their own models.

Gemini CLI

Gemini CLI has a generous free tier and is what I currently recommend for people wanting to try an agentic tool for free. I’m not sure whether my Google AI Pro trial increases my Gemini CLI usage limits or if it’s unrelated, I’m honestly kind of confused with Google’s various AI plans (in typical Google fashion).

Note

UPDATE: Google AI Pro and Ultra subscribers now get Gemini CLI and Gemini Code Assist with higher limits.

Codex

Included with paid ChatGPT plans including Plus, Pro and Team.

BYO (Bring Your Own) API keys

Ironically, the FOSS tools such as opencode and crush might actually be more expensive in this case. When using an API key you have to pay the “real” cost of running the AI model which can end up significantly more expensive than a set plan. The same is true when using Claude Code with an API key instead of a plan; in all but very moderate use a plan would make more sense. Even the expensive Max plans often end up cheaper than what equivalent API use would cost.

My Opinion

Claude Code remains my most used agentic CLI tool. Neverthelss, I am still actively experimenting with other tools, I have used Gemini CLI increasingly more in recent weeks (Gemini’s free tier is really good), and am also trying Codex due to its improvements. However, while these tools feel similar in many ways and the competition is closer than ever, I still feel that Claude Code with Claude Sonnet 4.5 is noticeably better than all other tools that I have used. This may change in the near future as all of these tools are actively developed and new ones are introduced all the time.

This is in addition to other AI tools which I am also actively using. Right now I am mainly using the web and app versions of ChatGPT, Gemini, Claude and Perplexity Pro (I also use Microsoft Copilot at work, but it’s not very good).

Featured image by Steve Johnson on Unsplash.

GPT-5

2025-08-16T00:00:00Z

This Week I Learned about GPT-5

At the announcement post, OpenAI made some bold claims about GPT-5. Including:

The best response, every time

OpenAI o3 Review

2025-05-28T00:00:00Z

I’ve used o3 extensively, and I think it’s a really strong model compared to earlier ChatGPT models.

It doesn’t just “think”; it also does web research and cross-checks sources to reach a conclusion. Other ChatGPT models can browse too, but o3 digs deeper and pulls more sources (when I read its “thoughts,” it said it tries to fetch at least 10 sources).

This is similar to what Deep Research does, which makes sense because ChatGPT’s DR used the o3 model even before it launched. However, DR returns essay-length answers (and is limited to 10 uses per month on ChatGPT Plus), which isn’t always practical. o3 gives answers closer in length to the other ChatGPT models. There are Plus usage limits, but I had to use it quite a lot before hitting them.

The model “thinks” for several minutes before responding. Usually it’s worth the wait, except for simple questions another model could answer faster. For complex questions, o3 is often noticeably better. I tried tough coding prompts that 4o struggled with (confident answers with hallucinations), then asked o3 and got much better results. For bigger tasks I sometimes had to tweak the prompt a few times, but in most cases o3 eventually delivered (unlike 4o).

The model isn’t perfect. There are still hallucinations and mistakes. Neverthless, in my experience fewer than other models I’ve tried.

AI moves so fast that it’s hard to keep up. Last month o3 was probably the best model around, and now people say Gemini 2.5 has overtaken it. It takes time to use a new model enough to really understand its strengths and weaknesses.

I also played a bit with ChatGPT 4.5 and 4.1. I haven’t used them much yet and so far I’m less impressed.

I haven’t tried o4 or o4-mini-high yet. Assuming o3 is better, I’d rather wait a few minutes for deeper reasoning. For simpler questions I still default to 4o.

Featured image by Levart_Photographer on Unsplash.