Link Copied!

Claude Opus 4.7 Just Dropped. Here's Everything New.

Anthropic released Claude Opus 4.7 today with a 13% coding benchmark gain, a new xhigh effort tier, a /ultrareview mode, and high-resolution vision. Pricing is unchanged at $5 and $25 per million tokens, but a new tokenizer can raise effective costs by up to 35%. Here is what changed.

A swirling molten-orange energy tornado erupts from an open Anthropic Opus 4.7 product box on a developer's desk, scattering papers and a tumbling coffee mug in a sunlit home office.

What Anthropic Just Shipped

Anthropic released Claude Opus 4.7 on April 16, 2026, rolling it out the same day across the Claude apps, the Application Programming Interface (API), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The pitch is straightforward: better coding and agentic performance over Opus 4.6, sharper vision, a new /ultrareview command in Claude Code, and an extra effort tier called xhigh.

The model ID is claude-opus-4-7, and the price is unchanged from 4.6 at $5 per million input tokens and $25 per million output tokens. The unit “MTok” (million tokens) is roughly how Claude bills text going in and out. It is the same sticker number as last quarter, with one asterisk that gets its own section below.

The Benchmark Numbers Anthropic and Customers Cite

Between Anthropic’s own claims and the early-access customer testimonials on the launch page, several benchmark gains over Opus 4.6 stand out:

  • +13% resolution on Anthropic’s 93-task coding benchmark. Anthropic’s own figure.
  • Three times as many production tasks resolved on Rakuten-SWE-Bench, a software engineering benchmark named for the Japanese internet company Rakuten. Anthropic’s reported figure.
  • 90.9% accuracy on BigLaw Bench for Harvey at high effort, a benchmark run by legal AI firm Harvey that tests models on real lawyer tasks like document review and drafting.
  • A state-of-the-art score on the Finance Agent evaluation, which Anthropic flags as a headline result without publishing the exact number on the launch page.
  • A tied top score of 0.715 across six modules on an internal research-agent benchmark, per a testimonial from Michal Mucha, Lead AI Engineer at Applied AI, quoted on Anthropic’s launch page.

These figures come from Anthropic’s own claims and from early-access customer testimonials published on its launch page. Independent third-party evaluations will trickle in over the coming weeks. If you are shopping the current frontier against GPT or Gemini, wait for outside scores before locking in a provider.

What’s Actually New in the API

Benchmark claims aside, a handful of features and behaviors matter if you are a developer or a power user:

  • A new xhigh effort level. Anthropic now exposes five effort levels that trade off intelligence against token spend: low, medium, high, xhigh, and max. The company explicitly recommends starting at xhigh for coding and agentic use cases.
  • Task budgets (public beta). You tell the model roughly how many tokens it has for a full agentic loop, including thinking, tool calls, tool results, and final output. The model sees a running countdown and paces itself. Minimum budget is 20,000 tokens.
  • High-resolution vision. Opus 4.7 is the first Claude that accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels, up from a 1,568-pixel cap on prior models. Bounding-box coordinates returned by the model are 1:1 with actual pixels now, so no rescaling in your front-end.
  • /ultrareview in Claude Code. A dedicated slash command that runs a longer, more thorough review pass on code.
  • 1 million token context at standard pricing. No long-context premium, matching the 1M window 4.6 already had but at the regular rate.

About That Unchanged Price

Opus 4.7 ships a new tokenizer. That sounds like pedantic plumbing, but it has a direct effect on your bill.

The migration guide states that the new tokenizer “may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content).” Per-token prices are unchanged. Per-request costs on the same prompt can rise, depending on what kind of text you send. Code-heavy and structured content is likelier to sit closer to the 1.35x ceiling; plain prose sits closer to 1x.

If you care about AI spend:

  • Re-run cost estimates using Anthropic’s /v1/messages/count_tokens endpoint against representative prompts before switching production traffic.
  • Expect the biggest deltas on vision-heavy workloads. Full-resolution images can consume up to roughly 4,784 tokens per image on 4.7, versus a previous cap near 1,600 tokens. That is close to triple if you stop sending downsampled images.

Anthropic suggests downsampling images when you do not need the added fidelity, and using the new task_budget control to cap spend. The math is not sneaky, but it is also not free.

Breaking Changes You Need to Know

If you already ship on Claude, a few things now return HTTP 400 errors on Opus 4.7:

  • temperature, top_p, and top_k are gone. Setting any of these to a non-default value errors out. Remove them from your request payloads.
  • Manual extended thinking is removed. thinking: {type: "enabled", budget_tokens: N} no longer works. Switch to thinking: {type: "adaptive"} and use the effort parameter instead.
  • Assistant-message prefilling errors. Use structured outputs or system prompts to steer format.
  • Thinking summaries are off by default. If your interface streams reasoning to users, explicitly set thinking.display to "summarized", or the UI (user interface) shows a long pause before output begins.

Anthropic also ships a /claude-api migrate helper inside Claude Code that rewrites these patterns automatically across a codebase, with a confirmation prompt before it touches files.

Behavior Changes That Will Surprise You

Not every difference breaks the API. Some just change the feel of the model.

The biggest one: Opus 4.7 follows instructions more literally than 4.6. In Anthropic’s own wording, the model “will not silently generalize an instruction from one item to another, and it will not infer requests you didn’t make.” Anthropic frames this as a precision win for structured extraction and pipelines. The trade-off is that any prompt that quietly relied on the old model filling in gaps will need a review.

Other behavior shifts:

  • Response length adapts to task complexity. Shorter answers on simple lookups, much longer ones on open-ended analysis.
  • More direct tone. Less validating language, fewer emoji, and less of 4.6’s warmer style.
  • Fewer subagents and fewer tool calls by default. The model reasons more on its own and delegates less. You can steer this back up with prompting or higher effort.
  • Strict effort calibration. At low and medium, the model scopes its work tightly to what you asked. On complex tasks, Anthropic recommends raising effort to high or xhigh rather than prompting around shallow reasoning.
  • New runtime cybersecurity safeguards. Requests involving prohibited or high-risk security topics can now trigger refusals. Anthropic runs a Cyber Verification Program for legitimate penetration testing, vulnerability research, and red-team work to get reduced restrictions.

Where You Get It

Opus 4.7 landed on GitHub Copilot the same day, for Copilot Pro+, Business, and Enterprise plans, at a 7.5x premium request multiplier through April 30, 2026. After that, expect the multiplier to reset to whatever Copilot settles on as the permanent rate. Bedrock, Vertex AI, Microsoft Foundry, and Claude Code all get the model on launch day.

Opus 4.6, released in February, stays available as claude-opus-4-6. If you want a migration window before flipping production traffic, 4.6 is not going anywhere yet. For context on the prior release, see the Opus 4.6 launch and its “Manager” model framing.

The Bottom Line

Claude Opus 4.7 is a credible, developer-focused upgrade. Measurably better coding scores, high-resolution vision, a new effort tier, a genuinely useful /ultrareview mode, and the same $5 and $25 sticker price as 4.6. The caveat is real: a new tokenizer and a 3x image-token jump mean bills on the same workload can rise without a single headline number changing. If you ship on Claude, treat this like any other model upgrade. Re-benchmark cost, re-tune prompts for literal instruction following, and test before flipping production traffic. If you just use Claude to write or review code, flip the effort dial to xhigh and see what happens.

Sources

🦋 Discussion on Bluesky

Discuss on Bluesky

Searching for posts...