BMClogo

Anthropomorphic release Claude’s sonnet 4.5 And set new benchmarks for end-to-end software engineering and real-world computer use. The update also uses the same scaffolding for the ship concrete product surface changes (Claude Code checkpoint, native VS code extension, API memory/context tool) and the proxy SDK, which uses the same scaffolding internally to human use. Pricing for SONNET 4 ($3 input/$15 per million token production) remains the same.

What is actually?

  • Records of SWE basic verification. Human Report 77.2% Using a simple two-tool scaffold (bash + file editing), accuracy on datasets verified on 500 problem pallets, run on average 10 times, no test time calculations, 200k “Think” the budget. 1m-context setting reaches 78.2%and high axiom settings with parallel sampling and rejection raise it to 82.0%.
  • SOTA used by the computer. exist OSWorld Verificationsonnet 4.5 lead 61.4%Starting with 42.2% of SONNet 4, it reflects stronger tool control and UI manipulation of browser/desktop tasks.
  • Long-distance running autonomy. The team observed > 30 hours Continuous focus on multi-step coding tasks – actually jumping high early limits, directly related to proxy reliability.
  • Reasoning/Mathematics. The release describes the “substantial benefits” of common reasoning and mathematical floats; accurate per base number (e.g., AIME configuration). The safety posture is ASL-3, and the defense is quickly strengthened.
https://www.anththropic.com/news/claude-sonnet-4-5

What are the agents?

Sonnet 4.5 targets the fragile parts of the real agent: expansion planning, memory and reliable tool arrangement. Human Claude Agent SDK Reveal its production mode (long-term running tasks, licenses, secondary coordinated memory management), not just bare LLM endpoints. This means that the team can copy the same scaffolding as using Claude Code (now has checkpoints, refreshed terminals and VS code integration) to keep jobs consistent and reversible for multiple hours.

On the measurement task that simulates the “using computer” 19-point jumps are worth noting. It can track the ability of the model to navigate, fill spreadsheets and complete web streams in Anthropic’s browser demonstration. For businesses experimenting with Agesic RPA style jobs, higher OSWorld scores are often associated with lower rates of intervention during execution.

Where can you run it?

  • Human APIs and applications. Model ID claude-sonnet-4-5; Price parity with sonnet 4. File creation and code execution are now available directly in the paid layer of the Claude application.
  • AWS Bedrock. Agent department available through bedrock and with integrated pathways; AWS highlights long horse proxy sessions, memory/context features, and operational controls (observability, session isolation).
  • Google Cloud Vertex AI. GA On Vertex AI, through ADK/Agent Engine, multi-agent orchestration is supported through configuration throughput, 1M token Analysis jobs and timely caches.
  • Github sub-brand. Public previews for cross-copy chat (VS code, web, mobile) and Copilot CLI are launched; organizations can be enabled through policies and BYO keys are supported in VS code.

Summary

With records 77.2% SWE bench Verified Score under transparent constraints 61.4% OSWorld Verification Computer usage clues and actual updates (checkpoints, SDK, secondary copper/bedrock/vertex availability), Claude’s sonnet 4.5 It’s for development Heavier agent workloads for long-running tools Instead of a brief demonstration prompt. Independent replication will determine the durability of the “best coding” claim, but the design goals (autonomy, scaffolding and computer control) are consistent with today’s actual production pain relief points.


Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels in transforming complex data sets into actionable insights.

🔥 (Recommended Reading) NVIDIA AI Open Source VIPE (Video Pose Engine): a powerful and universal 3D video annotation tool for spatial AI



Source link