TLDR:
- Anthropic debuts computer control capabilities for AI in latest Claude update
- New models (Claude 3.5 Sonnet upgrade and Haiku) show significant performance gains
- Beta testing partners include major tech companies like Amazon and Canva
- AI can now navigate computers, websites, and applications like a human user
- Models demonstrate improved coding abilities with higher benchmark scores
Anthropic has unveiled a new capability that allows its AI to interact with computers just like human users do. The announcement, made on Tuesday, October 22, 2024, introduces an upgraded version of Claude 3.5 Sonnet and a new model called Claude 3.5 Haiku.
The standout feature of this release is the Computer Use capability, now available in public beta. This groundbreaking function enables Claude to perform tasks that previously required human intervention, such as moving a cursor, clicking buttons, and navigating through websites and applications.
Several major technology companies have already begun testing these new features. Amazon received early access to the technology, while companies like Asana, Canva, and Notion have been exploring its potential applications since early 2024. These early adopters have been using the system to automate complex tasks that typically require multiple steps to complete.
According to Anthropic’s chief science officer, Jared Kaplan, the system’s capabilities extend far beyond simple tasks.
“We’re looking at an AI that can handle operations requiring tens or even hundreds of steps,”
Kaplan explained in a recent interview. The technology aims to assist with everyday tasks such as travel booking, appointment scheduling, and expense report filing.
The upgraded Claude 3.5 Sonnet has shown remarkable improvements in its coding abilities. The model achieved a 49% score on the SWE-bench Verified test, marking a substantial increase from its previous performance of 33.4%. This score places it ahead of other publicly available AI models, including those specifically designed for coding tasks.
Meanwhile, the new Claude 3.5 Haiku model has managed to match the performance of the previous generation’s top model while maintaining cost efficiency and speed. It scored an impressive 40.6% on SWE-bench Verified, outperforming many competing systems including earlier versions of Claude and GPT-4o.
Early feedback from users has been encouraging. GitLab’s testing revealed improved reasoning capabilities of up to 10% across various use cases, without any increase in processing time. The Browser Company reported that the new Claude 3.5 Sonnet performed better than any other model they had previously tested.
Safety remains a priority in this release. Both the US AI Safety Institute and the UK Safety Institute participated in pre-deployment testing of the new models. Anthropic has also developed special monitoring systems to detect potential misuse of the computer control features.
The Computer Use feature works through a specialized API that gives Claude the ability to understand and interact with computer interfaces. In testing on the OSWorld platform, which measures how well AI systems can use computers like humans, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category, significantly higher than the next best system’s score of 7.8%.
However, the system still faces some challenges. Basic actions like scrolling, dragging, and zooming can be difficult for the AI to execute smoothly. Anthropic acknowledges these limitations and recommends that developers start with simpler tasks during the beta phase.
This release comes at a time of intense competition in the AI industry. Major technology companies including Google, Microsoft, and Meta are all working to advance their AI capabilities in a market expected to generate over $1 trillion in revenue within the next ten years.
Anthropic’s growth has been remarkable since it first released Claude in March 2023. The company has expanded its offerings to include mobile apps, business team plans, and recently launched Claude Enterprise for corporate integration.
The new Computer Use capability will be accessible through multiple platforms, including Anthropic’s own API, Amazon Bedrock, and Google Cloud’s Vertex AI. The Claude 3.5 Haiku model will be released later this month, starting with text capabilities and adding image processing features in future updates.
Developers can now access the Computer Use beta through various cloud platforms, while the upgraded Claude 3.5 Sonnet is available to all users at the same price point as its predecessor.
Stay Ahead of the Market with Benzinga Pro!
Want to trade like a pro? Benzinga Pro gives you the edge you need in today's fast-paced markets. Get real-time news, exclusive insights, and powerful tools trusted by professional traders:
- Breaking market-moving stories before they hit mainstream media
- Live audio squawk for hands-free market updates
- Advanced stock scanner to spot promising trades
- Expert trade ideas and on-demand support