AI Model Release Tracker: Anthropic releases Sonnet 5


Anthropic

Samuel Boivin/NurPhoto via Getty Images

Follow ZDNET: Add us as a preferred source on Google.


AI labs are shipping new models nonstop. Besides being better and faster than their predecessors, every new model isn’t guaranteed to be a major step change, despite how the company’s PR may wax poetic about them. Model strengths really emerge in context: Where are competitor models lacking or excelling? Which models have outstanding specialties, and which are just catching up to industry standards?  

Also: How we test AI at ZDNET

Our Model Release Tracker helps you make sense of where models stand relative to each other, and whether they’re worth a deeper look. While we don’t test every model or model update on this list, we’ll always include the key elements you need to know, along with our hands-on expert test, where applicable. We also include an Expert Score for certain models. Curious about how we test AI? Check out this breakdown of our process

Here are the biggest model releases of 2026 so far and what to know about them. We’ll update this list whenever a notable new model arrives. 


Sonnet 5

Anthropic | June 30, 2026

What it does: Pivoting from its recent focus on Opus models, Anthropic released Sonnet 5. According to Anthropic, Sonnet 5 can “make plans, use tools like browsers and terminals, and run autonomously at a level that, just a few months ago, required larger and more expensive models.” The company said it performs similarly to Opus 4.8, released a month ago, but costs less. Sonnet 5 starts at $2 per million input tokens but will jump to $3 per million in September. It’s now the default in Free and Pro plans and available to all other plan types (Max, Team, and Enterprise). 

Also: Why AI tokens will send your enterprise cloud bill sky-high again

In keeping with the agentic focus of the AI industry at large, Sonnet 5 scores notably high on computer use benchmarks and agentic coding — testing found it completed complex tasks that earlier Sonnet models couldn’t. It also comes with safeguards implemented automatically, a nod to both recent PR around Mythos and overall improvements in general intelligence. 

Why it matters: With the rate of AI model development continuing to accelerate, it’s a fair bet that Sonnet 5 is in fact a significant upgrade over Sonnet 4.6, which was released back in February (eons ago in current shipping time). The timing of this release is also significant; Anthropic noted that Sonnet 5 “has a much lower ability to perform dangerous cybersecurity tasks than our current Opus models,” perhaps a stabilizing mention after the complex rollout of its powerful Fable 5 and Mythos 5 models. Ironically, however, Sonnet 5 showed a higher rate of misaligned behavior than Mythos Preview.


Fable 5 and Mythos 5  

Anthropic | June 9, 2026

What it does: ZDNET senior contributing editor David Gewirtz called Fable 5 a “defanged” version of Mythos, made safe for public use. Anthropic released Mythos 5 to those who already had Mythos Preview access through Project Glasswing. Fable 5 is restricted from responding to high-risk queries about topics like cybersecurity and biological weapons, but still “Mythos-class,” which, by Anthropic’s telling, means a step change in capability. As for Mythos 5, Anthropic said it had plans to expand access beyond initial partners through a systematic program. But both models were pulled barely four days after their release on orders from the US government (though on June 26, the government re-allowed access to Mythos 5 for certain partners). 

Also: Why Anthropic suddenly pulled Fable 5 and Mythos 5 for everyone

Why it matters: The models created a lot of commotion in the few days they were available, not just because of their supposed basis in Mythos. Fable 5 beguiled safety testers who were unaware that the model was set to downgrade to Opus in response to certain questions, which created trust issues between researchers and Anthropic. Despite being guardrailed, the fact that security authorities felt compelled to shut down the models so quickly after their release suggests the government found some way to jailbreak Fable 5, though Anthropic framed this as “narrow” in their understanding. 

While the details remain unclear, it’s a rare instance of government authorities intervening in an American AI product in public, which could mean Mythos-level models have caused a shift in the relatively hands-off approach to AI labs the Trump administration has had thus far. 


MAI-Thinking-1 

Microsoft AI | June 2, 2026

What it does: At its Build developer conference, Microsoft said that this new 35-billion-parameter model is, unsurprisingly, designed for multi-step agentic tasks. It scored similarly on the SWE Bench Pro benchmark test for coding as Anthropic Opus 4.6. The company also noted that enterprise users can trust this model for any use because it was trained only on clean, commercially safe data — an important tidbit given mounting AI copyright lawsuits. 

Also: Microsoft’s first reasoning model is one of 7 AIs just released at Build – what we know so far

Why it matters: This is the first reasoning model from Microsoft AI, a notable milestone for any AI lab, but especially so this late in this race. Most labs have created multiple advanced reasoning models at this point, so it’s unclear yet where Microsoft’s approach stands relative to those, though the company is focusing on appealing to enterprise clients specifically. 


Claude Opus 4.8 

Anthropic | May 28, 2026

What it does: Replacing Opus 4.7 starting May 28 (at the same price), Opus 4.8 offers faster thinking modes for one-third the cost of the earlier version, according to Anthropic. Like most of Anthropic’s models, 4.8 prioritizes coding abilities, scoring higher than 4.7 on two coding benchmarks but not fully besting OpenAI’s GPT 5.5. It also “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest,” the company noted in the release, though definitions for what that means remain murky. 

Also: I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test – and a legal prompt broke it

Why it matters: Anthropic has always prioritized model safety and interpretability, but appears to be further emphasizing that standard with this release. The company said Opus 4.7 had a 92% honesty rate, in addition to being less sycophantic and hallucination-prone overall. The fact that it claims 4.8 shows “substantially” lower rates of misalignment than 4.7 indicates an increasingly high standard for model safety, especially because Anthropic compared 4.8’s alignment to that of Mythos Preview


Gemini 3.5 Flash

Google | May 19, 2026

What it does: At Google I/O, the company launched its 3.5 model family, designed for building agents (following suit with where all of the AI industry is focused at the moment). The first model available now is Gemini 3.5 Flash, which beat Gemini 3.1 Pro on a few coding and agentic benchmarks and now undergirds AI Mode in Search and the Gemini App as the default model. Like all Flash models, it’s optimized for speed and a cheaper, more lightweight user experience, but can still handle “long-horizon” agentic tasks, Google noted. 3.5 Pro should roll out in June. 

Also: Anthropic launches Opus 4.8, with honesty as its killer feature

Why it matters: The most important place 3.5 will operate is in Search. As Google doubles down on an AI-first search experience, the models running it shape the responses people will passively consume every day, meaning it has to be especially up to snuff in terms of accuracy and hallucinations. Notably, though, the System Card for 3.5 Flash lacks any mention of hallucination rate or sycophancy. 


GPT-5.5 Instant  

OpenAI | May 5, 2026

What it does: OpenAI said in its announcement that the lighter version of OpenAI’s just-released GPT-5.5 is less verbose than its predecessor, GPT-5.3 Instant. It also touted fewer hallucinations and improved factuality, saying “GPT‑5.5 Instant produced 52.5% fewer hallucinated claims than GPT‑5.3 Instant on high-stakes prompts covering areas like medicine, law, and finance.” 

Also: Anthropic’s Mythos is evolving faster than expected, reports AI safety agency

Why it matters: GPT-5.5 Instant replaces GPT-5.3 as the default model in ChatGPT. Again, while the expectation is that each new AI model gets more efficient, easier to use,  and makes up less stuff, a significant improvement in hallucinations for a model most people use for fast queries could mean less misinformation spreading among the masses. That’s especially critical given how many people are using ChatGPT for everyday health questions, for example. 

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)


Nemotron 3 Nano Omni 

Nvidia | April 28, 2026

What it does: The latest in Nvidia’s open Nemotron family, this model provides agents with multimodal input. That means they can “perceive and reason across visual, audio, and textual inputs within a single shared perception‑to‑action loop,” according to Nvidia, thereby unifying multiple capabilities into a single system. 

Also: AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

Why it matters: Normally, systems of agents need to use separate models for speech, vision, and text, meaning they jump across documents, video, and audio to complete multi-step tasks. That slows down workflows, undermines the context agents gather, and racks up inference costs. Nvidia’s approach, if it works, would streamline this process and reduce token use, saving you money. Try it on Hugging Face


GPT-5.5  

OpenAI | April 23, 2026 

Expert Score: 93/100

What it does: ZDNET tester-in-residence David Gewirtz technically gave GPT-5.5 an A- score, but said it “can be reductively described as better and faster than GPT-5.4,” which is hopefully the bare-minimum expectation for a new model. Specifically, though, the model got better at agentic coding, clearly identifying concepts, scientific research, and factual accuracy. 

Also: I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

Why it matters: While the model itself may not be leaps and bounds ahead of its immediate predecessor, the quick turnaround from 5.4 to 5.4 — less than two months — indicates how rapidly agentic coding is accelerating OpenAI’s model release cycle. As ZDNET’s David Gewirtz breaks down, the company, much like other frontier labs using AI to build AI, is shipping updates at an exponentially increasing rate. 


ChatGPT Images 2  

OpenAI | April 23, 2026 

What it does: Soon after sunsetting Sora, its generative video model and social platform, OpenAI somewhat confusingly announced Images 2. ZDNET model tester David Gewirtz got an early look at Images 2 before its release and was impressed. While he didn’t give this model a formal Expert Score, he said it’s fun, a huge leap, and actually useful for work

Why it matters: OpenAI seemed to be getting out of the more consumer-minded AI product game when it discontinued Sora, having been beaten by Anthropic at securing lucrative enterprise contracts. That OpenAI still came out with Images 2 within that redirection narrative indicates that it sees image generators as relevant enough to enterprise AI — especially on the heels of Anthropic’s Claude Design


Claude Opus 4.7 

Anthropic | April 16, 2026 

What it does: Arriving relatively quickly after Opus 4.6, this model boasts new highs in honesty, reduced sycophancy, and hallucinations. It also appears to have a knack for cybersecurity, as it backs the new Claude Security, released shortly after the model itself — but no, it’s not Mythos, as many suspected. 

Also: Anthropic’s new Claude Security tool scans your codebase for flaws – and helps you decide what to fix first

Why it matters: Hallucinations and honesty are among the most difficult, hard-to-solve issues plaguing even the best models. For Anthropic to claim such significant gains in those areas is no small feat for an AI lab that takes safety seriously. 


Claude Mythos (Preview) 

Anthropic | April 7, 2026 

What it does: This is a tough one because Mythos isn’t actually available to the public. Anthropic created quite a media storm when it positioned the new general-purpose model as too powerful to release as usual. While the model is apparently a step change from earlier Anthropic models, the company was especially alarmed because of the security threat it posed, stating that “it is strikingly capable at computer security tasks.” 

In response to that, Anthropic spearheaded Project Glasswing, a collaborative effort with several rival AI labs, including Google, Nvidia, and Microsoft, as well as security authorities like Palo Alto Networks, “to help secure the world’s most critical software, and to prepare the industry for the practices we all will need to adopt to keep ahead of cyberattackers.” 

Also: Apple, Google, and Microsoft join Anthropic’s Project Glasswing to defend world’s most critical software

Why it matters: If we’re to believe Anthropic’s guidance that Mythos poses a significant threat to the world’s software — so much so that only a select few partners can access it — cybersecurity apparatuses as they stand may not be prepared to meet the rapidly evolving frontier of model capabilities. Mythos may not be the only model of its caliber, but simply the first of many to come once other labs achieve similar breakthroughs. 

For now, just a few weeks into its release, Mythos is helping catch software bugs in droves. 


GPT-5.4  

OpenAI | March 5, 2026 

What it does: OpenAI framed this new model, released barely three months after GPT-5.2, as specifically designed for professional work. According to the company’s own testing (which should always be taken with a grain of salt until verified by a third party), GPT-5.4 matches or outperforms human professionals 83% of the time. 

Why it matters: As AI companies focus more on gaining enterprise trust (and contracts) while lauding what agentic AI can do, they need models that can handle complex work-related tasks with minimal risk, delay, or prohibitively high costs. Any model advancement that shows prowess in professional workflows has a better chance of being taken seriously by companies struggling to adopt AI, though nothing guarantees seamless integration. 

Also: OpenAI’s new GPT-5.4 clobbers humans on pro-level work in tests – by 83%


Claude Opus 4.6 

Anthropic | Feb. 5, 2026 

What it does: This model quickly redefined the standard for autonomous agentic work, especially for coding. That’s no surprise given Anthropic’s authority in building models especially adept at programming tasks. Opus 4.6 also demonstrated improvement in complex, longer-running tasks overall. 

Why it matters: Opus 4.6’s ability to handle tasks better on its own means you can reliably offload more of your workflow to it — something agentic offerings usually struggle with. 

Also: Anthropic says its new Claude Opus 4.6 can nail your work deliverables on the first try


GPT-5.3-Codex  

OpenAI | Feb. 5, 2026 

What it does: This new coding model — which OpenAI said helped build and debug itself — can be interrupted and redirected mid-task, which, if true, is a huge boon for developers using it on complex or shifting projects with tons of trial-and-error. GPT-5.3-Codex also boasts run times of over a day and a better grasp on user intent. 

Also: OpenAI’s new Spark model codes 15x faster than GPT-5.3-Codex – but there’s a catch

Why it matters: OpenAI is trying to catch up to Anthropic’s lead in agentic coding (and, coincidentally or not, released 5.3 Codex on the same day as Anthropic launched Opus 4.6). While ZDNET experts often prefer Claude Code to other tools for vibe coding, OpenAI’s rumored shift toward enterprise clients and away from fun consumer tools could eventually close that gap. 





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


When to watch Brazil vs. Panama

  • Saturday, May 30, at 5:30 p.m. ET (2:30 p.m. PT).

Where to watch

  • There are no confirmed broadcasters for this match in the US.

73% off with 2yr plan (+4 free months). Now only $3.49/month


See more details

Brazil plays its final World Cup warm-up before heading to the US as it hosts Panama today at the iconic Estadio do Maracanã in Rio.

Having overseen a disappointing five wins, two draws and three defeats since taking over as Brazil boss last year, head coach Carlo Ancelotti will be hoping to improve on that record on home turf today in the penultimate friendly before next month’s tournament. 

Facing the Seleção is a Panama team preparing to step on world football’s biggest stage for only the second time in the nation’s history. Today’s tricky fixture looks set to provide an ideal test for Thomas Christiansen’s men, who face a real battle to come through a tough-looking Group L, which also features England, Croatia and Ghana. 

Brazil takes on Panama at the Estadio do Maracanã in Rio de Janeiro on Sunday, May 30. Kickoff is set for 6:30 p.m. BRT local time. That makes it a 5:30 p.m. ET or 2:30 p.m. PT kickoff in the US and Canada. For football fans in the UK, it’s a 10:30 p.m. BST start, while for viewers in Australia, the game gets underway at 7:30 a.m. AEDT on Sunday morning. 

Carlo Ancelotti, Head Coach of Brazil, looking onwards.

Brazil boss Carlo Ancelotti will have to make do without veteran star Neymar in his World Cup preparations, with the Santos forward set to miss today’s game and next week’s friendly against Egypt with a calf injury.

Maddie Meyer/Getty Images

Can I livestream the Brazil vs. Panama match in the US?

No broadcaster in the US has the rights to show this match live.

That also means that if you’re traveling in the US, you’re unlikely to be able to watch the game as you normally would at home due to geoblocking.     

Livestream the Brazil vs. Panama match in Brazil

This World Cup warm-up will be broadcast on free-to-air Globo Brazil, which means it will also be available to stream online via the network’s Globoplay streaming service.

Globo

Globo’s streaming service Globoplay is free to use for viewers in Brazil, with dedicated apps for Android and Apple devices, as well as Amazon Fire and other smart TVs.

Can I livestream the Brazil vs. Panama match in the UK, Canada or Australia? 

It’s the same story as the US, with no broadcaster currently scheduled to show this World Cup warm-up match live in any of these three regions.

How to watch the Brazil vs. Panama match online from anywhere using a VPN

If you’re traveling abroad and want to keep up with all the international soccer action while away from home, a VPN can help enhance your privacy and security when streaming.  

It encrypts your traffic and prevents your internet service provider from throttling your speeds. Additionally, it can be helpful when connecting to public Wi-Fi networks while traveling, providing an extra layer of protection for your devices and logins. VPNs are legal in many countries, including the US and Canada, and can be used for legitimate purposes such as improving online privacy and security.  

However, some streaming services may have policies restricting VPN use to access region-specific content. If you’re considering a VPN for streaming, check the platform’s terms of service to ensure compliance.

If you choose to use a VPN, follow the provider’s installation instructions to ensure you’re connected securely and in compliance with applicable laws and service agreements. Some streaming platforms may block access when a VPN is detected, so verify whether your streaming subscription allows VPN use.

James Martin/CNET

Price $78 for two yearsLatest Tests No DNS leaks detected, 18% speed loss in 2025 testsJurisdiction British Virgin IslandsNetwork 3,000 plus servers in 105 countries

ExpressVPN is our current best VPN pick for people who want a reliable and safe VPN, and it works on a variety of devices. Prices start at $3.49 a month on a two-year plan for the service’s Basic tier.

Note that ExpressVPN offers a 30-day money-back guarantee.

73% off with 2yr plan (+4 free months). Now only $3.49/month





Source link