Microsoft’s New AI Models Go Beyond Just Text


Microsoft is doubling down on AI models that aren’t large language models. The company announced on Thursday that it’s releasing three new models: brand new models for voice and text transcription, and the second generation of its in-house image model.

The voice and text transcription models are the first of their kind from Microsoft. The transcription model can translate recordings into text in 25 different languages. It’s built for video captioning, meeting transcription and voice agents. The voice model can create audio recordings up to 60 seconds long. The company says its second-generation image model has a faster generation speed and more lifelike depictions, improving on its previous model. They’re available now in Microsoft’s Foundry and MAI playground, with future plans to bring MAI-Image-2 to Bing and PowerPoint. Developers can check out pricing info here.

These new models are a clear sign that Microsoft is looking to expand its offerings across the AI market. Microsoft’s Copilot is one of the most popular chatbots for businesses, especially those who already use Microsoft’s Office 360 suite and Azure cloud service. Aside from the now-outdated original image model, Microsoft has primarily focused on text-based models, trying to distinguish itself among its many competitors as a secure, enterprise-friendly option. Its newest AI tools, Copilot Cowork and Copilot Health, are proof of that.

AI Atlas

The models are also a reminder that Microsoft, as a legacy tech company, has the cash and compute to burn on these kinds of “side quests” that even billion-dollar start-ups like OpenAI can’t always afford to do. Last week, OpenAI confirmed that it will be discontinuing its Sora AI video app, citing that it will refocus on core activities. The AI industry in 2026 has been aiming to prove its tools are useful in the workplace, especially with Anthropic’s Claude Code leapfrogging the competition. 

Generative media, like the models that power AI image and video generation, require a lot of compute and energy to run, which could be spent elsewhere. Google, as another legacy tech company with billions of its budget allocated to AI research, indicated this week that it won’t be giving up on generative media but will be trying to make models more cost- and energy-efficient, as with its new Veo 3.1 Lite video model.





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


A new class-action lawsuit, filed on Monday by three teenage girls and their guardians, alleges that Elon Musk’s xAI created and distributed child sexual abuse material featuring their faces and likenesses with its Grok AI tech.

“Their lives have been shattered by the devastating loss of privacy, dignity, and personal safety that the production and dissemination of this CSAM have caused,” the filing says. “xAI’s financial gain through the increased use of its image- and video-making product came at their expense and well-being.”

From December to early January, Grok allowed many AI and X social media users to create AI-generated nonconsensual intimate images, sometimes known as deepfake porn. Reports estimate that Grok users made 4.4 million “undressed” or “nudified” images, 41% of the total number of images created, over a period of nine days. 

X, xAI and its safety and child safety divisions did not immediately respond to a request for comment.

The wave of “undressed” images stirred outrage around the world. The European Commission quickly launched an investigation, while Malaysia and Indonesia banned X within their borders. Some US government representatives called on Apple and Google to remove the app from their app stores for violating their policies, but no federal investigation into X or xAI has been opened. A similar, separate class-action lawsuit was filed (PDF) by a South Carolina woman in late January.

The dehumanizing trend highlighted just how capable modern AI image tools are at creating content that seems realistic. The new complaint compares Grok’s self-proclaimed “spicy AI” generation to the “dark arts” with its ease of subjecting children to “any pose, however sick, however fetishized, however unlawful.”

“To the viewer, the resulting video appears entirely real. For the child, her identifying features will now forever be attached to a video depicting her own child sexual abuse,” the complaint reads.

AI Atlas

The complaint says xAI is at fault because it did not employ industry-standard guardrails that would prevent abusers from making this content. It says xAI licensed use of its tech to third-party companies abroad, which sold subscriptions that led abusers to make child sexual abuse images featuring the faces and likenesses of the victims. The requests ran through xAI’s servers, which makes the company liable, the complaint argues.

The lawsuit was filed by three Jane Does, pseudonyms given to the teens to protect their identities. Jane Doe 1 was first alerted to the fact that abusive, AI-generated sexual material of her was circulating on the web by an anonymous Instagram message in early December. The filing says she was told about a Discord server by the anonymous Instagram user, where the material was shared. That led Jane Doe 1 and her family, and eventually law enforcement, to find and arrest one perpetrator.

Ongoing investigations led the families of Jane Does 2 and 3 to learn their children’s images had been transformed with xAI tech into abusive material.





Source link