I Chatted With Google’s Lifesize, Hyperreal AI Companion


Sitting down in a comfortably air-conditioned booth in the overheated outdoor demo landscape at Google I/O this week, I had a pleasant and uncanny chat with a smiling person who wasn’t actually there. Nor were they even a real human being. 

The booth was a demo setup of Google Beam, a large-screen camera-studded telepresence video screen device. As we talked eye to eye, it all felt shockingly real.  

I’ve experienced Google Beam demos before, back when it was called Project Starline: with actual humans on the other end, in holographic glasses-free 3D. This was a whole new twist, and while the demo was 2D, it could easily adopt 3D rendering, too.

@scottsteinsayshi

A hyper realistic uncanny AI video agent demo via Google Beam at Google IO. And yeah it made AI images of me. Where the heck is this going to end up and will it ever aim to replace a coworker…or end up in a hotel or theme park?

♬ original sound – scottstein89

Google’s planning a larger-scale rollout later this year for Beam, its business-focused collaborative video chat technology created with HP. Beam’s central idea is connecting two people remotely as if they were sitting across a table in person. That requires two people with Beams, though. Google’s next step is imagining situations where others are more easily brought in from other devices, or there isn’t another person, period.

Watch this: Is Google’s Uncanny Virtual Human a Future Coworker or Concierge?

The custom-made AI video agent, created with in-house models Google didn’t share, actually shocked me. Like a true deepfake, the woman-like agent was photoreal (the agent didn’t have a name, I just started talking with it), and smiled and gestured and talked with me casually. This agent was there just to help me and casually chat, just like Gemini or any AI chatbot. 

I asked it to generate a photo of me doing magic tricks at a New York Jets game, and it happily served it up. It asked about prop bananas on the table and complimented my cameraperson’s backpack. It gave us map search recommendations from a contained demo experience delivered by a Google employee while I watched, as it gestured at the maps and images next to me. 

A video AI agent showing an AI generated image of Scott Stein making a football hover at a stadium

Yes, me doing magic at a New York Jets game again — it’s my Will Smith pasta AI test.

Scott Stein/CNET

It was uncanny. This video agent, even though it wasn’t in 3D (just standard 2D), was one of the realest almost-people I’ve ever chatted with before.

But do we need Beam? Many of us already have a telepresence readily at hand, in video chats on our phones. For those who want it, there’s connecting via VR headsets, or future iterations of telepresence as it might emerge on AR glasses. Microsoft just pivoted away from trying to evolve its telepresence realism in Teams. Will Google Beam make enough of a difference to businesses and institutions to convey a larger-scale sense of actual presence? And would an agentic video AI assistant make things helpful or awkward? And what jobs would an agentic video agent like this possibly replace?

CNET AI Atlas badge; click to see more

According to Andrew Nartker, the general manager for Google Beam, who guided me through demos, this video AI agent is very much an experiment. But it’s also something that could land in an office as much as a public-facing location, like a talking interactive kiosk. 

On the whimsical side, the demo made me think of theme parks. Would a 3D light field version of an AI character greet me at a food stand in some future Disney Star Wars Galaxy’s Edge expansion, serving me interplanetary food as it wrinkled its alien face? Would a theme hotel use it to create a magical experience? Or would it be used to replace actual concierges and workers in certain situations, a more advanced concept of the AI chatbot drive-through windows that already exist?

I can see all of these pathways at once. The Google Beam demo did impress me, though, and wowed me — and concerned me — more than almost anything else at this year’s Google I/O. 





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Researchers in South Korea developed a wearable system that uses seven smart rings to read finger and hand motions to translate American Sign Language and International Sign Language into text. The purpose is to make communicating easier between those who sign and nonsigners without needing a separate human interpreter. 

AI Atlas

According to the study, published Friday in the journal Science Advances, the system reliably recognized 100 ASL and ISL words during testing. It also performed well with users the system had not seen before, and it didn’t require recalibration for each person. Because the system detects words in sequence, it can produce sentence-level translations without extra training on grammar. 

ASL and ISL are the everyday languages of more than 72 million deaf and hard-of-hearing people. However, most hearing people do not know any words in these languages or have a very basic understanding. That gap makes certain tasks, like ordering at a restaurant or asking for help, much more difficult. 

A graphic shows two illustrated people talking in sign language, ASL and ISL. The graphic also shows the different components of the ring as well as pictures of hands modeling the rings.

A concept of how the rings work in the real world. 

American Association for the Advancement of Science (AAAS)

Existing sign language translator prototypes often rely on bulky gloves that can distract from or block natural hand movement or feel uncomfortable for the wearer, which limits real word adaption. Camera-based technologies can work well in controlled environments but are often limited to those places where a camera can be set up with a clear line of sight, the researchers wrote. 

To solve these problems, the researchers designed sensing rings for each finger that can capture precise motion and finger position while letting the hands move naturally. The rings can detect both signs that involve movement, like the words for “dance,” “fly” and “sun,” and signs that are held still, like “I” and “you.”

“These advances suggest that [the device could enable] barrier-free public translation systems for unseen users and unrestricted daily assistive interfaces,” the authors wrote in the study. 

The authors are affiliated with Yonsei University, Hankuk University of Foreign Studies and the Korea Institute of Science and Technology, among others. While the technology is still experimental, the authors wrote that the technology has the potential to ease communication difficulties. The underlying idea could also help improve controls for other systems, like virtual or augmented reality.

“Beyond sign language translation, the ring-type, wireless, and modular architecture of (wirelessly connected, ring-type sign language translators) may also be extended to other gesture-driven applications such as virtual or augmented reality control, touchless device interfaces, or rehabilitation monitoring systems where fine-grained hand movement tracking is essential,” they wrote.





Source link