Google Introduces ‘Vlogger AI’ That Turns Your Avatar Into A Surrogate YouTuber
By Mikelle Leow, 18 Mar 2024
Screenshot via Corona et al
Google’s researchers have cracked open a window for those who have toyed with becoming social media influencers but are, incidentally, camera-shy. The new, aptly-named VLOGGER AI whips up lifelike talking videos with little more than a selfie and your voice—perfect for users who struggle with talking in front of the lens or even for high-performing YouTubers with the desire to generate content quickly.
You provide a headshot and a voice clip, and VLOGGER gets to work, crafting a video where the person in the photo appears to be speaking the words from the audio. This animation process includes syncing lip movements and adding natural body language, aiming for a result that feels real, not robotic.
Video via Corona et al
The team is still workshopping the technology, so it won’t yet be available in everyone’s hands. Still, VLOGGER is showing promise to make digital avatar creation less of a hassle. Compared to current methods that require more data and processing power, VLOGGER is designed to be more efficient, potentially making it easier for everyone to create animated avatars that look and move in a lifelike manner.
This software uses sophisticated diffusion architecture, which is also found in other cutting-edge tools for generating images, videos, and 3D models. The technology allows VLOGGER to animate avatars with detailed gestures such as head tilts, eye movements, and even facial expressions.
Video via Corona et al
Google trained the AI with an extensively labeled dataset named MENTOR, packed with 800,000 videos of people talking. Thanks to the breadth of inputs, VLOGGER can supposedly predict how a person naturally moves while speaking, using just a still image as a starting point and the audio clip as a guide.
VLOGGER’s potential uses are wide-ranging, from creating video translations that match the speaker’s original movements to generating animated avatars for various applications. It could also pave the way for more efficient video communications, especially in scenarios where bandwidth is limited.
However, this automated YouTuber’s surrogate is not without its challenges. The realism of its animations can sometimes venture into the uncanny valley, and there are limitations in handling large movements or diverse settings. Plus, the ethical implications of such technology, including the risk of creating deepfakes, are a concern that needs addressing.
[via Tom’s Guide and The Register, videos and cover screenshot via Corona et al]