Meta Unveils ‘Voicebox AI‘ That Can Mimic And Edit Your Voice For You
By Nicole Rodrigues, 20 Jun 2023
Meta’s latest revelation, Voicebox AI, is a new piece of technology that promises to change how we interact with our own voices, and allows users to generate content in the speech styles of your friends and loved ones.
Unlike traditional text-to-speech systems that deliver generic and robotic voices, Voicebox leverages Meta’s neural network architecture, which it claims to create realistic vocal replications. The system can capture the subtle nuances, intonations, and quirks that make each clip unique by analyzing just a few seconds’ worth of a sample from the desired individual.
Meta trained the machine on an algorithm comprising 60,000 and 50,000 hours of English and multilingual audiobooks in six languages—English, Spanish, German, Portuguese, French, and Polish—respectively. This gives users the chance to produce their desired outcomes in a range of different languages.
Voicebox aims to open up a wide range of possibilities for both personal and professional use. In personal communication, visually-impaired users can take a sample of their loved one’s voice, feed it into the system, and later have it read out texts.
It can even edit audio for professional users and content creators, such as removing background noises from a clip and regenerating the person’s voice without any awkward cuts.
Underlying all of this is the ethical implications of putting such a generator into the hands of the public. Such a machine could bring about a host of moral issues as it makes it much easier to create clips of people saying things they have not. A paper released by Meta claims that a binary classification model can determine if an audio clip is made with a generated speech or with a real person.
Voicebox is still not ready for public use, but there are demos for those interested in learning more about the model.
Recently, Apple announced similar features for its iOS 17 called ‘Personal Voice’ and ‘Live Speech’ that transform your speech into an automated voice.