Meta Debuts Open-Source ‘ImageBind’ AI That Learns To Mimic Human Perception

By Alexa Heah, 10 May 2023

Image ID 251724386 © via Daniel Constante | Dreamstime.com

Last month, Meta released an open-source artificial intelligence (AI) tool that helped creatives turn doodles into animation at a click of a button. Now, the technology giant is making yet another system open to the public, dubbed ‘ImageBind’.

Unlike common image generators that create images from a series of words, the software allows users to link text, images, videos, audio clips, 3D measurements, temperature data, and motion data. More impressively, it offers all of these options without having to train on every possibility.

ImageBind works by predicting the connections between different sets of data groups, similar to how humans perceive the environment around them. Instead of just conjuring up an image, the model can attach sounds, temperatures, or even precise locations relevant to the scene.

Through this, AI moves closer to human comprehension, mimicking the way the brain works to absorb various stimulants in the surroundings, such as the sights, sounds, and other sensory experiences that come as second nature to us.

“For example, a creator could couple an image with an alarm clock and a rooster crowing, and use a crowing audio prompt to segment the rooster or the sound of an alarm to segment the block and animate both into a video sequence,” Meta explained.

According to graphs provided by the company, it seems ImageBind far outperforms other algorithms in both audio accuracy and depth accuracy, showing that it’s possible to create a “joint embedding space” across multiple modalities without having to train the AI on every single one.

“This is important because it’s not feasible for researchers to create datasets with samples that contain, for example, audio data and thermal data from a busy city street, or depth data and a text description of a seaside cliff,” the blog post continued.

Soon, Meta envisions the technology will expand beyond the current “six senses” to include the likes of touch, speech, smell, and even brain fMRI signals that will give a “holistic” approach to machine learning and future mixed-reality gadgets.

Check out the code on GitHub.

[via Engadget and Screen Rant, images via various sources]

Advertise here

Also check out these recent news

Retro

Meta Debuts Open-Source ‘ImageBind’ AI That Learns To Mimic Human Perception

7-Eleven Brings Nostalgic ‘Tetris’ To Playable Slurpee Cups

Puppy Yoga Banned In Italy As Animal Lovers Maintain Downward-Facing Posture

Crayola Reconnects Adults To Their Early Masterpieces From 40 Years Ago

Patagonia Is Designing Wetsuits That Can Be Recycled & Remade Forever And Ever

Panera Takes A Dip Into Elegance With ‘Bread Head’ Hat