Meta Touts New AI Model As ‘State-Of-The-Art’ Version Of Text & Image Generators

By Alexa Heah, 18 Jul 2023

Meta recently unveiled CM3leon, an artificial intelligence (AI) model designed for text-to-image generation that it claims is “state-of-the-art.”

This new model follows a recipe that includes retrieval-augmented pre-training and supervised fine-tuning stages, resulting in improved performance while maintaining lower computational requirements compared to previous transformer-based methods.

Images generated based on these prompts: (1) A small cactus wearing a straw hat and neon sunglasses in the Sahara desert. (2) A close-up photo of a human hand, hand model. High quality. (3) A raccoon main character in an Anime preparing for an epic battle with a samurai sword. Battle stance. Fantasy, Illustration. (4) A stop sign in a Fantasy style with the text “1991.” Image via Meta

CM3leon, aptly named after a causal masked mixed-modal (CM3) model, is said to represent a significant leap forward in text and image generation capabilities. Unlike previous models that focused solely on text-to-image or image-to-text generation, CM3leon excels at both.

By leveraging large-scale multitask instruction tuning, Meta says it has made strides in image caption generation, visual question answering, text-based editing, and conditional image generation.

Example of text-guided image editing. Image via Meta

The generator works by making use of scaling recipes that were originally developed for text-only models. These recipes have been seamlessly applied to tokenization-based image generation models, resulting in enhanced performance and coherence in image generation.

Compared to other generators in the market, CM3leon purportedly produces more coherent and visually aligned imagery based on input prompts.

Example of segmentation-to-image generation. Image via Meta

Key highlights include text-guided image generation and editing, which allows the algorithm to generate accurate images when the prompt is complex or comprises multiple constraints. Next, it can take a given image and a text prompt to edit the existing picture according to the users’ instructions.

In addition, CM3leon can follow along a slew of varying prompts to generate captions and answer questions about an image. Not to mention, when given a text description of the bounding box segmentation of the image, the model is able to generate an image just from that.

While all of the generated images are available in raw outputs from the CM3leon model, a separately trained super-resolution stage can also be added to produce higher-resolution images from the original model outputs, ensuring users get the best quality possible.

Head here to read the research paper behind CM3leon and its impressive abilities.

[via TechCrunch and The Statesman, images via Meta]

This article was crafted with assistance from an AI engine, and has been reviewed, edited, and fact-checked.

Advertise here

Also check out these recent news

Humor

Meta Touts New AI Model As ‘State-Of-The-Art’ Version Of Text & Image Generators

Loewe Indulges Recent Tomato Meme By Reimagining It As A Real-Life Purse

DuckDuckGo Now Lets You Stay Anonymous While Talking To AI Chatbots

A LEGO Lamborghini Countach Steers Right Into Your Home

Hublot x Daniel Arsham Join Hands To Craft Rare ‘Droplet’ Pocket Watch

Leica Launches ‘LUX’ Mobile App That Puts An Old-School Camera Into Your Pocket