Meta Touts New AI Model As ‘State-Of-The-Art’ Version Of Text & Image Generators
By Alexa Heah, 18 Jul 2023
Meta recently unveiled CM3leon, an artificial intelligence (AI) model designed for text-to-image generation that it claims is “state-of-the-art.”
This new model follows a recipe that includes retrieval-augmented pre-training and supervised fine-tuning stages, resulting in improved performance while maintaining lower computational requirements compared to previous transformer-based methods.
CM3leon, aptly named after a causal masked mixed-modal (CM3) model, is said to represent a significant leap forward in text and image generation capabilities. Unlike previous models that focused solely on text-to-image or image-to-text generation, CM3leon excels at both.
By leveraging large-scale multitask instruction tuning, Meta says it has made strides in image caption generation, visual question answering, text-based editing, and conditional image generation.
The generator works by making use of scaling recipes that were originally developed for text-only models. These recipes have been seamlessly applied to tokenization-based image generation models, resulting in enhanced performance and coherence in image generation.
Compared to other generators in the market, CM3leon purportedly produces more coherent and visually aligned imagery based on input prompts.
Key highlights include text-guided image generation and editing, which allows the algorithm to generate accurate images when the prompt is complex or comprises multiple constraints. Next, it can take a given image and a text prompt to edit the existing picture according to the users’ instructions.
In addition, CM3leon can follow along a slew of varying prompts to generate captions and answer questions about an image. Not to mention, when given a text description of the bounding box segmentation of the image, the model is able to generate an image just from that.
While all of the generated images are available in raw outputs from the CM3leon model, a separately trained super-resolution stage can also be added to produce higher-resolution images from the original model outputs, ensuring users get the best quality possible.
Head here to read the research paper behind CM3leon and its impressive abilities.
[via TechCrunch and The Statesman, images via Meta]
This article was crafted with assistance from an AI engine, and has been reviewed, edited, and fact-checked.