Meta has taken a significant leap in the realm of generative AI with the introduction of Emu Video and Emu Edit, two innovative projects designed to transform the way Facebook and Instagram users create and customize visual content.
These tools, derived from Meta’s Emu AI research project, are set to redefine user interaction with visual media on social platforms.
“Emu Video,” the first of the two projects, is a revolutionary text-to-video model that allows users to generate short video clips from text prompts. This model is an extension of Meta’s Emu image generation model and can process text-only, image-only, or combined text and image inputs to produce high-quality video content.
Its factorized approach not only enhances the efficiency of video generation model training but also significantly improves the quality of the generated videos. In human evaluations, the outputs from Emu Video were preferred by 96% of respondents over previous models, highlighting its superior output quality.
The model operates on a two-step process: initially generating images based on text prompts, followed by creating videos conditioned on both the text and the generated image. This factorized method of video generation, implemented through a single diffusion model, demonstrates a unified architecture capable of responding to various inputs.
Key design decisions, such as adjusting noise schedules for video diffusion and multi-stage training, enable the direct generation of higher-resolution videos.
Emu Edit, the second project, focuses on enabling users to perform customized edits on images directly within the app. This tool could have numerous applications, from personalizing photos to creating unique visual content for broader engagement.
These latest generative AI tools from Meta represent a significant advancement in the digital creative process, offering users novel ways to engage with and produce visual content.