Meta Platforms (NASDAQ:META) on Thursday introduced two new AI-based features for video editing called Emu Video and Emu Edit, which can carry out tasks based on text instructions.
The company noted that the technology from Emu, its first foundational model for image generation, underpins many of its generative AI experiences, such as AI image editing tools for Instagram that lets users take a photo and change its visual style or background, among other things.
The tech giant said that Emu Video, which uses the Emu model, is a simple method for text-to-video generation based on diffusion models. It can take inputs of text only, image only, and both text and image.
The company has split the process into two steps — generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image.
Meta added that unlike prior work which requires a deep cascade of models (such as five models for Make-A-Video), the new approach is simple to implement and uses only two diffusion models to generate 512×512 four-second long videos at 16 frames per second.
Meta also introduced Emu Edit, which can do free-form editing via instructions, including tasks such as local and global editing, removing and adding background, color and geometry transformations, detection and segmentation.
The company noted that the main goal should not just be about producing a ‘believable’ image, But a model should focus on precisely altering only the pixels relevant to the edit request.
Unlike many generative AI models today, Emu Edit precisely follows instructions, making sure that pixels in the input image unrelated to the instructions remain untouched, according to the company.
The company said that to train the model it has developed a dataset which has 10 million synthesized samples, each including an input image, a description of the task to be performed, and the targeted output image. Meta believes it is the largest dataset of its kind to date.
Meta added that although the work is purely fundamental research right now, the potential use cases could include, generating one’s own animated stickers or GIFs for sending in chat, editing own photos with no requirement of technical skills, improving an Instagram post by animating static photos, or generating something entirely new.
Meta already has several large language models, or LLMs, such as AudioCraft, SeamlessM4T, and Llama 2. Generative AI services have taken the world by storm since the launch of Microsoft (MSFT)-backed OpenAI’s ChatGPT last year.
Alibaba’s (BABA) Tongyi Qianwen 2.0 and Tongyi Wanxiang, Baidu’s (BIDU) Ernie Bot, OpenAI’s text-to-image tool DALL·E 3, Alphabet (GOOG) (GOOGL) unit Google’ Bard, Samsung’s (OTCPK:SSNLF) Gauss, and Getty Images’ (GETY) model called Generative AI by Getty Images, are some of the LLMs, among the many, being developed by companies worldwide.