Meta's FAIR Team Releases AI Models for Image-to-Text and Text-to-Music Generation

Meta's Fundamental AI Research (FAIR) team has recently released five new artificial intelligence (AI) research models. These models have wide-ranging applications, including image-to-text and text-to-music generation, as well as improved code completion and detection of AI-generated speech.

Chameleon Model: Images and Text Generation

One of the noteworthy models released is Chameleon, a family of mixed-modal models capable of generating both images and text.

Unlike traditional models that focus on unimodal results, Chameleon can process input consisting of both text and images and generate a combination of text and images as output. This capability opens up new possibilities, such as generating creative captions for images or using text prompts and images to create entirely new scenes.

Multi-Token Prediction Model

Meta's FAIR team has also made significant advancements in code completion models by introducing a new approach called multi-token prediction. Unlike the previous one-word-at-a-time approach, this new method trains language models to predict multiple future words simultaneously, which helps to train AI Models to predict words faster.

JASCO: AI Music Generation

The third model released by Meta's FAIR team is JASCO, which offers improved control over AI music generation. Unlike existing text-to-music models that rely solely on text inputs, JASCO can accept various inputs, including chords and beats, allowing for more versatility and creativity in generating music.

AudioSeal: Locating AI-Generated Speech

AudioSeal is a groundbreaking system that can embed watermarks in AI-generated audio clips. This technique enables the precise detection of AI-generated segments within longer audio snippets, providing a valuable tool for identifying misinformation and scams.

Diversity Enhancement in Text-to-Image Models

To ensure text-to-image models reflect the geographical and cultural diversity of the world, Meta's FAIR team has developed automatic indicators to evaluate potential geographical disparities in these models.

By conducting a large-scale annotation study and collecting extensive feedback, Meta aims to improve evaluations of text-to-image models and promote diversity in AI-generated images.

The release of geographic disparities evaluation codes and annotations will enable researchers to enhance the representation and inclusivity of their generative models.

Meta's Efforts Invested in AI Development

Meta's commitment to AI development is evident in its substantial capital expenditures on AI and the metaverse-development division Reality Labs.

With expenditures projected to reach between $35 billion and $40 billion[1] by the end of 2024, Meta aims to build various AI services and platforms, including AI assistants, augmented reality apps, and business AIs.

“We’re building a number of different AI services, from our AI assistant to augmented reality apps and glasses, to APIs [application programming interfaces] that help creators engage their communities and that fans can interact with, to business AIs that we think every business eventually on our platform will use,” highlighted by Meta CEO Mark Zuckerberg.