November 19, 2024

How to Evaluate the Reliability of General-Purpose AI Models Before Deployment

Foundation models, like those powering AI tools such as ChatGPT and DALL-E, are massive deep learning systems pretrained on vast amounts of general-purpose, unlabeled data. These models are adaptable to various tasks, from generating images to answering customer queries, making them incredibly versatile.

However, there’s a catch—these models may occasionally offer incorrect or misleading information. In high-stakes situations, such as controlling a self-driving car approaching a pedestrian, these errors could have severe consequences.

Mitigating AI Errors with a New Technique

To minimize such risks, researchers from MIT and the MIT-IBM Watson AI Lab have engineered a novel technique to estimate how reliable these foundation models are before they’re applied to specific tasks.

Their approach involves utilizing a set of foundation models that share many similarities but exhibit minor differences. They then use an algorithm to measure the consistency of the representations that each model generates from the same test data point. If the representations are consistent, the model is deemed reliable for that particular task.

Superior to Existing Methods

Compared to existing state-of-the-art methods, this new technique has proven to be more effective at capturing the reliability of foundation models across various downstream classification tasks. This allows users to gauge a model’s reliability without needing a real-world dataset, which could be extremely useful when data is inaccessible due to privacy concerns, such as in healthcare settings. Furthermore, this method enables the ranking of models based on reliability scores, allowing users to select the most reliable model for their specific needs.

In fact, this work aligns with recent advancements in the field, as seen in new methods designed to help AI systems avoid overconfidence in wrong answers.

Understanding the Challenge

“All models can be wrong, but models that know when they’re wrong are far more useful,” says senior author Navid Azizan, an Assistant Professor at MIT. Foundation models, however, present a unique challenge in quantifying uncertainty because their abstract representations are difficult to compare. The technique developed by the MIT team offers a way to assess how reliable a model is for any input data, making it a valuable asset in AI deployment.

Assessing Reliability with Ensemble Models

Traditional machine-learning models are built to perform specific tasks, often giving concrete predictions. But foundation models work differently. They are pretrained on general data without knowing the exact tasks they will be used for later. This leads to abstract outputs, which makes it harder to assess their reliability.

To tackle this, the researchers employed an ensemble approach. They trained multiple models that are slightly different but share many core characteristics. By measuring the consistency of representations across these models, they can estimate how reliable the model will be in a real-world scenario.

“It’s like measuring consensus,” explains Young-Jin Park, lead author of the study. If all models provide consistent representations for a given dataset, the model is considered reliable.

The Power of Neighborhood Consistency

But how do you compare abstract representations? The researchers solved this issue using a method called neighborhood consistency. They prepared reliable reference points and examined how consistent those points were across the ensemble of models. If the neighboring points align well, this suggests the model is trustworthy.

Aligning Representations for Better Accuracy

Foundation models map data into what is known as a representation space. Think of this space as a sphere where similar data points are grouped together. For example, a model might group images of cats in one part of the sphere and dogs in another. But different models might map these animals to different locations on their own spheres.

The researchers used neighboring points as anchors to align these spheres, allowing them to evaluate the reliability of the model’s output. When tested against a variety of classification tasks, this approach was much more consistent than existing methods, even when dealing with challenging data points.

Future Directions

Despite its effectiveness, the method does have some limitations. Training an ensemble of foundation models is computationally expensive. In the future, the researchers aim to explore more efficient ways to create these ensembles, possibly by introducing small perturbations to a single model.

This novel approach represents a significant step forward in ensuring the reliability of AI models before they are deployed. As AI continues to evolve, ensuring that models can be trusted in real-world applications is becoming more critical than ever.

This work was supported by the MIT-IBM Watson AI Lab, MathWorks, and Amazon.