June 3, 2025

How Themis AI Is Making Machine Learning Models Aware of Their Own Limits

Artificial intelligence has become a powerful tool, offering answers to almost any query through systems like ChatGPT. But here’s the catch: these models often sound confident even when they’re not — and that can be dangerous in high-stakes industries like healthcare, autonomous driving, and critical infrastructure.

Solving AI’s Confidence Problem

Enter Themis AI, an MIT spinout on a mission to make AI systems more trustworthy by helping them recognize when they don’t know something. Their platform, Capsa, is designed to wrap around any machine learning model and detect unreliable or ambiguous outputs — in real time.

“The idea is to wrap the model with Capsa, identify uncertainty and failure modes, and then enhance the model’s performance,” explains Daniela Rus, co-founder of Themis AI and director of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “This gives teams peace of mind that their model is functioning as intended.”

Built for High-Stakes AI Applications

Themis AI was launched in 2021 by Rus alongside Alexander Amini and Elaheh Ahmadi, former researchers at MIT. Since then, they’ve collaborated with telecom providers on network automation, helped oil and gas firms interpret seismic data, and advanced chatbot reliability through academic research.

“We’ve all seen AI hallucinate or make mistakes,” notes Amini. “As AI becomes more integrated into critical systems, those mistakes become unacceptable. Our software helps make these systems more transparent and accountable.”

Helping Models Recognize Gaps in Knowledge

The foundation of Themis AI’s work lies in years of research on model uncertainty. In a 2018 project backed by Toyota, Rus’s lab aimed to improve reliability in self-driving car systems — a context where a single misjudgment can be fatal.

Later research led to the development of algorithms that could identify and correct biases in facial recognition systems. The algorithm reweighted training data to eliminate racial and gender biases, demonstrating how model introspection can lead to more equitable AI outcomes.

In 2021, the same approach was applied to drug development, helping pharmaceutical companies predict the properties of drug candidates more reliably. That experience became the catalyst for launching Themis AI.

Bringing Confidence to AI Outputs

Today, Themis AI is partnering with companies across industries — many of which are building their own large language models (LLMs). Capsa allows these models to analyze their outputs and report their confidence levels, helping flag potentially unreliable results before they’re acted upon.

“Organizations want to use LLMs trained on their own data,” says Stewart Jamieson, Themis AI’s Head of Technology. “But they’re wary of hallucinations or errors. We enable these models to self-assess their reliability, improving response quality and user trust.”

This capability aligns with emerging trends in edge computing, where lightweight AI models operate outside of cloud environments — on mobile devices or embedded systems. Paired with Capsa, these models can maintain efficiency while knowing when to escalate complex tasks to more powerful servers. This complements innovations like Google’s Gemma 3n, which also prioritizes powerful AI on mobile devices.

Accelerating Drug Discovery with Explainable AI

Pharmaceutical companies are leveraging Capsa to improve how AI predicts the performance of drug candidates in clinical trials. These predictions are typically difficult to interpret, but Capsa provides instant insights into whether a model’s output is grounded in its training data or simply speculative.

“This can streamline the identification of strong candidates and reduce costly bottlenecks in drug development,” Amini explains. “It’s about making AI more helpful, more explainable, and ultimately more impactful for society.”

Shaping the Future of AI Reliability

Themis AI continues to push the boundaries of what’s possible, especially in complex reasoning tasks. Their team is currently exploring how Capsa can improve the performance of chain-of-thought reasoning — a method used by LLMs to break down their logic step-by-step.

“We believe Capsa can guide models to choose the most confident reasoning paths,” Amini says. “That reduces latency, improves accuracy, and lowers computational costs — a high-impact goal with wide-reaching applications.”

From Research to Real-World Impact

For Rus, Themis AI represents more than a startup — it’s a continuation of her lab’s mission to build AI that’s both powerful and safe.

“AI is transforming every industry, but it comes with real risks,” she says. “What excites me is building the technical guardrails that allow people to trust the technology they use every day.”