As artificial intelligence rapidly evolves into autonomous systems capable of interacting with the real world and each other, researchers are sounding the alarm: existing frameworks for understanding their behavior may no longer suffice.
AI Agents Are Changing the Game
Zico Kolter, a Carnegie Mellon professor and technical adviser in AI security, is leading efforts to understand and mitigate the challenges posed by increasingly autonomous AI agents. These agents go beyond chatbots—they can act online, manipulate digital environments, and in some cases, even influence the physical world.
From Chatbots to Autonomous Operators
Traditional AI models such as chatbots pose relatively low risk in isolation. But once these models gain the ability to perform actions—like sending emails, controlling software, or interfacing with real-world systems—the stakes rise dramatically. Kolter emphasizes that once AI systems become agents with ‘end-effectors’ capable of real-world impact, the conversation shifts from theoretical to urgent.
Building Safer AI from the Ground Up
Kolter’s lab is working on training AI models that are secure by design. Unlike today’s massive 700-billion parameter models, these are smaller, more efficient systems focused on resilience. However, even these require significant computing power to train from scratch—something that academia often lacks compared to industry giants.
A new partnership with Google is helping bridge that gap, providing Carnegie Mellon with the compute resources necessary to push forward on safety research and innovation.
Risks Multiply When Agents Interact
Kolter warns that when multiple agents begin interacting—especially on behalf of different users or companies—emergent behaviors can arise. These interactions are unpredictable and can result in behaviors not intended by any single developer.
This is where traditional game theory falls short. Developed to model human decisions, it struggles to encompass the complexities of autonomous AI agents negotiating, cooperating, or even competing with each other.
Kolter advocates for a new discipline of “agent-focused game theory” to navigate this emerging landscape—a theory designed specifically for ecosystems where intelligent systems outnumber and outpace human actors.
Security Threats Are Already Emerging
Even in their infancy, agentic systems have demonstrated vulnerabilities. Improperly connected tools could allow data leaks or unauthorized access to private systems. While most current implementations still include human-in-the-loop safeguards, Kolter notes that users will eventually demand more automation and fewer security interruptions.
This makes it essential to bake in protections at the foundational level. If the underlying model can be compromised, attackers could hijack the agent’s behavior, similar to a software buffer overflow—except with far more dynamic consequences.
These concerns mirror recent advancements where AI agents were shown to be susceptible to manipulation. For example, in DeepSeek’s exploration of AI alignment and reward systems, researchers highlighted the difficulty of keeping AI behavior aligned with human values under complex conditions.
Guardrails and Human Oversight—For Now
To mitigate these risks, most agent platforms today maintain strict guardrails. For instance, OpenAI’s Operator tool requires human approval before executing sensitive tasks like sending emails on the user’s behalf. However, as adoption grows, reducing friction will become a priority—putting pressure on developers to find safer, more autonomous solutions.
Preparing for an Autonomous Future
Kolter believes we are approaching a future where AI agents will regularly interact with each other, forming networks of cooperation, competition, and influence. This will require a deep rethinking of how intelligent systems are governed and secured.
Understanding how agent ecosystems behave—and how they can be manipulated—will be critical. As with past technological revolutions, from nuclear deterrence to the internet, society will need new rules and frameworks to keep the benefits of AI from being overshadowed by its risks.
The Call for a New Game Theory
Kolter draws parallels to the historical development of game theory, which emerged from the need to understand geopolitical tensions in the 20th century. Similarly, today’s AI explosion requires a fresh theoretical foundation—one that can model not just human choices, but the autonomous decisions of machines interacting in a shared, digital world.
“We’re heading toward a future that’s not just automated,” Kolter says, “but one where intelligence is distributed across agents. And that changes everything.”