Amazon has officially launched Nova Act, an innovative AI model designed to empower intelligent agents capable of executing complex web-based tasks with minimal human input.
Redefining What AI Agents Can Do
While large language models (LLMs) have traditionally been used for tasks like answering questions or retrieving information through methods such as Retrieval-Augmented Generation (RAG), Amazon is setting a new standard. Nova Act is engineered to handle multi-step, real-world tasks—from organizing large events to executing intricate IT workflows.
“Our vision is to create agents that can autonomously complete multi-faceted digital and physical tasks,” Amazon stated in its official release.
Addressing Current Agent Limitations
Most existing AI agents rely heavily on API integrations and human oversight, which limits their scalability. Nova Act tackles this by enabling agents to operate directly within web browsers, significantly expanding their utility without requiring deep backend integrations.
Nova Act SDK: Automating the Web
Along with the model, Amazon rolled out a research preview of the Nova Act SDK, allowing developers to build agents that can automate tasks such as scheduling meetings, submitting out-of-office requests, or replying to emails—all within a web interface.
The SDK breaks down activities into manageable “atomic commands” like clicking buttons, filling forms, or selecting from dropdown menus. Developers can fine-tune these commands to handle specific scenarios, like avoiding insurance upsells during checkout processes.
To ensure accuracy and efficiency, the SDK integrates technologies such as Playwright for browser control, supports API calls and Python scripts, and includes features like multithreading to handle slow-loading pages.
Benchmarking Nova Act’s Capabilities
Nova Act isn’t just another generative model—it’s built for performance and reliability. In internal Amazon benchmarks, Nova Act achieved over 90% accuracy on tasks that typically challenge other models.
It scored an impressive 0.939 on the ScreenSpot Web Text benchmark, outperforming competitors like Claude 3.7 Sonnet and OpenAI’s CUA. It also achieved 0.879 on the ScreenSpot Web Icon benchmark, which evaluates how well a model interacts with visual elements on a page. Although it slightly trailed behind in the GroundUI Web test, Amazon sees this as a growth area for future iterations.
Practical Use Cases and Flexibility
Nova Act stands out for its ability to adapt to new environments with minimal retraining. For example, it has shown success in browser-based games, despite not being specifically trained for gaming scenarios.
This adaptability makes it ideal for a wide range of applications, including integration into Amazon’s own Alexa+ ecosystem. Here, Nova Act enables autonomous web navigation even in the absence of complete API access—bringing us closer to smarter, more self-reliant virtual assistants.
Vision for the Future of AI Agents
Amazon sees Nova Act as the first step in a larger roadmap to develop scalable and intelligent AI agents. These agents will eventually handle more complex tasks through reinforcement learning across diverse, real-world environments—not just demo-based scenarios.
“The most impactful use cases for AI agents haven’t been built yet,” Amazon noted. “Our SDK is designed to empower forward-thinking developers to explore these opportunities through rapid prototyping and iterative development.”
For those interested in the broader evolution of AI-powered autonomous agents, this initiative shares key principles with Microsoft’s recent announcement at Hannover Messe 2025, where they revealed how AI agents are transforming factory automation—read more about it here.
Conclusion: Empowering the Next Generation of Developers
Nova Act represents a significant leap forward in the development of AI agents that can handle real-world tasks with minimal oversight. By emphasizing reliability, adaptability, and scalability, Amazon is paving the way for a new era of digital automation—one where AI doesn’t just answer questions, but takes action.
Developers now have the tools to push the boundaries of what’s possible with AI-powered automation, unlocking new efficiencies and experiences across industries.