Gemini Robotics 1.5 Unleashes Smart Robots That Perceive, Plan and Act

Gemini Robotics 1.5 Unleashes Smart Robots That Perceive, Plan and Act

Google DeepMind has taken a giant leap forward in robotics with the release of Gemini Robotics 1.5, a suite of advanced AI models that bring intelligent agents into the physical world. These models use vision, language, and reasoning to help robots perform complex tasks with precision and adaptability.

Introducing the Next Generation of AI-Powered Robots

Gemini Robotics 1.5 is designed to empower robots with the ability to understand their surroundings, think through multi-step processes, and execute tasks with remarkable fluency. This includes sorting items, creating plans, and adapting to new environments—all using natural language and visual cues.

The system includes two core components:

  • Gemini Robotics 1.5: A vision-language-action model that translates instructions and visual data into motor commands. It enables robots to think before acting, improving decision-making and transparency.
  • Gemini Robotics-ER 1.5: A high-level reasoning agent capable of advanced planning, spatial reasoning, tool usage, and mission execution. It integrates seamlessly with digital tools like Google Search to enhance problem-solving in real-world scenarios.

Smarter Task Execution Through Agentic Thinking

Unlike traditional systems that directly map commands to actions, Gemini Robotics 1.5 introduces an agentic framework—robots can now internally reason in natural language before executing tasks. For example, when asked to sort laundry by color, the model breaks down the task, reasons through each step, and performs actions accordingly, explaining its logic throughout the process.

Cross-Embodiment Learning: One Model, Many Robots

One of the standout capabilities of Gemini Robotics 1.5 is its ability to learn across different robot embodiments. Whether it’s a humanoid like Apollo, a dual-arm robot like Franka, or a smaller robotic system, the model transfers skills without additional tuning. This vastly accelerates the development of general-purpose robotic assistants.

State-of-the-Art Performance in Embodied Reasoning

Gemini Robotics-ER 1.5 has demonstrated leading performance across 15 academic benchmarks, including ERQA, Point-Bench, and RoboSpatial-VQA. It excels at tasks like object detection, segmentation, trajectory planning, and task success estimation.

This aligns with DeepMind’s broader mission to create AI that can operate safely and responsibly across all real-world environments.

Safety and Transparency at the Core

To ensure safe deployment, both models were developed in close coordination with DeepMind’s Responsibility & Safety Council. They incorporate safety-first reasoning, collision avoidance subsystems, and alignment with Gemini’s safety policies. The updated ASIMOV benchmark is also used to rigorously test semantic and physical safety.

Building the Future of Physical AI Agents

This release marks a pivotal step toward achieving Artificial General Intelligence (AGI) in the physical realm. By combining perception, cognition, and action, Gemini Robotics 1.5 enables robots to function autonomously and adaptively in human-centric environments.

Developers can now access Gemini Robotics-ER 1.5 through the Google AI Studio, while Gemini Robotics 1.5 is being rolled out to select partners. These tools are set to redefine how machines interact with the world—and with us.

For more on how Gemini’s capabilities are evolving, check out our related article on Gemini 2.5 Deep Think’s success at the world’s top programming competition.

On Key

Related Posts

stay in the loop

Get the latest AI news, learnings, and events in your inbox!