Google DeepMind's robotics team has recently published a research paper showing how they are teaching Google's RT-2 robots to learn and adapt to their environment using Gemini AI.
Rather than relying solely on traditional programming methods, the team is using videos to train the robots, allowing them to learn in a way similar to human interns. By recording video tours of designated areas, such as homes or offices, the robots can watch and absorb information about their surroundings.
Google's Robots Navigate with Gemini AI
The model's functionality allows for verbal and visual outputs, enabling the robots to perform tasks based on their existing knowledge, and showcases the potential for robots to interact with their environment in ways that resemble human behavior.
In practical tests, there’s a vast 9,000-square-foot area to operate the Gemini-powered robots, and it turns out that the robots can successfully follow over 50 different user instructions with a 90 percent success rate.
This high level of accuracy opens up numerous real-world applications for AI-powered robots, including assisting with household chores or performing more complex tasks in the workplace.
The robots are equipped with the Gemini 1.5 Pro generative AI model, which enables a long context window. This allows the AI to multi-task and process information efficiently, enabling the robots to learn about their environment in detail.
For example, if a user asks if a specific drink is available, the robot can navigate to the refrigerator, visually assess its contents, and then provide an answer based on that information. This level of understanding and execution represents a significant advancement in the capabilities of AI-powered robots.
Despite the promising results achieved with Gemini 1.5 Pro, there are still challenges to overcome. The robots currently take between 10 to 30 seconds to process each instruction, which is slower compared to human execution in most cases.
Additionally, the complexities and unpredictability of real-world environments pose challenges for the robots' navigation abilities.
Although the Gemini-powered robots are not yet ready for mass commercialization, their potential impact across various industries is promising. Integrating AI models such as Gemini 1.5 Pro into robotics, sectors like healthcare, shipping, and janitorial duties can be transformed.