
Google has launched the Gemini 2.5 Computer Use model, a new system that allows AI agents to interact directly with computer interfaces, much like a human would. It is built on the company’s Gemini 2.5 Pro foundation. The specialized model combines visual understanding and reasoning skills to perform on-screen tasks like clicking buttons, filling forms, and navigating apps or websites.
The new model is available to developers via the Gemini API in Google AI Studio and Vertex AI. Google says it outperforms leading alternatives on multiple web and mobile control benchmarks, all while offering lower latency.

Traditionally, AI models connect to apps using structured APIs. But many real-world digital tasks, like logging into an account, scrolling through a webpage, or submitting a form, still require manual interface interaction. Gemini 2.5 Computer Use is said to bridge that gap by giving AI agents the ability to handle these tasks through direct UI control.
At its core, the system works through a loop. The model receives a user request, a screenshot of the current screen, and a history of recent actions. It then decides what to do next, whether that’s clicking, typing, or selecting a dropdown option, and sends those instructions to be executed. After each step, it re-analyzes the updated screen until the task is complete.
While the model is currently optimized for web browsers, Google says it also shows “strong promise” for mobile UI control, though it’s not yet tuned for desktop-level tasks.
The company has also trained the model with safety features to reduce unintentional or harmful behavior. Developers are encouraged to test thoroughly before deployment and use the provided safety tools to ensure responsible use.