Microsoft launches AI agent to outperform OpenAI's…

Microsoft launches AI agent to outperform OpenAI's GPT-4o — running locally on your PC with built-in human intervention triggers

The Microsoft logo is being displayed on a smart phone, with the OpenAI logo visible on the screen in the background.

Enjoy our content? Make sure to set Windows Central as a preferred source in Google Search, and find out why you should so that you can stay up-to-date on the latest news, reviews, features, and more.

With rapid advances in generative AI, the world is moving beyond simple text-based models. Top AI labs like OpenAI are scaling greater heights in the ever-evolving sector, venturing into the agentic AI era with tools like Operator and Deep Research.

The former is designed to help control computers and perform tasks autonomously, while the latter helps users conduct multi-step research on the internet.

As you may know, Microsoft and OpenAI renewed their "vows", making critical changes to their multi-billion-dollar partnership. Some of the adjustments include a cap that prevents OpenAI from prematurely declaring artificial general intelligence (AGI). The coveted feat can only be determined by an independent expert panel. Additionally, Microsoft is also at liberty to chase AGI independently or in collaboration with third parties.

Microsoft has been making waves in the AI space ever since it signed the new definitive agreement with OpenAI, including forming a special team dubbed MAI Superintelligence to help bolster its advances. More recently, the company introduced Fara-7B, a Computer Use Agent with the capability of undertaking complex tasks on behalf of users directly on their devices (via VentureBeat).

This 7-billion-parameter model runs on relatively small systems with lower latency and better privacy. What's more, the model doesn't depend on cloud-based models, making it more reliable, private, and secure.

However, it's worth noting that the model is still in the experimental phase and has not yet shipped to broad availability. Fara-7B's size lets users run the model locally on their devices, making it easy to automate sensitive workflows, including handling sensitive company data.

The model essentially mimics how humans interact and navigate user interfaces. It interprets the web through screenshots and predicts specific coordinates for actions like clicking, typing, and scrolling, similar to using a mouse and keyboard.

Rather than relying on browser code to describe web pages to screen readers, it leverages pixel-level visual data. As such, the model can interact with websites even when the code is difficult to interpret.

While speaking to VentureBeat, Microsoft Research's Senior PM Yash Lara indicated:

Processing all visual input on-device creates true "pixel sovereignty," since screenshots and the reasoning needed for automation remain on the user’s device. This approach helps organizations meet strict requirements in regulated sectors, including HIPAA and GLBA.

Yash Lara, Microsoft

More interestingly, Fara-7B outperformed sophisticated proprietary models such as OpenAI’s GPT-4o in a standard benchmark for web agents, scoring 73.5% compared to GPT-4o’s 65.1%.

To mitigate the risks of allowing agent AI models to manage complex and sensitive tasks, Microsoft announced that Fara-7B was trained to identify Critical Points. In these situations, the model pauses and seeks human intervention and approval before proceeding, especially when actions involve personal data or require explicit consent.

Follow Windows Central on Google News to keep our latest news, insights, and features at the top of your feeds!

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here