Grok 4 is crushing it — Elon Musk’s AI just topped the…

Grok 4 is crushing it — Elon Musk’s AI just topped the leaderboard that matters most

When it comes to chatbots, it's easy to forget about Grok because it seems like other big tech is always in the news. With Google's Nano Banana starting new trends and OpenAI's ChatGPT hyping their latest models, Elon Musk's chatbot simply exists in the background.

I've definitely found myself rolling my eyes at some of Grok's decisions, especially when it comes to image generation. However, it's clear that there are some reasons to sit in awe of what Elon Musk calls “the smartest AI in the world.”

As someone who has spent hours testing it, the truth is, it's not just hype. From near-instant web searches to jaw-dropping results on complex engineering queries, Grok 4 is delivering in ways its predecessors and rivals haven’t quite managed. Whether you love the direction or cringe at the controversies, Grok 4 may always be the underdog that quietly crushes it.

What makes xAI's Grok different

I now think @xAI has a chance of reaching AGI with @Grok 5. Never thought that before. https://t.co/FaBUYegl3DSeptember 17, 2025

Elon Musk posted on X highlighting that Grok 4 is at the top of the ARC-AGI leaderboard. To understand why that's impressive, it's important to become familiar with how models are tracked on it.

Essentially, the ARC-AGI leaderboard is a scoreboard for AI, that not only tracks how many problems a model can solve, but also how efficiently it solves them. In other words, it's measuring both the brain and the resourcefulness of the model. High performance with low cost per task is what matters most.

So, Grok's position at the very top is extrememly significant because it means the xAI model is not only keeping up with rivals like Gemini and ChatGPT, but outpacing them on some of the toughest benchmark criteria possible.

Beating every other chatbot suggests that Grok 4 is powerful and efficient, which is exactly the type of breakthrough that supports true progress in the evolution of artifical general intelligence (AGI).

Where Grok still stumbles

Whether used on X or on the standalone platform, real-time search pulls in fresh infromation from both the web and X, so it can keep up with breaking news at a moment's notice.

However, the accuracy and bias concerns are what critics keep coming back to. Grok has made some claims that turned out false, and there are questions about how its alignment is being guided (e.g. how much Musk’s own views factor in).

The model also struggles with issues of content moderation after xAI scrambled to pull posts and update filters when anitsemitc content popped up.

The takeaway

Despite the model beating it's rivals, questions still remain like, will it stay reliable as usage increases? Will “garbage data” or bias creep back in under pressure? How well will xAI handle moderation long-term? The past controversies suggest it’s an ongoing battle.

There are no doubts that Grok is not perfect. It carries some extremely controversial baggage, but the proof of what it does better in terms of speed, real-time data and flexible thinking makes it a serious contender in the AI race.

More from Tom's Guide

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here