Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Geekflare
Geekflare
Keval Vachharajani

Google Launches Stax to Take the Guesswork Out of AI Testing

Google has launched a new experimental tool called Stax. It is designed to make testing large language model (LLM)-powered applications more reliable. The feature is built by Google DeepMind and Google Labs to help developers move past the trial-and-error “vibe testing” approach that has long been a pain point in prompt engineering.

Unlike traditional software, AI models are non-deterministic; they don’t always return the same answer for the same input. That makes evaluation tricky, often requiring developers to manually compare outputs or build their own testing pipelines. Stax is designed to solve that problem by offering both human evaluation and LLM-as-a-judge methods, with support for custom testing criteria.

How Stax Works?

The Stax allows developers to either bring their own datasets by uploading test cases in a CSV or create new ones directly within the tool. For those who don’t want to start from scratch, Google has included pre-built “autoraters” that can check outputs for common criteria like coherence, factual accuracy, and conciseness. 

However, the standout feature is the ability to build custom autoraters. It allows developers to define their own rules, like ensuring a chatbot remains “helpful but not overly chatty,” making sure a summarizer never includes personal information, or even enforcing a team’s specific code style.

When it comes to the availability part, the Stax is not available in India. Besides, some X users have also pointed this out, and there’s no clarity on when the feature will arrive for everyone. Though given Google’s track record with gradual rollouts, it’s likely that Stax will expand to more regions in the coming weeks.

Google’s Other AI Updates

Stax is not the only latest AI update from the tech giant. A couple of days ago, Google also announced improvements to its image editing model, focusing on more consistent likeness preservation. It helps Gemini to maintain key details across edits, whether that’s putting a friend in a Halloween costume or trying a new hairstyle, addressing one of the most common complaints about AI image generation.

That’s all about Google for now. However, if you want to get all the latest tech and AI updates, then make sure to join us on WhatsApp. 

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.