Intel's fancy new AI tool measures image quality in…

Intel's fancy new AI tool measures image quality in games in real time, so upscaling artifacts and visual nasties have nowhere to hide

A still from a short clip showing the difference in image quality using Intel XeSS 1.3, with a samurai walking past a bamboo wall.

Image quality perception, I've often found, varies massively from person to person. Some can't tell the difference between a game running with DLSS set to Performance and one running at Native, while others can easily ignore the blurriness of a poor TAA implementation while their peers are busy climbing the walls. Intel's new tool, however, attempts to drill down on image quality and provide a quantifiable end result to give game developers a helping hand.

The Computer Graphics Video Quality Metric (CVGM) tool aims to detect and rate distortions introduced by modern rendering techniques and aids, like neural supersampling, path tracing, and variable rate shading, in order to provide a useful evaluation result.

The Intel team took 80 short video sequences depicting a range of visual artifacts introduced by supersampling methods like DLSS, FSR, and XeSS, and various other modern rendering techniques. They then conducted a subjective study with 20 participants, each rating the perceived quality of the videos compared to a reference version.

Distortions shown in the videos include flickering, ghosting, moire patterns, fireflies, and blurry scenes. Oh, and straight up hallucinations, in which a neural model reconstructs visual data in entirely the wrong way.

I'm sure you were waiting for this part: A 3D CNN model (ie, the sort of AI model used in many traditional AI-image enhancement techniques) was then calibrated using the participants' dataset to predict image quality by comparing the reference and distorted videos. The tool then uses the model to detect and rate visual errors, and provides a global quality score along with per-pixel error maps, which highlight artifacts—and even attempts to identify how they may have occurred.

What you end up with after all those words, according to Intel, is a tool that outperforms all the other current metrics when it comes to predicting how humans will judge visual distortions. Not only does it predict how distracting a human player will find an error, but it also provides easily-interpretable maps to show exactly where it's occurring in a scene. Intel hopes it will be used to optimise quality and performance trade-offs when implementing upscalers, and provide smarter reference generation for training denoising algorithms.

"Whether you’re training neural renderers, evaluating engine updates, or testing new upscaling techniques, having a perceptual metric that aligns with human judgment is a huge advantage", says Intel.

"While [CGVQM's] current reliance on reference videos limits some applications, ongoing work aims to expand CGVQM’s reach by incorporating saliency, motion coherence, and semantic awareness, making it even more robust for real-world scenarios."

Cool. You don't have to look far on the interwebs to find people complaining about visual artifacts introduced by some of these modern image-quality-improving and frame rate-enhancing techniques (this particular sub-Reddit springs to mind). So, anything that allows devs to get a better bead on how distracting they might be seems like progress to me. The tool is now available on GitHub as a PyTorch implementation, so have at it, devs.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here