Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

Fortune

Jonathan Vanian

The quest to measure A.I.

Nvidia Google

The quest to measure progress in A.I. (Credit: Getty)

Artificial intelligence is so new that researchers are just now figuring out how companies can best evaluate it and the technologies like computer chips that A.I. relies on.

Semiconductor vendors, for instance, may claim that their computer chips are better than others for powering data training, the important process that “teaches” machine-learning systems to recognize objects in photos. But it’s difficult for companies to assess whether that's true without independent auditors.

Several efforts are underway, however, to help companies and researchers evaluate A.I.’s performance across different tasks, like data training. One such endeavor is known as MLPerf, a set of software tools and computing methods that help monitor A.I.’s progress using benchmark tests.

The most recent MLPerf benchmarking test analyzed how different A.I. chips from companies like Nvidia and Google performed at tasks including training a machine learning model to recognize images in photos. The results are highly technical, but they should help with deciding which A.I. chips are best for data training.

Providing more transparency into how different A.I. chips perform at specific tasks is part of the overall goal of the non-profit consortium MLCommons, which oversees MLPerf. The organization was founded in 2018 by corporate and academic players including Google, Intel, AMD, and Harvard University.

David Kanter, executive director at MLCommons, told Fortune that “benchmarks and metrics are about really defining what ‘better’ means.” Because A.I. is so new, there’s no agreed-upon standard to measure the technology like more conventional ones. Kanter hopes that MLCommons can act as a sort-of Switzerland for the A.I. industry.

Because A.I. depends on data to function correctly, MLCommons is also expanding its mission to create datasets for testing A.I. software. One project the group is working on involves the curation of over 87,000 hours of transcribed speech across several different languages, which Kanter hopes will help researchers create more advanced systems for understanding more languages than just English, the dominant language in the A.I. realm.

“Let's take a pretty popular language like Portuguese,” Kanter said. “There's like 300 million people who speak that language, and there's not much out there,” he said.

Kanter hopes that MLCommons’s upcoming language dataset, which is to be publicly released later this year, becomes as popular as the ImageNet dataset, which contained 14 million photos that humans annotated with descriptions. The ImageNet dataset, overseen by A.I. luminaries like Stanford University’s Fei Fei Li, helped spur the modern-day deep learning renaissance, in which researchers were able to create A.I. systems that can spot dogs in photos, among other tasks.

If others use the MLCommons-curated language dataset to power their own A.I. tech, MLCommons will be better able to evaluate how those A.I. systems perform, explained MLCommons president Peter Mattson. One problem in evaluating current A.I. systems trained to recognize language is that it’s unclear what data those A.I. systems were trained with.

It’s important that the “data you use to build the system, and the data used to evaluate the system are drawn from the same source,” Mattson said.

Jonathan Vanian
@JonathanVanian
jonathan.vanian@fortune.com

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here