Get all your news in one place.
100’s of premium titles.
One app.
Start reading
The Guardian - AU
The Guardian - AU
World
Nick Evershed and Josh Nicholas

Social media ban trial data reveals racial bias in age checking software: just how inaccurate is it?

Hands of students holding mobile phones
People from Indigenous or south-east Asian backgrounds have to wait longer for a result when using age-estimation software, data from the age assurance technology trial shows. Photograph: Daniel de la Hoz/Getty Images

A Guardian analysis of age assurance technology trial data, which will underpin the government’s teen social media ban, shows the impact of introducing age checks will fall hardest on already-marginalised groups.

Data from the trial, published alongside the report, shows that the age estimation software tested is less accurate for people with an Indigenous or south-east Asian background. This means young people from these backgrounds are more likely to be miscategorised as over the age limit, or older people categorised as underaged when they’re not.

People from these backgrounds also had to wait significantly longer for a result when using age-estimation software, which relies on face scanning to estimate age.

The data also shows that age-verification software, which relies on scanning documents such as a driver’s licence or passport, was also unreliable for Indigenous people, though the number of people tested was below that required for statistical significance.

Bias downplayed

The report summary downplays differences in accuracy, saying “systems performed broadly consistently across demographic groups, including Indigenous populations”, and “… despite an acknowledged deficit in training age analysis systems with data about Indigenous populations, we found no substantial difference in the outcomes for First Nations and Torres Strait Islander Peoples and other multi-cultural communities using the age assurance systems”.

Later sections of the report are, however, more equivocal about differences in accuracy across demographic groups, saying there were “some known challenges in accuracy for underrepresented skin tones or facial features”.

The $6.5m age assurance technology trial, run by the UK-based Age Check Certification Scheme (ACCS), tested various types of technology that could be used by social media platforms and adult websites to keep out under-16s or under-18s, respectively, when Australia’s under-16s social media ban comes into force in December.

The three main technologies tested were age inference, age estimation and age verification. Age estimation and verification were tested by ACCS in real-world scenarios with school students around Australia, and “mystery shopper” testing. The ACCS also conducted automated testing using an image dataset.

The ACCS published some of the data behind the report, and the report itself shows results that combine both the public and non-public data. The Guardian’s analysis is limited to the public data.

The figures show the accuracy rate for Indigenous people was seven percentage points lower than for people categorised as having an “Oceania and Antarctica” background, which includes people who simply answered “Australian” in the trial:

The accuracy rate for people who had a south-east Asian background was five percentage points lower.

The report from the ACCS only looked at performance of age estimation software by skin tone using automated testing, and did not publish accuracy rates by demographic background, despite including the raw information in the public files.

The trial’s analysis of skin tone showed that age estimation systems had a higher error rate on average for people with darker skin. However, because no group had 20% more errors than the average, the report concluded: “This indicates no significant evidence of adverse impact or systemic disparity in error rates across skin tone groups enabling us to state that they are broadly consistent across demographic groups.”

Recent studies, including one by the US government and one from one of the leading age estimation companies, have shown higher age estimation error rates for women and people with darker skin tones.

The age assurance technology trial did not publish a detailed analysis of age estimation accuracy by gender. There are, however, two references in the report, with one section saying “fairness testing showed acceptable parity across age and gender”, and another saying “demographic performance disparities were still observed … [including] variability in model output by gender presentation”.

The ACCS said such an analysis was not included in the report based on guidance from the independent ethics committee, saying: “The committee advised that publishing detailed gender-based analyses could risk misinterpretation and would not directly serve the trial’s primary evaluation objectives.”

Questions raised about methods

When examining the report, the Guardian raised numerous questions about the trial methodology with the organisation that conducted the trial, Age Check Certification Scheme (ACCS).

The age verification tests conducted were on a smaller scale with the public data showing results for just 328 people. The report does not look at the accuracy for age verification software by skin tone or demographic background because the sample sizes would be far too small to be statistically valid.

However, the data shows that six of the 328 people had an Indigenous background, and the age verification software failed for three of these people. While this number is too small for meaningful statistical analysis, it is still a 50% error rate for Indigenous people.

The age assurance trial did not test age inference systems in the same way, but noted that age inference would be difficult in some situations as “some groups (e.g. new migrants, remote First Nations communities) may not have consistent participation in certain systems”.

The report also identified that “[digital] gaps persist in remote and very remote communities where digital exclusion and lack of foundational credentials continue to limit access”.

When the Guardian asked the ACCS why the report summary said “systems performed broadly consistently across demographic groups” when the public data for the age estimation tests shows the opposite, the ACCS said the basis of these claims is “an aggregation of multiple strands of evidence (school trial, mystery shoppers, lab observations, and vendor documentation)”.

The Guardian raised questions about the overall accuracy of the age verification software tested. The age verification report says the accuracy of the software when tested by the 328 “mystery shopper” trial was 97% overall.

However, the Guardian’s analysis of the public data puts the overall accuracy at 92%.

The ACCS initially said in an email to the Guardian that the 97% figure in the report was based on the same 328 sample as in the public data, but when questioned again on the accuracy of the results the ACCS said in a subsequent email that the report instead included unpublished “repeated lab tests in addition to the mystery shopper trials”.

When asked why some results have not been included in the public dataset, the ACCS said this was due to “lab tests” which were “recorded in KJR’s internal assurance system and remain outside the scope of the anonymised dataset release”. KJR is a consultancy contracted to conduct some of the age assurance testing.

In the report’s analysis of how accurate age estimation providers were for distinguishing people above or below the age of 16, the report notes that two underperforming providers were excluded from the results “due to consistent poor performance at this age gate unfairly skewing the results”. ACCS did not say why these results were removed.

Most adults will be fine

Despite the issues raised with the report, the results show that most older people are unlikely to be inconvenienced by age assurance technology.

Toby Walsh is a professor of AI at the University of New South Wales who provided independent advice on the report methodology, but was not otherwise involved in the trial or the presentation of the results.

He says that adults will be the least affected by the new requirements, because the report recommends a “cascade” of technological solutions for determining someone’s age, from age inference, which is the least invasive technique, through to age verification, which requires uploading official documents.

“One of the concerns people have is that all of us are now going to be inconvenienced by having to assure our age. But the results of the trial are that if you’re well away from the age threshold, you won’t be inconvenienced.”

Walsh says this is because many adults will be able to rely on age inference for existing social media services, where, for example, a company like Meta already has a good idea of their age. And if that doesn’t work then age estimation via face scanning will work for the vast majority of adults, as these are quite accurate for older people.

Kids unlikely to be able to rely on face-scanning

The report shows there is a lot of uncertainty in the age estimation tools. On average the difference between a person’s actual and predicted ages can be a couple of years. Age estimation is particularly hard for teenagers, experts say, as they can be in the “throes of puberty” and are “changing on a day-to-day basis”.

“[The tools] might be good enough to distinguish between a 16-year-old and a 30-year-old, but they certainly are not good enough to distinguish between a 16-year-old and a 15, 14, or indeed 17-year-old,” says Prof Tama Leaver from Curtin University.

The trial data shows false positive rates (where a person has falsely been predicted to exceed the age requirement) of between 25% and 73% for kids under 16. It is only when someone’s actual age is 18 or higher that the error rate generally falls below 5%.

Due to the difficulty estimating exact ages, the report notes that a “buffer” will probably need to be applied – requiring further verification for anyone predicted to be two to three years younger or older than 16.

This means that more than 1.3 million Australians aged 16 to 19 who are eligible for social media will likely be required to provide some other information that can prove their age if they want to create an account.

Guardian Australia contacted the minister for communications, Anika Wells, with the results of the analysis. Wells referred Guardian Australia’s questions to the eSafety Commissioner. A spokesperson for the commission said: “The Department’s independent trial run by the Age Check Certification Scheme conducted some important testing producing independent evaluation results for a range of technologies.

“Improvement of all age assurance tools, including classifiers and facial age estimation require consistent training and retraining to ensure improvement and accuracy.

“This is important when it comes to homing in on the 13-15 age range and when it comes to better identifying the broad range of ethnicities reflected in Australia.”

• Thanks to Prof Falk Scholer at RMIT for providing feedback on our analysis. Any errors remain the fault of the authors

• Nick Evershed is the data and interactives editor and Josh Nicholas is a data journalist for Guardian Australia

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.