Get all your news in one place.
100’s of premium titles.
One app.
Start reading
MusicRadar
MusicRadar
Entertainment
Will Simpson

“The largest intellectual property theft in human history”: Big tech companies accused of scraping millions of copyrighted songs to train AI models

Isolated Robot Reading Book Wearing Headphones.

It really is the great rock n’ roll swindle. Billboard has published evidence collated by the international music publishing trade association, the ICMP that some of the world’s biggest tech companies, including Google, Microsoft, Meta, Twitter/X and Open AI, have illegally scraped copy-protected music from millions of artists and songwriters to train their AI models.

Artists affected include The Beatles, Mariah Carey, The Weeknd, Beyonce and Bob Dylan – and those are only the megastars. The ICMP has compiled a dossier from open-source repositories of training content, leaked material, research papers and independent research. It says this shows “comprehensive and clear” evidence of the unlicensed use of digital music on a “global and highly extensive scale.”

The Billboard article (paywall) details plenty of evidence which should – if they were aware of the concept – make the tech titans feel thoroughly ashamed. Mark Zuckerberg’s Meta Llama 3 open-source large language model has, apparently, been trained on copy-protected music and lyrics by Lorde, Ed Sheeran, Alicia Keys and many many others.

The Open AI Jukebox app, asserts the ICMP, has been trained on a wide range of copy-protected music by artists including Elton John, Drake and Madonna. Other companies to be caught include Microsoft’s AI Copilot app and Google AI system Gemini.

ICMP Director General John Phelan has described the accumulated evidence as: “The largest IP theft in human history. That’s not hyperbole. We are seeing tens of millions of works being infringed daily.

"Within any one model training data set, you’re often talking about tens of millions of musical works often gained from individual YouTube, Spotify and GitHub URLs, which are being collated in direct breach of the rights of music publishers and their songwriter partners.”

“Despite their public claims that they’re not training upon copyright-protected works, we’ve caught many of them (tech companies) red-handed,” Phelan continued.

“We have extensive evidence of serious copyright infringement. Many of these companies are scraping the lyric datasets from the internet of millions of works and putting them into their models.

"Aside from amounting to breaches of copyright laws and often contract laws, this is often done despite the music sector’s consistent and clear statements that licences are both required and available for legal AI training and GenAI.”

Billboard contacted the tech companies mentioned in the dossier, but none commented on the allegations.

In the US, tech companies have argued that training AI systems on copyrighted material represents ‘fair use’, despite the fact that this defence isn’t a legal standard outside the US.

These arguments remain in the balance in the States - in March, a California federal judge rejected a preliminary bid from the music publishers UMPG, Concord and ABKCO to block the Amazon-backed firm Anthropic from using copyright-protected lyrics to train its systems, saying that “it is an open question” whether training generative AI models with copyrighted material is “infringement or fair use.”

Things are more clear-cut in Europe, where the EU’s AI Act provides rights holders with robust protections against fair use claims. It instructs tech companies to respect existing copyright law, irrespective of where in the world the training data was sourced or if it was acquired by a third-party ‘offshore’ company.

The ICMP evidence clearly shows, as many long suspected, that regarding AI, big tech have been doing what they always do: moving fast and breaking stuff, with little or no respect for the law. The question now for the policy makers on both sides of the Atlantic is where they stand on this struggle over copyright: are they on the side of the creative industries? Or big tech?

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.