Nvidia’s ISP piracy defense backfires as judge refuses…

Nvidia’s ISP piracy defense backfires as judge refuses to dismiss copyright lawsuit over more than 197,000 pirated books — scripts in NeMo Framework allegedly ‘have no other purpose’ than to speed up infringement

U.S. District Judge Jon Tigar has denied Nvidia's request to dismiss a copyright infringement case filed against it, arguing that it’s not liable for how clients use its AI-powered NeMo Megatron Framework. According to TorrentFreak, Nvidia is asking the court to dismiss the direct copyright infringement claims that are connected to its use of the Bibliotik eBook torrent tracker, the Books3 dataset, and 'The Pile' dataset for language modeling. Nvidia then cited the Cox vs. Sony ruling, where the U.S. Supreme Court ruled that a service provider is not liable for any piracy that its users might carry out.

Go deeper with TH Premium: AI and data centers

Nvidia said that its NeMo Megatron Framework has significant “non-infringing uses” and that it did not promote it as a piracy tool. This should fall under Justice Clarence Thomas’ decision saying, “Under our precedents, a company is not liable as a copyright infringer for merely providing a service to the general public with knowledge that it will used by some to infringe copyrights.” Unfortunately for the company, Judge Tigar disagreed with its argument, saying that it’s not the framework, but specific scripts within it that violated copyright rules.

He said that these were intended to make it easier for users to automatically download and preprocess The Pile dataset, which the complainants said allegedly contained copyrighted work. “The scripts are alleged to have no other purpose than to speed up the process of infringement, unlike the digital video recorder systems at issue in Sony Corp. or the internet service provided in Cox,” Judge Tigar wrote. Bibliotik is a private eBook torrent tracker, which allegedly contains over 197,000 books. It was then included in the Books3 dataset, which itself was included in the 800+ gigabyte The Pile dataset. The Pile was then used for training Nvidia’s AI LLMs, resulting in several authors filing a class action lawsuit against the company for copyright infringement.

There have been previous cases of copyright infringement related to AI companies scraping data for training their models. Aside from this case against Nvidia, Meta has also been facing a similar lawsuit since last year. It even defended itself by saying that using pirated material is legal if you don’t seed content. Google has even been pushing to have AI scraping tagged as fair use, saying that it wants “copyright systems that enable appropriate and fair use of copyrighted content to enable the training of AI models in Australia on a broad and diverse range of data while supporting workable opt-outs for entities that prefer their data not to be trained in using AI systems.”

With this decision, the authors’ class action against Nvidia is set to move forward, and we will likely hear more details as the case progresses. We don’t have a date yet for when the next hearing will be, though. Still, we expect this to be a multi-year battle as the AI giant battles it out with allegedly infringed writers.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here