Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Alex Hughes

Perplexity accused of scraping websites even when told not to — here's their response

Perplexity on iPhone.

Perplexity is riding high in the AI world right now. After launching the company’s Comet browser, leading the way in agentic browsing, they’ve ran into some controversy.

Cloudflare, in an online blog, published research that showed Perplexity has been crawling and scraping content from websites that explicitly stated they don’t want to be scraped.

The research accuses Perplexity of obscuring its identity when trying to scrape web pages, stating that they had received complaints from customers who had both disallowed Perplexity from analysing their files and created rules to specifically block Perplexity from doing this.

Cloudflare performed its own tests to confirm this, creating brand new domains and then querying Perplexity with questions about these specific domains. Perplexity was able to answer queries on these pages, even though Cloudflare had stated it didn’t want these websites to be analyzed.

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

Perplexity and lots of other AI tools require large amounts of information to work. They analyse the internet, looking at forums, web pages, and other online sources of information to work.

However, there is more and more backlash to this approach and an expectation for transparency from AI companies on how they gather data. Some of Perplexity’s competitors, like Claude and ChatGPT are offering ways to opt out of data gathering, and it is likely we’ll see more rules as time goes on.

(Image credit: Perplexity AI)

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” says Cloudflare’s post.

Jesse Dwyer, a spokesperson for Perplexity, accused Cloudflare’s blog of being a sales pitch for the company in an email to TechCrunch on the subject.

She went on to say that the screenshots in the blog “show that no content was accessed” and that the bot named in the Cloudflare blog “isn’t even ours”.

Cloudflare is now taking a strong stance on AI crawlers, including Perplexity. The company has claimed that AI is breaking the business model of the internet and wants to help fight back.

While Perplexity has denied this incident, the company has been in hot water before for similar problems, being accused of stealing news sites' content and struggling to define plagiarism.

More from Tom's Guide

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.