Perplexity accused of scraping websites even when told…

Perplexity accused of scraping websites even when told not to — here's their response

Perplexity is riding high in the AI world right now. After launching the company’s Comet browser, leading the way in agentic browsing, they’ve ran into some controversy.

Cloudflare, in an online blog, published research that showed Perplexity has been crawling and scraping content from websites that explicitly stated they don’t want to be scraped.

The research accuses Perplexity of obscuring its identity when trying to scrape web pages, stating that they had received complaints from customers who had both disallowed Perplexity from analysing their files and created rules to specifically block Perplexity from doing this.

Cloudflare performed its own tests to confirm this, creating brand new domains and then querying Perplexity with questions about these specific domains. Perplexity was able to answer queries on these pages, even though Cloudflare had stated it didn’t want these websites to be analyzed.

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

Perplexity and lots of other AI tools require large amounts of information to work. They analyse the internet, looking at forums, web pages, and other online sources of information to work.

However, there is more and more backlash to this approach and an expectation for transparency from AI companies on how they gather data. Some of Perplexity’s competitors, like Claude and ChatGPT are offering ways to opt out of data gathering, and it is likely we’ll see more rules as time goes on.

How Perplexity is able to get around these rules is complicated. It appears that Perplexity is changing its bots “user agent”. In other words, it is pretending to not be a large AI model but just a normal visitor.

“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” says Cloudflare’s post.

Jesse Dwyer, a spokesperson for Perplexity, accused Cloudflare’s blog of being a sales pitch for the company in an email to TechCrunch on the subject.

She went on to say that the screenshots in the blog “show that no content was accessed” and that the bot named in the Cloudflare blog “isn’t even ours”.

Cloudflare is now taking a strong stance on AI crawlers, including Perplexity. The company has claimed that AI is breaking the business model of the internet and wants to help fight back.

While Perplexity has denied this incident, the company has been in hot water before for similar problems, being accused of stealing news sites' content and struggling to define plagiarism.

More from Tom's Guide

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here