<RETURN_TO_BASE

Scraped and Blocked: Cloudflare Takes on Perplexity's AI Crawling

Cloudflare alleges Perplexity masked its crawler to scrape sites that opted out, sparking debate about AI access to web content and the future of publisher monetization

What Cloudflare found

Cloudflare published a detailed report alleging that Perplexity AI systematically accessed and scraped content from websites that had explicitly signalled they did not want AI crawlers. According to Cloudflare and independent investigators, the bot reportedly modified its user agent to mimic popular browsers like Chrome on macOS and rotated Autonomous System Numbers to avoid detection and bypass blocks set by site owners.

Cloudflare says the activity affected tens of thousands of domains and produced millions of requests every day. The company used machine learning and other network signals to fingerprint the crawler and concluded it was hiding its identity to access data from sites that had opted out via robots.txt or other blocking mechanisms.

How Perplexity responded

Perplexity disputed Cloudflare's characterization, calling parts of the blog post a promotional move and denying that the screenshots showed actual content access. The company later argued that a significant portion of the traffic Cloudflare observed could be user-driven fetching, where an AI agent retrieves content in response to a direct user request rather than covert automated crawling. Perplexity has faced similar content disputes before, and the company continues to grapple with how to define its content use policies.

Why the accusations matter

For years, robots.txt has functioned as a customary agreement between websites and bots. While it carries little legal force in most jurisdictions, major AI companies have typically respected these signals. If Perplexity or similar services routinely circumvent such blocks, it undermines site owners' expectations and the informal norms that have governed the web.

This controversy surfaces at a moment when Cloudflare has launched a Pay Per Crawl marketplace that allows publishers to charge for AI access and blocks most crawlers by default. High-profile publishers have joined, and millions of websites now explicitly disallow AI training, signaling a shift toward monetizing access rather than relying solely on advertising.

The broader debate and consequences

Reactions are split. Cloudflare positions itself as defending publishers' business models and enforcing block signals. Perplexity argues that user-initiated AI browsing should not be treated differently from a human user opening a page in a browser. Observers on social platforms are divided: some say user-requested retrieval by an AI agent is equivalent to normal browsing, while others stress the harm to publishers who depend on ad revenue and control over their content.

The bigger picture is a changing internet economy. Content monetization models are shifting, transparency and compliance are becoming essential, and AI firms face reputational and legal risks if they are found evading blocks or misusing content. Many major AI companies are increasingly pursuing licensing deals with publishers instead of relying on unconsented scraping.

Whether Perplexity is being unfairly singled out or is genuinely violating web norms, the debate marks a turning point. The era of freely harvesting web data for AI training looks to be giving way to paid access arrangements, stricter enforcement of publishers' preferences, and a more fragmented landscape for how models are trained and fed content.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский