Tech and AIOne company's devious plan to stop AI web scrapers...

One company’s devious plan to stop AI web scrapers from stealing your content

-


AI is stealing your content. We know this is how AI companies have built their highly-valued businesses – by scraping the web and using your data to train their chatbots.

Web scraping isn’t new. In the past, websites could rely on simple protocols like robots.txt to define what could, and could not, be used by web crawlers. Those guidelines were respected by the companies doing the scraping to, say, build results for search engines. AI companies, however, are not abiding by this social contract and are ignoring those instructions.

Cloudflare, a global network service that helps some of the biggest websites in the world deliver content to users, has devised a new plan to deal with AI companies’ web scrapers. And the idea is as positively devious as it is ingenious. 

In a new blog post, Cloudflare has shared how it’s now “trapping misbehaving bots in an AI labyrinth.” Basically, bots that don’t follow the rules laid out for them via protocols such as robots.txt, a simple text file that lays out what web crawlers are allowed to do on a site, will be messed with in order to waste the time and resources of the company in charge of the bot.

“AI-generated content has exploded…at the same time, we’ve also seen an explosion of new crawlers used by AI companies to scrape data for model training,” Cloudflare said in its post. “AI Crawlers generate more than 50 billion requests to the Cloudflare network every day, or just under 1% of all web requests we see.”

Mashable Light Speed

Cloudflare says it previously just blocked AI web crawlers and scrapers. However, doing so alerted those behind the bots that their access had been denied, and as a result they would shift strategies in order to continue their scraping campaigns.

So, Cloudflare came up with an idea to build a honeypot: a series of fake webpages created with AI-generated content.

The fact that Cloudflare is utilizing AI-generated content to fight AI web scrapers isn’t just for schadenfreude. When AI trains off of AI-generated content, it actually degrades the AI model itself. The industry even has a term for it: “model collapse.” Cloudflare is essentially making sure that bots that break the rules are punished for doing so.

Cloudflare’s post gets into the technical details of building the AI labyrinth. But, the main gist of it is that Cloudflare devised things in a way where a human visitor shouldn’t ever see these AI-generated honeypot pages. In addition, humans would notice the “AI-generated nonsense” on these pages. Bots, however, would fall down the rabbit hole, wasting computational resources as they go deeper and deeper through the multiple pages of AI-generated content.

Cloudflare customers are able to opt-in to using the AI labyrinth right now to protect their content from web scrapers.





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Chainlink’s Social Dominance Spikes After Exclusive US Government Meetup

Chainlink (LINK) has been one of the worst-performing top assets on Thursday, having declined by over 8% in...

Javice found guilty of defrauding JPMorgan in $175M startup purchase

Charlie Javice, the founder of student loan application startup Frank that was purchased by JPMorgan for $175 million,...

BRICS Watch: Russian Finance Minister Highlights Digital Assets’ Role for Bloc’s Future

Russian Finance Minister Anton Siluanov specified that digital financial assets were among the options that BRICS was considering...

Trump plans White House visit for El Salvador’s pro-Bitcoin Bukele

President Donald Trump is set to...

Advertisement

Anthropic Explores How Claude ‘Thinks’

It can be difficult to determine how generative AI arrives at its output. On March 27, Anthropic published a...

Crypto News You May Have Missed This Week | March 29, 2025

From U.S SEC resolutions to Studio Ghibli memecoins, here are the news you may have missed this week. Polymarket...

Must read

Chainlink’s Social Dominance Spikes After Exclusive US Government Meetup

Chainlink (LINK) has been one of the worst-performing...

Javice found guilty of defrauding JPMorgan in $175M startup purchase

Charlie Javice, the founder of student loan application...

You might also likeRELATED
Recommended to you