Tech and AIOne company's devious plan to stop AI web scrapers...

One company’s devious plan to stop AI web scrapers from stealing your content

-


AI is stealing your content. We know this is how AI companies have built their highly-valued businesses – by scraping the web and using your data to train their chatbots.

Web scraping isn’t new. In the past, websites could rely on simple protocols like robots.txt to define what could, and could not, be used by web crawlers. Those guidelines were respected by the companies doing the scraping to, say, build results for search engines. AI companies, however, are not abiding by this social contract and are ignoring those instructions.

Cloudflare, a global network service that helps some of the biggest websites in the world deliver content to users, has devised a new plan to deal with AI companies’ web scrapers. And the idea is as positively devious as it is ingenious. 

In a new blog post, Cloudflare has shared how it’s now “trapping misbehaving bots in an AI labyrinth.” Basically, bots that don’t follow the rules laid out for them via protocols such as robots.txt, a simple text file that lays out what web crawlers are allowed to do on a site, will be messed with in order to waste the time and resources of the company in charge of the bot.

“AI-generated content has exploded…at the same time, we’ve also seen an explosion of new crawlers used by AI companies to scrape data for model training,” Cloudflare said in its post. “AI Crawlers generate more than 50 billion requests to the Cloudflare network every day, or just under 1% of all web requests we see.”

Mashable Light Speed

Cloudflare says it previously just blocked AI web crawlers and scrapers. However, doing so alerted those behind the bots that their access had been denied, and as a result they would shift strategies in order to continue their scraping campaigns.

So, Cloudflare came up with an idea to build a honeypot: a series of fake webpages created with AI-generated content.

The fact that Cloudflare is utilizing AI-generated content to fight AI web scrapers isn’t just for schadenfreude. When AI trains off of AI-generated content, it actually degrades the AI model itself. The industry even has a term for it: “model collapse.” Cloudflare is essentially making sure that bots that break the rules are punished for doing so.

Cloudflare’s post gets into the technical details of building the AI labyrinth. But, the main gist of it is that Cloudflare devised things in a way where a human visitor shouldn’t ever see these AI-generated honeypot pages. In addition, humans would notice the “AI-generated nonsense” on these pages. Bots, however, would fall down the rabbit hole, wasting computational resources as they go deeper and deeper through the multiple pages of AI-generated content.

Cloudflare customers are able to opt-in to using the AI labyrinth right now to protect their content from web scrapers.





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Solana Rises As BlackRock Brings Its $1,700,000,000 Tokenized Treasury Fund to Ethereum Rival’s Chain

Solana (SOL) is green on the day on reports that BlackRock is moving its blockchain-based money market fund...

DeFi lending protocol Abracadabra exploited for $13M of ETH

The investigation into the hack is ongoing, but an Abracadabra core contributor thinks it could be linked to...

Amazon Prime Big Spring Sale: Best Tech Deals

As early flowers start to bloom, Amazon holds another sale — its Big Spring Sale discounts are available...

Fidelity’s FBTC Leads Daily Inflow as Bitcoin ETFs Achieve 7-Day Inflow Streak With $84 Million Addition

Bitcoin ETFs continued their positive momentum with a seventh consecutive day of inflows, adding $84 million on March...

Advertisement

President Trump’s World Liberty Financial Announces Launch of ‘Institutional-Ready’ Stablecoin USD1

President Trump-backed World Liberty Financial is announcing plans to launch a new stablecoin on Tuesday that would be...

Must read

DeFi lending protocol Abracadabra exploited for $13M of ETH

The investigation into the hack is ongoing, but...

You might also likeRELATED
Recommended to you