AI companies and notably AI scrapers are a cancer that is destroying what's left of the WWW.
I was hit with a pretty substantial botnet "distributed scraping" attack yesterday.
- About 400,000 different IP addresses over about 3 hours
- Mostly residential IP addresses
- Valid and unique user agents and referrers
- Each IP address would make only a few requests with a long delay in between requests
It would hit the server hard until the server became slow to respond, then it would back off for about 30 seconds, then hit hard again. I was able to block most of the requests with a combination of user agent and referrer patterns, though some legit users may be blocked.
The attack was annoying, but, the even bigger problem is that the data on this website is under license - we have to pay for it, and it's not cheap. We are able to pay for it (barely) with advertising revenue and some subscriptions.
If everyone is getting this data from their "agent" and scrapers, that means no advertising revenue, and soon enough no more website to scrape, jobs lost, nowhere for scrapers to scrape for the data, nowhere for legit users to get the data for free, etc.