In the last week I've had to deal with two large-scale influxes of traffic on one particular web server in our organization.
The first involved requests from 300,000 unique IPs in a span of a few hours. I analyzed them and found that ~250,000 were from Brazil. I'm used to using ASNs to block network ranges sending this kind of traffic, but in this case they were spread thinly over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).
A few days later this same web server was on fire again. I performed the same analysis on IPs and found a similar number of unique addresses, but spread across Turkey, Russia, Argentina, Algeria and many more countries. What is going on?! Eventually I think I found a pattern to identify the requests, in that they were using ancient Chrome user agents. Chrome 40, 50, 60 and up to 90, all released 5 to 15 years ago. Then, just before I could implement a block based on these user agents, the traffic stopped.
In both cases the traffic from datacenter networks was limited because I already rate limit a few dozen of the larger ones.
Sysadmin life...
Try Anubis: <https://anubis.techaro.lol>
It's a reverse proxy that presents a PoC challenge to every new visitor. It shifts the initial cost of accessing your server's resources back at the client. Assuming your uplink can handle 300k clients requesting a single 70kb web page, it should solve most of your problems.
For science, can you estimate your peak QPS?
Anubis is a good choice because it whitelists legitimate and well behaved crawlers based on IP + user-agent. Cloudflare works as well in that regard but then you're MITM:ing all your visitors.
Also, I was just watching brodie robertson video about how United Nations has this random search page of unesco which actually has anubis.
Crazy how I remember the HN post where anubis's blog post was first made. Though, I always thought it was a bit funny with anime and it was made by frustration of (I think AWS? AI scrapers who won't follow general rules and it was constantly giving requests to his git server and it actually made his git server down I guess??) I didn't expect it to blow up to ... UN.
Her*
It was frustration at AWS' Alexa team and their abuse of the commons. Amusingly if they had replied to my email before I wrote my shitpost of an implementation this all could have turned out vastly differently.
I've seen a few attacks where the operators placed malicious code on high-traffic sites (e.g. some government thing, larger newspapers), and then just let browsers load your site as an img. Did you see images, css, js being loaded from these IPs? If they were expecting images, they wouldn't parse the HTML and not load other resources.
It's a pretty effective attack because you get large numbers of individual browsers to contribute. Hosters don't care, so unless the site owners are technical enough, they can stay online quite a bit.
If they work with Referrer Policy, they should be able to mask themselves fairly well - the ones I saw back then did not.