Forum: War Ensemble BBS

Re: bad bot behavior

From anthk@anthk@openbsd.home to comp.misc on Mon May 12 06:24:45 2025

From Newsgroup: comp.misc

On 2025-03-18, Toaster <toaster@dne3.net> wrote:

On Tue, 18 Mar 2025 12:00:07 -0500
D Finnigan <dog_cow@macgui.com> wrote:

On 3/18/25 10:17 AM, Ben Collver wrote:

Please stop externalizing your costs directly into my face
==========================================================
March 17, 2025 on Drew DeVault's blog

Over the past few months, instead of working on our priorities at
SourceHut, I have spent anywhere from 20-100% of my time in any
given week mitigating hyper-aggressive LLM crawlers at scale.

This is happening at my little web site, and if you have a web site,
it's happening to you too. Don't be a victim.

Actually, I've been wondering where they're storing all this data;
and how much duplicate data is stored from separate parties all
scraping the web simultaneously, but independently.

But what can be done to mitigate this issue? Crawlers and bots ruin the internet.

GZip bombs + fake links = profit. Remember that gz'ed web pages are a
standard, even lynx can parse gz files natively.

Also, Megahal/Hailo under Perl. Feed it nonsense, and create some
non-visible contents under a robots.txt-dissallowed directory
full of Markov-chains generated nonsense and gzip bombs.

--- Synchronet 3.21a-Linux NewsLink 1.2