#data-poisoning

Public notes from activescott tagged with #data-poisoning

Friday, October 31, 2025

This software is not made for making the Crawlers go away. It is an aggressive defense mechanism that tries its best to take the blunt of the assault, serve them garbage, and keep them off of upstream resources. Even though a lot of work went into making iocaine efficient, and nigh invisible for the legit visitor, it is an aggressive defender nevertheless, and will require a few resources - a whole lot less than if you’d let the Crawlers run rampant, though.

lol

It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, Markov-babble is added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.