@Amthystreamm if it does it because the organizations training AI don't care about the small number of nightshaded images or have a human in the loop that would catch any errors caused by nightshade or glaze. Nightshade and glaze based on adversarial noise, stable diffusion (the one of the main methods when generative AI started to get big) is trained to removed noise. That should give you an idea of how hard a problem the creators of nightshade and glaze have given themselves.
What I mean by my first sentence is that the two main types of generative AI are large image models and small image models. Large image models are trained on so many images that the small amount of nightshaded or glazed images are a statistical blip what more they are moving beyond the scrap the web phase to a more targeted training regime. How this factors into nightshade and glaze is the training on large image models is depending less on the image tagging models that glaze and nightshade target and more on the context around the image to figure out what is in the image.
Small image models are trained by individual AI artists using hand selected images. Some of them are actually trained on their own mid training output. My point is a human is involved close enough they notice any issues cause by nightshade or glaze (on the flip side they are involved enough they should know that the original artist doesn't want their art used to train AI).
I understand artist want to protect their work from copyright infringement I just don't want them wasting CPU cycles on things that do practically nothing. My best advice is to put their art on sites that have anti scraping terms of use, I have read some judge statements indicate that they think LLM based AI is getting good enough that the organizations can be held to those terms of use. If you want to be really sure put your art behind a log in wall on such sites to really bind the organizations to the anti scraping terms of use. That way they already have a class (the users of that website) to sue any organization that violates those terms of use.
On a postscript I think the robot.txt and related standards need to be updated. They were created in an age when the major web crawlers where search engine indexing bots and the like. Now a days we have a wide variety of web crawlers which site might want to treat differently (for example web archiver can archive all pages that don't create a heavy load on the server including some behind the log in wall, search engine indexing bots are restricted to publicly accessible pages plus meta data on pages behind the log in wall, AI training bots are completely disallowed as one schema a website might want to set up). With such an update it wouldn't matter the size of the organization because a lot of the information they need to know to see if their bot is allowed is machine readable by program as simple and as cpu modest as they web crawler itself.