This post is directly addressing the recently disclosed activity by Meta AI. That they disregarded crawler instructions and just took what they wanted, because they "can".
When uploading your content, and in your profile, you have the ability to specify whether you want that data indexed, crawlable or used for AI training.
We specify your preferences by returning headers in the HTML pages that list what the content can be used for. These protocols are well defined, and establish a trust between you, the content producers, and them, the content consumers. It was designed to allow you the right to specify what they can do with your data.
Now because one big company decides they they want to disregard well established protocols for determining what content they are allowed to spider and just take whatever they damn well please, we're left in this situation where I can't let them take anything.
Blåhaj.zone has taken the step of implementing ACTIVE anti-bot / scraper counter-measures on our CDN servers to curb all detected scraping by known bots/crawlers. When our CDN detects activity directly for any hosted content from a bot or scraper, it will now block access to that request via a 403 PROHIBITED response.
I'm sorry that I have had to take this action, and remove your ability to specify a preference for your content to be indexable, but the protection of those that do not want to have their content indexed must take priority.
Frankly I am strongly disappointed that it comes to pass like this, that big companies resort to stealing data. It's not my data. It's your data. Data that you entrusted to us, and I let them steal it from our pockets. I feel angry and misled, and I'm sorry.
Kaity A
@supakaity@blahaj.zone
She/Her. A woman, polyam, pansexual and transgender.
I'm also an admin of the Blåhaj Zone instance running #Sharkey (a fork of #Misskey ), currently open for registration. https://activitypub.software/TransFem-org/Sharkey
blahaj.zone
@supakaity@blahaj.zone
·
Aug 09, 2025
0
4
0
Conversation (4)
Showing 0 of 4 cached locally.
Syncing comments from the remote thread. 4 more replies are still loading.
Loading comments...