where my fellow monster fuckers at 👅👅👅👅👅👅

seen from United Kingdom
seen from United Kingdom
seen from Netherlands
seen from Italy
seen from United States
seen from China
seen from China
seen from Türkiye
seen from China

seen from United Kingdom
seen from Germany
seen from United Kingdom
seen from United Kingdom

seen from Pakistan
seen from United States
seen from Malaysia
seen from United Kingdom
seen from Brazil

seen from Hong Kong SAR China
seen from United States
where my fellow monster fuckers at 👅👅👅👅👅👅
And things get even more terrifying!! I saw someone mention this in Jon's insta comments.
Everyone is at risk and not just artists...
Here's the full article!
I know a lot of artists are antsy about art theft right now (myself included, I literally just had a terrible nightmare about fighting the physical manifestation of AI, The Mitchells vs The Machines style…). I can’t claim that any of these things can prevent it. But here’s a few things I’ve found useful:
Opening a free account on Pixsy.com. This website does a decent job at letting me know when my images have been reposted. 99% of the time, the results are just Tumblr-copying zombie websites that just repost everything that is already here. But, it’s sensitive enough that it alerted me when my old college posted my work. They were harmlessly using my stuff as an example of alumni work- but I was glad to be in the know, AND they had mistakenly credited my deadname, so I was able to reach out and correct that. I would have never have seen it otherwise. The website has subscription options, but you can ignore them and still use the monitoring services it provides.
Reverse image searching my most widely shared pieces on haveibeentrained.com. This website checks to see if your work has been fed to AI.
Looking up legal takedown letters and referencing them to draft a generic letter for my own use. This takes a bit of the stress off what is already a stressful and often time-consuming ordeal. Taking time to craft a Very Scary, Legally Threatening, Yet Coldly Professional Memo has been worth it.
Remaining careful about what and how I post online. My living depends on sharing my work, so I have to post it. I’ve learned through trial and error how to post lower resolution images that still look good, but aren’t easily used for anything beyond the intended post, and of course, strategic watermarking. Never, ever post full res, print quality stuff for the general public. Half the time it ends up looking unflattering on social media anyways, cause the files get crunched for being large. I try to downsize my images, while set to bicubic smoothening, to head that off. Look up the optimal image resolutions and proportions for individual sites before posting your web versions. For some work, cropping the piece, or posting chunks of detail shots instead of a full view, is a more protective measure.
Look out for other artists! Reach out when in doubt. Don’t steal from others. Learn the difference between theft, and a study/master copy/fanart/inspiration. Don’t assume that all posted art has the same intended purpose as a “how to” instructional like 5 Minute Crafts. Ask permission. Artists are often helpful and supportive towards people who want to study their work! And, the best tip-offs I’ve received have all been from other people who were watching my back. Thank you to everybody who keeps an eye out for my work, and who have been thoughtful enough to reach out to me when they see theft happening 💖 y’all are the real MVPs. All we have is each other.
The LAION-5B search and opt-out tool returns with advanced safeguards to identify and prevent access to CSAM
Good news?
DAVID LUIZ IS GOING TO CYPRUS!?!?!?!??!?!?!?!!?!?!?!??!!?!?!?!?!?!?!?!??!!?!?!?!?!?!?!??!!??!?!!!!!!
The biggest dataset used for AI image generators had CSAM in it
Link the original tweet with more info
The LAION dataset has had ethical concerns raised over its contents before, but the public now has proof that there was CSAM used in it.
The dataset was essentially created by scraping the internet and using a mass tagger to label what was in the images. Many of the images were already known to contain identifying or personal information, and several people have been able to use EU privacy laws to get images removed from the dataset.
However, LAION itself has known about the CSAM issue since 2021.
LAION was a pretty bad data set to use anyway, and I hope researchers drop it for something more useful that was created more ethically. I hope that this will lead to a more ethical databases being created, and companies getting punished for using unethical databases. I hope the people responsible for this are punished, and the victims get healing and closure.
Okay, tech people:
Can anybody tell me what the LAION-5B data set is in layman's terms, as well as how it is used to train actual models?
Everything I have read online is either so technical that it provides zero information to me, or so dumbed down that it provides almost zero information to me.
Here is what I *think* is going on (and I already have enough information to know that in some ways this is definitely wrong.)
LAION uses a web crawler to essentially randomly check publicly accessible web pages. When this crawler finds an image, it creates a record of the image URL, a set of descriptive words from the image ALT text, (and other sources I think?) and some other stuff.
This is compiled into a big giant list of image URLs and descriptive text associated with the URL.
When a model is trained on this data it... I guess... essentially goes to every URL in the list, checks the image, extracts some kind of data from the image file itself, and then associates the data extracted from the image with the discriptive text that LAION has already associated with the image URL?
The big pitfall, apparently, is that there are a lot of images that have been improperly or even illegally posted on the internet publicly with the ability to let crawlers access them even though they shouldn't be public (e.g. medical records or CSAM) and the dataset is too large to actually hand-curate every single entry? So that therefore models trained on the dataset contain some amount of data that legally they should not have, outside and beyond copyright considerations. A secondary problem is that the production of image ALT text is extremely opaque to ordinary users, so certain images that a user might be comfortable posting may, unbeknownst to them, contain ALT text that the user would not like to be disseminated.
Am I even in the ballpark here? It is incredibly frustrating to read multiple news stories about this stuff and still lack the basic knowledge you would need to think about this stuff systematically.
So, SO fucking tired of this year, honestly.
So, AI generated pictures ( I REFUSE to call these 'art' ) are the current ban of my existence as an artist, but if you want to be sure if you art has been scraped reaped and stole for the LAION dataset (you know, the one who blatantly steal and resell private and copyrighted data?) you can use this site.
Now, I have no idea if their tool to opt out of the data base even work, but if I have news about it (or if you have, drop a comment or a MP) I'll update this post.
Yeah, I found several art of mine there, including some of my portfolio art, personal gift for friend's birthday and the content of my redbubble gallery.
As a french woman, my ancestral instinct to find and guillotine* someone are getting higher every day.
*in a metaphorical sense, of course, I'm usually** non violent and more prone to cry and blow my nose on the perpetrator's shirt than going straight to killing.
** usually anyway.