aenigmae @aenigmae - Tumblr Blog

incredibly funny how a bunch of people interpreted “ao3 was almost certainly scraped as part of the gpt training dataset because it’s a big easily accessible body of english language text, so you can prompt gpt with surprisingly vague stuff and it will autocomplete with snarry underage or wangxian a/b/o” as “elon musk Personally is Currently scraping ao3 and training an ai to plagiarize fic, going to go lock ALL my works on ao3 IMMEDIATELY”

its. its already in the dataset. how do you think these things work. “locking my works to registered users only until after the scraping stops!” my dude the ao3 team just needs to like add a robots.txt and check the useragent and stuff to prevent this from happening in the future*, and theyre already on it, but not only is the existing body of work presumably In the Dataset, the model has ALREADY BEEN TRAINED. that omelet isnt going to get unscrambled

(*im assuming that everyone gathering datasets for large language models is being reasonably Polite about it bc these are both very simple to circumvent — if this assumption is false then ao3 might need to graduate to Offensive Measures but also we would definitely need to bully the culprits off of hacker news)

anyway im not taking any Stance one way or the other on the “ai art debate” (other than maybe “none of you know what the hell you’re talking about”) but we’re definitely going to see a whole new world of copyright claims against the big art models and ml researchers developing new tools for “removing” stuff from a trained model, and i for one think that it will be SO entertaining to watch

gender-trash

right ok i did some cursory googling and the main two datasets gpt-3 got trained on that ao3 works might appear in are common crawl and webtext2. commoncrawl.org has a faq page that tells you exactly what to add to your robots.txt to stop the crawler, and their terms of use say you’re not allowed to use the dataset to violate ip rights, although in terms of actual legal force i think that probably has about as much oomph as a whiffle ball.

since the common crawl dataset is used for a broad range of internet research, not just ai training, i personally wouldn’t want to block their crawler — but ao3 might decide differently, as is their right. (it’d be really lovely if common crawl let you indicate to their crawler that you’re ok with your data being included in the full dataset for some purposes and not others, but i digress.) they also check the robots.txt regularly although i couldn’t find any info about what they do with previously scraped data when it gets updated (and at any rate i would be SHOCKED if openai hadn’t downloaded a copy of the dataset, independent of common crawl’s updates).

unlike common crawl which is provided by an independent organization, webtext2 is a dataset generated by openai themselves, composed of every outbound reddit link with at least 3 karma. i couldn’t immediately find any info on how to block their web scraper, but if you had a website that was actually being scraped by them, you could figure out what their bot is called and add it to your robots.txt, or just blacklist everything except googlebot etc. or block reddit as a referrer or something so people don’t link to your stuff from there, idk. for ao3 specifically the best solution is probably gonna be blocking the openai scraper bot, in my opinion as someone who only vaguely knows shit about websites.

but the most important takeaway if you didn’t understand ANY of that is that these are long term solutions for preventing future stuff from being included in the dataset; there’s really no point in locking your back catalog, it’s already in there and openai provides no tools for letting you take it out. go harass them about it if you wanna do something.

the other thing you may be interested in knowing is that github is in DEEP SHIT for doing similar stuff with code — to oversimplify, they trained a big language model called copilot on a dataset including all public github repositories (non-programmers: github repositories are one of the most popular places to keep source code), completely ignoring license and copyright, and they’ve started selling a subscription service that lets you use copilot to write code. unsurprisingly to anyone who knows anything about ml, copilot immediately started regurgitating verbatim snippets of code from those public repos.

now, a lot of repos on github are public because they are open-source projects licensed under gpl, which is a “copyleft” license that “infects” other projects — if you use gpl’d code in your commercially licensed projects, ooops! your project is now also gpl-licensed. when i interned at google i got a WHOLE spiel during orientation about how if you want to use an open source project you have to get approval before doing anything ESPECIALLY IF IT IS gpl-licensed, and i’ve gotten similar spiels at other internships.

and, uh, copilot can recite the entire gpl license word for word, so it… definitely has seen a lot of gpl’d code. this means if you use copilot in your proprietary project the chances are good that, eventually, it will start puking out code that was gpl licensed and that is now being blithely reproduced in complete and total violation of that license, and theres no way for you to tell this has happened because after all copilot doesn’t fucking know any better, and the original project can rightfully now force YOUR entire project to be free and open-source.

anyway i think that, Real Soon Now, ml dataset gatherers are going to have some nasty realizations about copyright law, but the field of battle is probably going to be software licensing (a field that has just SO much legal firepower to throw around) and not fanfiction or digital art (fields which, uh, don’t).

irradiate-space

It's good to see you bring up CoPilot's use of code, because there are some interesting things going on in code. Sometimes, there's only one way to write a specific function concisely. That same way of writing the function may have been independently derived by three different program authors. The first author released the code under a proprietary license to make it open-source but not free or libre. The second author released the code under the WTFPL or the DAMAIL or the CC0 License, which are maximally permissive. The third author took the middle path and released the code under the GPL-3, a very infectious open-source license that is both free and libre while still reserving rights to the author.

If you looked at the output of CoPilot, you would not be able to tell which license the copied code was from, or whether it came from a single source instead of all three sources.

Likewise, there are only so many ways to write concise sentence about Steve slapping Bucky on the shoulder.

Human languages are large, but they are not infinite, and there is only a certain amount of complexity that can be expressed in a text string of a given length.

One of the open questions of law is at what point a work legally ceases to be derivative and becomes transformative. There are no clear guidelines in US law about this; if you want to be sure, you have to go to court.

The Organization for Transformative Works runs AO3, and they favor a broad interpretation of what counts as transformative. A fanfiction author's Stucky fanfic novelization of Captain America: Winter Soldier is generally regarded as transformative, except maybe by Disney's lawyers, even though it has the same plot beats as the original film and the same spoken lines.

Is it transformative if I write a proprietary Drupal module that contains significant similarities to a GPL-2 WordPress plugin? Is it transformative if CoPilot does the writing of the code?

Is it transformative if GPT-3 writes Stucky slashfic, or Rogers/Barnes/Danvers A/B/O?

These are the things that lawyers are paid to argue about.

But if you're someone who writes unlicensed fanfic, I think you should be cautious about too strongly endorsing an author's right to control derivative works. Don't vote for the Leopards Eating People's Faces party.

gender-trash

YEAH thanks for bringing this up! iirc this was also a major point of contention when oracle and google were fighting over the android api, because 99% of google’s reimplementation was different but there was one nine-line function for something really simple that both oracle and google happened to implement identically. so part of the fight was “can you copyright an api” and part of it was “did this identical reimplementation infringe on oracle’s ip if nobody involved in writing it saw oracle’s code”.

but also if the ao3 legal team does decide to pick a fight over it i think thats one of the better options, because of their interest in preserving transformative works. i’ve been seeing a lot of fic authors and fanartists make very strong claims about the legitimacy of “ai” creative pursuits that im not sure they’d like turned around on them. law in its majestic equality and all that

irradiate-space

OTW might block the simpler scrapers, but if anything, I think their mission statement is more in favor of preserving the rights of fans to create transformative works through machine generation than it is opposed to machine generation. Transformative works created with the aid of a machine are still transformative works, in my eyes.

If there's a legal judgement which preserves human ability to create transformative works but puts limits on computer-generated transformative works, that judgement will certainly be splitting some extremely fine hairs, especially with regards to human-prompted computer-generated works.

I worry that the specifics of any judgement against computer-generated transformative works will lead to restrictions on those human-composed transformative works in which a computer was in any way involved beyond stenographically reproducing the human's inputs. Possibly even on all human-composed transformative works. There's no way to do clean-room rederivation of cultural artifacts.

(Tangent: Part of Oracle v. Google was about whether Google could use the same names for functions as Oracle. Google could have used different names, but then their reimplemenation of the Java API wouldn't have worked with existing third-party code. If you replace all the character names, your Twilight fanfic becomes Fifty Shades of Grey, which is no longer recognizable as Twilight fanfic and which doesn't attract Twilight readers who don't use a compatibility layer to change the names back.)

#“none of you know what the hell you are talking about” is my favourite take on the ai art debate so far tbh #i can understand why creators don't want their work used for ai training but the amount of misinformation going around is not even funny #anyway #as a quantitative fandom studies researcher I'm glad that the general section of ao3s robots.txt remains somewhat permissive #(i.e. the part that's for user agents that aren't Google or Common Crawl etc)#but I am worried about the parts of the announcements that make it sound like they consider any large-scale scraping abusive #Sure there's no policy against data collection for academic purposes but the wording of “abusive” vs “responsible” scraping is very vague #AO3

roscolux

(gripping the sink) vienna waits for me

aenigmae

Wait a god damn second- du sprichst auch deutsch??

Deutsche auf tumblr be like

dracenathe6th

Turns out there's way more Deutschsprachige on this site then you think xD

aenigmae

Wait, I just realised that I first followed you because of your post about how funny the dwarves' names in Neverafter are to German speakers. So I did know you spoke German, I just forgor 💀

#actually that probably makes in funnier now that I think about it

Wait a god damn second- du sprichst auch deutsch??

Deutsche auf tumblr be like

#I didn't know either lmao #dracenathe6th

forestlion

Reblog and add a picture of the Scout-Schulranzen you had in der Grundschule

forestlion

I asked my mommey and she said mine did have cats instead of horses and no side bags... damn... it really was just a Kasten

aenigmae

Mine had this pattern but it was the square boxy kind. Young me really chose the knockoff anime girlies. and boys I guess. Anyway the one with the long brown hair was my favourite

#I think the matching Umhängegeldbeutel still exists at my parents' house somewhere

S.O.S. I'm having feelings about Adrian Clairmont again

#mostly thirsty feelings I'll admit it #or should I say... I'll confess it?#anyways help #adrian clairmont #la by night

sevenfactorial

Any algorithm can be a black box algorithm if you're too lazy to read the documentation

#jokes on you i will read the documentation and it will still be a black box to me #it's all black boxes to me baby!#and then i'll trial and error myself to an understanding #don't be me basically

utenah

“Well, let it pass; April is over, April is over. There are all kinds of love in the world, but never the same love twice.”

— F. Scott Fitzgerald, “The Sensible Thing” in The Short Stories

lakevida

i'm so glad i'm not a teen anymore i'm sorry for teens that you guys still have to do that. whole heartedly prefer the ways my 20s suck to the ways my teen years sucked

raytoroapologist

the future is bulletproof. by the fucking way.

#also the aftermath is secondary. if you even care

arsonistblue

“the horrors” this “the agonies” that. pick one of the wonders

sunshine

the sound of laughter

humanity’s ability to commit to a bit

“weird” animals and bugs

the curiosity of an autistic 9yo at recess

spicy food

learning new languages

the feeling after finishing a workout or pt exercises

talking to your friends

other (in the tags!!)

Voting ended onMar 20, 2023

#how surprising that committing to the bit is winning here on the committing to the bit website

420technoblazeit

i know we've already made a hundred jokes about it but oh my god. dean winchester escaping heaven with his car to save the multiverse is a real thing that happened. like that was airing on live tv in the year 2023. he drove. the car. and it took him to an alternate dimension where his parents were better people. you cant make this shit up

yetizeus

im sorry WHAT happened

420technoblazeit

are you in a position to receive information that might harm you

ruffboi-mags

Wait I'm sorry WHAT???

420technoblazeit

okay so. tldr for anyone who didn't know anything about the supernatural prequel. november 19th 2020, the show ends with a mediocre at best ending. dean dies young and goes to heaven where he waits for sam to die for 30 years i guess idk. it was bad. no one is happy with it. one of the people who's maybe the least happy with it is jensen ackles, who's been playing dean winchester for the past 15 years

so half a year later he buys the rights to supernatural to make his own prequel show. kind of insane, i know. what's more insane is that the premise of the prequel contradicts canon. way back in season 5-ish it was established that mary and john winchester only got together through divine intervention because sam and dean's births were pivotal in the plan to start the apocalypse. it's long and complicated and i don't really want to explain the plot of the best two seasons of this show because it would take way too long but.

what you need to know is that the premise of the prequel does not make sense within the context of canon. jensen ackles and the writers are aware that this does not make sense. supernatural is also a show that has messed with the concept of alternate universes since the 5th season

the winchesters is a show revolving around mary and john winchester, who are sam and dean's parents. it covers their first meeting in the 70s and their hunting adventures. theyre also fighting against a previously unmentioned group of monsters called the akrida. central to defeating the akrida is this man who's apparently been fighting them and knows exactly what they are and what they're after. this man is dean

dean shows up in the last episode to explain everything. apparently this is in fact post-spn finale dean who escaped from heaven after his death. he caught wind of the akrida (who are monsters created as god's failsafe in case he was defeated. god was evil in the end, just roll with it) and left to help kill them. the world of the prequel show is an alternate dimension, presumably one where mary and john winchester get to leave the hunting life and be their own people free from the tragedy that eventually befalls them in the original show. no, destiel does not happen. yeah.

ruffboi-mags

I unironically fucking love this. Absolutely losing my mind. Fucking hell Jenny.

#i cannot keep finding out news about spn like this #please

hey tumblr mobile app? uhh so this may come as a shock to you, are you sitting down? okay yeah so you know about the porn blogs? yeah the ones that have been following me in droves. sooo did you know that when i go to their blogs, it's actually to report and block them. yeah i don't want to give them gifts. they're not people i love and i definitely don't want to give back to them. sorry you had to find out this way

#sorry staff not trying to clown I realise tumblr needs to make revenue #but I think it would be better to only suggest the gift thing on blogs you already follow #tumblr mobile

gayvampyr

i hate that every time i look for color studies and tips to improve my art and make it more dynamic and interesting all that comes up are rudimentary explanations of the color wheel that explain it to me like im in 1st grade and just now discovering my primary colors

gayvampyr

“red and green are opposites 🥰” cool now how do i paint a tree with pinks and blues without it looking like a child’s finger painting or incongruous blobs of rainbow vomit

gayvampyr

ok i can’t explain it very well but im looking for tips and techniques for rendering art like

with specifically the highlights and colors being hues that compliment each other, don’t distract from the scene, and make it more interesting/visually appealing

is it too much to ask

moreclaypigeons

gonna drop some sources I have saved on Pinterest! I don't know if these all link back to the original sources so apologies for that

cohesive but still contrasting

This kind of talks about color and composition

This is a bit about landscape specifically

Values & composition

Contrast in composition

Balance in colors & values

This one's more for palette building but I think it's useful and can be applied to the other ones

Cohesion within compositions/lighting

"Chromatic fringe" - I also see people using this with shading, they bring in a transition color that is a different hue than the base color or shadow, it makes it so that less vibrancy is lost and it doesn't get muddy!

This one specifically has a lot of process behind the style of painting you're looking for!

Also one of my favorite artists who makes bright and colorful art like this is Not Sorry Art on TikTok & YouTube, her website is here and it's<3 my fav. She has some videos where you can see her process

With the oranges painting you put as an example, I noticed they painted the lighter values more toward yellow - they also exaggerated the hues of the undertones of the photo, so I'm guessing they either did it in their head or bumped the saturation up to get a closer look! I really love these paintings you shared and I definitely share your desire to paint/draw like that :)

#ref

forestlion

might stand in the corner or in the spotlight in a couple minutes. i better not lose my religion

forestlion

you guys wont fucking believe this

regging

being a humanities major who’s friends with stem majors is so funny because you’ll ask your friends what they’re doing today and they’re like “UGH it’s so stressful i have to stabilize the reactor core for my nuclear power midterm and then i have to build the supercomputer from i have no mouth yet i must scream for my electrical engineering homework :/ what about you” and you’re like “oh well i have to read a fun little book and write an essay about gender.” and they still think you have it worse

freakinflipflop

Being a stem major who's friends with humanities majors is ALSO funny bc you ask what's goin on with them and they're like "oh yeah my day's pretty good! I only have to read 50 pages for this one class today and half a book for another one. It's much better than last week where I read three books and wrote a 10 page paper about their overlapping motifs for one class while also researching a niche period of time that our library doesn't have any resources on. How's it been for you?" and you're like "oh I have a lil packet of fun math puzzles due tomorrow." and they look at you like you're carrying the weight of the universe on your back

#me in my cultural studies classes vs. me in my formal logic classes #give me that packet of fun little math problems any day AS LONG AS it's formal logic and nothing else