Hiya! I know it's been a little while but I just wanted to let you know I finally got around to making the web version of that fic poisoning tool I made about a month ago. It's at https://tricksofloki.github.io/ficpoison.html if you're interested :)
OHOHOHO!
Alright, I gave this a little test on my own fic over here. Quick little review/notes for anyone interested! (But the tl;dr is that I approve based on my initial review of the original code and based on using this web tool to automate running the code.)
This version is super easy to use. I'll be honest; I was struggling trying to figure out how to run the code locally before because that is not a coding language I personally use, and this website takes out all of the hard part of doing that. You need to do the one time task of creating a work skin to enable the "poison" CSS used, and you need to make sure that work skin is enabled for any work you're going to use this on. The code to put into your work skin is available at the link. If you already have a work skin you use, you can just add this class to it. (I think the tutorial I linked to does a good job walking you through how, but I'm open to doing a tutorial on this blog if anyone wants that.)
If you're poisoning an existing fic, first have a backup copy. Once you poison it, that copy is going to be annoying to UN-poison if you ever want to, so you should keep a private copy on your PC or phone or wherever so you have the unpoisoned version available. Once you do this, your copy on AO3 is poisoned, and it would take a fair amount of effort to unpoison as the author. Upside: as the author, you can see all the CSS stuff in the background, so if you really need to unpoison a copy as the author with full access to it, it's not impossible. Just really annoying.
For reference, here's what I can see as the author with access to the edit page:
I can clearly see where the poison is if I really wanted to go back through and unpoison.
And here is what I can see in a copy scraped with nyuuzyou's code:
You can definitely see it's messed up by looking, but you don't see an active callout to where exactly the poison code is. Keep in mind that not every scraper uses the same code as nyuuzyou, and more sophisticated code may pull something more sophisticated than the plain text from nyuuzyou's tool. Other scrapers may be pulling fics with the formatting and everything, and I don't know exactly what that output looks like. Depending on what their output is, if they can see the class for the poison, they can pretty easily code something to remove it. That's me being overly conservative, I suspect. I haven't heard of any scrapers who have bothered with anything more than plain text, and this isn't an issue unless they're grabbing the full HTML. (Translation: From what I know, this is NOT an issue. Yet. So this is not a weakness of the poison tool. Yet.)
Based on the output, anyone who's doing a half decent job of cleaning up the data they scrape would toss my fic out of the dataset. It's full of what look like typos because the poison got placed mid-word, so it looks like I just suck at writing. If your goal is to get tossed out of the dataset, this is perfect. If a scraper isn't paying attention at all, you can contribute some really terrible training data if they leave your fic in the set because your poisoned fic is going to be full or words that don't even exist thanks to the word placement.
As far as using the tool, I used an existing fic. I went into the edit page for the chapter, scrolled to the bottom and left the text editor on the default HTML mode. I copied everything in that box. (Easy method: click into the box where you can type out the fic, and press "Ctrl" and "A" to select all, then "Ctrl" and "C" to copy.) I went to the tab with all-hail-trash-prince's tool, and I pasted it into the box on the left.
I clicked "Apply poison" and the poisoned fic appeared in the right box. I copied the poisoned fic from the right box, went back to my fic on AO3 with my custom work skin already enabled, and I pasted the poison fic in place of the original fic. I clicked the preview button to make sure it would look normal, and it did. So I clicked to update the chapter with the poison block included.
I loaded the chapter with the default Microsoft screen reader turned on, and it didn't read any of the poison data, only the real fic that is visible on the screen, so success there.
So that brings us to applying this to a brand new fic. For those, you're going to go through the motions of posting a fic as usual, but instead of clicking post when you're done, you're going to swap that text editing mode over to HTML and copy everything in there. Take it to the poison tool, paste it in, and grab your poisoned copy. Go back to AO3, make sure your poison work skin is enabled, and then replace the original fic with the poison fic, making sure to stay in the HTML editing mode while you do.
(Sneaky quick edit after posting: sometimes the tool leaves you with a dangling <p> or </p> or <em>. Make sure you always preview the chapter after poisoning it, and you can go back in to the rich text editor to delete any of the floating tags that were accidentally put in by the poison.)
The last downside I notice is that your word count is immediately wrong. My 34k fic looks like a 43k fic after poisoning the first 16k words. Technically, you don't have to tell people the true word count of your fic but like. That feels a little rude to the reader, so I think it would be kind to briefly put the true word count either at the bottom of your summary or in your first author's note.
To me, the downsides of having to create a custom work skin (that trash-prince has kindly already written for everyone) and having the wrong word count displayed... are nothing. In comparison to having my fic be easy to scrape, I'll take those slight downsides any day. From what I know of the current scraping landscape, this is a reasonably effective way to make your fic useless to anyone who scrapes it because people are out there that will be scraping AO3 again.
I'm curious to hear anyone else's thoughts if they check this tool out or try it for themselves, so don't be shy! I'm one person, so maybe I can't catch everything. If you're seeing something that I'm not, I want to hear about it.
And if anyone wants a more visual step by step, you are welcome to yell my way. If this text post is clear enough for everyone, I won't bother, but if a more visual walkthrough will help anyone, then I'm happy to do it!
EDIT: Just tossing in a summary of feedback I've seen from others below!
The tool is pulling from a list of most popular English words, which means it may add inappropriate verbiage to G-rated fics. See this ask for info. trash-prince has made adjustments based on the initial words spotted, but please kindly report any other concerning poison words you find, particularly slurs and other wording that cannot be interpreted in a SFW way.













