So more than a month ago I started looking at the kudos information I scraped from AO3 D:BH fics, very basic descriptive stuff.
Assuming (1) choosing to write a fic about a certain ship(s), and/or (2) choosing to read a fic with a certain ship(s) and leaving a kudos on it can be taken to be some proxy measure of interest in a ship, I decided to make a simple plot summarising the amount of interested users that each ship has and that each ship shares with another ship in the fandom.
Resulting plot:
x Interactive version of the plot can be found on my Github (link in description/Tumblr heading). My Github page also has links to tables with the raw counts shown when hovering over the nodes/links, for easier reference.
Again, note that this scrape was done in June 2020. I removed any fics that were non-English, were crossovers, or had less than 10 words. 16211 fics remain for analysis.
Details of the process under the cut.
1. Preprocessing relationship tags
Ship tags can get pretty unstandardised! I took only tags that had a ‘/’ between the characters (e.g. Connor/RK900). I then split each tag at the ‘/’ character and standardised the resulting two (or more) names. For example, Connor (one-sided), RK 800, Connie, Connor - they all fall under ‘Connor’.
As ships may involve more than two characters, for the sake of visualisation, I converted those into a list of every possible permutation of pairs within the listed ship; so Connor/RK900/Hank into [Connor/RK900, Connor/Hank, RK900/Hank].
2. Extracting user interest in ship(s)
Again, I took user interest in a ship to be signalled by:
(1) being an author of a fic with the ship tagged, or
(2) leaving a kudos on a fic with the ship tagged
(1) was extracted very simply given that each fic has an author(s) tied to it. (2) was taken from the kudos list shown at the end of every fic. Given that there’s no way to disambiguate guests and tie them across different fics on my end, I discarded that information and focused only on registered site users.
At the end of this process, I had a long list of user-ship interest pairs. 354382 entries, to be exact.
3. Preparing the plot
Given the information I have (user-ship interest pairs), I started with a bipartite plot. All that means is that there’s one set of nodes that’s all users, and another set of nodes that’s all the D:BH ships ever written about on AO3. Between these two sets of nodes are links that connect a user to a particular ship if they’ve expressed interest in it. The set of user nodes has no links within it, same for the set of ship nodes.
The bipartite graph can then be flattened/projected into a regular graph with just one type of nodes (in this case, ships, since I’m interested to see how many interested users each pair of ships shares). So we keep only the ship nodes, and a link exists between two ship nodes now if they share at least one common interested user. We can weight the links by the number of common interested users they share.
This graph has 311 ship nodes and 34429 links. It’s not huge, but it’s very unwieldy to visualise. Likely, many nodes also share very weak links (e.g. just a couple of common users). Since my end-goal is really just visualisation, I decided to prune the graph.
4. Filtering the graph
I reuse the same filter from Serrano et al., (2009) that I applied on my character co-occurence graph. I set it at a relatively strict level of α=.001. This filtered the graph down to 185 ship nodes and 1251 links.
5. Visualising the graph
I made this one less springy than the previous ones, since I realised how annoying it was to explore when the nodes keep bouncing back to place when you tug them out.
Ship node size is determined by number of users that have indicated interest in the ship (bigger=more). Link size is determined by the number of common interested users the pair of ships shares (thicker=more).
I also realised it’s still a little tough to really pick apart the links to get a good look, so I’ve uploaded the tables with the raw counts on my Github page. Unfortunately I cannot add a search bar since I don’t think I can deploy Dash on Github pages, so it may be a bit tedious looking through the link table.