Good Omens a stylometric analysis, or who wrote what?
As part of the masters programme at the University of Antwerp, my friends and I did a study of the novel Good Omens by Neil Gaiman and Terry Pratchett. We did a stylometric analysis, which is mainly used for authorship attribution. Methods like this were for instance used to detect J.K. Rowling's style in the Cuckoo's Calling which was written under the pseudonym Robert Galbraith. We apply this technique on Good Omens to detect the styles of Pratchett and Gaiman in order to get an idea of who wrote what. We then visualise this in graphs that show for each part whose style is closer to that passage.
How do we do this? We train a model on lots of training data, i.e. by presenting it with novels of which we know who wrote them. The model then tries to learn the writing style of both authors. The author's style seems hard to define, but in stylometry we basically look at a certain number of most frequent words, which will always be small functional words such as 'the', 'of', 'you' ... . They are the best indicators of someone's writing style, because they occur in every text (and therefore transgress genre or time periods) and we are not really conscious of of them. For instance, most people will have ignored the second 'of' in the previous sentence because we've learned not to pay too much attention to these filler words. The same is true when writing a text. We are hardly aware of how we use function words and they are very difficult to forge, they are kind of like the fingerprints of language. So we use these seemingly insignificant words to determine the writing style of Gaiman on the one hand and Pratchett on the other. The model was trained on six novels by Neil Gaiman and six novels by Terry Pratchett.
After the model has learned this, we present it with a new, ambiguous text (in this case Good Omens). We chop the novel up in smaller pieces (specifically chunks of 5000 words) and ask the model for each chunk okay, does this segment of the novel resemble Gaiman or Pratchett's style more closely. The model will then classify it as it either Pratchett (P) or Gaiman (G) and it will also say how sure it is of its decision. The graph below illustrates this: on the x-axis you'll find the words of Good Omens (just over 1 000 000 words). The colour that is below the horizontal line gets the vote (so if green is below the line, the model says that segment was written by Pratchett). Anything above the line shows how sure it is of its choice. If there is some red above the line it basically means this segment is probably written by Pratchett, but Gaiman's style is also detected so it might be him as well.
The graph shows that Pratchett's style is overwhelmingly dominant throughout the novel. However there are some pieces where Gaiman shows up. These are (nearly) all passages involving the Four Horsemen of The Apocalypse. Gaiman said in the appendix of Good Omens that he wrote most of those passages, and since this also shows up in the model, which inspires confidence that it seems to work.
But we weren't happy yet! We have the unbelievable advantage that we have another version of the story, which was only written by one of the two authors, the screenplay of the TV-show (2019). Since we know this was only written by Gaiman, the model should attribute this in it's entirety to him, which it kind of does!
Without making too many claims, the results are very interesting to us because of Pratchett's overwhelming presence in the novel. Even in the screenplay, which we know he had nothing to do with, picks up his writing style. We are quite happy with the results, especially since they were reproducible in other methods.
This study was peer-reviewed and presented at the Computational Humanities Research Conference (CHR) on December 14th 2022. You can read the full article via the link below!









