DeepTingle
I’m temped to just leave this without comment. But there’s a serious point here too:
There’s no denying that many of these systems can provide real benefits to us, such as faster text entry, useful suggestion for new music to listen to, or the correct spelling for Massachusetts. However, they can also constrain us. Many of us have experienced trying to write an uncommon word, a neologism, or a profanity on a mobile device just to have it “corrected” to a more common or acceptable word. Word’s grammar-checker will underline in aggressive red grammatical constructions that are used by Nobel prize-winning authors and are completely readable if you actually read the text instead of just scanning it. These algorithms are all too happy to shave off any text that offers the reader resistance and unpredictability. And the suggestions for new books to buy you get from Amazon are rarely the truly left-field ones—the basic principle of a recommender system is to recommend things that many others also liked.
What we experience is an algorithmic enforcement of norms. These norms are derived from the (usually massive) datasets the algorithms are trained on. In order to ensure that the data sets do not encode biases, “neutral” datasets are used, such as dictionaries and Wikipedia. (Some creativity support tools, such as Sentient Sketchbook (Liapis, Yannakakis, and Togelius 2013), are not explicitly based on training on massive datasets, but the constraints and evaluation functions they encode are chosen so as to agree with “standard” content artifacts.) However, all datasets and models embody biases and norms. In the case of everyday predictive text systems, recommender systems and so on, the model embodies the biases and norms of the majority.
It is not always easy to see biases and norms when they are taken for granted and pervade your reality. Fortunately, for many of the computational assistance tools based on massive datasets there is a way to drastically highlight or foreground the biases in the dataset, namely to train the models on a completely different dataset. In this paper we explore the role of biases inherent in training data in predictive text algorithms through creating a system trained not on “neutral” text but on the works of Chuck Tingle.
In a world where recommender systems try to sell us things we already own and AI projects are trying to revive phrenology and sell it to police departments, it is worth remembering that no dataset is truly neutral.
http://www.deeptingle.net/index.html














