Exercise 2
Working with Voyant Tools to Create Data Visualizations
For this exercise, I experimented with Voyant Tools and used Dostoevsky's novella Notes from Underground, sourced from Project Gutenberg.
I've actually read this novella in my first year Philosophy class, I read it again from time to time.
I chose this text because Fyodor Dostoevsky's novella is unique case study. The protagonist, the infamous "Underground Man" is extremely self-aware, but trapped and constantly contradicts himself. So, if there is any literary work that could benefit from having the linguistic pattern quantified and visualized, I thought this would be a good one.
With a document that contains a total of 44,481 total words and 7,479 unique word forms, the corpus was substantial enough to see patterns, but still remain meaningful for my first distant reading method.
First Impressions: The Interface
A feature I found to be distinct in Voyant Tools, is its multi-panel dashboard. Each corner of the site displays a different tool I could reorganize and customize. I actually thought it was just a visual quirk. But now I find it as a philosophical approach to text analysis. The multiple perspectives need to be visible simultaneously, which can help with comparative thinking.
Each panel can be swapped and interactive, which allows me to customize my analytical workspace. Akin to a digital historian's workstation.
The design of the interface is a methodological lesson, and my analysis worked best when I could see the different types of evidence side by side.
Source Criticism
Before any analysis could start, I had to clean the data. The Summary Tool displayed an immediate issue with my corpus and that is the word "gutenburg", which appeared 75 times. This was not in Dostoevsky's lexicon, but a metadata embedded by Project Gutenberg.
My distant reading began with criticising my source by filtering out administrative words. The Stopwords Tool is in charge of this, which affects the word cloud and the frequency statistics in real time. This digital cleaning tool is equivalent to removing stamps or other residues from a physical source or document.
All sources, digital or not, often come with baggage.
Cirrus and Document Terms
After cleaning the data, I analyzed Cirrus (word cloud) and the Document Terms tool to identify which words appeared frequently.
"Man" used 107 times.
"know" used 88 times.
"time used 75 times.
"like" used 75 times.
"love" used 67 times.
I think this highlights the narrators preoccupations: his identity, epistemology, temporality and affect.
The world cloud allowed me to take this data and turn it into a visual map. With the narrator using "man" 107 times, it meant more than just a word. This repetition shows me that questions of identity, self, and what it means to be human are central in the novella. I could narrow my focus for close reading, and philosophical or historical questions.
The Context Tool was like a highlighter, finding every single time that the word "man" was used and displaying sentences it appeared in. For instance, "I am a sick man", found in Part I.
This data no longer showed me a number but a pattern where the narrator used it to describe himself negatively. My analysis concluded that the Underground Man's negative labels for himself might reveal notions of masculinity and identity in 19th-century Russia. The Context Tool didn't directly answer this for me but now I have textual evidence. I could focus on interpretation.
Fancy Visual Voyant Tools: Knots and Dreamscape
After trying the basic tools, I jumped into the fancier tools Voyant offered, such as Knots and DreamScape. These were the most appealing to me.
Knot is a tool that turns the words into spiral or twisty lines. The frequency of the word's usage, the more twists and turns the line will display. The word "know" used 88 times, had more tangles, compared to "man", used 107 times. It's a weird but also cool way of seeing the repetition without the use of numbers.
Discovering the DreamScape Tool was the most exciting for me. It's aim is to find a location's name within the corpus text and dot it in a map. I originally thought it would show me Russia, since the context of the novella takes place there. Instead, it showed multiple locations in North America, places that were not in the book at all.
Before I experimented with the tool it did give me a warning prompt about the tool undergoing experimentation and "do not trust this data" on the Voyant Tools Help page. This taught me to always double-check what the digital tools show me.
Just because a digital tool looks cool, it doesn't speak for its reliability. If Voyant did not show a warning, I would have plainly believed the tool's capabilities of factual data.
For a historian, one shouldn't plainly believe any information at face value. We are surrounded with forged or fabricated documents. It exists in the digital world too.
Final Thoughts
I think Voyant is a helpful pair of glasses for near-sightedness, but not a magnifying glass. If I take the glasses off, I can read the small prints myself, analyse the tone and feel what the underground man is trying to tell me.
I think Voyant is a tool that can be used at the beginning of any project that could help me decide where my focus should be. But the "thinking" job is for me to do on my own with the data it can provide for me.
The limitation of this digital tool is that the real work of understanding people and meanings still remains in the hands of an old-fashioned approach.














