Discover Top Posts Tagged with #nltk

POURING

"Pouring"

Pity In The Southern Clime Slow

Do Their Nest Merrily Morneault

Father Sold Me How They Buis

Where The Beetle Goes His Work Says

---

Eat Hoarse With Joy In Die

Moon Arise In The Lily White Di

And The Heat Till She Imo

Upon A Thorn And Wroe

#poem #poetry #computationallygenerated #poet #poets #poems #nltk #python #linguistics #fauxe #robot_poetry #poemtype2

Playing around with the corpus of all transcripts from all episodes of CR. After removing the most common English words, these are the top 4-word groupings. Mentally constructing my prototypical CR episode now.

#critical role #python #jupyter #ngrams #critical role data #critical role wiki #nltk #I knew it was always a constitution saving throw That is 941 times of rolling damage #critical role things

Exploring Phantasms of the Living (1886) through Machine Learning: Presentiment, Crisis Apparitions and Thought Transference

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain

Phantasms of the Living, published in 1886 by the Society for Psychical Research (SPR), was a landmark ESP study that presented the case for “telepathy" or thought transference from mind to mind. The study consisted of 702 cases spanning over 1400 pages that considered several varieties of telepathic experiences collectively referred to as “phantasms of the living”

The case collection examined non-sensory and internalized impressions, many of which were presentiment experiences involving dreams, clairvoyance, visions, feelings or an awareness in connection with the deaths of family members or friends. These experiences often coincided with the approximate time of death

Cases also considered sensory and externalized impressions, in particular apparitional representations of living persons, who were perceived to be in moments of crisis or danger. These situations appeared evidential of shock-induced forms of thought transference from a distressed agent to a percipient in the form of telepathic hallucinations

As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning could discover main topics within Phantasms of the Living. For the project, two varieties of generative topic models were used: Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA)

Both models view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents. The mathematics underlying both models are beyond the scope of this post, but on an intuitive level there are key differences between the two methods

pLSA views topics as probability distributions over words. Topics are seen as conditionally independent across the documents that produced them. Non-Negative Matrix Factorization (NMF) is a method for finding topic clusters that equates to pLSA

LDA by contrast views documents as probability distributions over topics and topics as probability distributions over words. All documents share the same collection of topics, but each document contains those topics in different proportions. The LDA algorithm samples words across topics until it arrives at topics and word selections that most likely generated the documents

The project used various packages and libraries for natural language processing within the Python programming platform to include: the Natural Language ToolKit (NLTK) for processing the data set; scikit-learn to prepare and fit the LDA and NMF models; pyLDAvis was used to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The end-to-end project pipeline involved: data set processing; conversion of words and documents into matrix and vector space; fitting the LDA and NMF models; and then displaying the results

Processing. The book was decomposed into several documents from its constituent sections, chapters and volumes for the data set. Stopwords were removed such as common prepositions and conjunctions using the wordcloud application

Since telepathic experiences are spontaneous and can occur at any time or place, words conveying times and locations were removed as well as ordinal and cardinal types of numeric rankings

Nouns or titles representing persons were removed (e.g. man, woman, Mr., Mrs., etc.); however, interpersonal relationships were preserved (i.e. family, friends, acquaintances or strangers)

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing

The rows of the matrix correspond to documents with columns corresponding to the frequency of a term. Count vectorizers count word frequencies. Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents

Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were also instructed to look for bigrams (or words that were often used together) such as "thought-transference" and "telepathic hallucination"

Model Fit/Display. The LDA and NMF models were fitted using ten topics. Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

The LDA model was fitted with using Count and TF-IDF vectorization and ran with a maximum of 10 iterations. LDA model results were displayed using pyLDAvis and t-SNE to map topic distances

The NMF model was fitted with TF-IDF vectorization only and ran with a maximum of 200 iterations. NMF model results were displayed via spreadsheet

Results. The topics produced from the models are unlabeled. However words within topics often can be woven into a coherent theme

The first two pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 and 2 using Count vectorization

Words in Topic 1 include: “dreams”; “visions”; “impressions”; and “experiences” in connection with the “death”(s) of family members and friends. This can be considered a presentiment topic and it generated 67% of the content. This mirrors results from the prior wordcloud project

Words and bigrams in Topic 2 include: “thought-transference”, “hallucination(s)”, “phantasms”, “mind(s)”, “percipients”, “agent” and “telepathy.” This can be considered a telepathic hallucinations topic and it produced 27% of the content

The third pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization

Topic 1 combines all the aforementioned words into one topic. This can be considered a “presentiment and telepathic hallucinations ” topic and it accounts for 95% of the content, rendering all other topics practically insignificant in influence. The reason for this consolidation is that TF-IDF vectorization lowers the contribution weight of commonly used words

The spreadsheets compare LDA and NMF model runs using TF-IDF vectorizations with results limited to the top 10 words. Although topic weights and distances are not available, some topics appear more meaningful and cohesive, and are likely more impactful than others

There is overlap between topics 5 and 6 in the LDA model and together they form the presentiment and telepathic hallucinations topic. Topics 0 and 1 in the NMF model respectively appear to correspond to presentiment and crisis apparitions topics

The bigram “thought-transference” arises in both the LDA and NMF results and appears associated with the “Society” for “Psychical” Research and the late F.W.H. “Myers” who invented the term “telepathy”

This project had an extended preparation and production pipeline. The results indicate that unsupervised machine learning using LDA and NMF effectively and comprehensively summarized topical content in Phantasms of the Living. Moreover, key topics approximately corresponded to the types of internalized and externalized telepathic experiences described in the book

This project demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data. Moreover, visualization and graphing tools are essential for fully comprehending these patterns. Elsewhere in parapsychology LDA or NMF could also be applied to survey data, case collections, web or social media content of interest.

REFERENCES

Anaya, L. A. Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers. University of North Texas, 2011.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Christou, D. (2016). Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses. arXiv preprint arXiv:1604.01272.

Deerwester, S. (1988). Improving information retrieval with latent semantic indexing.

Gurney, E., Myers, F. W., & Podmore, F. (1886). Phantasms of the Living (2 vols.). London: Trübner. Reprinted at the Esalen Center.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

IMAGES

Genius.com + NLTK + WordArt.com = visualizing Kendrick Lamar's DAMN. lyrics

#data #genius #nltk #wordart #kendrick lamar #damn

SHEEP

"Sheep"

Think Not And Smell The Wurth

Benighted And The Rushes Kerth

Weeping In His Baits Sank

Thee Then The Wolvish Howl Schank

---

Ever See The Night Maskell

Wise Guardians Of Love Because Burble

Heard Her Head Er The Rebirth

To Peace Peace For Werth

#poem #poetry #computationallygenerated #poet #poets #poems #nltk #python #linguistics #fauxe #robot_poetry #poemtype2

LITTLE

"Little"

Lamb So Do Not Thou Prause

I Was Wet With Soft Baus

Sweet Is Not Alone Nor Rejoice

These Flowers While Thou Complainest Now Intervoice

---

Thee The Tiger Tiger Thuma

Moon Lovely Lyca Sleep Fama

Little Bird That In Her Haws

On Earth To Drive Their Son Naus

#poem #poetry #computationallygenerated #poet #poets #poems #nltk #python #linguistics #fauxe #robot_poetry #poemtype2

POEM

"POEM"

Pity The Lilly Of My Friend

The Mire Was Wet With The

Skies Earth S Descend

Lyca D In A Little A

---

Dacre And Mutual Fear

And They Know I Vanish Innocent

Hands Full Of The Voices Appear

Gifts Coined Gold Struggling Millisent

#poem #poetry #computationallygenerated #poet #poets #poems #nltk #python #linguistics #fauxe #robot_poetry #poemtype1

"LL"

Shade And The Ground Near

My House And Gentle Sleep

If The Earth A River Appear

Why It With Cruelty Didst Mould Asleep

---

Eyes O Life Betray Besides

Their Duty If Thou Know What

A Shade Him For I Asides

Bed And I Lay Sleeping But

#poem #poetry #computationallygenerated #poet #poets #poems #nltk #python #linguistics #fauxe #robot_poetry #poemtype1