T-SNE graphs look like snacks or a ball pit and I want to crawl into the screen

seen from Malaysia
seen from United States
seen from United States

seen from United Kingdom
seen from United Kingdom

seen from Hong Kong SAR China
seen from China
seen from Yemen

seen from United States
seen from Yemen
seen from United States
seen from United States
seen from Yemen

seen from United States
seen from United States

seen from United States

seen from Bosnia & Herzegovina

seen from Canada

seen from United States
seen from United States
T-SNE graphs look like snacks or a ball pit and I want to crawl into the screen
t-distributed stochastic neighbour embedding of WikiDoc
by Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei
WikiDoc: the entire set of English Wikipedia articles (articles containing less than 1000 words are removed). Each article is a data point. We label the articles with the top 1,000 Wikipedia categories and label all the other articles with a special category named “others.”
WikiWord: the vocabulary in the Wikipedia articles4 (words with frequency less than 15 are removed). Each word is a data point.
CSAuthor: the co-authorship network in the computer science domain, collected from Microsoft Academic Search. Each author is a data point.
Font Map
There are a lot of fonts out there, and there isn’t always a good way to see which fonts are similar to the look you’re going for without doing laborious searches. Inspired by previous machine learning projects, developer Kevin Ho decided to map a collection of fonts to a unified visualization. Thus, this 2D T-SNE project of the font manifold.
I like the practical application here: for graphic designers, finding a font that has the right style but isn’t over-used is a common task. Or finding one that’s similar to one style, but just a bit different. So a tool to help with that is welcome.
When I talk about how generative approaches can help artists and designers, this is exactly the kind of thing I’m talking about: tools that let us see relationships that would otherwise be invisible.
Can we extend this approach to other graphic tasks that could use a better interface for viewing potential outcomes? For example, what about a map of the different outcomes for a shader’s settings? Or a visualization for picking models in a Bethesda-style modular kit. Or a map of outcomes for different settings for a procedural generator, a bit like some of the visualization Mike Cook’s Danesh is doing. (Bonus points if you manage to use the manifold to add interpolation.)
http://fontmap.ideo.com/
https://medium.com/ideo-stories/organizing-the-world-of-fonts-with-ai-7d9e49ff2b25
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.
Feature vectors extracted by neural network
A.I. Experiments: Visualizing High-Dimensional Space
Dimensionality Reduction and Intuition
“I call our world Flatland, not because we call it so, but to make its nature clearer to you, my happy readers, who are privileged to live in Space.”
So reads the first sentence of Edwin Abbott Abbott’s 1884 work of science fiction and social satire, Flatland: A Romance of Many Dimensions. At the time, Abbott used contemporary developments in the fields of geometry and topology (he was a contemporary of Poincaré) to illustrate the rigid social hierarchies in Victorian England. A century later, with machine learning algorithms playing an increasingly prominent role in our daily lives, Abbott’s play on the conceptual leaps required to cross dimensions is relevant again. This time, however, the dimensionality shifts lie not between two human social classes, but between the domains of human reasoning and intuition and machine reasoning and computation.
Much of the recent excitement around artificial intelligence stems from the fact that computers are newly able to process data historically too complex to analyze. At Fast Forward Labs, we’ve been excited by new capabilities to use computers to perceive objects in images, extract the most important sentences from long bodies of text, and translate between languages. But making complex data like images or text tractable for machines involves representing the data in high-dimensional vectors, long strings of numbers that encode the complexity of pixel clusters or relationships between words. The problem is these vectors become so large that it’s hard for humans to make sense of them: plotting them often requires a space of way more than the three dimensions we live in and perceive!
On the other hand, machine learning techniques that entirely remove humans from the loop, like automatic machine learning and unsupervised learning, are still active areas of research. For now, machines perform best when nudged by humans. And that means we need a way to reverse engineer the high-dimensionality vectors machines compute in back down to the two and three dimensional spaces our visual systems have evolved to make sense of.
What follows is a brief survey of some tools available to reduce and visualize high-dimensional data. Send us a note at [email protected] if you know of others!