t-distributed stochastic neighbour embedding of WikiDoc
by Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei
WikiDoc: the entire set of English Wikipedia articles (articles containing less than 1000 words are removed). Each article is a data point. We label the articles with the top 1,000 Wikipedia categories and label all the other articles with a special category named “others.”
WikiWord: the vocabulary in the Wikipedia articles4 (words with frequency less than 15 are removed). Each word is a data point.
CSAuthor: the co-authorship network in the computer science domain, collected from Microsoft Academic Search. Each author is a data point.






