Discover Top Posts Tagged with #compling

So, I’ve noticed some people using “folx” as a alternative to “folks” analogous to Mx as an alternative to Mr/Mrs, but like... “folks” is already gender-neutral. But people obviously have something they want communicate by using “folx” instead of “folks”, so this makes me think that the “x” suffix has an additional meaning that doesn’t just mean “gender-neutral”. It might be worth noting that in Spanish, at least (and I think maybe some other Romance langauges) “x” is used as a contrast to using @ instead of a/o, because @ is sort implicitly binary. (My understanding is that most people actually speaking the language either pronounce it as e, or both pronounce it and spell it as e, but you still see e.g. “amigxs” from time to time on twitter.) However, English doesn’t have this problem, there isn’t a gender-neutral ending that is explicitly binary that x is an alternative to, and works like “folks” don’t imply a binary. But I wonder if this specific meaning for x carried over somehow. It’s like, instead of just removing gender from the word, it specifically adds an assertion that this word applies to anyone no matter how non-standard their gender identity. Or something?

This reminds me of something else related to @: a woman that I work with was recently working on a paper related to the meaning of @ in Spanish (the paper wound up being about something different, after she failed to find an effect for the particular thing she wanted to write about) and she was talking about how e.g. “compañero/a” can mean a variety of different things in Spanish, but if someone writes “Compañer@s!” they’re probably about to start singing the Internationale, so there is even an additional meaning to @ other than “either male or female”, and this was reflected in the data (we use ConceptNet, which you might remember I reblogged something about not too long ago, which can be used to compute vectors representing the meaning of words, it’s pretty cool).

#linguistics #compling

(mostly) right...for the wrong reasons

you’re in the midst of writing an email when suddenly, in grey text, google suggests the next sentence. mind reading? actually, it’s an application of natural language inference (nli) , the process by which computers determine if one sentence “entails” another. while google might seem to predict your email structure with uncanny accuracy, it can only handle the simple stuff; “let me know...”, “...does that work for you?”, and other phrases found in millions of emails. so at what point do these programs fail, and how can we fix it? that’s what computational linguists r. thomas mccoy, ellie pavlick, and tal linzen are trying to find out.

#computational linguistics #compling #machine learning #cogsci

You can’t just tack on “ok to make these results more stable we just added mutual information onto the other thing, calculated using ngrams”

Like, what is the distribution? Is it the probability of the term appearing an in n-gram? What size n-gram? What else is in the n-gram? Is it the probability of the term appearing in the document? If so, what do n-grams have to do with this? Like give me the formula so I know what we’re calculating here, mutual information is like the vaguest statistic ever without context

Also you know what is frustrating? When your system doesn’t use a stemmer for reasons known only to the ancient civilizations of programmers who designed it but all of the research of course uses a stemmer because why wouldn’t you, so you have to guesstimate, is this still gonna work without the stemmer? And then no one in charge of the business wants to devote time and money to generating labelled data so haha good luck with your evaluation of the thing

#adventures in research #compling #i'm sorry for the jargon #i just want to vent

Probably one of the best ways I've ever ended a night in my life: I got accepted into the M.A program of Computational Linguistics at the CUNY Graduate Center! Getting into this program has been my focal point since a year ago. I've spent the last Summer lugging around a GRE prep book to help ready myself for that exam, ask both professors and advisors of mine for letters of recommendations, and going back time and time again to revise my personal statement over the course of several summer months. I'd be kidding myself if I didn't say it required quite a bit of work. After submitting every required document by November of last year, I had to wait in uncertainty up until this very point. This institution is the only one in the city to offer the grad program in Computational Linguistics -- with an average of 25 candidates. The times where the uncertainty got to me has been countless...but even then, I did my best to prepare and plan ahead to the best of my abilities. Whenever I've had free time, I'd find a little hole in the wall and have been self teaching myself Python in order to prepare. It was the only program I had the intention of initially applying to -- a risky and foolish move on my part. I'm speechless, overjoyed, and even a little emotional over the news. Although it's far from over, the hard work and focus paid off. Of course, I'll be happily accepting! I want to thank all my friends, advisor, professors loved ones, and family for your support...you guys have no idea how much it means to me. Onward to new horizons we go, I won't let any of you down. #gradschool #acceptance #computationallinguistics #compling #cuny #cunygradcenter #newyork #nyc #queens #flushing

#nyc #queens #computationallinguistics #acceptance #compling #flushing #newyork #cunygradcenter #cuny #gradschool

Comp Ling in Italy?

this is a long shot, but i was wondering if anyone knows of italian universities which offer computational linguistics courses?

i want to do an erasmus in italy but i can’t decide on a place because none seem to have everything i want OR specifically a comp ling program.

so any italian computational linguists here who can offer some insight?

(maybe reblog to help me get an answer :D)

#compling #comp ling #computational linguistics #linguistics #italy #studying abroad #question #please reblog #studyblr #students

Bamman, Eisenstein and Schnoebelen 2014, Journal of Sociolinguistics

Gender Identity and Lexical Variation in Social Media

By: David Bamman, Jacob Eisenstein and Tyler Schnoebelen

Published by: Journal of Sociolinguistics Volume 18, Issue 2 April 2014 Pages 135–160

LL Abstract:

With a blend of quantitative and qualitative analysis, Bamman, Eisenstein and Schnoebelen explain their study of the relationship between gender, linguistic style, and social networks. They cluster Twitter users by lexical items instead of a male-female binary distinction, discovering that many clusters have strong gender orientations but conflict with the overall language stats found in the general population in previous research. The authors then train a machine-learning statistical classifier and measure the confidence with which it classifies individuals. The findings show that social network homophily is correlated with the use of same-gender markers, while individuals in general position themselves according to certain norms in order perform gender.

LL Summary:

The article begins with a brief overview of the study, a computational analysis of the impact of gender on lexical choices and social networks in a novel corpus of 14,000 individuals on Twitter. The authors continue with a background of the topic of gender in sociolinguistics, mentioning the theoretical outlook of gender as being constructed, maintained, and disrupted by linguistic practices. Computational analysis has looked at large data sets quantitatively to identify which words are the most accurate predictors of ‘attributes’ like age, gender, and regional origin. One Argamon et. al (2007) analyzed blogs and built a predicative model of gender that was more accurate than human judges. Computational literature distinguishes men and women in this way on the dimensions of ‘informativeness’ (resources that communicate propositions associated with men) and ‘involvement’ (resources that create interactions between speakers and environment associated with women). However, the complex role of gender in identity poses problems for quantitative analysis that only treats gender as an independent variable. The goal of this article is to bring the theory of resources having indexical fields of meaning used to create various stances and personae. In other words, social categories are performances by speakers, depending on the situational context. The next section of the article describes the data, collected from Twitter due to its public nature and ease of collection from the streaming API. After filtering for social network and gender and name distributions from census data, the data set contained over nine million tweets from 14,464 users. The first analysis was on the lexical markers of gender, a standard computational approach of dividing the tweets by gender and training a logistic regression classifier to identify gender (the dependent variable) by the independent variable (the 10,000 most frequent lexical items in the corpus). The classifier had 88 percent accuracy, and this analysis determined words most strongly associated with each gender. Pronouns, emotion terms, kinship terms, abbreviations, and some assent and negation terms were associated with female authors. Swears and taboo words were found to be more often associated with male authors. Overall these findings were similar to those of previous research, but this analysis determined eight categories of word classification. Each word was categorized in only one category depending on the salience of the ordered list of all eight categories. All differences between genders in these eight categories were statistically significant, but the direct association of word types with social categories is problematic. In the next section, the authors address this by using a quantitative method of clustering users according to similar lexical items or sets of words. Probabilistic clustering grouped linguistically similar authors together without regarding gender, resulting in seventeen clusters with fourteen which still had strong gender orientations. This organization showed the importance of intersectionality, or the impossibility of pulling different dimensions of social life into separate strands. These clusters are more like verbal repertoires, a complex interaction of social positions. Several differences between clusters reversed the earlier computational findings, suggesting that social categories like gender could not be separated from other aspects of identity. The data in this study exhibited gender homophily, the theory that people prefer similar people in their social networks. The authors also found a strong correlation between individuals with a greater proportion of same-gender ties and the use of more gendered language. They trained a classifier to identify gender based on the gender skew of social network, and it was found that the higher the gender skew, the more confident the classifier’s prediction would be. The article concludes with a discussion of building a data-driven model, how gender is constructed, and how machine learning in large data sets should be further investigated.

LL Recipe Comparison:

With its detailed convergence of fields of linguistics, this article is most like Linguine with Scallops Red Pepper and Broccolini:

In their article, the authors blend computational and sociolinguistic theories of gender and lexical use to examine how Twitter users construct gender through language. In this recipe, the fresh vegetables blend deliciously well with the fried scallops. I highly recommend both!

MWV 11/19/15