Discover Top Posts Tagged with #language modelling

Popular Recent

Our human understanding of coherence derives from our ability to recognize interlocutors’ beliefs and intentions within context. That is, human language use takes place between individuals who share common ground and are mutually aware of that sharing (and its extent), who have communicative intents which they use language to convey, and who model each others’ mental states as they communicate. As such, human communication relies on the interpretation of implicit meaning conveyed between individuals. The fact that human-human communication is a jointly constructed activity is most clearly true in co-situated spoken or signed communication, but we use the same facilities for producing language that is intended for audiences not co-present with us (readers, listeners, watchers at a distance in time or space) and in interpreting such language when we encounter it. It must follow that even when we don’t know the person who generated the language we are interpreting, we build a partial model of who they are and what common ground we think they share with us, and use this in interpreting their words.

Text generated by an LM [language model] is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Emily M. Bender, Timnit Gebru, Angela McMillan-Major, Shmargaret Shmitchell [cheeky alias], & 3 others suppressed by Google.

https://doi.org/10.1145/3442188.3445922

[emphasis added]

#reading #linguistics #language modelling #ok sorry done with the Bender bender for now #lowkey assembling materials for probably my last syllabus

There are two different perspectives from which one can look at the progress of a field. Under a bottom-up perspective, the efforts of a scientific community are driven by identifying specific research challenges. A scientific result counts as a success if it solves such a specific challenge, at least partially. As long as such successes are frequent and satisfying, there is a general atmosphere of sustained progress. By contrast, under a top-down perspective, the focus is on the remote end goal of offering a complete, unified theory for the entire field. This view invites anxiety about the fact that we have not yet fully explained all phenomena and raises the question of whether all of our bottom-up progress leads us in the right direction.

There is no doubt that NLP [natural language processing] is currently in the process of rapid hill-climbing. Every year, states of the art across many NLP tasks are being improved significantly—often through the use of better pretrained LMs [language models]—and tasks that seemed impossible not long ago are already old news. Thus, everything is going great when we take the bottom-up view. But from a top-down perspective, the question is whether the hill we are climbing so rapidly is the right hill. How do we know that incremental progress on today’s tasks will take us to our end goal, whether that is “General Linguistic Intelligence” (Yogatama et al., 2019) or a system that passes the Turing test or a system that captures the meaning of English, Arapaho, Thai, or Hausa to a linguist’s satisfaction?

It is instructive to look at the past to appreciate this question. Computational linguistics has gone through many fashion cycles over the course of its history. Grammar- and knowledge-based methods gave way to statistical methods, and today most research incorporates neural methods. Researchers of each generation felt like they were solving relevant problems and making constant progress, from a bottom-up perspective. However, eventually serious shortcomings of each paradigm emerged, which could not be tackled satisfactorily with the methods of the day, and these methods were seen as obsolete. This negative judgment— we were climbing a hill, but not the right hill—can only be made from a top-down perspective. We have discussed the question of what is required to learn meaning in an attempt to bring the top-down perspective into clearer focus.

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data, Emily M. Bender & Alexander Koller. DOI: 10.18653/v1/2020.acl-main.463 [italics in the original]

#reading #linguistics #language modelling #this is a Good Article

Zhiying Jiang, Matthew Yang, Mikhail Tsirlin, Raphael Tang, Yiqin Dai, Jimmy Lin. Findings of the Association for Computational Linguistics:

In this paper, we propose a non-parametric alternative to DNNs that’s easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages.

Our approach consists of a lossless compressor, a compressor-based distance metric, and a k-Nearest-Neighbor classifier. Lossless compressors aim to represent information using as few bits as possible by assigning shorter codes to symbols with higher probability. The intuition of using compressors for classification is that (1) compressors are good at capturing regularity; (2) objects from the same category share more regularity than those from different categories.

Being parameter-free, our method doesn’t rely on GPU force but CPU resources only. Thus, it does not bring negative environmental impacts revolving around GPU. In terms of overgeneralization, we conduct our experiments on both in-distribution and out-of-distribution datasets, covering six languages. As compressors are data-type agnostic, they are more inclusive to datasets, which allows us to classify low-resource languages like Kinyarwanda, Kirundi, and Swahili and to mitigate the underexposure problem.

#tech #cs #hello????????????#language modelling

Publications talking about the application of large LMs to meaning-sensitive tasks tend to describe the models with terminology that, if interpreted at face value, is misleading. Here is a selection from academically-oriented pieces (emphasis added):

(1) In order to train a model that understands sentence relationships, we pre-train for a binarized next sentence prediction task. (Devlin et al., 2019)

(2) Using BERT, a pretraining language model, has been successful for single-turn machine comprehension . . . (Ohsugi et al., 2019)

(3) The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. (Petroni et al., 2019)

If the highlighted terms are meant to describe human-analogous understanding, comprehension, or recall of factual knowledge, then these are gross overclaims. If, instead, they are intended as technical terms, they should be explicitly defined.

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data, Emily M. Bender & Alexander Koller. DOI: 10.18653/v1/2020.acl-main.463

#reading #linguistics #language modelling #searle mentioned late rin this paper is ee

Interview with Jon Dehdari, computational linguist, part 1/2

Today we talk to Jon Dehdari, a well-known computational linguist from Saarland University. Jon, let me ask you to introduce yourself: what did you study before, how did you reach the point you are at now?

Hi, sure, I’m Jon Dehdari. I’m doing a post-doc here at University of Saarland and DFKI, Germany. And before that I was working on a PhD at Ohio State University in the US. I was working on different kinds of NLP related topics. I started out working on parsing and formal analysis of syntax as well. And then I drifted into statistical NLP, machine translation, and then to neuroscience-informed NLP, I guess.