Faculty of Language: Computational Linguistics: Too Computational for Linguistics?

Sunday, November 10, 2013

Computational Linguistics: Too Computational for Linguistics?

Even though I have recently moved on from the bed of nails that is the current job market to a cushy tenure track job, I still find myself reading the job announcements on LinguistList on a daily basis. There's of course all kinds of professional reasons for doing so, but the actual driving force behind this minor obsession of mine is more twisted. For you see, I have an existentialist streak that allows me to derive perverse amounts of joy from things that should cause me grief, worry, pain, and outbreaks of homicidal rage. And job searches for computational linguists got all of that aplenty.

If you also frequent that other blog about linguistics, then you are probably familiar with their annual number crunching of the linguistics job market. Computational linguistics usually does extremely well in those, with the number of job searches vastly outstripping the number of freshly minted PhDs. But my beef isn't so much with the number of available jobs, it's with the kind of jobs advertised. Not at all with the industry jobs, those are straight-forward NLP searches and the skills they require are exactly those you will acquire in an MA/MS or PhD program in computational linguistics that's geared towards NLP (startups tend to list more requirements than big companies like Google and Nuance, but anecdotal evidence tells me that they are also more willing to settle for less). But not every computational linguist wants to do NLP, so he or she will look at the computational linguistics searches run by linguistics departments. And that's where the dramedy starts.

Why Linguists Love Numbers

Having seen many, many job announcements over the course of the last five years, I have come to the conclusion --- and you have to read the rest of this sentence in deadpan monotone, there is no outrage on my part here --- that linguistics departments have no interest in computational linguistics, what they want is linguistics with numbers.

Given a choice between somebody who studies OT from the perspective of formal language theory and somebody who couples Stochastic OT with some MaxEnt algorithm to model gradiance effects, they'll pick the latter. Given a choice between somebody who works with Minimalist grammars or Tree Adjoining Grammars and somebody who does corpora-based sociolinguistics, they'll pick the latter. Given a choice between somebody who studies the learnability of formal language classes and somebody who presents a Bayesian learner for word segmentation, they'll pick the latter. (Just to be clear, all these examples are made up.) Of course there are exceptions --- I got a job, after all --- but they are rare. And a job search for a computational linguist that explicitly favors the formal type over all others is unheard of.

Even if one considers quantitative research the greatest thing since sliced bread, this is an unfortunate state of affairs. In my case, it makes it way more difficult to convince your students that they have a good chance of landing a job if they join the glorious enterprise of computational linguistics. More importantly, though, it means that linguistics departments, and by extension the field as a whole, are missing out on tons of interesting work. This is the part where the rules of blogosphere click-baiting dictate that I turn the rant-ometer to 11 and diss the narrow-mindedness of linguists and their aversion to rigorous work. But I prefer to only do things that I do well, and my rants tend to be less like precision surgery and more like beating a paraplegic to death with their own severed leg. Not a beautiful sight to behold, so let's instead try to be rational here and answer the question why linguists do not care about theoretical computational linguistics.

The reason is very simple: it's about as approachable as a porcupine in heat. That's due to at least three factors that mutually reinforce each other to create an environment where the average linguist cannot be expected to have even the hint of a shadow of a clue what computational linguistics is all about.

Reason 1: Abstractness/Few Empirical Results

The theoretical work in computational linguistics is mostly "big picture linguistics". The major issues are the generative capacity of language and linguistic formalisms, the memory structures they require, their worst-case parsing performance, and so on. This is appealing because it carves out broad properties of language that are mostly devoid of theory-specific assumptions and pin-points in what respects formalisms differ from each other if one abstracts away from substantive universals. Eventually this does result in empirical claims, such as what kind of string patterns should not occur in natural language or in which contexts Principle B is expected to break down. But the formal pipeline leading to the empirical applications is long, and it is often hard to see how the mathematical theorems inform the empirical claims. At the same time, many computational linguists are not linguists by training and hence are resilient to moving out of their formal comfort zone. On the Bayesian and probabilistic side, however, we find tons of linguists who want to explain a specific empirical phenomenon and simply use these tools because they get the job done (or at least it looks like they do). When asked why their work matters, they can give a 5 second summary and point to a neverending list of publications for further background. A theoretical computational linguists needs at least 5 minutes, and they can't give you any papers because they are all too hard.

Reason 2: Difficulty/Lack of Prior Knowledge/Lack of Good Intros

There is no denying that computational linguistics isn't exactly noob-friendly. Before you can even get started you need to be able to read mathematical notation and understand general proof strategies, and that's just the baseline for learning the foundational math and CS stuff you have to know before you can get started on computational linguistics proper. Only once all of that is in place can you really start to think about the implications for linguistics. To add insult to injury, none of the math is anything like what you know from high school or the calculus courses you took as an undergrad: formal language theory, complexity theory, abstract algebra, mathematical logic, finite model theory, type theory, learnability, just what the heck is all this stuff? Do you really want to wrestle with all this arcane lore for at least two years in the vague hope that eventually it might tell you something really profound about language, or would you rather hop on the Bayesian gravy train where your first baby steps only require addition and multiplication?

What more, the overwhelming majority of math you will have to learn on your own because no linguistics department teaches courses on these topics and the corresponding courses in the math and CS departments focus on things you do not need. There's also a dearth of good textbooks. The best option is to compile your own reader from various textbook chapters and survey papers --- the kind of fun activity that even Sir Buzz Killington wouldn't put on his weekend schedule. On the other hand, you could just drop this whole idea of learning about computational linguistics, take your department's statistics and experimental methodology courses, and spend the weekend marathoning Orange is the New Black. Tough choice.

Reason 3: Lack of Visibility/Lack of Practitioners/Lack of Publications

Computational linguistics as a whole is a minority enterprise within linguistics, but its theoretical spin is downright exotic. The odds of coming across a formal paper in a mainstream journal are rather low. LI sometimes features formal work in its Remarks and Replies section, and some full-blown technical papers got published in Lingua.¹ Formal talks are also far from common at the major conferences such as NELS and WCCFL (CLS is a positive exception in this respect, and the LSA is downright abysmal). But this is not enough to reach critical mass, the level where linguists start to develop a general awareness of this kind of work, its goals, its merits. This of course means fewer hires, which means fewer publications and fewer students, which means reduced visibility, which means fewer hires. It's the textbook definition of a Catch 22, assuming that there are any textbooks that define Catch 22.

We're Doomed, Doomed! But With Every Year we're Slightly Less Doomed

The above isn't exactly great PR for computational linguistics: a steep learning curve with few immediate rewards, permanent outsider status in the linguistic community, and colleagues that waste their Sunday afternoons ululating on some obscure blog. But things aren't quite as bad. People do get jobs (good ones to boot), but the road leading there takes a few more unexpected turns and twists. If you're a syntactician or a phonologist, you have a clear career trajectory, you know how many papers to publish in which journals, what conferences to attend, and what courses you'll wind up teaching. Most importantly, you are part of a big community, with all the boons that entails. The same is also true for quantitative computational linguists. As a theoretical computational linguist that wishes to be hired in a linguistics department, you have to find your niche, your own research program, and you have to figure out how to market it in a field where almost everybody you talk to lacks even the most basic prerequisites.

I imagine that things must have been similar for generativists back in the 50s and 60s when linguistics departments were dominated by descriptivists that knew little about Transformational grammar, didn't care about it at best and actively loathed it at worst. But Transformational grammar succeeded despite a higher degree of necessary rigor because it offered a new and highly insightful perspective. The same is true for theoretical computational linguistics. Scientifically, we are already on our merry way towards a bright future, which is evidenced by breakthroughs in learnability, the unifying force of a transductive perspective of grammar formalisms, and many more (all of which deserve their own posts). On a sociological level, there's still PR issues to overcome, but things are improving in this respect, too.

The three problems listed above can all be solved by publishing more work that solves empirical problems with computational tools that are easy to understand on an intuitive level. And thanks to the efforts of people like Robert Berwick, Aravind Joshi, Paul Smolensky, Ed Stabler, Mark Steedman, and all their students, this has become a lot more common since the early oughts. Jeff Heinz and Jim Rogers have done some remarkable work with their students that factors phonology in appealing ways yet is pretty easy to follow --- if you know how to play domino, you can get started right away.² Tim Hunter uses MGs in his unification of freezing effects, adjunct islands, and extraposition constraints.³ The Stabler parser for MGs is already used to model and predict processing difficulties.⁴ I, too, have moved from constraints and feature coding to empirical applications of this result for binding and island constraints.⁵ All of these topics are simple enough that they can be discussed in an advanced grad student course without having to presuppose years of formal training. As a matter of fact, that's exactly what I am doing in my computational seminar this semester, in an attempt to get them hooked and eager to try some of the harder stuff --- yes, it's deliberately designed as a gateway drug

Of course it would be nice to also have some textbooks for students at departments without a computational linguist, maybe even an online course (yes Norbert, small fields actually stand to profit a lot from MOOCs, although the enrollment probably wouldn't be all that massive; OOCs rather than MOOCs). Well, I'm working on some of that, but this takes more time than I have right now --- patience, my young Padawan, patience. In the meantime, I'll just keep you guys briefed on all the interesting computational work that's going on now. Who knows, maybe you'll think of it when your department is looking for a computational linguist.

Idsardi, William (2006): A Simple Proof that Optimality Theory is Computationally Intractable. Linguistic Inquiry 37, 271-275.
Heinz, Jeffrey, Gregory M. Kobele, and Jason Riggle (2009): Evaluating the Complexity of Optimality Theory. Linguistic Inquiry 40, 277-288.
Bane, Max, Jason Riggle, and Morgan Sonderegger (2010): The VC Dimension of Constraint-Based Grammars. Lingua 120, 1194-1208.↩
Heinz, Jeffrey (2010): Learning Long-Distance Phonotactics. Linguistic Inquiry 41, 623-661.
Chandlee, Jane, Angeliki Athanasopoulou, and Jeffrey Heinz (2011): Evidence for Classifying Metathesis Patterns as Subsequential. Proceedings of WCCFL 2011, 303-309.↩
Hunter, Tim: Deconstructing Merge and Move to Make Room for Adjunction. To appear in Syntax.
Hunter, Tim and Robert Frank: Eliminating Rightward Movement: Extraposition as Flexible Linearization of Adjuncts. To appear in Linguistic Inquiry.↩
Gregory M. Kobele, Sabrina Gerth, and John T. Hale (2013): Memory Resource Allocation in Top-Down Minimalist Parsing. Proceedings of FG 2012/2013, LNCS 8036, 32-51.↩
Graf, Thomas and Natasha Abner (2012): Is Syntactic Binding Rational. Proceedings of TAG+11, 189-197.
Graf, Thomas (2013): The Syntactic Algebra of Adjuncts. Ms., Stony Brook University.↩

21 comments:

Alex ClarkNovember 10, 2013 at 11:30 PM
Completely agree about the lack of good textbooks or introductory articles. There aren't even any good ones for modern language theory -- i.e. the MCS hierarchy.
ReplyDelete
Replies
UnknownNovember 11, 2013 at 11:03 AM
I completely agree with this statement: "The three problems listed above can all be solved by publishing more work that solves empirical problems with computational tools that are easy to understand on an intuitive level." The works that you're citing in that paragraph are all excellent examples. One thing that I'm confused about is your definition of computational linguistics, which seems to exclude probabilistic methods. Why is that a useful dichotomy? Where does the work you mention on modeling reading times using probabilistic minimalist grammars fall on this spectrum?
ReplyDelete
Replies
ewanNovember 11, 2013 at 12:50 PM
This comment has been removed by the author.
ReplyDelete
Replies
ewanNovember 11, 2013 at 12:51 PM
Well, "computational linguistics" is a bit of a catch-all. In addition to NLP, "linguistics with numbers" and shall we say "linguistics with proofs", one could easily distinguish "linguistics plus computer programs", "descriptive work plus computer programs", "proofs tangentially related to linguistics", "psycholinguistics with computer programs", "psycholinguistics with statistical models", "psycholinguistics with fancy statistical analysis" etc etc etc etc. All these are very very different things and I personally am just glad that in recent years linguistics departments have ventured out and helped make it at least a bit kosher to want "computational X" in whatever sense.

Anyhow I'm tempted to make double sure to spin what you're saying in the direction of "be aware, here's what non-computational linguists seem to be primed to hear" -- so that we can work with that as a starting point in this uphill battle. I think is where you were going with the deadpan thing. There's a tendency, on the other hand, to say "linguists are unfairly shutting formal people out because They Just Don't Get It." I think that's exactly NOT what you were going for, and it's the same old other-blaming mope you hear from theoretical linguists - "I can't talk to Those psychol{og/ingu}ists because They Just Don't Get It." But it's no coincidence that high performing 20th century physicists doing weird 20th century physics like Feynman and Einstein seemed to have nice, low-dimensional explanation-to-a-five-year-old versions of their work ready at all times. People who think big are much better at what they do when they have thought so deeply about complicated things that they are simple.

As much as I have encountered the problem myself and continue to do so I am still not convinced that the absence of a 5-second version is a necessary problem of the territory. I think the special problem formal work brings is precisely that it is NOT computational - at least not in the sense of "this is something I did on a computer." The entities one needs to refer to are not computers but really quite ethereal abstract objects. But, hey, the problems better darn well still lie squarely in the common ground or else I'm not sure why we want to be talking to linguists anyway. So, as they say in Quebec, "don't drop the potato" - keep on trying to find new and better ways of communicating with "normal" linguists and I think we will all be doing better science for it. Their willingness to accept computational anything into their lives is a foot in the door, even if they don't KNOW they want "that" kind of computational something - yet.
ReplyDelete
Replies
Kyle GormanNovember 11, 2013 at 2:07 PM
I agree with your major point that hiring committees are very interested in "linguistics with numbers". My own sobering opinion is that many such committees are not even fully capable of evaluating whether a linguist uses numbers well or poorly, for good or for evil. Much like the parable of the drunk and the lamppost, they evaluate "impact" instead.

Speaking of impact, though, you haven't really sold me on the impact of proof-based computational linguistics such as the work you cite (modulo some of the work you've explained on this blog). Take Bane et al. for instance. They show that the VC dimension of HG/OT is finite. From where I sit, and with all due respect to the authors, the impact of this finding is quite minimal. First, this tells us next to nothing about learnability. I believe (as do most here) that children learn something like a minimalist grammar, yet MCSGs we know about have an infinite VC dimension, so as Galilean biolinguists, we should at least suspect that finitetude of VC dimension is irrelevant to learnability. Secondly, if I understand correctly, the finiteness result assumes that CON is finite, which is a matter of some contention (I believe Bill Idsardi has written on this). Third, OT has been the dominant paradigm in phonology for almost 2 decades before Bane et al; does anyone really believe that a negative result would have changed that one bit?

So, I guess what I'm asking is whether you could explain why hiring committees should prefer a proof-writer candidate over a Bayesian word segmentation candidate. I am not trying to be critical, just asking out of my own ignorance.
ReplyDelete
Replies
AveryAndrewsNovember 11, 2013 at 7:20 PM
Random thought from the peanut gallery: Thomas G: "This holds as long as every grammar uses only finitely many contraints --- the set of constraints furnished by UG may nonetheless be infinite, though. So it applies even if you do not think that CON is fixed across grammars." It occurred to me a few days ago that you might be able to get learnability in the absence of any absolute limit on grammar complexity (eg infinite VC dimension) by imposing an upper bound on grammar complexity as a function of age. So for each age,there would be a finite VC dimension of grammars learnable by that age.
ReplyDelete
Replies
Charles YangNovember 19, 2013 at 11:19 PM
Thomas: The problem is more general. Those working on formal models--not just the formal/statistical variety you discuss--bear a responsibility of making their work relevant to other researchers. I think a major reason for the success of GB/MP, which is an abstract model based on the study of very familiar languages, was because it was very useful in the study of very unfamiliar languages. The range of complex specific examples in LGB, including languages that appeared very unfamiliar back then, was amazing. I wasn't around at the time, obviously, but I can easily imagine one running to his or her favorite language since the theoretical devices are presented clearly--LGB style--with worked out case studies.

Let me offer an example from personal experience. In a term paper for Ken Wexler's class, I essentially completed the variational model of parameter setting that became part of my thesis (it's not that complicated, after all). It worked better than triggering and I showed it to some people, thinking my job was done. The reaction was largely negative (for many reasons and that's for another day), but two very senior colleagues told me that in order for the work to have any bite, I needed to find empirical evidence from child language to show the reality of the variational model. (They don't even work on child language.) *That* took a lot of work--at a time I had no interest in child language whatsoever--and I am still learning. But the work became a lot stronger and more relevant, thanks to their advice.

I think the (reasonable) empirical worker's skepticism toward abstract models is well justified: In what way does it help us understand child language development, how does it affect the action of the parser, whether CED is reducible to ECP/Subjacency/Barriers (to cite one of my, and I'm sure yours, favorite examples). What's in it for me? The work you mentioned are all good progress in that direction, and we need to do more, by taking initiative ourselves. Some "brands" of work may be perceived to be more relevant to empirical matters, as you lament: time will tell.
ReplyDelete
Replies
UnknownMarch 31, 2014 at 8:29 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMarch 31, 2014 at 8:31 AM
Hi everyone,

If you are a software developer or simply have a burning desire to reduce illiteracy rates and improve global education, you should be interested in this new challenge http://bit.ly/1jRqBJ1. The All Children Reading (ACR) Global Challenge Development (GCD) is hosting Enabling Writers, a $100,000 prize competition aimed at finding technological solutions to improve reading skills for children in developing countries. Enabling Writers seeks to spur the development of software that easily allows authors to write and publish materials to help primary school children in developing countries learn to read in mother tongue languages. In the first round of the prize, three finalists will be awarded $12,000 each and offered feedback to improve their submissions for field testing. The technological solution that best enables local writers to quickly and easily create appropriate and interesting texts that follow tested reading instruction methodologies, and provide the optimum reading and learning experience for early primary school children, will win the $100,000 grand prize.

Established in 2011 as a partnership between USAID, World Vision and the Australian Government, ACR GCD aims to catalyze the creation and expansion of scalable, low-cost education tools and initiatives to improve literacy for early-primary students.

To learn more about the Challenge and to apply, go to http://bit.ly/1jRqBJ1 or follow us at on Twitter https://twitter.com/ReadingGCD

ps Please share the link and spread word about the challenge. The more applications and solvers we have, the more chance we have of finding a long-lasting solution and reducing global illiteracy rates. :-)
ReplyDelete
Replies
Asad SayeedApril 5, 2014 at 8:31 AM
Oh, I was kind of occupied when this blog post was posted, but I would have had so much to say about this from very personal experience. This thread is not fresh any more but I will say that trying to walk the tightrope of being a computer scientist who does language and a linguist who is computational leads to a bit of a "neither fish nor fowl" problem, career-wise. It turns out that one isn't both/and, but neither/nor, at least some of the time.
ReplyDelete
Replies

Faculty of Language

Comments