If you also frequent that other blog about linguistics, then you are probably familiar with their annual number crunching of the linguistics job market. Computational linguistics usually does extremely well in those, with the number of job searches vastly outstripping the number of freshly minted PhDs. But my beef isn't so much with the number of available jobs, it's with the kind of jobs advertised. Not at all with the industry jobs, those are straight-forward NLP searches and the skills they require are exactly those you will acquire in an MA/MS or PhD program in computational linguistics that's geared towards NLP (startups tend to list more requirements than big companies like Google and Nuance, but anecdotal evidence tells me that they are also more willing to settle for less). But not every computational linguist wants to do NLP, so he or she will look at the computational linguistics searches run by linguistics departments. And that's where the dramedy starts.
Why Linguists Love NumbersHaving seen many, many job announcements over the course of the last five years, I have come to the conclusion --- and you have to read the rest of this sentence in deadpan monotone, there is no outrage on my part here --- that linguistics departments have no interest in computational linguistics, what they want is linguistics with numbers.
Given a choice between somebody who studies OT from the perspective of formal language theory and somebody who couples Stochastic OT with some MaxEnt algorithm to model gradiance effects, they'll pick the latter. Given a choice between somebody who works with Minimalist grammars or Tree Adjoining Grammars and somebody who does corpora-based sociolinguistics, they'll pick the latter. Given a choice between somebody who studies the learnability of formal language classes and somebody who presents a Bayesian learner for word segmentation, they'll pick the latter. (Just to be clear, all these examples are made up.) Of course there are exceptions --- I got a job, after all --- but they are rare. And a job search for a computational linguist that explicitly favors the formal type over all others is unheard of.
Even if one considers quantitative research the greatest thing since sliced bread, this is an unfortunate state of affairs. In my case, it makes it way more difficult to convince your students that they have a good chance of landing a job if they join the glorious enterprise of computational linguistics. More importantly, though, it means that linguistics departments, and by extension the field as a whole, are missing out on tons of interesting work. This is the part where the rules of blogosphere click-baiting dictate that I turn the rant-ometer to 11 and diss the narrow-mindedness of linguists and their aversion to rigorous work. But I prefer to only do things that I do well, and my rants tend to be less like precision surgery and more like beating a paraplegic to death with their own severed leg. Not a beautiful sight to behold, so let's instead try to be rational here and answer the question why linguists do not care about theoretical computational linguistics.
The reason is very simple: it's about as approachable as a porcupine in heat. That's due to at least three factors that mutually reinforce each other to create an environment where the average linguist cannot be expected to have even the hint of a shadow of a clue what computational linguistics is all about.
Reason 1: Abstractness/Few Empirical ResultsThe theoretical work in computational linguistics is mostly "big picture linguistics". The major issues are the generative capacity of language and linguistic formalisms, the memory structures they require, their worst-case parsing performance, and so on. This is appealing because it carves out broad properties of language that are mostly devoid of theory-specific assumptions and pin-points in what respects formalisms differ from each other if one abstracts away from substantive universals. Eventually this does result in empirical claims, such as what kind of string patterns should not occur in natural language or in which contexts Principle B is expected to break down. But the formal pipeline leading to the empirical applications is long, and it is often hard to see how the mathematical theorems inform the empirical claims. At the same time, many computational linguists are not linguists by training and hence are resilient to moving out of their formal comfort zone. On the Bayesian and probabilistic side, however, we find tons of linguists who want to explain a specific empirical phenomenon and simply use these tools because they get the job done (or at least it looks like they do). When asked why their work matters, they can give a 5 second summary and point to a neverending list of publications for further background. A theoretical computational linguists needs at least 5 minutes, and they can't give you any papers because they are all too hard.
Reason 2: Difficulty/Lack of Prior Knowledge/Lack of Good IntrosThere is no denying that computational linguistics isn't exactly noob-friendly. Before you can even get started you need to be able to read mathematical notation and understand general proof strategies, and that's just the baseline for learning the foundational math and CS stuff you have to know before you can get started on computational linguistics proper. Only once all of that is in place can you really start to think about the implications for linguistics. To add insult to injury, none of the math is anything like what you know from high school or the calculus courses you took as an undergrad: formal language theory, complexity theory, abstract algebra, mathematical logic, finite model theory, type theory, learnability, just what the heck is all this stuff? Do you really want to wrestle with all this arcane lore for at least two years in the vague hope that eventually it might tell you something really profound about language, or would you rather hop on the Bayesian gravy train where your first baby steps only require addition and multiplication?
What more, the overwhelming majority of math you will have to learn on your own because no linguistics department teaches courses on these topics and the corresponding courses in the math and CS departments focus on things you do not need. There's also a dearth of good textbooks. The best option is to compile your own reader from various textbook chapters and survey papers --- the kind of fun activity that even Sir Buzz Killington wouldn't put on his weekend schedule. On the other hand, you could just drop this whole idea of learning about computational linguistics, take your department's statistics and experimental methodology courses, and spend the weekend marathoning Orange is the New Black. Tough choice.
Reason 3: Lack of Visibility/Lack of Practitioners/Lack of PublicationsComputational linguistics as a whole is a minority enterprise within linguistics, but its theoretical spin is downright exotic. The odds of coming across a formal paper in a mainstream journal are rather low. LI sometimes features formal work in its Remarks and Replies section, and some full-blown technical papers got published in Lingua.1 Formal talks are also far from common at the major conferences such as NELS and WCCFL (CLS is a positive exception in this respect, and the LSA is downright abysmal). But this is not enough to reach critical mass, the level where linguists start to develop a general awareness of this kind of work, its goals, its merits. This of course means fewer hires, which means fewer publications and fewer students, which means reduced visibility, which means fewer hires. It's the textbook definition of a Catch 22, assuming that there are any textbooks that define Catch 22.
We're Doomed, Doomed! But With Every Year we're Slightly Less DoomedThe above isn't exactly great PR for computational linguistics: a steep learning curve with few immediate rewards, permanent outsider status in the linguistic community, and colleagues that waste their Sunday afternoons ululating on some obscure blog. But things aren't quite as bad. People do get jobs (good ones to boot), but the road leading there takes a few more unexpected turns and twists. If you're a syntactician or a phonologist, you have a clear career trajectory, you know how many papers to publish in which journals, what conferences to attend, and what courses you'll wind up teaching. Most importantly, you are part of a big community, with all the boons that entails. The same is also true for quantitative computational linguists. As a theoretical computational linguist that wishes to be hired in a linguistics department, you have to find your niche, your own research program, and you have to figure out how to market it in a field where almost everybody you talk to lacks even the most basic prerequisites.
I imagine that things must have been similar for generativists back in the 50s and 60s when linguistics departments were dominated by descriptivists that knew little about Transformational grammar, didn't care about it at best and actively loathed it at worst. But Transformational grammar succeeded despite a higher degree of necessary rigor because it offered a new and highly insightful perspective. The same is true for theoretical computational linguistics. Scientifically, we are already on our merry way towards a bright future, which is evidenced by breakthroughs in learnability, the unifying force of a transductive perspective of grammar formalisms, and many more (all of which deserve their own posts). On a sociological level, there's still PR issues to overcome, but things are improving in this respect, too.
The three problems listed above can all be solved by publishing more work that solves empirical problems with computational tools that are easy to understand on an intuitive level. And thanks to the efforts of people like Robert Berwick, Aravind Joshi, Paul Smolensky, Ed Stabler, Mark Steedman, and all their students, this has become a lot more common since the early oughts. Jeff Heinz and Jim Rogers have done some remarkable work with their students that factors phonology in appealing ways yet is pretty easy to follow --- if you know how to play domino, you can get started right away.2 Tim Hunter uses MGs in his unification of freezing effects, adjunct islands, and extraposition constraints.3 The Stabler parser for MGs is already used to model and predict processing difficulties.4 I, too, have moved from constraints and feature coding to empirical applications of this result for binding and island constraints.5 All of these topics are simple enough that they can be discussed in an advanced grad student course without having to presuppose years of formal training. As a matter of fact, that's exactly what I am doing in my computational seminar this semester, in an attempt to get them hooked and eager to try some of the harder stuff --- yes, it's deliberately designed as a gateway drug
Of course it would be nice to also have some textbooks for students at departments without a computational linguist, maybe even an online course (yes Norbert, small fields actually stand to profit a lot from MOOCs, although the enrollment probably wouldn't be all that massive; OOCs rather than MOOCs). Well, I'm working on some of that, but this takes more time than I have right now --- patience, my young Padawan, patience. In the meantime, I'll just keep you guys briefed on all the interesting computational work that's going on now. Who knows, maybe you'll think of it when your department is looking for a computational linguist.
- Idsardi, William (2006): A Simple Proof that Optimality Theory is Computationally Intractable. Linguistic Inquiry 37, 271-275.
Heinz, Jeffrey, Gregory M. Kobele, and Jason Riggle (2009): Evaluating the Complexity of Optimality Theory. Linguistic Inquiry 40, 277-288.
Bane, Max, Jason Riggle, and Morgan Sonderegger (2010): The VC Dimension of Constraint-Based Grammars. Lingua 120, 1194-1208.↩
- Heinz, Jeffrey (2010): Learning Long-Distance Phonotactics. Linguistic Inquiry 41, 623-661.
Chandlee, Jane, Angeliki Athanasopoulou, and Jeffrey Heinz (2011): Evidence for Classifying Metathesis Patterns as Subsequential. Proceedings of WCCFL 2011, 303-309.↩
- Hunter, Tim: Deconstructing Merge and Move to Make Room for Adjunction. To appear in Syntax.
Hunter, Tim and Robert Frank: Eliminating Rightward Movement: Extraposition as Flexible Linearization of Adjuncts. To appear in Linguistic Inquiry.↩
- Gregory M. Kobele, Sabrina Gerth, and John T. Hale (2013): Memory Resource Allocation in Top-Down Minimalist Parsing. Proceedings of FG 2012/2013, LNCS 8036, 32-51.↩
- Graf, Thomas and Natasha Abner (2012): Is Syntactic Binding Rational. Proceedings of TAG+11, 189-197.
Graf, Thomas (2013): The Syntactic Algebra of Adjuncts. Ms., Stony Brook University.↩