Never thought I would say this, but I found that I resonated positively to a recent small comment by Chris Manning on Deep Learning (DL) that Aaron White sent my way (here). It seems that the DL has computational linguistics (CL) of the Manning variety in its sights. Some DLers apparently believe that CL is just is nano-moments away from extinction. Here’s a great quote from one of the DL doyens:
NLP is kind of like a rabbit in the headlights of the Deep Learning machine, waiting to be flattened.
DL wise men like Geoff Hinton have already announced that they expect that machines will soon be able to watch videos and “tell a story about what happened” and be downsized onto an in-your-ear chip that can translate into English on the fly. Great things are clearly expected. Personally, I am skeptical as I’ve heard such hyperbole before. We have been five years away from this sort of stuff for a very long time.
Moreover, I am not alone. If I read Manning correctly, he is skeptical (though very politely so) as well. But, like me, he sees an opportunity here, one I noted before (here and here). Of course we likely disagree about what kind of linguistics will be most useful for advancing these technological ends, but when it comes to engineering projects I am very catholic in my tastes.
What’s the opportunity consist in? It relies on a bet: that generic machine learning (even of the DL variety) will not be able to solve the “domain problem.” The latter is the belief that how a domain of knowledge is structured matters a lot even if one’s aim is to solve an engineering problem.
An aside: shouldn’t those that think that the domain problem is a serious engineering hurdle also think that modularity is a good biological design feature? And shouldn’t these people therefore think that the domain specificity of FoL is a no-brainer? In other words, shouldn’t the idea that humans have domain specific knowledge that allows them to “solve” language problems (and support human facile acquisition and use) be the default position? Chris? What think you? Dump general learning approaches and embrace domain specificity?
Back to the main point: The bet. So, if you think that using word contexts can only get you so far (and not interestingly far either), then you are ready to bet that knowing something about language will be useful in solving these engineering problems. And that provides linguists with an opportunity to ply their trade. In fact, Manning points to a couple of projects aimed at developing “a common syntactic dependency representation and POS (‘part of speech,’ NH) and feature label sets which can be used with reasonable linguistic fidelity and human usability across all human languages” (3). He also advocates developing analogous representations for “Abstract Meaning.” This looks like the kind of thing that GGers could usefully contribute to. In other words, what we do directly fits into the Manning project.
Another aside: do not confuse this with investigating the structure of FL. What matters for this project is a reasonable set of Greenberg “Universals.” Indeed, being too abstract might not be that useful practically, and being truly universal is not that important (what is important is finding those categories that best fit the particular languages of interest). This is not a bad thing. Engineering is not to be disparaged. It’s just not the same project as the one that GG has scientifically set for itself. Of course, should the Chomsky version of GG succeed, it is possible that it will contribute to the engineering problem. But then again, it might not. As I understand it, General Relativity has yet to make a big impact on land surveying. It really all depends (to fix ideas think birds and planes or fish and submarines. Last time I looked plane wings don’t flap and sub bodies don’t undulate).
Manning makes lots of useful comments about DL, many of which I didn’t understand. He makes some, however, that I did. For example, his the observation that DL has mainly proved useful in signal processing contexts (2) (i.e. where the problem is to get the generalization that is in the data, the pattern from (noisy) patternings). The language problem, as I’ve argued, is different from this (see here) so the limits of brute force DL will, I predict, become evident when the new wise men turn their attention to these. In fact, I make a more refined prediction: to “solve” this problem DLers will either (i) ignore it, (ii) restrict the domain of interest to finesse it or (iii) promise repeatedly that the solution is but 5 years away. This has happened before and will happen again unless the intricate structural constraints that characterize language are recognized and incorporated.
Manning also makes several points that I would take issue with. For example, IMO he (like many others) confuses squishy data for squishy underlying categories. See, in particular, Manning’s discussion of gerunds on p. 4. That the data does not exhibit sharp boundaries does not imply that the underlying structures are not sharp. In fact, at some level they must be for under every probabilistic theory there is a categorical algebra. I leave it to you out there to come up with an alternative analysis of Manning’s observed data set. I give you a 30 second time limit to make it challenging.
At any rate, you will not be surprised to find out that I disagree with many of Manning’s comments. What might surprise you is that I think he is right in his reaction to DL hubris and he is right that there is an opportunity for what GGers know to be of practical value. There is no reason for DL (or Bayes or stats) to be inimical to GG. It’s just technology. What makes its practice often anathema is the hard-core empiricism gratuitously adopted by its practitioners. But this is not inherent to the technology. It is only a bias of the technologists. And there are some like Jordan and Manning and Reisinger who seem to get this. It looks like an opportunity for GGers to make a contribution? One, incidentally, that can have positive repercussions for the standing of GG. Scientific success does not require technological application. But having technological relevance does not hurt either.
 I confess to a touch of schadenfreude given that this is the kind of thing that Manning and Co like to say about my kind of linguistics wrt to their CL approaches.
 Though I am not confident about this. I am pretty confident about what kind of linguistics one needs to advance the cognitive project. I am far less sure about what one needs to advance the engineering one. In fact, I suspect that a more “surfacy” syntax will fit the latter’s design requirements better than a more abstract one given its NLPish practical aims. See below for a little more discussion.
 I have it from a reliable source that this project is being funded by Google to the tune of millions. I have no idea how many millions, but given that billions are rounding errors to these guys, I suspect that there is real gold in them thar hills.