Faculty of Language: Never thought I would say this

Tuesday, November 17, 2015

Never thought I would say this

Never thought I would say this, but I found that I resonated positively to a recent small comment by Chris Manning on Deep Learning (DL) that Aaron White sent my way (here). It seems that the DL has computational linguistics (CL) of the Manning variety in its sights. Some DLers apparently believe that CL is just is nano-moments away from extinction. Here’s a great quote from one of the DL doyens:

NLP is kind of like a rabbit in the headlights of the Deep Learning machine, waiting to be flattened.

DL wise men like Geoff Hinton have already announced that they expect that machines will soon be able to watch videos and “tell a story about what happened” and be downsized onto an in-your-ear chip that can translate into English on the fly. Great things are clearly expected. Personally, I am skeptical as I’ve heard such hyperbole before. We have been five years away from this sort of stuff for a very long time.

Moreover, I am not alone. If I read Manning correctly, he is skeptical (though very politely so) as well.[1] But, like me, he sees an opportunity here, one I noted before (here and here). Of course we likely disagree about what kind of linguistics will be most useful for advancing these technological ends,[2] but when it comes to engineering projects I am very catholic in my tastes.

What’s the opportunity consist in? It relies on a bet: that generic machine learning (even of the DL variety) will not be able to solve the “domain problem.” The latter is the belief that how a domain of knowledge is structured matters a lot even if one’s aim is to solve an engineering problem.

An aside: shouldn’t those that think that the domain problem is a serious engineering hurdle also think that modularity is a good biological design feature? And shouldn’t these people therefore think that the domain specificity of FoL is a no-brainer? In other words, shouldn’t the idea that humans have domain specific knowledge that allows them to “solve” language problems (and support human facile acquisition and use) be the default position? Chris? What think you? Dump general learning approaches and embrace domain specificity?

Back to the main point: The bet. So, if you think that using word contexts can only get you so far (and not interestingly far either), then you are ready to bet that knowing something about language will be useful in solving these engineering problems. And that provides linguists with an opportunity to ply their trade. In fact, Manning points to a couple of projects aimed at developing “a common syntactic dependency representation and POS (‘part of speech,’ NH) and feature label sets which can be used with reasonable linguistic fidelity and human usability across all human languages” (3).[3] He also advocates developing analogous representations for “Abstract Meaning.” This looks like the kind of thing that GGers could usefully contribute to. In other words, what we do directly fits into the Manning project.

Another aside: do not confuse this with investigating the structure of FL. What matters for this project is a reasonable set of Greenberg “Universals.” Indeed, being too abstract might not be that useful practically, and being truly universal is not that important (what is important is finding those categories that best fit the particular languages of interest). This is not a bad thing. Engineering is not to be disparaged. It’s just not the same project as the one that GG has scientifically set for itself. Of course, should the Chomsky version of GG succeed, it is possible that it will contribute to the engineering problem. But then again, it might not. As I understand it, General Relativity has yet to make a big impact on land surveying. It really all depends (to fix ideas think birds and planes or fish and submarines. Last time I looked plane wings don’t flap and sub bodies don’t undulate).

Manning makes lots of useful comments about DL, many of which I didn’t understand. He makes some, however, that I did. For example, his the observation that DL has mainly proved useful in signal processing contexts (2) (i.e. where the problem is to get the generalization that is in the data, the pattern from (noisy) patternings). The language problem, as I’ve argued, is different from this (see here) so the limits of brute force DL will, I predict, become evident when the new wise men turn their attention to these. In fact, I make a more refined prediction: to “solve” this problem DLers will either (i) ignore it, (ii) restrict the domain of interest to finesse it or (iii) promise repeatedly that the solution is but 5 years away. This has happened before and will happen again unless the intricate structural constraints that characterize language are recognized and incorporated.

Manning also makes several points that I would take issue with. For example, IMO he (like many others) confuses squishy data for squishy underlying categories. See, in particular, Manning’s discussion of gerunds on p. 4. That the data does not exhibit sharp boundaries does not imply that the underlying structures are not sharp. In fact, at some level they must be for under every probabilistic theory there is a categorical algebra. I leave it to you out there to come up with an alternative analysis of Manning’s observed data set. I give you a 30 second time limit to make it challenging.

At any rate, you will not be surprised to find out that I disagree with many of Manning’s comments. What might surprise you is that I think he is right in his reaction to DL hubris and he is right that there is an opportunity for what GGers know to be of practical value. There is no reason for DL (or Bayes or stats) to be inimical to GG. It’s just technology. What makes its practice often anathema is the hard-core empiricism gratuitously adopted by its practitioners. But this is not inherent to the technology. It is only a bias of the technologists. And there are some like Jordan and Manning and Reisinger who seem to get this. It looks like an opportunity for GGers to make a contribution? One, incidentally, that can have positive repercussions for the standing of GG. Scientific success does not require technological application. But having technological relevance does not hurt either.

[1] I confess to a touch of schadenfreude given that this is the kind of thing that Manning and Co like to say about my kind of linguistics wrt to their CL approaches.

[2] Though I am not confident about this. I am pretty confident about what kind of linguistics one needs to advance the cognitive project. I am far less sure about what one needs to advance the engineering one. In fact, I suspect that a more “surfacy” syntax will fit the latter’s design requirements better than a more abstract one given its NLPish practical aims. See below for a little more discussion.

[3] I have it from a reliable source that this project is being funded by Google to the tune of millions. I have no idea how many millions, but given that billions are rounding errors to these guys, I suspect that there is real gold in them thar hills.

17 comments:

UnknownNovember 17, 2015 at 4:35 PM
I'm not particularly surprised to hear this from Manning, he is one among a few handful in NLP that

1) have a firm grasp of the linguistic literature,
2) have good taste for what constitutes an interesting problem,
3) realize that a hypertrophic focus on incremental performance improvements is always an intellectual dead end that inhibits progress in the long run.

Manning's brief one-line remark on the ACL also shows that his views are not reflective of the NLP community at large. Maybe this will change if deep learning does end up obsoleting the simple models that currently dominate much of NLP.

One more remark regarding your conclusion: [A contribution], incidentally, that can have positive repercussions for the standing of GG. Scientific success does not require technological application. But having technological relevance does not hurt either.

This seems to focus on the institutional boons that come with applications, i.e. more prestige, money, jobs in other fields, and so on. But applications are also important sources for new empirical questions. For example, designing a wide-coverage grammar requires analyses for phenomena that are usually considered boring or part of the periphery. But these questions can turn out to be much more interesting than initially thought. Very little work in Minimalism has looked at the syntax of if-then in comparison to auxiliary inversion, but it's actually far from obvious what is going on in those constructions. Science has often profited from technology pushing new questions to the forefront, and in an ideal world that's what NLP should be doing for linguistics.
ReplyDelete
Replies
UtpalNovember 18, 2015 at 2:33 AM
This comment has been removed by the author.
ReplyDelete
Replies
UtpalNovember 18, 2015 at 3:13 AM
By the way, Manning did a straight syntax thesis (LFG) on ergativity, if my memory serves me right.
ReplyDelete
Replies
halNovember 20, 2015 at 11:04 AM
There seems to be a general belief in the deep learning community that FoL is "nothing special." In the sense that it's "okay" to put some known domain structure in our models, so long it's of the type that they approve of. (Somewhat circular, yes.) But in particular when pushed on linguistic domain knowledge, that's typically seen as too specific and not broad enough. From this I conclude that the underlying theory is basically that language faculty is an artefact of other general problem solving skills. Whether you agree with that or not, is a reasonable question. (I believe I can guess where Norbert falls, and you can perhaps guess from my tone where I fall, though perhaps not as far and for slightly different reasons.)

That said, I cannot complain too vehemently because when we lowly NLPers who happen to do some DL on the side integrate "linguistic knowledge" into our models, it's perhaps right to say that it's too specific because it typically is. We're not yet at a point where we really even use things like Greenbergian properties adequately, and I don't think the pudding there will bear much proof until we have far more than 30 languages we're looking at. But to go all the way to what Norbert might find appropriate (which I think hardcore DL folks would still consider too "FoL specific") is far from what we can do.
ReplyDelete
Replies
UnknownNovember 24, 2015 at 12:08 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownNovember 24, 2015 at 12:11 AM
If anything, the deep learning community is much more averse than the "old-fashioned" statistical NLP community to knowledge that's hard-coded into the model by the architect of system. It's essentially an article of faith for deep learning - the whole pitch that the system is supposed to learn the best representations for the data (and the task) automatically.

From the empirical point of view, there's a debate on whether even having hierarchical structure as part of the architecture of the model is useful, or whether you can get away with modeling language as a linear phenomenon with a slightly fancier neural network that can keep track of long-distance dependencies. My reading of that debate is that it's not at all clear that hierarchy helps you perform many of the standard tasks, at least when evaluated using standard metrics. So I find it hard to imagine a system that explicitly incorporates specific insights from generative grammar, like the ECP or Principle A or subjacency, outperforming massive "dumb" deep neural network.

Finally, for what it's worth, I read Manning to be suggesting not that generative grammar can be used to improve deep learning, but that we should use neural networks to explain certain linguistic phenomena.
ReplyDelete
Replies
ziyyaraedutechApril 13, 2021 at 4:37 AM
Online home tuition in Bangalore is the need of the hour as school learning is not enough, thus students are seeking online home tutors in Bangalore to clear their concepts.
Call Our Experts :- +91-9654271931
Visit Us:- home tuition in bangalore
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, November 17, 2015

Never thought I would say this

17 comments:

Contributors