Monday, January 7, 2013

Complexity redux redux

Alex has had some interesting comments on an earlier post. I thought that I would move the discussion from the comments to the main text.  Thanks Alex.

I think that I have found the nub of my disagreement with Alex. It can be localized to the first sentence of paragraph 2 of his last comment. 

We don't know what the grammars are so *of course* we have to abstract away from the properties of grammars.

No, No, No!!! We know a great deal about grammars. What I want from computational theories/considerations is an explanation of why what I know to be the case has the properties it does.  For Alex, grammatical properties have to earn their computational keep by solving antecedent problems of various sorts.  For me the shoe is on the other foot: I am interested in computational theories to the degree that they earn their linguistic keep by enlightening me as to why grammars have the properties I know that they have. For example: Why are there locality restrictions like minimality?  Why are the phase heads v, C and D?  Why do antecedents of moved elements c-command their launch sites?  Note that these why questions all presuppose that there are locality restrictions like minimality, and "subjacency." That there is displacement and that c-command describes a required grammatical restriction etc.  I go further: I think that we KNOW that the structure of FL/UG are roughly described by the generalizations outlined in LGB. This I know. What I don't know is why, and here is where I look to computational considerations to possibly enlighten me. If such enlightenment is forthcoming, great. If not, I look somewhere else. This opportunism has led me to the tentative conclusion that much of what goes on in CS is beside the point.  Oddly some of what goes on in software design is not.  So, maybe the right way of thinking of computation in linguistics should look to what they are doing in software engineering rather than in CS departments.  I am not dogmatic about this for after all I am being deliberately opportunistic (a variant on Tom Lehrer’s Lobyachevsky strategy).  So were there to be something interesting coming from core CS considerations, I would be happy to embrace the results. Here I am open minded. Where I am not open-minded concerns whether we know what FL/UG (roughly) looks like. We do. There’s more to discover, but we know a lot and should NOT set this knowledge aside in doing our computational investigations.

My impression is that this is not how Alex see things. What he wants is a computational theory able to handle linguistic data without assuming that these properties of grammar hold. He is skeptical about them and so wants to finesse them. I am convinced they are true and so they form boundary conditions on further discussion.  We have made different judgments and they determine what we are looking for. Much of our disagreement stems from this, I believe.

There is one further source: we have very different conceptions concerning the utility of formalization.  I am not sold on the view that without formalization science grinds to a halt. Indeed, I might go further, it often impedes. I was recently reminded of the fact that physics used the calculus quite effectively for about 200 years before it was put on solid foundations (Cauchy-Wierstrass). Till then it was riddled with all sorts of foundational problems that philosophers and mathematicians were eager to parade and probe.  Doing so was, I have been told, very central to work on physics in England post Newton.  The problems were ignored on the continent. Consequently, physics thrived on the continent and died in England for about 100 years.  It seems that despite the foundational flaws lots of good work could be done and was done. 

The tools we use in syntax are clear enough in my view and formalization has (putting a few exceptions aside) yet to reveal anything very deep, at least in syntax.  I have nothing against people working on doing this, but I don't really see this as a pressing need. Our problems in linguistics don't arise, I believe, because we misunderstand our tools.  Nor, have I so far seen many deep results arise from the various formalizations that have been wrought. Formalizations can be useful, but are by no means required. Indeed, my impression from Stabler’s work is that should one want to formalize GB or Minimalism or GPSG or whatever it seems doable.  The question then is not if it can be done, but why, and here I am less sanguine than Alex seems to be that it is worth the effort.

Very last point: Alex notes that the ease of parsing long sentences needs explanation. Yup. In fact, Berwick and Weinberg in other work provide one. Certain left corner parsers seem to do the trick. However, as BW show, these require bounded left contexts. Interestingly, as they show, grammars that embed something like phases/subjacency/barriers deliver a grammar format of the right type, one that snugly fits into an efficient left corner parser. Now, I don't know if left corner parsers are the "right" parsing theory. BW’s was deterministic for example and whether grammars are deterministic is contentious, I am told. But I do like the style of argument: it provides a nice computational reason for a certain kind of grammatical code. One might be tempted to say that a grammar would be well designed (i.e. the code would be “efficient,” “optimal”) to efficiently parse language, if it had  subjacency/phases/barrier like restrictions as a design characteristic. I like this kind of computational explanation.  It aims to explain a known design feature of FL/UG in computational terms. We need more results like this. Get to work Alex!

1 comment:

  1. Apologies for not getting back to this before. Norbert was arguing that it's important that the faculty of language produces data structures that are well suited to their interactions with other parts of cognition, lets say, primarily, certain kinds of thinking (say those that involve situations, participants in them, their temporal and locational structure, asserting and denying aspects of those situations, etc; things we admittedly know little about). Well suited meaning that the mapping between properties of parts of the data structures produced by FoL and properties of those in thinking is simple (where an isomorphic mapping would be the most simple, and we measure complexity by departures from that in a Goodmanesque way). Then the relevant notion of computational complexity is, I think, a notion about the selection of an optimal algorithm for producing the relevant kind of data structures. Before I understood this, it used to puzzled me, for example, why Merge should be taken to be binary set formation, not n-ary set formation, which might be seen as less restrictive and hence simpler in some sense; but if the data structures require some kind of embedding, then an optimal way of getting that kind of data structure might be via an algorithm that creates binary structures. Ditto for things like short vs long dependencies: a long dependency has to be created out of shorter ones because the relevant data structure needs to map neatly to, say, situations or individuals. The relevant notion of computational complexity is then just about optimising algorithms for the creation of certain kinds of data structures. But it's crucial that the optimisation is bounded by the empirical conditions on the data structures' interactions with other systems of cognition. As Norbert hinted above, that doesn't require that issues of actual memory limitations in processing etc are relevant to the optimisation: that would be something extra and unexpected on my reading of this.