Comments on Faculty of Language: Complexity redux: Best Coding Practice

Much of BW is now out of date -- for example it pr...

2013-01-04T01:27:23.234-08:00

Much of BW is now out of date -- for example it predates the whole mildly context-sensitive program (Joshi, Weir, ... all the way down to Stabler). So while it makes some quite correct points on efficient parsing they need updating.

We don't know what the grammars are so *of course* we have to abstract away from the properties of grammars. They are not irrelevant at all -- on the contrary CS tells us that some of these properties are crucially important, and it helps to tell us which properties are not. In particular it tells me that Stabler's unification of movement and phrase structure is a reasonable solution and the Peters and Richie result tells me that unconstrained transformations are bound to be wrong. Regardless of whether you agree with these conclusions, clearly this an architecturally relevant distinction that is cut by our CS butter knife.

His arguments against asymptotic complexity (which I have not reread so apologies if I misrepresent) make one good point, which is that one wants to think about the size of the grammar. In the jargon this means looking at the uniform membership problem rather than the membership problem. So I think this is the main point you refer to.
This means the parsing problem has to be efficient not just in the length of the string but also in the size of the grammar. This is a fair point and a valid criticism of GPSG. So this is a point where the initial CS analysis is too weak, and we need to make it stronger (sharper).

He also makes one point which I find unconvincing which is that we only need to think about sentences of short length, and so we can ignore the asymptotic complexity which only kicks it at longer sentences. But the sentence I just wrote has 37 words, and no-one has any problem understanding it, and that is a fact that needs explanation.

In an earlier interaction you gave the example of: 'The boy from Montreal that I met last week while playing baseball recruited a lady for cheerleader that he met the day before while she was at the automat.' If that sort of example is part of what you want to explain, then you can't just consider short strings, and you need a computationally efficient process to recognise and parse.

"The most critical Berwick and Weinberg observation IMO is the observation that properties of the grammar can matter in important ways and that considering the asymptotic case simply presupposes that this is irrelevant. "
I am not sure whether what I have said answers this point -- if there is some other point that I missed then please point it out-- I think it is worth plugging away at this distinction.

The problem is not that it is dull (though many of...

2013-01-03T19:02:58.949-08:00

The problem is not that it is dull (though many of the results really are), but it cuts in the wrong places. Dull would be fine if it cut along the right seams. But the asymptotic complexity results that are the bread and butter of the CS approach simply abstract from what is interesting, viz. the architecture of the grammar. The most critical Berwick and Weinberg observation IMO is the observation that properties of the grammar can matter in important ways and that considering the asymptotic case simply presupposes that this is irrelevant. As my interests are in the properties of grammars, I don't see why I should pay too much attention to methods of investigation that abstract away from these.

One more point: I agree that if you want to do things in what you believe to be the right way (nice move btw invoking Fodor here) then formalism is a must. But if the price of formalism is abstracting from what is important, then I can learn to live without it (think of the current discussions in economics about DSGE theories and the abstractions required to make them run). BTW, physics has been mathematical for a long time, but what makes it quantitative is NOT the formal side but the fact that we have been able to give numerical values to key constants (like G, or the charge of the electron, or the value of the fine structure constant). As Freeman Dyson has observed, the theoretical results are generally qualitative. At any rate, the right question is not whether it looks mathematical but whether its the right math. Does it fit the problem (or are the important issues sacrificed to tractability (again think DSGEs)). If it fits the problem, it's useful. If not, it's not. I have nothing against going formal, so long as the formalism does not throw the interesting questions out the window. It's less clear to me than it is to you that the CS style as currently practiced fits the two problems we should be interested in; what does UG looks like and why.

Numerical coding is a nice example. So from a CS p...

2013-01-02T23:06:18.167-08:00

Numerical coding is a nice example. So from a CS point of view, unary/base 1 (your tally system) is much less efficient than binary or decimal systems, and binary and decimal systems are broadly speaking similar in power. Obviously some particular computations are a lot easier in base 2 than in base 10 (dividing by 2 for example), but let's say that from a CS point of view we can't draw a distinction between base 10 and base 2.
This means that from a CS perspective we can draw a distinction between base 1 and base n (where n > 1) but nothing else.

So I think that as you say, and as Berwick and Weinberg put it nicely in their 82 paper, complexity is a dull knife; because it is so general it may not draw distinctions as finely as we want. There is a trade off -- it applies to all computational systems, but only coarsely.

From my perspective then, there is a 'right' way and a 'wrong' way to look at things, to use a Fodorian terminology.

The 'right' way is to say: ok your CS complexity is fine, and it applies to the brain as it applies to all computational systems, but it isn't enough, we need a specific notion of neural complexity that takes account of the particular architecture of that part of the brain we are interested in.

The 'wrong' way is to say CS complexity does not apply at all, and to try to slide out from worst case asymptotic complexity analysis as Berwick and Weinberg 82 does. Being fair here, I think one thing which is very clear now, which maybe wasn't so clear 30 years ago (30 years is a *long* time in CS) is that that notion of complexity is in exactly the right place. I think the argument that we only need to consider short sentences is very weak; and that argument was only deployed to make a specific rhetorical point in a specific debate which is now over.

Obviously as the terminology suggests I think the 'right' way is the right way to look at things. *If* you accept the right way, then I think there are two conclusions. One is that even though complexity is a dull knife you can still cut a lot with it. We can point to the internet and the success of modern CS as evidence that even these very general notions of complexity can help us design efficient computational systems. The second is more cautious: it certainly is the case that the mind/brain is architecturally very different from the laptop I am writing this on -- and it can do some things very quickly that are impossibly slow on an electric computer, so we need to stay abstract, and only try to draw general conclusions -- i.e. efficient versus non efficient computationans. In other words not try to chop things too fine.
There is a price of entry to the 'right way' -- you need to be formal, as e.g. Stabler is. -- but you need to be formal to do physics too. Qualitative physics never got anywhere, it was only when it became mathematical, with Galileo and Newton, that it made progress. (Shamelessly trying to exploit your physics envy here ..)

On the other hand if you choose the *wrong* way,
then I don't think you are doomed to hand-waving.
Of course we have no idea what the architecture of the HFL is, but that doesn't necessarily rule out making one up. One of the parts of the neural network/PDP program was to find an alternative architecture, and something similar could be a viable research program -- define some architecture and see what consequences flow.

(BTW I recommend Iris van Rooij's writings on complexity in cognitive science, which are very relevant to this discussion -- e.g. van Rooij, I. (2008). The tractable cognition thesis. Cognitive Science, 32, 939-984.)