Monday, March 28, 2016

Linguistics from a Marrian perspective 1

This was intended to be a short post. It got out of hand. So, to make reading easier, I am breaking it into two parts, that I will post this week.

For the cognitively inclined linguist ‘Marr’ is almost as important a name as ‘Chomsky.’ Marr’s famous book (Vision) is justly renowned for providing a three-step program for the hapless investigator. Every problem should be considered from three perspectives: (i) the computational problem posed by the phenomenon at hand, (ii) the representations and algorithms that the system uses to solve the identified computational problem and (iii) the incarnation of these representations and algorithms in brain wetware. Cover these three bases, and you’ve taken a pretty long step in explaining what’s going on in one or another CN domain. The poster child for this Marrian decomposition is auditory localization in the barn owl (see here for discussion and references). A central point of Marr’s book is that too much research eschews step (i), and this has had baleful effects. Why? Because if you have no specification of the relevant computational problem, it is hard to figure out what representations and algorithms would serve to solve that problem and how brains implement them to allow them to do what they do while solving it. Thus, a moral of Marr’s work is that a good description of the computational problem is a critical step in understanding how a neural system operates.[1]

I’m all on board with this Marrian vision (haha!) and what I would like to do in what follows is try to clarify what the computational problems that animate linguistics have been. They are very familiar, but it never hurts to rehearse them. I will also observe one way in which GG does not quite fit into the tripartite division above. Extending Marr to GG requires distinguishing between algorithms and generative procedures, something that Marr with his main interest in early vision did not do. I believe that this is a problem for his schema when applied to linguistic capacities.  At any rate, I will get to that. Let’s start with some basics.

What are the computational problems GG has identified? There are three:

1.     Linguistic Creativity
2.     Plato’s Problem
3.     Darwin’s Problem

The first was well described in the first chapter, first page, second paragraph of Chomsky’s Current Issues. He describes it as “the central fact to which any significant linguistic theory must address.” What is it? The fact that a native speaker “can produce a new sentence of his language on the appropriate occasion, and that other speakers can understand it correctly, though it is equally new to them” (7). As Chomsky goes on to note: “Most of our linguistic experience…is with new sentences…the class of sentences with which we can operate fluently and without difficulty or hesitation is so vast that for all practical purposes (and, obviously, for all theoretical purposes) we may regard it as infinite” (7).

So what’s the first computational problem? To explain the CN sources of this linguistic creativity. What’s the absolute minimum required to explain it? The idea that native speaker linguistic facility rests in part on the internalization of a system of recursive rules that specify the available sound/meaning pairs (<s,m>) over which the native speaker has mastery. We call such rules a grammar (G) and given (1), part of any account of human linguistic capacity must involve the specification of these internalized Gs.

It is also worth noting that providing such Gs is not sufficient. Humans not only have mastery over an infinite domain of <s,m>s, they also can parse them, produce them, and call them forth “on the appropriate occasion.”[2] Gs do not by themselves explain how this gets accomplished, though that there is a generative procedure implicated in all these behaviors is as certain as anything can be once one recognizes the first computational problem.

The second problem, (2), shifts attention from the properties of specific Gs to how any G get acquired. We know that Gs are very intricate objects. They contain some kinds of rules and representations and not others. Many of their governing principles are not manifest in simple data of the kind that it is reasonable to suppose that children have easy access to and that they can easily use. This means that Gs are acquired under conditions where the input is poor relative to the capacity attained. How poor? Well, the the input is sparse in many places, degraded in some, and non-existent in others.[3]  Charles Yang’s recent Zipfian observations (here) demonstrate how sparse the input is even in seemingly simple cases like adjective placement. Nor is the input optimal (e.g. see how sub-optimal word “learning” is in real world contexts (here and here)). And last, but by no means least, for many properties of Gs there is virtually zero relevant data in the input to fix their properties (think islands, ECP effects, and structure dependence).

So what’s the upshot given the second computational problem? G acquisition must rely on given properties of the acquirer that are instrumental to the process of G acquisition. In other words, the Language Acquisition Device (LAD) (aka, child) comes to the task of language acquisition with lots of innate knowledge that the LAD crucially exploits in acquiring its particular G. Call this system of knowledge the Faculty of Language (FL). Again, that LADs have FLs is a necessary part of any account of G acquisition. Of course, it cannot be the whole story and Yang (here) and Lidz (here) (a.o.) have offered models of what more might be involved. But, given the poverty of the linguistic stimulus relative to the properties of the G attained, any adequate solution to the computational problem (2) will be waist deep in innate mental mechanisms.

This leaves the third problem. This is the “newest” on the GG docket, and rightly so, for its investigation relies on (at least partial) answers to the first two. The problem addressed is how much of what the learner brings to G acquisition is linguistically specific and how much is cognitively and/or computationally general. This question can be cast in computational terms as follows: assume a pre-linguistic primate with all of the cognitive and computational capacities this entails, what must be added to these cognitive/computational resources to derive the properties of FL? Call the linguistically value added parts “Universal Grammar” (UG). The third question comes down to trying to figure out the fine structure of FL; how much of FL is UG and how much generic computational and cognitive operations?

A little thought places interesting restrictions on any solution to this problem. There are two relevant facts, the second being more solid than the first.

The first one is that FL has emerged relatively recently in the species (sourly 100kya) and when it emerged it did so rapidly. The evidence for this is “fancy culture” (FC). Evidence for FC consists of elaborate artifacts/tools, involved rituals, urban centers, farming, forms of government etc. and these are hard to come by before about 50kya (see here). If we take FC as evidence for linguistic facility of the kind we have, then it appears that FL emerges on the scene within roughly the last 100k years.

The second fact is much more solid. It is clear that humans of diverse ethnic and biological lineage have effectively the same FL. How do we know? Put a Piraha in Oslo and it will develop a Norwegian G at the same rate and trajectory as other Norwegians do and with the same basic properties. Ditto with a Norwegian in the forests of the Amazon living with the Piraha. If FL is what underlies G acquisition, then all people have the same basic FL given that anyone of them could acquire any G if appropriately situated. Or, whatever FL is, it has not changed over (at least) the last 50ky. This makes sense if the emergence of FL rested on very few moving parts (i.e. it was a “simple” change).[4] 

Given these boundary conditions, the solution to the Darwin’s problem must bottom out on an FL with a pretty slight UG; most of the computational apparatus of FL being computationally and cognitively generic.[5] 

So three different computational problems, which circumscribe the class of potential solutions. How’s this all related to Marr? And this is what is somewhat unclear, at lest to me. I will explain what I mean in the next post.

[1] The direction of inference is not always from level 1 to 2 then to 3. Practically, knowing something about level 2 could inform our understanding of the level 1 problem. Ditto wrt level 3. The point is that there are 3 different kinds of questions one can use to decompose the CN problem, and that whereas level 2 and 3 questions are standard, level 1 analyses are often ignored to the detriment of the inquiry. But I return to the issue of cross-talk between levels at the end.
[2] This last bit, using them when appropriate is somewhat of a mystery. Language use is not stimulus bound. In Chomsky’s words, it is “appropriate to circumstance without being caused by them.” Just how this happens is entirely opaque, a mystery rather than a problem in Chomsky terminology. For a recent discussion of this point (among others) see his Sophia lectures in Sophia Linguistica #64 (2015).
[3] Charles Yang’s recent work demonstrates how sparse it is even in seemingly simple cases like adjective placement.
[4] It makes sense if what we have now is not the result of piecemeal evolutionary tinkering for if it were the result of such a gradual process it raises the obvious question of why the progress stopped about 50kya. Why didn’t FLs further develop to advantage Piraha to acquire Piraha and Romance speakers to acquire Romance? Why stop with an all purpose FL when one more specialized to the likely kind of language the LAD would be exposed to was at hand? One answer is that this more bespoke FL was never on offer; all you get is the FL based on the “simple” addition or nothing at all. Well, we all got the same one.
[5] So, much of the innate knowledge required to acquire Gs from PLD is not domain specific. However, I personally doubt that there is nothing proprietary to language. Why? Because nothing does language like we do it, and given its obvious advantages, it would be odd if other animals had the wherewithal to do it but didn’t. Sort of like a bird that could fly never doing so. Thus, IMO, there is something special about us and I suspect that it was quite specific to language. But, this is an empirical question, ultimately.


  1. I wonder if vision today is considered to be domain-specific (Marr-style, on my interpretaion) or domain-general?

    1. Perceptual theories are always domain specific. Visual info is nothing like auditory info. As Gallistel likes to say, there is not general sensing mechanism. So early perception is very much tied to the details of visual infomation processing and aditory to sound properties and these are not the same.

      Higher order visual processing e.g. The principles of determining what a visual object or auditory object may share features. Though if geons are on the right track, they are very domain specific. But geons are not widely endorsed, I am told, so not clear what to conclude.

  2. I generally like the three part division into: linguistic creativity, Plato's problem and Darwin's problem, but it seems inappropriate here. Marr applies his three levels to information processing problems like the problem of vision: taking the retinal information and working out what objects are visually present. Linguistic creativity is not a computational process in this sense. Rather there are several different computational problems: most importantly production and comprehension of spoken speech. Plato's problem corresponds quite neatly to the computational process of language acquisition: the LAD in classic terms. The final one, Darwin's problem, is not a computational problem at all: it is rather a non-computational constraint on the other theories.

    1. I agree that the fit is not perfect. I talk about this in part 2. I also agree that the most natural fit is with the computational problems in production and comprehension and real time language acquisition for the second. Indeed, the quote from Chomsky says as much wrt the the creativity problem. Gs are intended as partial descriptions of the relevant computational problem, partial because it needs further supplementation. Ditto Plato's problem and FL/UG. This will be part of a solution.

      Concerning Darwin, I agree here too. We can turn it into a computational problem and maybe an information processing one, but ti concerns how map one genome to another given eve conditions of the time and genome of our ancestors. There is a real time eve problem here, and I don't see why it cannot be treated on a par with Marr's others, albeit it is not a CN problem per se.

      So, no disagreement on my end. Maybe part 2 will clarify the take I am proposing. The bottom line is that Marr can be made to fit, but it is not a perfect fit.

    2. Evolution as a computational problem has in fact been suggested; to my knowledge, most (computationally) rigorously by Les Valiant as a formal learnability problem. See this paper in JACM ( Les wrote a popular science book on this and the problem of learning in general. There are both positive and negative results.