This was intended to be a short post. It got out of hand.
So, to make reading easier, I am breaking it into two parts, that I will post
this week.
For the cognitively inclined linguist ‘Marr’ is almost as important
a name as ‘Chomsky.’ Marr’s famous book (Vision) is justly renowned for
providing a three-step program for the hapless investigator. Every problem should
be considered from three perspectives: (i) the computational problem posed by
the phenomenon at hand, (ii) the representations and algorithms that the system
uses to solve the identified computational problem and (iii) the incarnation of
these representations and algorithms in brain wetware. Cover these three bases,
and you’ve taken a pretty long step in explaining what’s going on in one or
another CN domain. The poster child for this Marrian decomposition is auditory
localization in the barn owl (see
here for discussion and references). A central point of Marr’s book is that
too much research eschews step (i), and this has had baleful effects. Why?
Because if you have no specification of the relevant computational problem, it
is hard to figure out what representations and algorithms would serve to solve
that problem and how brains implement them to allow them to do what they do
while solving it. Thus, a moral of Marr’s work is that a good description of
the computational problem is a critical step in understanding how a neural
system operates.[1]
I’m all on board with this Marrian vision (haha!) and what I
would like to do in what follows is try to clarify what the computational
problems that animate linguistics have been. They are very familiar, but it
never hurts to rehearse them. I will also observe one way in which GG does not
quite fit into the tripartite division above. Extending Marr to GG requires
distinguishing between algorithms and generative procedures, something that
Marr with his main interest in early vision did not do. I believe that this is
a problem for his schema when applied to linguistic capacities. At any rate, I will get to that. Let’s start
with some basics.
What are the computational problems GG has identified? There
are three:
1. Linguistic
Creativity
2. Plato’s
Problem
3. Darwin’s
Problem
The first was well described in the first chapter, first
page, second paragraph of Chomsky’s Current Issues. He describes it as
“the central fact to which any significant linguistic theory must address.”
What is it? The fact that a native speaker “can produce a new sentence of his
language on the appropriate occasion, and that other speakers can understand it
correctly, though it is equally new to them” (7). As Chomsky goes on to note:
“Most of our linguistic experience…is with new sentences…the class of sentences
with which we can operate fluently and without difficulty or hesitation is so
vast that for all practical purposes (and, obviously, for all theoretical
purposes) we may regard it as infinite” (7).
So what’s the first computational problem? To explain the CN
sources of this linguistic creativity. What’s the absolute minimum required to explain it? The idea that native speaker
linguistic facility rests in part on
the internalization of a system of recursive rules that specify the available
sound/meaning pairs (<s,m>) over which the native speaker has mastery. We
call such rules a grammar (G) and given (1), part of any account of human linguistic capacity must involve the
specification of these internalized Gs.
It is also worth noting that providing such Gs is not sufficient. Humans not only have
mastery over an infinite domain of <s,m>s, they also can parse them,
produce them, and call them forth “on the appropriate occasion.”[2]
Gs do not by themselves explain how
this gets accomplished, though that there is a generative procedure implicated
in all these behaviors is as certain as anything can be once one recognizes the
first computational problem.
The second problem, (2), shifts attention from the properties
of specific Gs to how any G get acquired. We know that Gs are very intricate
objects. They contain some kinds of rules and representations and not others.
Many of their governing principles are not manifest in simple data of the kind
that it is reasonable to suppose that children have easy access to and that
they can easily use. This means that Gs are acquired under conditions where the
input is poor relative to the capacity attained. How poor? Well, the the input
is sparse in many places, degraded in some, and non-existent in others.[3] Charles Yang’s recent Zipfian observations (here) demonstrate how
sparse the input is even in seemingly simple cases like adjective placement.
Nor is the input optimal (e.g. see how sub-optimal word “learning” is in real
world contexts (here
and here)).
And last, but by no means least, for many properties of Gs there is virtually
zero relevant data in the input to fix their properties (think islands, ECP
effects, and structure dependence).
So what’s the upshot given the second computational problem?
G acquisition must rely on given
properties of the acquirer that are instrumental to the process of G
acquisition. In other words, the Language Acquisition Device (LAD) (aka, child)
comes to the task of language acquisition with lots of innate knowledge that
the LAD crucially exploits in acquiring its particular G. Call this system of
knowledge the Faculty of Language (FL). Again, that LADs have FLs is a necessary part of any account of G
acquisition. Of course, it cannot be the whole story and Yang (here) and Lidz (here)
(a.o.) have offered models of what more might be involved. But, given the poverty
of the linguistic stimulus relative to the properties of the G attained, any
adequate solution to the computational problem (2) will be waist deep in innate
mental mechanisms.
This leaves the third problem. This is the “newest” on the
GG docket, and rightly so, for its investigation relies on (at least partial)
answers to the first two. The problem addressed is how much of what the learner
brings to G acquisition is linguistically specific and how much is cognitively
and/or computationally general. This question can be cast in computational
terms as follows: assume a pre-linguistic primate with all of the cognitive and
computational capacities this entails, what must be added to these
cognitive/computational resources to derive the properties of FL? Call the linguistically
value added parts “Universal Grammar” (UG). The third question comes down to
trying to figure out the fine structure of FL; how much of FL is UG and how
much generic computational and cognitive operations?
A little thought places interesting restrictions on any
solution to this problem. There are two relevant facts, the second being more
solid than the first.
The first one is that FL has emerged relatively recently in
the species (sourly 100kya) and when it emerged it did so rapidly. The evidence
for this is “fancy culture” (FC). Evidence for FC consists of elaborate
artifacts/tools, involved rituals, urban centers, farming, forms of government
etc. and these are hard to come by before about 50kya (see here).
If we take FC as evidence for linguistic facility of the kind we have, then it
appears that FL emerges on the scene within roughly the last 100k years.
The second fact is much more solid. It is clear that humans
of diverse ethnic and biological lineage have effectively the same FL. How do
we know? Put a Piraha in Oslo and it will develop a Norwegian G at the same
rate and trajectory as other Norwegians do and with the same basic properties.
Ditto with a Norwegian in the forests of the Amazon living with the Piraha. If
FL is what underlies G acquisition, then all people have the same basic FL
given that anyone of them could acquire any G if appropriately situated. Or,
whatever FL is, it has not changed over (at least) the last 50ky. This makes
sense if the emergence of FL rested on very few moving parts (i.e. it was a
“simple” change).[4]
Given these boundary conditions, the solution to the
Darwin’s problem must bottom out on an FL with a pretty slight UG; most of the
computational apparatus of FL being computationally and cognitively generic.[5]
[1]
The direction of inference is not always from level 1 to 2 then to 3.
Practically, knowing something about level 2 could inform our understanding of
the level 1 problem. Ditto wrt level 3. The point is that there are 3 different
kinds of questions one can use to decompose the CN problem, and that whereas
level 2 and 3 questions are standard, level 1 analyses are often ignored to the
detriment of the inquiry. But I return to the issue of cross-talk between
levels at the end.
[2]
This last bit, using them when appropriate is somewhat of a mystery. Language
use is not stimulus bound. In Chomsky’s words, it is “appropriate to
circumstance without being caused by them.” Just how this happens is entirely
opaque, a mystery rather than a problem in Chomsky terminology. For a recent
discussion of this point (among others) see his Sophia lectures in Sophia Linguistica #64 (2015).
[3]
Charles Yang’s recent work demonstrates how sparse it is even in seemingly
simple cases like adjective placement.
[4]
It makes sense if what we have now is not the result of piecemeal evolutionary
tinkering for if it were the result of such a gradual process it raises the
obvious question of why the progress stopped about 50kya. Why didn’t FLs
further develop to advantage Piraha to acquire Piraha and Romance speakers to
acquire Romance? Why stop with an all purpose FL when one more specialized to
the likely kind of language the LAD would be exposed to was at hand? One answer
is that this more bespoke FL was never on offer; all you get is the FL based on
the “simple” addition or nothing at all. Well, we all got the same one.
[5]
So, much of the innate knowledge required to acquire Gs from PLD is not domain specific. However, I
personally doubt that there is nothing proprietary to language. Why? Because
nothing does language like we do it, and given its obvious advantages, it would
be odd if other animals had the wherewithal to do it but didn’t. Sort of like a
bird that could fly never doing so. Thus, IMO, there is something special about
us and I suspect that it was quite specific to language. But, this is an
empirical question, ultimately.
I wonder if vision today is considered to be domain-specific (Marr-style, on my interpretaion) or domain-general?
ReplyDeletePerceptual theories are always domain specific. Visual info is nothing like auditory info. As Gallistel likes to say, there is not general sensing mechanism. So early perception is very much tied to the details of visual infomation processing and aditory to sound properties and these are not the same.
DeleteHigher order visual processing e.g. The principles of determining what a visual object or auditory object may share features. Though if geons are on the right track, they are very domain specific. But geons are not widely endorsed, I am told, so not clear what to conclude.
I generally like the three part division into: linguistic creativity, Plato's problem and Darwin's problem, but it seems inappropriate here. Marr applies his three levels to information processing problems like the problem of vision: taking the retinal information and working out what objects are visually present. Linguistic creativity is not a computational process in this sense. Rather there are several different computational problems: most importantly production and comprehension of spoken speech. Plato's problem corresponds quite neatly to the computational process of language acquisition: the LAD in classic terms. The final one, Darwin's problem, is not a computational problem at all: it is rather a non-computational constraint on the other theories.
ReplyDeleteI agree that the fit is not perfect. I talk about this in part 2. I also agree that the most natural fit is with the computational problems in production and comprehension and real time language acquisition for the second. Indeed, the quote from Chomsky says as much wrt the the creativity problem. Gs are intended as partial descriptions of the relevant computational problem, partial because it needs further supplementation. Ditto Plato's problem and FL/UG. This will be part of a solution.
DeleteConcerning Darwin, I agree here too. We can turn it into a computational problem and maybe an information processing one, but ti concerns how map one genome to another given eve conditions of the time and genome of our ancestors. There is a real time eve problem here, and I don't see why it cannot be treated on a par with Marr's others, albeit it is not a CN problem per se.
So, no disagreement on my end. Maybe part 2 will clarify the take I am proposing. The bottom line is that Marr can be made to fit, but it is not a perfect fit.
Evolution as a computational problem has in fact been suggested; to my knowledge, most (computationally) rigorously by Les Valiant as a formal learnability problem. See this paper in JACM (http://people.seas.harvard.edu/~valiant/evolvability-2008.pdf). Les wrote a popular science book on this and the problem of learning in general. There are both positive and negative results.
Delete