In a recent book (here),
Chomsky wants to run an argument to explain
why the Merge, the Basic Operation, is so simple. Note the ‘explain’ here. And
note how ambitious the aim. It goes beyond explaining the “Basic Property” of
language (i.e. that natural language Gs (NLG) generate an unbounded number of
hierarchically structured objects that are both articulable and meaningful) by
postulating the existence of an operation like Merge. It goes beyond explaining
why NLGs contain both structure building and displacement operations and why
displacement is necessarily to c-commanding positions and why reconstruction is
an option and why rules are structure dependent. These latter properties are
explained by postulating that NLGs must contain a Merge operation and arguing
that the simplest possible Merge
operation will necessarily have these properties. Thus, the best Merge
operation will have a bunch of very nice properties.
This latter argument is interesting enough. But in the book
Chomsky goes further and aims to explain “[w]hy language should be optimally
designed…” (25). Or to put this in Merge terms, why should the simplest possible Merge operation be the one that we
find in NLGs? And the answer Chomsky is looking for is metaphysical, not epistemological.
What’s the difference? It’s roughly this: even granted that
Chomsky’s version of Merge is the
simplest and granted that on methodological grounds simple explanations trump
more complex ones, the question remains, given
all of this why should the conceptually simplest operation be the one that
we in fact have. Why should methodological superiority imply
truth in this case? That’s the question Chomsky is asking and,
IMO, it is a real doozy and so worth considering in some detail.
Before starting, a word about the epistemological argument.
We all agree that simpler accounts trump more complex ones. Thus if some account A is involves fewer
assumptions than some alternative account A’ then if both are equal in their
empirical coverage (btw, none of these ‘if’s ever hold in practice, but were they to hold then…) then we all
agree that A is to be preferred to A’. Why? Well because in an obvious sense
there is more independent evidence in
favor of A then there is for A’ and we all prefer theories whose premises have
the best empirical support. To get a feel for why this is so let’s analogize
hypotheses to stools. Say A is a three legged and A’ a four legged stool. Say that evidence is
weight that these stools support. Given a constant weight each leg on the A
stool supports more weight than each of the A’ stool, about 8% more. So each of A’s assumption are better
empirically supported than each of those made by A’. Given that we prefer
theories whose assumptions are better supported to those that are less well
supported A wins out.[1]
None of this is suspect. However, none of this implies that the simpler theory is the true one. The epistemological
privilege carries metaphysical consequences only if buttressed by the
assumption that empirically better supported accounts are more likely to be
true and, so far as I know, there is actually no obvious story as to why this
should be the case short of asking Descarte’s God to guarantee that our clear
and distinct ideas carry ontological and metaphysical weight. A good and just
God would not deceive us, would she?
Chomsky knows all of this and indeed often argues in the conventional
scientific way from epistemological superiority to truth. So, he often argues
that Merge is the simplest operation that yields unbounded hierarchy with many
other nice properties and so Merge is the true Basic Operation. But this is not what Chomsky is attempting here. He
wants more! Hence the argument is interesting.[2]
Ok, Chomsky’s argument. It is brief and not well fleshed
out, but again it is interesting. Here it is, my emphasis throughout (25).
Why should language be optimally
designed, insofar as the SMT [Strong Minimalist Thesis, NH] holds? This
question leads us to consider the origins of language. The SMT hypothesis fits
well with the very limited evidence we have about the emergence of language,
apparently quite recently and suddenly
in the evolutionary time scale…A fair guess today…is that some slight rewiring
of the brain yielded Merge, naturally in
its simplest form, providing the basis for unbounded and creative thought,
the “great leap forward” revealed in the archeological record, and the
remarkable difference separating modern humans from their predecessors and the
rest of the animal kingdom. Insofar as the surmise is sustainable, we would
have an answer to questions about apparent optimal design of language: that is
what would be expected under the postulated circumstances, with no selectional or other pressures operating, so the emerging
system should just follow laws of nature,
in this case the principles of Minimal
Computation – rather the way a snowflake forms.
So, the argument is that the evolutionary scenario for the
emergence of FL (in particular its recent vintage and sudden emergence) implies
that whatever emerged had to be “simple” and to the degree we have the evo
scenario right then we have an account for why Merge has the properties it has
(i.e. recency and suddenness implicate a simple change).[3]
Note again, that this goes beyond any methodological arguments for Merge. It
aims to derive Merge’s simple features from the nature of selection and the
particulars of the evolution of language. Here Darwin’s Problem plays a very
big role.
So how good is the argument? Let me unpack it a bit more
(and here I will be putting words into Chomsky’s mouth, always a fraught
endeavor (think lions and tamers)). The argument appears to make a four way
identification: conceptual simplicity = computational simplicity = physical
simplicity = biological simplicity. Let me elaborate.
The argument is that Merge in its “simplest form” is an
operation that combines expressions into sets
of those expressions. Thus, for any A, B: Merge (A, B) yields {A, B}. Why sets?
Well the argument is that sets are the simplest kinds of complex objects there are. They are simpler than ordered
pairs in that the things combined are not
ordered, just combined. Also, the operation of combining things into sets does
not change the expressions so combined (no tampering). So the operation is
arguably as simple a combination operation that one can imagine. The assumption
is that the rewiring that occurred triggered the emergence of the conceptually simplest
operation. Why?
Step two: say that conceptually simple operations are also computationally
simple. In particular assume that it is computationally less costly to combine
expressions into simple sets than to combine them as ordered elements (e.g.
ordered pairs). If so, the conceptually simpler an operation then the less
computational effort required to execute it. So, simple concepts imply minimal
computations and physics favors the computationally minimal. Why?
Step three: identify computational with physical simplicity.
This puts some physical oomph into “least effort,” it’s what makes minimal
computation minimal. Now, as it happens, there are physical theories that tie
issues in information theory with physical operations (e.g. erasure of
information plays a central role in explaining why Maxwell’s demon cannot
compute its way to entropy reversal (see here on the
Landauer Limit)).[4]
The argument above seems to be assuming something similar here, something tying
computational simplicity with minimizing some physical magnitude. In other
words, say computationally efficient systems are also physically efficient so
that minimizing computation affords physical advantage (minimizes some physical
variable). The snowflake analogy plays a role here, I suspect, the idea being
that just as snowflakes arrange themselves in a physically “efficient” manner,
simple computations are also more physically efficient in some sense to be
determined.[5]
And physical simplicity has biological implications. Why?
The last step: biological complexity is a function of
natural selection, thus if no selection, no complexity. So, one expects
biological simplicity in the absence of selection,
the simplicity being the direct reflection of simply “follow[ing] the laws of
nature,” which just are the laws of minimal computation, which just reflect
conceptual simplicity.
So, why is Merge simple? Because it had to be! It’s what
physics delivers in biological systems in the absence of selection,
informational simplicity tied to conceptual simplicity and physical efficiency.
And there could be no significant selection pressure because the whole damn
thing happened so recently and suddenly.
How good is this argument? Well, let’s just say that it is
somewhat incomplete, even given the motivating starting points (i.e. the great
leap forward).
Before some caveats, let me make a point about something I
liked. The argument relies on a widely held assumption, namely that complexity
is a product of selection and that this requires long stretches of time. This suggests that if a given property is relatively simple then it was not selected
for but reflects some evolutionary forces other than selection. One aim of the
Minimalist Program (MP), one that I think has been reasonably well established,
is that many of the fundamental features of FL and the Gs it generates are in
fact products of rather simple operations and principles. If this impression is
correct (and given the slippery nature of the notion “simple” it is hard to
make this impression precise) then we should not be looking to selection as the
evolutionary source for these operations and principles.
Furthermore, this conclusion makes independent sense.
Recursion is not a multi-step process, as Dawkins among others has rightly
insisted (see here
for discussion) and so it is the kind of thing that plausibly arose (or could have arisen) from a single
mutation. This means that properties of FL that follow from the Basic Operation
will not themselves be explained as products of selection. This is an important
point for, if correct, it argues that much of what passes for contemporary work
on the evolution of language is misdirected. To the degree that the property is
“simple” Darwinian selection mechanisms are beside the point. Of course, what
features are simple is an empirical issue, one that lots of ink has been
dedicated to addressing. But the more mid-level features of FL a “simple” FL
explains the less reason there is for thinking that the fine structure of FL
evolved via natural selection. And this goes completely against current research
in the evo of language. So hooray.
Now for some caveats: First, it is not clear to me what
links conceptual simplicity with computational simplicity. A question: versions
of the propositional calculus based on negation and disjunction or negation and
disjunction are expressively equivalent. Indeed, one can get away with just one primitive Boolean operation, the
Sheffer Stroke (see here).
Is this last system more computationally efficient than one with two primitive
operations, negation and/or conjunction/disjunction? Is one with three
(negation, disjunction and conjunction) worse?
I have no idea. The more primitives we have the shorter proofs can be.
Does this save computational power? How about sets versus ordered pairs? Is
having both computationally profligate? Is there reason to think that a “small
rewiring” can bring forth a nand gate but not a neg gate and a conjunction
gate? Is there reason to think that a small rewiring naturally begets a merge
operation that forms sets but not one that would form, say, ordered pairs? I
have no idea, but the step from conceptually simple to computationally more
efficient does not seem to me to be straightforward.
Second, why think that the simplest biological change did
not build on pre-existing wiring? So, it is not hard to imagine that
non-linguistic animals have something akin to a concatenation operation. Say
they do. Then one might imagine that it is just as “simple” to modify this operation to deliver
unbounded hierarchy as it is to add an entirely different operation which does
so. So even if a set forming
operation were simpler than concatenation tout
court (which I am not sure is so), it is not clear that it is biologically simpler
to derive hierarchical recursion from a modified conception of concatenation given that it already obtains in the
organism then it is to ignore this available operation and introduce an
entirely new one (Merge). If it isn’t (and how to tell really?) then the
emergence of Merge is surprising given that there might be a simpler evolutionary
route to the same functional end (unbounded hierarchical objects via descent
with modification (in this case modification of concatenation)).[6]
Third, the relation between complexity of computation and
physical simplicity is not crystal clear for the case at hand. What physical
magnitude is being minimized when computations are more efficient? There is a
branch of complexity theory where real physical magnitudes (time, space) are
considered, but this is not the kind
of consideration that Chomsky has generally thought relevant. Thus, there is a
gap that needs more than rhetorical filling: what links the computational
intuitions with physical magnitudes?
Fourth, how good are the motivating assumptions provided by
the great leap forward? The argument is built by assuming that Merge is what
gets the great leap forward leaping. In other words, the cultural artifacts
that are proxy for the time when the “slight rewiring” that afforded Merge that
allowed for FL and NLGs. Thus the recent sudden dating of the great leap
forward are the main evidence for dating the slight change. But why assume that
the proximate cause of the leap is a rewiring relevant to Merge, rather than
say, the rewiring that licenses externalization of the Mergish thoughts so that
they can be communicated.
Let me put this another way. I have no problem believing
that the small rewiring can stand independent of externalization and be of
biological benefit. But even if one believes this, it may be that large scale
cultural artifacts are the product of not just the rewiring but the capacity to
culturally “evolve” and models of cultural evolution generally have
communicative language as the necessary medium for cultural evolution. So, the
great leap forward might be less a proxy for Merge than it is of whatever
allowed for the externalization of FL formed thoughts. If this is so, then it
is not clear that the sudden emergence of cultural artifacts shows that Merge
is relatively recent. It shows, rather, that whatever drove rapid cultural change is relatively recent, and this
might not be Merge per se but the
processes that allowed for the externalization of merge generated structures.
So how good is the whole argument? Well let’s say that I am
not that convinced. However, I admire it for it tries to do something really
interesting. It tries to explain why Merge is simple in a perfectly natural
sense of the word. So let me end with
this.
Chomsky has made a decent case that Merge is simple in that it involves
no-tampering, a very simple “conjoining” operation resulting in hierarchical
sets of unbounded size and that has other nice properties (e.g. displacement,
structure dependence). I think that Chomsky’s case for such a Merge operation
is pretty nice (not perfect, but not at all bad). What I am far less sure of is
that it is possible to take the next step fruitfully: explain why Merge has these properties and not
others. This is the aim of Chomsky’s
very ambitious argument here. Does it work? I don’t see it (yet). Is it
interesting? Yup! Vintage Chomsky.
[1]
All of this can be given a Bayesian justification as well (which is what lies
behind derivations of the subset principle in Bayes accounts) but I like my
little analogy so I leave it to the sophisticates to court the stately
Reverend.
[2]
Before proceeding it is worth noting that Chomsky’s argument is not just a
matter of axiom counting as in the simple analogy above. It involves more
recondite conceptions of the “simplicity” of one’s assumptions. Thus even if
the number of assumptions is the same it can still be that some assumptions are
simpler than others (e.g. the assumption that a relation is linear is “simpler”
than that a relation is quadratic). Making these arguments precise is not
trivial. I will return to them below.
[3]
So does the fact that FL has been basically stable in the species ever since it
emerged (or at least since humans separated). Note, the fact that FL did not continue to evolve after the trek out of
Africa also suggests that the “simple” change delivered more or less all of what we think of as FL today. So,
it’s not like FLs differ wrt Binding Principles or Control theory but are
similar as regards displacement and movement locality. FL comes as a bundle and
this bundle is available to any kid
learning any language.
[4]
Let me fess up: this is WAY beyond my understanding.
The growth of snowflakes (or of any substance changing
from a liquid to a solid state) is known as crystallization. During this
process, the molecules (in this case, water molecules) align themselves to maximize attractive forces and minimize
repulsive ones. As a result, the water molecules arrange themselves in predetermined
spaces and in a specific arrangement. This process is much like tiling a floor
in accordance with a specific pattern: once the pattern is chosen and the first
tiles are placed, then all the other tiles must go in predetermined spaces in
order to maintain the pattern of symmetry. Water molecules simply arrange
themselves to fit the spaces and maintain symmetry; in this way, the different
arms of the snowflake are formed.