Darwin's Problem

Humans are uniquely linguistically facile. This raises an interesting evolutionary question, an abstract version of which Minimalists have taken very much to heart: How did this linguistic capacity arise in the species? Following Cedric Boeckx, let’s dub this “Darwin’s Problem.” Answers to this problem have two separable parts: (i) an account of how that which made language a cognitive option became mentally available, (ii) an account of how the available option became fixed in the species.

Most give a “miracle theory” account of (i). What I mean is that it is regularly assumed that some kind of adventitious genetic change/mutation occurred that, when added to the cognitive apparatus already there, combined with it to allow for the emergence of a mental faculty with the key features of FL.  The “miracle” means to mark the observation that this change “just happened,” it’s a brute fact. Minimalists try to (abstractly) characterize the nature of this change (what was added), but that there is no attempt to explain why the change occurred. It just did. What’s up for grabs is the nature of the change (adding Merge being the currently favored candidate, though there have been other proposals, (one by yours truly)) and the number of these. Given the logic of the case (short time span etc.), one miracle is acceptable, two maybe barely tolerable, three fougetaboutit! At any rate, a miracle occurred sometime in the last (roughly) 100,000 years in at least one member of the species.  This brings us to (ii).

Once the miracle occurs, it must be fixed in the population, presumably by giving its bearers some selective advantage (this is the Darwin part).  With respect to language there are basically two possible sources for this advantage, which correspond with two classical views about the utility of language; language as vehicle for communication and language as vehicle for thought. Pinker and Bloom are perhaps the most famous advocates of the first conception. Chomsky is a well-known advocate of the second.

There are two main problems with the communication view.

First, it requires double the number of miracles. Note, this follows from two observations: that it takes at least two to communicate and mutations (the required miracle) originate in individuals and spread to populations via the reproductive success of the favored individuals. Thus, as improbable is it is for Merge, say, to pop into an individual ape mind once the chances of it doing so twice in two different (assuming that communication is between at least two individuals) proximate (if not near each other than the capacity to communicate won’t be realized) individuals, is much much more improbable still. Indeed if the events are independent, then it’s the square of the probability of the unique event.

Second, we need a story about why the particular form of communication, a communication system based on Merge like grammars, is so much more advantageous than a simpler system would be.  Here’s what I mean. Consider a simple linear N-V-(N) grammar with a vocabulary of 500 verbs and 1,000 nouns. This can support roughly 500,000,000 different messages. That’s a good number of messages, all without hierarchical recursion.  We know that animal communication doesn’t require recursion. The evolutionary question then  is what communicative advantage does the miracle promote that would be particularly advantageous?

Considerations like these led many to conclude that the main selective advantage of language was its enrichment of thought rather than its communicative efficacy. Here’s Francois Jacob’s take:

…the role of language as a communication system between individuals would have come about secondarily…Its primary function would rather have been, as with earlier evolutionary steps in mammals, the representation of a finer and “richer” reality,” a way of handling more efficiently a greater amount of information. As exemplified throughout the whole animal kingdom, communication can be easily established between individual organisms.  Even among hominids which had to hunt and live in community, most of the information to be shared with others and concerning immediate features of life could be handled by means of rather simple codes.  In contrast, to translate a visual and auditory world so that objects and events can be precisely labeled and recognized weeks or years later requires a much more elaborate coding system. The quality of language that makes it unique does not seem to be so much its role in communicating directives for action as its role in symbolizing, in invoking cognitive images. We mold our “reality” with our words and our sentences in the same way as we mold it with our vision and our hearing.  And the versatility of human language also makes it a unique tool for the development of imagination. It allows infinite combinations of symbols and, therefore, mental creation of possible worlds (58).[1]

Thus, the proposal is that grammatical structures enhance the class of entertainable and easily retrievable of thoughts. It allows for the imagination of alternatives, thereby, one might suppose, enhancing planning and action (as well as making dawdling that much more enjoyable!).  At any rate, were this so, then it is not hard to imagine how a miracle that enabled this would immediately endow its individual bearer with the kinds of advantages that natural selection cares about and how, therefore, this miracle could go forth and multiply via its bearers going forth and multiplying.

Before going on, we should appreciate that all of this is speculative.  As Lewontin has made clear, there is a big, perhaps ultimately insurmountable, step between this and a serious scientifically grounded selective explanation. As he demonstrates in detail (c.f. his "The Evolution of Cognition" in volume 4 of The Invitation to Cognitive Science), it’s extremely hard to move beyond just-so stories and provide empirically justified evolutionary accounts of cognitive capacities.

This said, there are tantalizing hints and what I want to point to one.  I have just reread some fascinating work (from 1999) by Hermer-Vazquez, Spelke and Katsnelson (H-VSK) that bears on these questions. They provide evidence for the kind of scenario that Jacob describes above.  Here’s what they found (from the abstract):

Under many circumstances, children and rats reorient themselves through a process which operates only on information about the shape of the environment... In contrast, human adults relocate themselves more flexibly by conjoining geometric and non-geometric information to specify their position. The present experiments used a dual-task method to investigate the processes that underlie the flexible conjunction of information…Together the experiments suggest that humans’ flexible spatial memory depends on the ability to combine divers information sources rapidly into unitary representations and that this ability, in turn, depends on natural language.

The experiments all involve disorienting children and adults in a rectangular room. The task is to find something in a prescribed corner. Sometimes the indicated corner abuts a wall with a certain color, thereby distinguishing it from the geometrically analogous opposite corner. Adults are able to exploit the additional color information to locate themselves and thus to identify the right corner (i.e. color serves to disambiguate the geometrical information). Prelinguistically capable kids cannot. Nor can rats.  More interesting still, H-VSK found a way of stopping adults from using the color information by having them engage in a language task while reorienting themselves. Presto, the adults start acting like kids and rats.  Importantly, engaging in additional non-linguistic tasks during reorientation does not stop successful identification of the correct corner.  This strongly implicates language use in facilitating spatial orientation.

I hope I have piqued your interest. The experiments are a delight to read (so do so) and the implications for Jacob’s (and Chomsky’s) evolutionary scenario very suggestive. Here we have a case where linguistic facility directly enhances something as basic as spatial orientation, a capacity that it does not take much imagination to suppose would be useful to our hunter-gatherer ancestors and would endow selective advantage in a wide range of plausibly relevant environments.

How exactly does language help?  H-VSK speculate that language constitutes a kind of interlingua allowing diverse information from separately encapsulated cognitive modules to combine into single thoughts.  The capacity to so combine diverse concepts allows for more complex thoughts and thereby allows, in Jacob’s words, for “the representation of a finer and “richer” reality.” In sum, were the Jacob-Chomsky speculation on the right track we might expect to find cognitive enhancement for selectionistically valuable traits, and this seems to be what H-VSK have found. Wow!

Need I say that this is still very speculative?  However, though a first step, it is very interesting and fits well with certain other assumptions out there minimalists are sure to find congenial.

First, standard Minimalist theory proposes a strong asymmetry between the two interfaces.  Rather than syntax being a pairing of sound (AP) and meaning (CI) (the standard view since Aristotle), it is more accurately thought of as a relation between structure and meaning with sound as an add-on (Chomsky has strongly pushed this line of late).  The derivation from lexicon to CI is clean and well designed (e.g. it meets Inclusiveness, Extension and Full Interpretation). The mapping to sound is considerably messier (e.g. does not conform to Inclusiveness). This fits well with the Jacob-Chomsky conception which presumes that the real biological action starts with the generation of complex thoughts that grammar makes available, not spoken outputs, which are a later accretion. 

Second, Generative Syntax endorses the autonomy of syntax thesis (AOS). Though AOS has often been misunderstood to assert that there is no relation between the grammar and meaning, it actually means that the primitives and operations of the grammar are independent of the contents of what they are used to express. In particular, syntactic categories, principles and operations to not reduce to semantic ones. Many have taken this to be a serious defect. However, in the context H-VSK’s results it looks like a great design feature.  Precisely because the syntax is autonomous it is able to combine information from different encapsulated modules. In other words, autonomy is just the flip side of not being modularly restricted.  The intra modular primitives and operations cannot do this, which is what makes it impossible for rats, young kids and linguistically distracted adults from combining different kinds of information (i.e. predicates from different modules). From the present perspective, a more revealing term for the autonomy of syntax might be the inter-modularity of syntax, autonomy being precisely the property we want in a tool required to combine diverse types of thoughts and concepts, ones otherwise confined to specialized cognitively encapsulated modules.

Last, consider hierarchy.  The kind of combination H-VSK’s tasks require is one that allows for diverse kinds of information to work together to produce finer and finer descriptions. In other words, we want the capacity to modify, viz. stack adverbs, specify events, combine nouns and adjectives, use sentences to cut down possibilities (e.g. as  relativization does) etc.  This is the conceptual value added that syntactic hierarchy provides, and it does so in spades. 

In sum, the Jacob-Chomsky “conjecture” when combined with a generative syntax with a minimalist flavor has a suggestive tang: it links what is special (viz. recursive hierarchy) with what is plausibly beneficial (viz. the capacity to entertain new and useful thoughts).

One of the novelties of the Minimalist Program has been the elevation of Darwin’s Problem to prominence along side Plato’s.  Interestingly, it appears that the empirical just-so stories of yore might finally graduate to empirically so-so stories and, one day maybe even to thus-so stories. Wouldn’t that be nice? Can’t blame a person for dreaming. In the meantime, take a look at H-VSK. It’s a great paper.

[1] From The Possible and the Actual (1982). University of Washington Press; Seattle.


  1. Whow! What a reading! Have you had published something related to this topic?

    BTW in a footnote, R. Jackendoff (2011) writes:

    "Hornstein and Boeckx 2009 (89) make such an argument: ‘…it is reasonable to suppose that an operation like Merge, one that ‘puts two elements together’ (by joining them or concatenating them or comprehending them in a common set), is not an operation unique to FL. It is a general cognitive operation, which when applied to linguistic objects, we dub ‘Merge.’” In other words, they acknowledge that recursion may be found elsewhere in cognition, but they do not call it Merge. Thus they reduce the notion that Merge is unique to language from an empirical hypothesis to a tautology. ’" ;-)

    Could I have a copy of Hornstein and Boeckx 2009 from you?

    1. Jackendoff misread out proposal. I think that we were assuming that Merge could be broken down into two parts, combine and label. Combine generated endlessly long "beads on a string" structures. Labels were needed for recursive hierarchy. I discuss why in A Theory of Syntax, but effectively without it the outputs of combine are not eligible for recombination as we assume that it is restricted to lexical elements (Combine that is). What labeling does is close combine in the domain of the lexical items, essentially by creating equivalence classes of items centered on LIs. Let me check to see if I have a copy of the paper on file. If I do, I'll send it.

  2. It seems to me that Tomasello's idea that an important step difference between human and ape cognition is that we have a much more powerful 'top down' view of the roles in cooperative tasks (and so can swap roles better) might fit into this story pretty well.

  3. I am not going to comment on the implausibility of the 'miracle account' but would just like to make you aware of a flaw in the logic of your argument [assuming the miracle account].

    You claim:
    First, it [language having evolved 'for' communication] requires double the number of miracles. Note, this follows from two observations: that it takes at least two to communicate and mutations (the required miracle) originate in individuals and spread to populations via the reproductive success of the favored individuals. Thus, as improbable is it is for Merge, say, to pop into an individual ape mind once the chances of it doing so twice in two different (assuming that communication is between at least two individuals) proximate (if not near each other than the capacity to communicate won’t be realized) individuals, is much much more improbable still. Indeed if the events are independent, then it’s the square of the probability of the unique event.

    There are two problems with your claim that communication would double the need for miracle occurrence.

    1. One possible source of mutation is exposure to [cosmic] radiation. [you may recall Chomsky's cosmic ray speculation]. In this case it would be quite likely that more than one individual of a 'breeding group' got exposed to the radiation. Hence in this scenario we need no doubling or squaring of the odds.

    2. Assuming the mutation really occurred in just one individual, as you say yourself, at this point there could not have been any 'selection for' because no advantage had occurred. The mutation needs to be passed on to the next generation. This means it is not only possible but almost necessary that at least two [preferably more] of the children of the 'linguistic Prometheus' have the mutation. So hypothetically in generation 2 we have already the required minimum number of people who could communicate with one another. Again no doubling/squaring of odds is needed because everyone of interest is a descendant of individual who had the original mutation.

    I am not suggesting this establishes communication was providing the selective advantage needed to fix the mutation in the population. I am only saying that your argument does not establish that that communication would have required doubling the odds of a highly unlikely event.

    1. I think 2 is a better explanation than 1.

      With 1 it may be more likely to get two different mutations, but still unlikely to get two identical mutations, so the odds will still be multiplied. Or more precisely if we decompose it as
      p(miracle) = p(miracle | mutation) * p(mutation)
      even if p(mutation) is very high say (1) the probability of two miracles will be
      p(two miracles) < p(miracle | mutation)^2
      assuming that which mutation happens is independent.

    2. Tha's right. Yet it may be even worse for the two miracle's case: The higher the number of apes the greater the chance for the miracle to happen. However, when estimating the probability of the second miracle for a numerous species , we should rather consider a relatively high number of breeding groups (whatever fuzzy the boundaries of their territories may be). If N is the total number and K is the average number in a group , then
      P(two miracles) ~ K/N*P(miracle)^2.
      For big N, P(miracle) may not be quite negligible but p(two miracles in contact) still remain improbable and, moreover, even less probable that two separated miracles.

  4. I agree, Christina, that it's just a speculation so far. Anyway, a beautiful one.

    As to your argument, I guess that massive radiation would probably destroy the breeding group. For a weak one, the squared probability looks like a good guess. But no use agruing on without a quantitative model.

  5. In this case it would be quite likely that more than one individual of a 'breeding group' got exposed to the radiation. Hence in this scenario we need no doubling or squaring of the odds.

    Surely you'd still be dealing with a squared probability: P(mutation|radiation)² × P(radiation) (where P(radiation) is the probability of the breeding group being exposed to radiation). Also, you seem to be ignoring the fact that the radiation is not likely to lead to the same mutation in each individual. Increasing the overall probability of mutations occurring would not greatly increase the probability of two individuals undergoing the same mutation at the same time.

  6. The two Alexs have adequately answered your first point, so I will refrain. As for the second, I don't think I understand it. Say an individual is endowed with the miracle. This gives him/her a leg up conceptually. If we buy H-VSK's results, it seems to endow advantages wrt spatial orientation, for example. So, this individual, let's call her EVE, should have a reproductive leg up as well. If this mutation is dominant, then it will spread in her kids and these kids will have a leg up in the next generation. Assuming a small breeding group, the standard assumption since (at least) Mayr, the mutation should spread rather quickly throughout the breeding group even if it all started in one member of the breeding group.

    Note, I have not said how this mutation arose. You suggest cosmic rays. I have no idea. But, all that's needed for the scenario to get off the ground is ONE mutation in one individual that allows that person to start combining predicates in thought across modules. It does not take two.

    Last point: nothing proposed here denies that communication might not also play a role, maybe later on. It's just not required at the get go and this seems like a good thing. at least Jacob thought so and, last I checked, he was a non too shabby biologist.

  7. Christina's point is valid though -- your argument was

    "There are two main problems with the communication view.

    First, it requires double the number of miracles. Note, this follows from two observations: that it takes at least two to communicate".

    If Eve has the gene, then half of her children will have the gene, and that will then give an advantage to her and her children. So you don't need two miracles to explain why it would spread even if it confers no advantage on an isolated individual.

    1. The story goes on, though:

      "Second, we need a story about why the particular form of communication ... system based on Merge like grammars is so much more advantageous than a simpler system would be."

  8. I have been told that most mutations do not stick. What is required for them to stick is that they confer an advantage on the individual that bears them, otherwise they disappear. If this is true, that most mutations are washed out pretty quickly, then we need a "miracle" that confers selective advantage pretty quickly or the "miracle" will be washed out. For EVE to have a leg up on others, the miracle must confer selective advantage. One advantage is whatever is gained by the capacity to communicate with others using the advantage endowed by the miracle. This, however, takes (at least) two with the same miracle. Two, we all know, is likely an exaggeration. The benefits of communication may well be pretty sparse if the communicative network is small. Real advantage accrues as the network of communicants gets bigger and bigger (as Sony's Betamax found out to its consternation and as first Microsoft and then Google know all too well). So, we want something that endows quick advantage, one that even a single individual would benefit from. If such occurred then these genes would spread relatively rapidly in a small group (meaning in dozens of generations). The H-VSK proposal identifies one such in a place where it is not too hard to imagine what the goodies accrued would be. The communication scenario is harder to explicate, in my view. I know because I tried to explicate way long ago in a galaxy far away (c.f. Brandon, R. and N. Hornstein. From Icon to Symbol: Some Speculations on the Evolution of Natural Language (1986), Philososphy & Biology. Vol. 1.2 pp.169-189). The scenarios envisaged there do not require a lot of fancy grammar. So when one tries to spell out the advantages of rich communication the immediate advantages of the envisaged scenarios begin to fade a little.

    This said, I have nothing against a combined account, which I suspect is coherent. The miracle happens endowing advantage to the individual. A communicative add on adds yet more value. In pretty short time, the combination arises and feeds on itself. The question both Jacob and Chomsky addressed is whats the first steps, and they, reasonably I believe, pointed to conceptual advantages independent of communicative enhancement. Given this, another app to allow for communication looks pretty reasonable. However, we should be careful for many cognitive enhancements do not come with this add on, or not apparently so. At any rate, it is not inconceivable that it confers additional value (building on the first miracle) in the right circumstances.

  9. Most mutations don't stick, but neutral mutations can be fixed by chance in a small breeding population; that isn't a miracle.
    The probability of fixation is 1/2N, where N is the breeding population, so if it is a small group of 50 hominids then we have a 1/100 chance it gets fixed by chance.

    Even if it isn't fixed it could take 5 or 50 generations to wash out of the population, and that seems plenty of time to for a communication system to be developed to the point at which it becomes adaptive.

    But I am not a population geneticist at all, so this might contains some egregious errors ...

    1. Yes, it could. However, it could equally well have taken many thousands years. And it’s not only about the lack of need for a more advanced communicative system.

      If the group members had communicated either via gestures or/and a "protospeech", they had possessed an interface between the conceptual and phonological systems at most. Then a couple of them got Merge. But what about the interface between syntax and the semantic system and that between syntax and the phonological system?

    2. I don't really follow, but I should say that I don't find the starting assumptions of Norbert's account particularly plausible. The one gene/mutation for language thing seems simplistic and my money is on the alternative more standard story of the gradual development of general purpose learning and processing mechanisms that eventually reach some critical mass.

      I was just trying to point out a flaw in Norbert's argument in favour of the internal monologue theory.

    3. I don't know much about mutations. But my impression was that in the potentially analogous cases, some mutation leads to prolonged embryology at some relatively late stage of development, with various cascading effects that are not predictable given current knowledge. (Cp. breeding for a behavioral trait in foxes and getting curly tails.) What's the more standard gradualist general purpose learning story? How does a population of animals gradually come to exhibit I-languages that generate unboundedly many expressions (i.e., articulation-meaning pairs) subject to specific constraints? Prima facie, the difference between humans and other animals involves at least one--and so, I hope, exactly one--saltation, whether we like it or not.

    4. I think the standard story -- translated into Chomskyan terms -- is that the ability to generate hierarchically structured expressions is a domain general ability (used also in say planning) that evolved gradually as most things do; and later a general purpose learning ability arose that was sufficiently powerful to acquire languages, and once these two components were in place language was possible.

      I am not sure what you mean by 'saltation' but my impression was that most biologists reject the idea that saltations are part of evolution. If you have to appeal to a special sort of evolution to account for the origin of your theory of language, then that is a problem for your theory of language.

      So as to how this happens I guess something along the lines of Hurford, Kirby, Komarowa and Nowak, and many other recent papers like:

      Baronchelli A, Chater N, Pastor-Satorras R, & Christiansen MH (2012). The biological origin of linguistic diversity. PloS one, 7 (10) PMID: 23118922

      Chater, N., Reali, F., & Christiansen, M. H. (2009). Restrictions on biological adaptation in language evolution. Proceedings of the National Academy of Sciences, 106(4), 1015- 1020.

    5. But the question is how a capacity to acquire the kinds of grammars that *kids actually acquire* could have evolved from a capacity to acquire grammars of some other kind. From my perspective, the "standard" story sketches an account of how languages *of some sort* could have emerged, leaving us with the questions we already had: how come kids acquire languages that don't merely generate hierarchically structured expressions, but do so in accord with structure-dependent rules; and more generally, why the constraints of Universal Grammar we ended up with?

      By 'saltation', I just meant a leap, hopefully explained in the way that I thought many evo-devo biologists went in for these days: some mutation leads to prolonged embryology at some relatively late stage of development, with various cascading effects that seem (and are) quite dramatic in terms of observed phentotype. I don't think anyone in this debate is positing a special *sort* of evolution, or even denying that "gradualism" is true at some level of analysis. But I also assume that nobody is still hanging on the idea that evolution has to work as a filter on small variations on phenotypes.

    6. Very fair points; on the first one,
      once you integrate phrase structure and movement, structure dependent rules just look like hierarchically structured expressions but with discontinuous constituents (This is the Stabler insight). So if you have a story about how you can learn hierarchically structured expressions, then that turns pretty smoothly into a story about how you can learn structurally sensitive movement.
      (e.g. Yoshinaka's extension of my CFG learning to MCFGs)

      I agree that there is a lot left to explain, but miracles aren't the right type of explanation.

      The 'why' problems are particularly tricky; but once we understand the learning procedures better that may give us some answers. At the moment, I don't think we know what UG is, so it is hard to explain why it is the way that it is.

  10. I thin another problem for the miracle account is that whatever pressures could push a species towards evolving language for communication (if they exist!) will be inherently localized, to the point where many individuals, often related, will develop in parallel due to identical selective pressures. This makes the gradual, communication-oriented story much more plausible because it doesn't require a miracle at all.

    1. Note that Chomsky's view (as I understand it) is that language initially interfaced with thoughts. This would give an individual advantage over others because he/she would be way more intelligent, for a simple reason that he/she could organize his/her thoughts and make complex plans. So in the initial stages you don't have to have communication (i.e. externalization) at all. Also suppose a male mutated. This would give him a lot of advantage as he could end up mating with multiple females and thereby passing the mutation to the next (smarter) generation. It would then take several generations for the communication to materialize when there is enough individuals with the same mutation.

  11. I have re-read this very informative post as a result of reading Norbert's today (Dec 10, 2013) post on Stanley Fish's opinion piece for NYT. I wanted to double-check what Chomsky's position on thought and language really is, as Fish's description of Chomsky's position as "language is thought rather than an addition to or clothing of thought" seemed at least ill-phrased if not incorrect. Anyway, as a result of re-reading this post I noticed something had not occured to me earlier.

    So, H-VSK found that prelinguistic kids or adults engaged in a language task begin to act like rats (and unlike ‘normal’ adults). My question is whether this is good or bad news for the ‘language as a vehicle for thought’ account of evolution of language? If I understand correctly this account links evolutionary advantageous advancement of thought to the emergence of Merge in a single human brain (rather than to a fully-fledged I-language). Using Norbert’s jargon, EVE who is endowed with a miracle of Merge now can combine different thoughts and entertain more complex thoughts. Presumably EVE still does not have much of a (I-)language yet. If so, should we not expect that any human who inherited EVE’s mutation, be it a kid or an adult engaged in a secondary language task, ought to be able to combine diverse information at all times, or in the context of H-VSK, solve complex spatial tasks at all times?

    If I-language is required for solving complex spatial tasks, the claim becomes rather Whorfian in flavour. Any opinions?