Tuesday, March 6, 2018

Evo Lang; a comment on logical distance

In a previous post, I outlined the three kinds of gaps that fuel PoS arguments. I also noted that one of the gaps (the one that focuses on utterances as imperfect exemplars of sentences in that they are subject to the exigencies that afflict (enhance?) all performances) was less serious than the other two in that it can likely be bridged using generic statistical techniques that can be applied to “raw” data to regiment it, scrub it, and normalize it. Despite its relative marginality for PoS claims, this kind of data degeneracy is often the focus of lots of empirical investigation claiming to refute the supposition that it is particularly poor, or misleading, or incomplete. Motherese, it is often retorted, is actually mostly well-formed, smoothly uttered, and even helpful intonationally. This kind of retort is (sometimes tacitly, sometimes not) used to support the claim that there is no real PoS problem because the data is quite clean and therefore the degeneracy problem is not serious and given that the data is good and clean, wholesome even, there is no reason to doubt that it is sufficient to guide an LAD to its G. What the previous post argued is that even if correct (which, frankly I doubt, but let’s be concessive[1]) it is completely irrelevant for there are two other gaps that have little to do with the quality of the data and this is where the real power of PoS arguments lie.

That’s what the earlier post argued at length so why am I regurgitating the points again here? Not to reiterate the main claim (though repetition is the soul of insight (or at least belief fixation)), but to observe that, in a curious sense, providing a thorough catalogue of problems for E(mpiricist) approaches to G acquisition serves to obscure the most important part of the argument. How so? It provides critics of the PoS a weak supposition on which to concentrate its fire which, if even partially successful (again, I am skeptical, but…) serves to move the argumentative focus away from the strong arguments (having to do with data deficiency, not degeneracy) and allow for a very premature declaration that there is no PoS problem at all. So, being thorough and exhaustive drops bread crumbs that Eish Hansel and Gretels eagerly gather thereby allowing them to get lost in a forest of irrelevancies.[2]  And the reason I mention this is that the same kind of poor argumentative behavior infects many Evo Lang discussions, which is what I want to concentrate on today by discussing a particularly obtuse piece that appeared in Aeon penned, by you guessed it, Dan Everett, entitled Did Homo Erectus Speak (henceforth DHES (here)).[3]

The goal of DHES is to argue for the Continuity Thesis. This is roughly the idea that current human linguistic facility is qualitatively identical to what our ancestors (and indeed other animals) have. So what we have is just like what they have but more so. Here is DHES (p.2):

…the ‘leap’ to language was little more than a long series of baby steps, requiring no mutations, nor any complex grammar. In fact, the language of erectus would have been every bit as much a ‘real language’ as any modern language.

The main argument for this conclusion is that Homo Erectus (Erectus (E) to friends and family) already spoke a language largely like ours and thus, our linguistic capacity has been gradually evolving for millions of years. Here is DHES (p. 2):

To discover the answers to these questions, we need to travel back in time at least 1.9 million years ago to the birth of Homo erectus, as they emerged from the ancient process of primate evolution. Erectus had nearly double the brain size of any previous hominim, walked habitually upright, were superb hunters, travelled the world, and sailed to ocean islands. And somewhere along the way they got language. Yes, erectus. Not Neanderthals. Not sapiens. And if erectus invented language, this means that Neanderthals, born more than a million years later, entered a world already linguistic.
Likewise, our species would have emerged into a world that already had language…
And, consequently, there is little reason to think that there is anything linguistically special about us. Our capacities are identical to Es give and take a little (very little). That’s the claim DHES advances, based on the “fact” (that DHES concedes is not widely accepted in the paleoanthropology community (p.2))[4] that E’s artifacts (“their settlements, their art, their symbols, their sailing ability, and their tools) all point to something like “an animal that can communicate via symbols.” Or, as DHES puts it: “a linguistic animal” (p. 6).

So, if E was a symbol manipulator (witness the artifacts) and was able to “transfer information by symbols” (5), then E was linguistic, i.e. graced with the same FL as us and there is no reason to assume that

…humans possess special cognitive abilities absent from the brains of all other creatures or whether, more simply, humans have language because they are smarter than other creatures (whether through higher densities of neurons, or other advantages of brain organisation). (p.6)

So what distinguishes us from other homos (and even other animals for the logic deployed leads here) is a bigger brain that is qualitatively the same as that of our ancestors but bigger and bigger gives you “language.”

Before examining the argument, you can see why DHES emphasizes and argues for E having language and the long time period separating E from us. The secret sauce of gradualist explanations is long expanses of time in which little changes can add up (see here for the definitive account). This is why DHES emphasizes the millions of years theme. The supposition seems to be that the only problem with a gradualist account of the evolution of the human FL is that it arose relatively recently (roughly 100kya) and that there is not enough time for natural selection to work its magic. So, the thinking seems to go, if DHES can show that this supposition is incorrect it implies that the continuity thesis does apply to human FL and the fact that humans speak as we do (in particular, have the kinds of Gs that we do) requires no novel cognitive architecture. It’s linguistic facility all the way down, with bigger brains adding more of the same doohickies and thingamabobs we find in smaller ones leading to an “apparent” qualitative (but in reality, merely a quantitative) change in capacity.

Now, as you can imagine, this is a very bad argument, though I concede that people like me (and perhaps Chomsky) have invited this kind of response. Chomsky has pointed out (following a pretty impressive bunch of people who think about this topic for a living (e.g. Tattersall)) that the indirect evidence for language is relatively recent. If one measures things using cultural artifacts, then the explosion of these around 50kya (rather than a handful of contentious ones further back) seems to indicate that something significant happened rather recently (not millions of years ago). If one takes such cultural artifacts to piggy back on our linguistic faculties and one takes these to prominently include the capacity to acquire Gs that generate an unbounded number of hierarchically structured interpretable objects, then one has indirect evidence for something like our kinds of Gs (Merge based ones) arising in the (relatively) recent past. If.

As I’ve noted before, this is a very indirect kind of argument for Merge, and it is not clear how much cultural artifacts implicate Merge. It is not unreasonable to suppose that the thing that goosed culture (maybe given its significance we should spell it Kultur!)[5] was hooking the system up with externalization thereby facilitating communication and the gradual build-up of retained and retainable knowledge. Who knows really. All of this is very speculative (i.e. it’s not as if there is a transparent logical entailment from elaborate burial rituals or cave paintings to hierarchical recursion). However, it is also not that important for even were this false the Evo Lang problem would remain fundamentally unaltered. Let me explain.

The hard problem for GGers is explaining how unbounded hierarchical recursion could have arisen (actually, how it actually arose is the problem, but right now would declare a small victory if we could redeem the modal). The assumption (and this is based on empirical evidence) is that there is nothing quite like it anywhere else in animal cognition. This is not to demean the powers of other animals. They really are amazing in many ways. But so far as we can tell there is nothing formally analogous to the kinds of operations and structures we regularly find in human Gs and there is no reason to believe that any other animal either uses or acquires systems with these properties. If this is true (and it is, really!), then one major Evo Lang problem is explaining how systems with this formal property could have biologically arisen in humans given that nothing like it was there before. The problem then is not a matter of temporal distance, but of logical distance. If the above correctly describes the current state of play and it is true that other animals don’t have cognitive capacities with these formal properties then explaining how these properties arose and the mental powers need to deploy them and acquire them requires not just more but also different. So what was the different and how did it happen? That’s the GG Evo Lang question.

Several comments: does this mean that this is the only EvoLang question? No, there are others and I will return to some discussed in DHES. Does this make the temporal question irrelevant? Yes and No.
Yes in the sense that the problem is more or less the same regardless of the time it took to arise. Why? Because the problem is how to get from non-recursive structural hierarchy to recursive structural hierarchy and no amount of non­-recursive hierarchy or recursive flat structure or non-recursive flat structure will get you there by adding more of it. How hierarchy arose and how, furthermore, embedded hierarchy within hierarchy ad nauseum arose is not explained by noting finite examples of hierarchy or unbounded examples of iterated beads on a string concatenations. Let me be clear: there is nothing wrong in pointing out that either of these exist in the mental repertoires of other animals, but this is not enough. There needs to be a story getting you from these non recursive hierarchical systems to the qualitatively different recursive hierarchical one we have now. No story, no proposed additional mechanism, no Evo Lang account.
And the No? Well, if FL arose (very) recently, then there would be no temptation to look for a gradualist account. So were FL of very recent vintage it would serve to block the bad (Eish) impulse to always look to the shaping effects of the environment for an explanation of anything.
Think of an analogy in the domain of language acquisition. Imagine that kids popped out speaking their native language. Does anyone think that we would be expecting Eish accounts of the process? Nope. Does anyone look for the shaping effects of the environment in explaining why kids are (normally) born with two legs and two arms, one heart, two kidneys etc.? Analogous impulses would dominate were LADs to pop out speaking their mother’s native tongue (actually I am not sure this is so, but I would hope it was). Ditto with a short evo time span and gradualist accounts. A long time span is a pre-requisite for a continuity story to even make sense. However, when you think about the issue just a little bit, even a very long time span does not bridge the logical gap, and that is the one that needs traversing.
Curiously, the problem DHES presupposes away is one that we have seen before in a slightly different venue. Piaget was a cognitive gradualist, famously claiming that logical thinking in children gradually developed in them. Jerry Fodor (in the Royaumont volume, sadly under-read nowadays) pointed out that gradually developing richer logical competence is impossible. There is no way of getting from the propositional to the predicate calculus without presupposing the resources of the latter as a precondition. This is a logical bridge too far, and it cannot be gradually navigated. The same holds true with the recursive hierarchy problem. It is not the sort of thing you get by adding more non recursive non hierarchical systems of representation. You need to add something else. The Evo Lang question is what.
DHES does not say. It does make many other points. It points to another important property of natural language, namely that its atoms are symbols and that this allows for “displacement” (i.e. not stimulus bound reference). DHES further insists (p.6) that

Symbols, not grammar, are thus the sine qua non of language. They alone guarantee communication that is displaced, that is shared by an entire community of speakers, that can be transmitted between speakers and between generations, and that can represent either abstract or concrete ideas or things.
Maybe DHES is right. As I’ve noted before, Chomsky agrees that there is something interestingly different about the atoms of human language wrt their semantic properties. But even were this is so (which it likely is), it doesn’t answer the GG question of how the kind of recursive hierarchy we find in human Gs (and that humans with FLs can all acquire) arose. In other words, even if we agree with DHES concerning the importance of atomic symbols as one key feature of human language (which, as I’ve noted many times before, Chomsky has highlighted often in the past) unless DHES shows us how symbolic terminals leads to recursive hierarchy, we have not progressed on the GG question.[6] And though the GG question is not the only question, it is one important one given that one of the distinctive features of human language is that it is G based (see here for recognition by some of the biologically informed that being G based is indeed a critical feature of human language).
So the big problem with DHES is its failure to recognize the logical problem the GG facts present. I say this because it appears to suppose that one explains how the capacity of interest arose by showing a chart that tracks its progression. The chart is on p. 7 and has arrows pointing from one kind of representational format to another. So, indices begat icons which begat symbols which via duality of patterning begat compositionality, which begat linearity, which begat hierarchy, which finally begat recursion. All very impressive, but for the fact that DHES says nothing about how all this begetting took place (DHES leaves out all the salacious prurient detail). How exactly does linearity begat hierarchy and hierarchy begat recursion? All DHES tells us is that it does. Or more accurately (pp. 9-10. I have quoted the parts where “the miracle happens” as the old New Yorker cartoon put it).
Once you have a set of symbols and a linear order agreed upon by a culture, you have a language.
That is really all there is to it, though of course most languages become more complex over time….
All of the embellishments of grammar such as hierarchical structures, recursion, relative clauses and other complex constructions are secondary, based on a slot-filler arrangement of and composition of symbols, in conjunction with cultural conventions and general principles of efficient computation…

Thus, once cultures and symbols appear, grammar is on the way...
So DHES does nothing to advance the GG question, except avoid saying anything about it while appearing to address it.
Before ending, let me admit that it might be that I am somewhat unfair to DHES. There are times when it appears that its interest is not engaging the Evo Lang question as GG poses it but in addressing another question: does a recursively hierarchical G have more expressive power than one without such a G? Note, that this is not the GG question and, to my knowledge, this has not been a question that has occupied my GG community. However, if this is the question that interests DHES, then it seems either irrelevant to, or problematic for, the standard Evo Lang question of how our G systems and the capacity to acquire them arose. Say that the two kinds of Gs are not expressively the same: how does this help answer the question of how our FL arose? Say they have the same expressive power, then why did we evolve an FL that could acquire Gs with unbounded hierarchy even though, by assumption, these add nothing to the “expressive power” of language? Again, it really does not advance the Evo Lang question of interest (which, to repeat, does not mean that this is the only question of interest).
Let me end here. DHES is a very messy piece and I think I know why. It really wants to argue that there is no Evo Lang problem of the kind GG (actually Herr Chomsky) poses because what we see all rose gradually over a very long period of time in small incremental steps. The problem is that DHES nowhere suggests how these steps could have been taken and how they could have added up to what we now have. How does one get from flat systems to hierarchical ones to recursively hierarchical ones? How does one get from strongly referential terminals (Chomsky’s observations concerning animal communication systems) to those that allow for pretty radical displacement? What mental changes are required to allow this kind of symbol or representational format to arise? What kind of mental changes are required to get from linear to unboundedly hierarchical? These are hard questions, and maybe we will never be able to answer them. But better to fail to answer a real question then fail to see what the question is.

[1] I hereby preemptively apologize to Jerry Fodor who wisely counciled against ever conceding anything even for the sake of argument. I am sinning here, I know.
[2] And yes I know that they did not pick up their own crumbs and get themselves lost, but the metaphor got away from me.
[3] His work really is the gift that keeps on giving, your one stop shopping venue for largely irrelevant arguments intended to buttress insupportable arguments. I am starting to think that DE is doing this all as public service for the enlightenment of the young. Master the non-sequiturs in the core DE oeuvre and you’ve seen through all (or at least many of) the non-sequiturs you are likely to encounter in the vast irrelevant anti- Chomsky literature. Like I said, an invaluable resource (sorta like the role that Piaget’s work played in early developmental psych work. Work through the many failures of logic there and you end up with modern developmental cognition of the Carey-Spelke variety (i.e. the good stuff)).
[4] Though I am not suggesting that this should be held against the view. Experts have been known to be wrong before, and for all I know E had some linguistic skills. That is not the issue, as I show below. The issue is how similar E’s capacities were to our own.
[5] The ‘K’ is also in honor of the fact that I’m posting from Germany where I am scheduled to give a talk that I stupidly agreed to give months ago sure that I would never have to do this as I would be lucky enough to be hit by a bus but my luck has turned and here I am in Stuttgart. So Kultur it is!
[6] DHES might be making a point that I am sympathetic to: that if one is interested in how Kultur arose, then the fact that Gs are recursively hierarchical is less important than that they deploy symbols closely semantically tied to the 4Fs. In other words, for communicative purposes, displacement might be critical and for Kultur communication. might be. This does not eliminate the GG question concerning recursive hierarchy, but it suggests that it contends that it is not the sine qua non of Kultur. I don’t know if this is right, but it might be for all I know. Like I’ve said before, given a simple ‘N V N’ template and 25,000 Ns and 15,000 Vs allows you to say a hell of a lot of things, maybe enough to sail the oceans and leave behind fancy artifacts.


  1. I find it hard to see the GG story as significantly better than Everett's, due to being unable to take seriously the idea that a facility for doing binary Merge arose by saltation, somehow acquiring the other machinery it needs to actually build grammatical structures (labelling and feature mechanics of some kind) and then hooked itself up to enough interfaces to do anything of sufficient value for it to be preserved by natural selection. Furthermore I think Everett has a point that the achievements of Erectus are beyond what a creature without any kind of language including some kind of syntax could manage.

    My suggestion is that a speech generation architecture called Salix, due to Penelope Sibun ('Text generation without trees', 1992) might be a plausible precursor for our form of language. What Salix does is traverse in an organized manner a graph with nodes representing entities (house rooms and people in her implemented models) and arcs representing relations (family and spatial), emitting symbols as it goes. Adding nodes for events and appropriate relationships for them would seem like a plausible extension, which would make the system suitable for giving instructions of various kinds, such as how to get to places where useful resources are present (so you don't have to accompany somebody there to show them where they are).

    Importantly, the connections between cognition and externalization are present from the beginning, because they are direct. There are no sentence structures as such: the structure of the graph determines what symbols are produced in what order directly. Salix will also of course need a parser to be a proposed working model for Erectus proto-language.

    If something like this is already present in human communities, it doesn't seem so implausible (to me, at any rate) that some kind of syntactic structure could arise as an intervening level between the conceptual (knowledge graph) level and overt performance, perhaps due to conferring advantages such as faster processing. Indeed, noting that social animals must have some way to 'parse' the activities of their peers, some kind of sentence structure might have developed immediately, with the big step forward being some kind of improvements in capacity to process it in a much faster and more flexible manner.

    1. The Chomsky account has one thing going for it that the Everett account does not: it recognizes the problem, which is how to get to a system with unbounded hierarchy. Chomsky's solution is Merge, an operation that comes in at once. You don't like this kind of "salutational" account because it is salutational, I assume. Oddly, as I've noted before, this does not appear to bother Dawkins, someone who is otherwise bothered by these things because he agrees that some chasms cannot be jumped in several steps and that this getting to recursion seems to him like on of these. At any rate, I have my own views on Chomsky's proposal, but what is clear is that if he states the problem correctly, has identified the right end state, (which I believe he has) then nothing Everett says even approaches a solution. What is proposed in the article less unsatisfactory than it is irrelevant, and that is a big problem. Whether Chomsky is right is another matter. At least it is on target.

    2. The trouble is that it is not merely saltational; it's a jump that magically incorporates or integrates everything else that is needed for spoken language. ("Dawkins doesn't see that as a problem" is not even close to an argument; no need to address it further.) One might as well conclude that it happened by magic.

      The underlying flaw is that Chomsky seems to assume unbounded hierarchy when there's no evidence that that actually exists *in human beings* (it's a different story in a theoretical model divorced from empirically-observable reality--the same sort of place that has frictionless surfaces). The framing of the problem is entirely wrong.

      I was, however, pleased to see this:

      "I also noted that one of the gaps (the one that focuses on utterances as imperfect exemplars of sentences in that they are subject to the exigencies that afflict (enhance?) all performances) was less serious than the other two in that it can likely be bridged using generic statistical techniques...."

      ...because it's the first time I've seen a Chomskyan admit anything of the sort (usually the generic statistical techniques are waved aside). Of course, now the problem they address is being dismissed as irrelevant, which increasingly is how I know they're on target. About which....

      "I hereby preemptively apologize to Jerry Fodor who wisely counciled against ever conceding anything even for the sake of argument."

      What a shame. One of the great and telling weaknesses of the Chomskyan approach is that it follows this precept. Heck, it's practically a Skinnerian stimulus-response paradigm: when someone brings up a counterargument, automatically claim it's not only wrong but it'd be irrelevant even if it were right. When someone quotes Chomsky, automatically claim that it's out of context. I guess it's fun as a rhetorical style, but it doesn't seem to have led to much progress, and it's why Chomsky's work is gradually being relegated to the realm of philosophy rather than science.

    3. [to Norbert] I think people usually get points for perceiving a problem that others ignore, so Chomsky gets some for noticing the unbounded hierarchy problem, but loses points for ignoring the issues of the interfaces and other stuff that Merge needs to do anything remotely useful.

      While Everett gets points for focussing on the problem of what Erectus could do, and what kinds of linguistics resources they probably needed to do it, but loses points for not seeing any problem with the emergence of any kind of grammar. I will not attempt to proclaim a winner.

      But I will point out that the Salix model suggests that many of the usual terms in which these debates are carried out are not fully appropriate. Even the simplest story-telling graph externalizer cannot be finite state because it has to remember where it has been before so as not to say the same thing twice (assuming that the input graph=storyline/navigation routes being externalized are of unbounded length, as per the usual idealizations).

      To get basic constituent structure without 'recursive symbols/subroutines' (Bach 1964, 1976 and standard computer programming terminology) we could write the graph-traversing algorithms in Fortran, to add full X within X recursion we might switch to Algol, but maybe that's not actually such a big leap as people seem think it is).

      One way of interpreting Salix is as an attempt to show that sentence structure doesn't exist; I think this claim is almost certainly false, but Salix does show that sentence structure as we normally think of it does not *have* to exist, so we need to justify it better than I think we have actually done so far.

    4. I agree that perceiving a problem is a big deal. Indeed, without perceiving one it is hard to solve it. That is why I think that Chomsky has made an important contribution to the Evo Lang problem: he has identified one feature that needs addressing. I am less clear on what E's contribution is to the discussion. If I understood his piece then whether or not Erectus had the properties he attributed to it does not solve any identifiable problem, or at least not one that I can see. Did Erectus have these features? I have no idea, but I am happy to concede that Erectus did. I just do not see what follows.

      Now there is a second issue: does Chomsky's proposal wrt to Merge solve the problem he identified? Well if the problem is the emergence of merge and merge suffices for unbounded recursive hierarchy then it is a potential solution if Merge could have emerged all at once. This is the salutational view. The claim seems to be that Merge could not have emerged all at once. I do not see why not, and I buttressed this view by noting that others generally skeptical of salutational accounts (Dawkins) appear to think that this kind of "hopeful monster" is not unreasonable. I take this to mean it is possible. you suggest another route to the same end Salix to Algol. I do not know the details, but if what needs adding to Salix is trivial enough and it is plausible that we had something like it before the addition, then why not, another plausible route to recursive hierarchy. I also have a horse in this game: start with linear beads on a strong (via something like concatenation) add labels and we get unbounded hierarchy. Is this right? Damn if I know. But CHomsky's suggestion, yours and mine at least have the right FORM. What is depressing is that most of the Evo Lang proposals (including E's) fails this simple prerequisite. That's what I am criticizing. It is less a defense of Chomsky than the observation that at least his answer has the right form. Given the lay of the land, this is, sadly, quite an achievement.