Monday, July 2, 2018

Apes and the Chomsky Hierarchy

The obit for Koko in a recent post (here), though intended somewhat tongue in cheek, garnered three interesting comments. They pointed to a couple of recent papers that (in some sense that I will return to) extended the intellectual project of which Koko was an important cog when I was in grad school. It was the first EvoLang project that I was exposed to. The aim was to establish the linguistic facility of non-human apes. Which is what got me into that gorilla suit. 

In honor of Chomsky’s birthday, Elan Dresher, Amy Weinberg and yours truly decided to settle the issue once and for all by having Koko publically debate Noam on the topic of whether or not Koko was linguistically competent. IMO, the debate was a draw (though Noam (the real Noam, not Dresher-Noam) begged to differ). However, whatever the outcome, for a while after the debate I was considered somewhat of an expert on the topic as many concluded that having been the one in the suit I had an ape’s eye view of the issue and so could add a heretofore unavailable slant on the topic.  I milked this for all it was worth (as you would have too, I am sure).

So what was the debate about? The question was whether human language is continuous with similar linguistic capacities in our ape cousins. Eric Lenneberg dubbed the claim that human language is a quantitative extension of qualitatively analogous capacities in our ape cousins the Continuity Thesis (CT). Given CT, our evidently greater human linguistic capacities are just diminished ones of our ape cousins when turbo charged with greater brainpower. Human linguistic competence, in other words, is just ape linguistic competence goosed by a higher IQ. Given CT, we humans are not different in kind. We just have bigger brains (fatter rounder frontal lobes), more under the cranial hood, and our superior verbal capacities just hydroplane on that increased general intelligence.

Koko (Penny Patterson’s verbal gladiator) was not the only warrior in the CT campaign. There were many apes recruited to the good fight (see here). The Gardners had Washoe, Sue Rambaugh had Kanzi, Herb Terrace had Nym. There were others too. A lot of effort went into getting these apes to use signs (verbal, manual, computerized) and demonstrate the rudiments of linguistic competence. The aim was to show that they would/could acquire (either naturally, or more standardly after some extensive training) the capacity to articulate novel semantically composed messages using a rudimentary syntax similar to what we find in human natural language. Once this basic syntax was found, we could solve the EvoLang problem by attributing human verbal facility to this rudimentary syntax proportionatly enhanced by our bigger brains into the wondrous linguistically creative syntactic engine currently found in humans. Not surprisingly, it was believed that were it possible to show that, linguistically speaking, these other apes were attenuated versions of us, this would demonstrate that Chomsky was wrong and that there was nothing all that special about human linguistic capacity. If CT was correct then the difference between them and us is analogous to the difference between a two cylinder Citroen Deux Cheveaux and a Mercedes AMG 12 cylinder. The principles were the same even if the latter dwarfed the horsepower of the former. CT, in sum, obviated the need for (and indeed would be considerable evidence against) a dedicated linguistic mental module.

This really was all the rage for quite a while (hence my several months of apish celebrity). And I would bet (given the widespread coverage of Koko’s demise) that you could still get interviewed on NPR or published in the New Yorker with a groundbreaking (yet heartwarming) story about a talking ape (apes play well, though parrots also get good press).  In academic circles however, the story died. The research was largely shown to be weak in the extreme (“crappy” and “shoddy” are words that come to mind). Herb Terrace’s (non shoddy, not crappy) work with Nym Chimpsky (Mike Seidenberg and Laura Petito were the unfortunate grad students that did the hard slogging) more or less put to rest the idea that apes had any significant syntax at all, and showed that it is nothing like what we see in even the average three year old toddler.

Charles Yang has a really nice discussion of the contrast in capacities between Nym and your average toddler in a recent paper on Zipf’s law as it applies to the linguistic productions of toddlers vs those of Nym (here). By the global measure which that law affords Charles is able to show that:

Under the Zipfian light, however, the apparent continuity between chimps and children proves to be an illusion. Children have language; chimps do not. Young children spontaneously acquire rules within a short period of time; chimpanzees only show patterns of imitation after years of extensive training. 

Moreover, these “patterns of imitation” do not show the hallmarks of Zipfian diversity that we would expect to see if they were products of grammatical processes. As Charles bluntly puts it: Nym “was memorizing, not learning a rule.”[1]

The ineluctable conclusion is that teaching apes to talk came nowhere close to establishing CT. It failed toe establish anything like CT. In short, there is zero evidence from this quarter to gainsay the obvious observation (obvious to anyone but a sophisticated psychologist that is) that nothing does language like humans do language, not even sorta, kinda. The gap is not merely wide it is incommensurable. The gap is qualitative, not quantitative and the central distinguishing difference is that we humans are gifted with a biologically sui generic syntactic capacity. We develop unique Gs on the basis of unique FL/UGs. Yay us![2]

Now, I admit that I thought that this line of discussion was dead and buried, at least in the academic community. But I should have known better. What the commentators noted in their comments to my previous obit for Koko is that it is baaaack, albeit in a new guise, this time sporting a new fancy formal look. The recent papers adverted to, one outlining the original research conducted by a consortium of authors including Stan Dehaene (SD&Co)(here) and a comment on the work explaining its significance by Tecumseh Fitch (TF) (here), resurrects a modern version of CT, but this time in more formidable looking garb. IMO, this time around, the discussion is of even less biolinguistic or evolang relevance. I’d like to talk about these papers now to show why I find them deeply disappointing.

So what does SD&Co do? It shows that Macaques (i.e. monkeys, not apes so further from humans that is really CT optimal) can be (extensively) trained (10,000-25,000 training trials) to produce supra-regular (manual) sequences that whose patterns go beyond the generative capacity of finite state automota. This is argued to be significant for it provides evidence that crossing the regular grammar boundary is not “unique to humans” (SD&Co, 1). This fact is taken to be relevant biolinguistically for it “indicates cognitive capabilities [in non humans, NH]…that approach the computational complexity level of human syntax” (TF, R695).

The experiments taught monkeys to perform sequential operations in the spatial-motor domain (touching figures in sequential patterns on a screen). These manual patterns go beyond those describable by regular grammars. It is argued that mastering this sequential capacity implies that monkeys have acquired supra-regular grammars able to generate these patterns (Phrase Structure Grammars (PSG)), heretofore thought to be “only available in humans” (SD&Co, 1). So, assuming that the experiments with the monkeys was well done (and I don’t have the expertise to deny this, nor do I believe it (nor do I think that it is relevant to do so)), then it appears that non humans can cross over the domain of regular grammars and acquire performance compatible with computational systems in (at least) the PSG part of the Chomsky Hierarchy. Let’s stipulate that this is so. Why should we care? TF provides the argument. Here it is in rough outline:

1.    Only humans show an “unbounded” communicative “expressive power”
2.    Shared traits are a “boon to biologists interested in language” to “test evolutionary hypotheses about adaptive function”
3.    Syntax has “until now resisted the search for parallels in our animal brethren”
4.    It has heretofore been thought that “grammars” are “beyond the capabilities of nonhuman animals”
5.    SD&Co “show[s] that with adequate training, monkeys can break beyond this barrier”
6.    In particular, the monkeys could learn rules beyond those in finite state (regular) grammars
7.    So nonhumans have “supra-regular computational capacities”
8.    And this “suggest[s] that the monkey’s brain possesses the kind of cognitive mechanisms required for human linguistic syntax…”

So there is it, the modern version of CT.  Monkeys have what we have just a little less of it in that ours requires a handful of examples (5 according to SD&Co) to get it going while theirs needs on the order to 10,000-25,000. Deux Cheveaux vs Mercedes it is, once again.

So, is this argument any better or insightful than the earlier Ape versions of CT? Not as far as I can tell. Let me say why I don't think so.

Note that his version of CT is very much more modest than the earlier attempts. Let’s count the ways.

First, earlier versions looked to our immediateevo relations. Here, it’s not our immediate cousins whose capacities are investigated, but monkeys (and they are very, very, very distant relations).

Second, early CT were interested in demonstrating a productive syntax in service of semantic productivity. What’s wondrous about us is that the syntax is linked to (underlies?) semantic productivity. We are not looking at mere patterns or sequences. What is impressive in language is the fact that the relevant patterns can be semantically deployed to produce and understand an open ended number of distinct kinds of messages/thoughts. The SD&Co experiments have nothing to do with semantic productivity. It is just pattern generation that is at issue. There is no reason provided to think that the monkeys can or could use these patterns for compositional semantic ends. So, one key feature of oursyntax (the fact that it ties together meaning and articulation) is set to one side in these experiments.[3]And this is a big retreat from the earlier CT work that correctly recognized that linguistic creativity is the phenomenon of interest.

Third, and related to the second, the kind of grammars the monkeys acquire have properties that our grammars never display. Human Gs don’t have mirror image rules. And the reason we think that such kinds of rules are absent is that human Gs have rules that don’t allow them. They are “structure dependent.” Our rules exploit the hierarchy of linguistic representations, and the syntax eschews their sequential properties. So it is unclear, at least to me, why being able to teach monkeys rules of the kind we never find in human Gs would tell us about the properties of human Gs that fail to pattern in accord with these rules. And it it cannot do this, it also cannot inform us as to how our kinds of Gs evolutionarily arose in the species. 

This criticism is analogous to one raised against earlier CT work. It was regularly observed that even the most sophisticated language in nonhumans (the best actually being on dolphins by Herman and published in Cognition many, many, many years ago) made use of an ordinal encoding of “thematic” role to sequential position. The critters were taught a “grammar” to execute commands like “Bring A with B to C” with varying objects in the A,B,C positions. They were pretty good at the end, with the best able to string 5 positional roles and actions together. So, they mastered an ordinal template to roles and actions productively (though not recursively (see note 4)). But, and this was noted at the time and since, it was a sequential template without a hint of hierarchy. And this is completely unlike what we find in human Gs.[4]

Interestingly, SD&Co notes that the same thing holds in its experiments (pp. 6-7). 

Even after extensive training on length 4 sequences, behavioral analysis suggested that monkeys still relied on a simple ordinal memory encoding, whereas pre-schoolers spontaneously sued chunking and global geometric structure to compress the information. Thus, the human brain may possess additional computational devices, akin to a “language of thought,” to efficiently represent sequences using a compressed descriptor during inductive learning.

In other words humans display a sense of constituency (after 5 exposures) while highly trained monkeys never develop one (well, not even after 25,000 training examples). Maybe its because human Gs are built around the notion of a constituent, while monkey Gs never are. And as constituency is what allows human semantic productivity (it underlies our conceptions of compositionality) it really is quite an important difference between us and them. In fact, it has been what people like me who claim that human syntax is unique have pointed to forever. As such, discovering that monkeys fail to develop such a sense of hierarchy and constituency seems like a really big difference. In fact, it seems to recapitulate what the earlier CT investigations amply established. In fact, even CD&Co seem to suggest that it marks a qualitative difference, referring as it does to “additional computational devices,” albeit describing these as properties of a distinctive “language of thought” rather than what I would call “syntax.” Note, that if this is correct, then one reasonable way of understanding the SD&Co experiments is establishing that what monkeys do and what we do is qualitatively different and that looking for Gs in our ancestors is a mug’s game (the conclusion opposite to the one that TF draws). In other words, CT is wrong (again) and we should just stop assuming that general intelligence or general computational capacities will come to the evolang rescue. 

Fourth, it is important to appreciate what a modest conclusion DG&Co’s paper licenses even if true. It shows that monkeys can manage to acquire capacities wrt to kinds of patterns, including mirror image ones, that are assumed to be relevantly similar to linguistically relevant ones. What makes the patterns linguistically interesting? Regular Gs cannot generate them. They require at least Gs in the PSG part of the Chomsky Hierarchy. The conclusion is that monkeys have computational capacities beyond the regular, and lie in at least the PSG precincts. But even if correct, this is a very weak conclusion. Why? 

There are billion and billions of supra-regular rules/Gs (indeed there are even humongously many regular ones). Human Gs have zeroed in on one extremely small corner of this space. And it is true that unlessnon-humans can master supra-regular patterns they cannot have human Gish capacities because our Gs are super-regular. However even if they can master supra-regular Gs it does not follow that the kinds they can master are in any way similar to those that we deploy in natural our natural language facility. In other words, the claims in 7/8 above are wildly misleading. The fact that monkeys can have one kind of supra-regular capacity tells us nothing at all about whether they can ever have our kind of supra-regular capacity, the one underlying our linguistic facility. Assuming otherwise is simply a non-sequitur. In other words, TF’s claim that the monkey behavior “suggest[s] that the monkey’s brain possesses the kind of cognitive mechanisms required for human linguistic syntax” is entirely unfounded if this is taken to mean that they have computational mechanisms like those characteristic of our Gs.

Let me put this another way: GGers have provided pretty good descriptions of the kinds of properties human Gs have. The evolang question is how Gs with thesefeatures could have arisen in the species. Now, one feature of these Gs is their recursivity. But this is a very weak feature. Any G for a non-finite language in the Chomsky Hierarchy will have this feature.  But virtually none of them will have the properties our Gs enjoy. What we want from a decent evolang story is an account of how recursive Gs like ours(ones that generate an unbounded number of hierarchically structured objects capable of supporting meaning and articulation, that are non-counting, that support displacement, etc. etc. etc.) came to be fixed in the species. That’s what we want to understand. There is virtually no reason to think that the SD&Co advances this question even one nano-nano-nano meter (how many nano meters to a jot?) for the rules they get the monkeys to acquire look nothing like the rules that characterize our Gs (recall, human Gs eschew mirror image rules, they link meaning with articulation, they show displacement etc.).

Truth be told, GGers (and here I mean Chomsky) might be a little responsible for anyone thinking simple recursion is the central issue. We are responsible for two different reasons. 

The first sin is that many a minimalist talk talk starts by insisting that the key feature of human linguistic facility is recursion. However, it is not recursion per se that is of interest (or not only recursion) but the very specific kind of recursion that we find in human language. There are boundlessly many recursive systems (even regular languages have recursive Gs that can generate an unbounded number of outputs). Our specific recursive Gs have very distinctive properties and the relevant evolang question is not how recursion arose but how the specific hierarchical recursion we find in humans arose. And there is nothing in these papers that indicates that human linguistic facility is any less qualitatively distinctive than GGers have been arguing it is for the last 60 years. In fact, in some important ways, as I’ve noted, this recent foray into CT is several steps less interesting than the earlier failed work, as it fails to link syntactic recursion to semantic creativity, like earlier work tried to do (even if it largely failed).

The second failing can be laid at the feet of Syntactic Structures. It presents a neat little argument showing that natural language Gs cannot be modeled as finite state automata. This was worth noting at the time because Markov processes were taken to be useful models of human cognition/language.[5]So Chomsky showed that this could not be true even looking at extremely superficial properties found in the sequential patterns of natural language (e.g. if…then…patterns). He then noted that PSGs could generate such patterns and argued that PSGs are also inadequate once one considers slightly more interesting generalizations found in natural language. The second argument, however, does not claim that PSGs cannot generate the relevant patterns, but do not do so well (e.g. they miss obvious generalizations, make Gs too complicated etc.). Some concluded that this was a weak argument against PSGs (I disagree), but what’s more important is the conclusion tacitly drawn was that being PSGish is what language is all about. There is a very strong whiff of this assumption in both papers. But this is wrong. Given Chomsky’s points, being (at least) PSGish is at most a necessary condition in that being Finite State Gs cannot even get off the ground. But we know that not all PSGs are the same and that natural language shows very distinctive PSG properties (e.g. headedness, binarity) so the necessary condition is a very weak feature and that having PSG capacities does not imply having those underlying human language. It’s a little like discovering that the secret to life is the prime factorization of an even number and concluding that I am closer to finding it because I know how to factor 6 into its primes. Knowing that it is even is knowing something, just not very much.

Let’s return to the main discussion again. Fifth, if the point is to demonstrate that animals have fancy computational capacities (e.g. ones that require memory more involved than the kind we find in Finite State Machines) then we already have tons of evidence that this is so. In fact, I am willing to bet that being able to identify the position of the sun even when obscured, using that position to locate a food source, calculating a direct route back home despite a circuitous route out, communicating the position of this food source to others by systematically dancing in the dark and by being able to understand this message and reverse the trajectory of flight all the while calibrating the reliability of these messages by comparing their contents with a mental map specifying whether the communicated position is one of the possible positions for the food source is quite a bit more computationally involved than mirror reversing a sequence. In fact, I would bet that one comes close to using full Turing computational resources for this kind of calculation (e.g. memory much more involved than a stack, and computations at least as fancy as mirror reversal). So, if the aim is to show that animals have very fancy computational capacities, we already know that they do (cashing behavior is similarly fancy, as is dead-reckoning etc.). Just take a look at some of the behaviors that Gallistel is fond of describing and you can put to rest the question of whether non-human animal cognition can be computationally very involved. It can be. We know this. But, as fancy as it can be, it is fancy in ways different from the ways human linguistic capacity is fancy and there is no reason to think that mastering one gives you a leg up on mastering the other (you try dancing in the dark to tell someone where to find a cheeseburger at 4 AM). 

In sum, there is no reason to doubt that non-humans have fancy cognitive computational capacities. But there is also no way currently of going from theirvery fancy capacities to the ones that underlie our linguistic facility. And that, of course, is the problem of interest. If the aim of SD&Co is to demonstrate that non-humans can be computationally fancy shmancy, it was a waste of time. 

Sixth, let me end with a much weaker concern. There is one more sense in which these kinds of experiments strike me as weak. What we believe about humans is that it is (partly) in virtue of having Gs that generate structures of certain sorts that humans have linguistic capacities of the kinds they have. So, for example, GGers argue that it is in virtue of having Merge that Gs allow movement and reconstruction that humans can creatively understand the interpretation of sentences like Which of his1books did every author insist you review. Without Gs of this sort, these behaviors could not be accounted for.  Now, it is quite unclear to me if most of the animal literature on sequence pattern capacities in animals (can they master AnBnor mirror image sequences?) shows that they animals succeed in virtue of having PSGs that generate such sequences. Maybe they do it some other way, using systems more powerful than PSGs. Let me explain what I mean.

If asked to verify whether a sequence of As and Bs was an AnBnsequence, I personally would count the As and Bs and see if they matched up. This would allow me to do the task without using a PSG to generate the string that has n As followed by n Bs. In the first case I count. Not so the second. Now, I might be off here, but I do not see that the experiments generally reported (including the SD&Co one) show that the animals strut their cognitive stuff in virtueof mastering Gs with the right properties. Note, that PSGs can do what I do with counting without it. But another G could solve the problem with counting. So how do we know the animals don’t count (note that we are usually talking of very short sequences (3,4 unites long))? Or, more specifically, how do we know that they solve the problem by constructing/acquiring a PSG that they use to solve it? The fact that such a G could do this does not imply that this is how the monkeys actually do it. There are many ways to cognitively skin a problem. Using a non counting PSG is one specific way. There are others.[6]

Ok, enough already. I am a fan of the work of both DeHaene and of Fitch. I have learned a lot by reading them. But I don’t see the value of this computational revival of CT. It really does nothing to advance the relevant evolang or biolinguistic issues so far as I can tell. More worrisome is that it is generally oversold and, hence, misleading. It is time to bury the Continuity Thesis in all of its manifestations. Human language really is different. We even have some idea how. For evolang to have any interest, it needs to start asking how itcould have evolved, how something with its distinctive properties could have arisen. The current crop of CT inspired explorations of the Chomsky Hierarchy don’t do this, though they don’t do this in an obscure way that covers up their general irrelevance. I prefer the older ape based ways of being wrong. They at least made for some fun theater. Thx Koko. 

[1]Charles’ most interesting observation in this wonderful little paper (read it!) is the following: 

To this day, Nim has provided the only public database of signs from animal language studies. (By contrast, the ability of Koko, the famous talking gorilla who occasionally holds online chats, comes exclusively from her trainer’s interpretation and YouTube clips.) 

It is amazing how far anecdotes can take you when you are pushing a line whose truth is strongly desired. Herb Terrace and company are to be commended for studying the issue scientifically rather than a la NPR.
[2]There was always another thing odd about CT work. Say we could show that apes had linguistic capacities (nearly) identicalto ours. That would allow us to explain human linguistic facility by saying that humans inherited it from a common ape/human ancestors.
So given this assumption, how we became facile is easily explained. But this would not really explain the question we are most interested in: how linguistic capacity in general arouse. It would only explain why we have it given our ape ancestors had it. But as the really interesting question is how language arose, not how it arose in us, the very same problem pops up now as regards our ape ancestors and monkeys, unless we assume that monkeys too have/had more or less the same linguistic facility as our ape ancestors. And so on through the clades. At some point, heading backwards through our ancestry we would come to a place where animal X had language while related animal Y did not. And when we got there, the very same problem we are trying to answer in the human/ape case would arise, and arise in the very same form. As such it is unclear what kind of progress we make by attributing to our ancestors (pale) versions of our capacities unless we are willing to go all the way and attribute paler and paler versions of this linguistic capacity all the way down (or out) the evo tree/bush. 
This brings us to the real implicit assumption behind the earlier CT enterprise: everything talks. The differences we see are never qualitative ones. For unless we assume this pushing the problem back a clade or two really does nothing to solve the conceptual puzzle of interest. So the inchoate assumption has always been that language is just a product of general intelligence, intelligence is common across animals (life?) and one only sees language emerge robustly when intelligence crosses a certain quantitative threshold. The only reason that dogs don’t talk is that they are too dumb. But don’t say this around a favored pet because they might get pissed.
Last point: this anti-modularity conception of cognition is just part of the general Eish conception of minds. Gallistel and Matzel have a nice discussion of how Associationism favors general intelligence approaches to the mind because modularity requires specific bespoke architectures. 
[3]Note that earlier CT work aimed at showing that this was possible in animals. Like I note below, Dolphins didn’t do badly wrt simple sequence/thematic meaning linkages. So they showed a kind of compositionality in their acquired G.
[4]It was also not productive, capping out at 5 (at most). So there was no recursion here. What we would like to see is recursion coupled to semantic composition via a hierarchical syntax. Neither the dophins nor the macaques provide this.
[5]My recollection is that heavyweights like Suppes mooted as much.
[6]The work on ‘Most’ (here) revolves around just this theme: truth conditions can be determined by various different generative procedures. The relevant GG question is the nature of the generative procedure. So too here. Are the monkeys solving their problems by using an acquired G that generates the relevant strings or are they doing it in some other way (e.g. by counting or ordinally ordering the inputs?). 


  1. "However, it is not recursion per se that is of interest (or not only recursion) but the very specific kind of recursion that we find in human language. "
    This feels a bit "no true scotsman" to me. There must be some property, or set of properties, that humans have and crucially non-human don't have. An interesting claim has to specify what that property might be. The claim that it has to be just like it is in human languages means that one can add on an additional property when needed -- whenever someone teaches an ape a new trick.

    BTW, I think your point 6 is spot on. The Chomsky hierarchy is only one way of doing things and only one way of drawing the boundaries.

    1. Thx for the second on 6.
      As for the first point: I am not sure what you intend. Here is the GG idea: it si in virtue of our syntax that we can be linguistically creative. This is one of the really interesting fact about our "communication" system. There are others (e.g. not stimulus bound (i.e. "free") but we have no idea how to explain this at all right now), but this is a big. Ok so we want to know how FL that supports a linguistic creativity like ours arose. That's the question. You evaluate an answer to the degree that it addresses it. I don't see that discovering that monkeys may have computational capacities further out than the regular/Finite state automata boundary does that. Not even a little. So, given that we know something about what we have and what we want to know is how it arose then a question to ask concerning any evolving proposal is to what degree it explains THAT. This is my only (and, IMO, relatively trivial) point.

    2. But think if the results had turned out the opposite way and it transpired that there was a sharp division between human and non human computational abilities that corresponded somewhat to the regular/non-regular boundary? That would certainly be an interesting result; and if P is interesting, so is not P.
      And certainly I have seen people identify HCF's notion of recursion with this boundary -- even if that is a mistaken view.

    3. But the results DIDN'T do this. they at best showed, IMO, that animals have computational capacities beyond the finite state. As I noted in the post, we already knew that. What both papers suggested is that having these capacities in some way can be used to explain how we developed our linguistic capacities (dendrophilia and all that). I don't see that they did this at all. Not even a little. Moreover, the papers will serve to obfuscate the main point of what we want from an evolving account. So all in all, not much help and potential fodder for confusion.
      As for "if P is interesting, so is not P" I doubt you believe this. As an excellent friend of mine pointed out, if I could show that all cats were robots then this would be quite a find. Proving that they are not is not really research worthy.

  2. Great commentary. Really well reasoned. Thanks for providing some further insight on these issues.

  3. Great commentary. Really well reasoned. Thanks for providing some further insight on these issues.