Faculty of Language: February 2018

Tuesday, February 27, 2018

Universals; structural and substantive

Linguistic theory has a curious asymmetry, at least in syntax. Let me explain.

Aspects distinguished two kinds of universals, structural vs substantive. Examples of the former are commonplace: the Subjacency Principle, Principles of Binding, Cross Over effects, X’ theory with its heads, complements and specifiers; these are all structural notions that describe (and delimit) how Gs function. We have discovered a whole bunch of structural universals (and their attendant “effects”) over the last 60 years, and they form part of the very rich legacy of the GG research program.

In contrast to all that we have learned about the structural requirements of G dependencies, we have, IMO, learned a lot less about the syntactic substances: What is a possible feature? What is a possible category? In the early days of GG it was taken for granted that syntax, like phonology, would choose its primitives (atomic elements) from a finite set of options. Binary feature theories based on the V/N distinction allowed for the familiar four basic substantive primitive categories A, N, V, and P. Functional categories were more recalcitrant to systematization, but if asked, I think it is fair to say that many a GGer could be found assuming that functional categories form a compact set from which different languages choose different options. Moreover, if one buys into the Borer-Chomsky thesis (viz. that variation lives in differences in the (functional) lexicon) and one adds a dash of GB thinking (where it is assumed that there is only a finite range of possible variation) one arrives at the conclusion that there are a finite number of functional categories that Gs choose from and that determine the (finite) range of possible variation witnessed across Gs. This, if I understand things (which I probably don’t (recall I got into syntax from philosophy not linguistics and so never took a phonology or morphology course)), is a pretty standard assumption within phonology tracing back (at least) to Sound Patterns. And it is also a pretty conventional assumption within syntax, though the number of substantive universals we find pale in comparison to the structural universals we have discovered. Indeed, were I incline to be provocative (not something I am inclined to be as you all know), I would say, that we have very few echt substantive universals (theories of possible/impossible categories/features) when compared to the many many plausible structural universals we have discovered.

Actually one could go further, so I will. One of the major ambitions (IMO, achievements) of theoretical syntax has been the elimination of constructions as fundamental primitives. This, not surprisingly, has devalued the UG relevance of particular features (e.g. A’ features like topic, WH, or focus), the idea being that dependencies have the properties they do not in virtue of the expressions that head the constructions but because of the dependencies that they instantiate. Criterial agreement is useful descriptively but pretty idle in explanatory terms. Structure rather than substance is grammatically key. In other words, the general picture that emerged from GB and more recent minimalist theory is that G dependencies have the properties they have because of the dependencies they realize rather than the elements that enter into these dependencies.[1]

Why do I mention this? Because of a recent blog post by Martin Haspelmath (here, henceforth MH) that Terje Lohndal sent me. The post argues that to date linguists have failed to provide a convincing set of atomic “building blocks” on the basis of which Gs work their magic. MH disputes the following claim: “categories and features are natural kinds, i.e. aspects of the innate language faculty” and they form “a “toolbox” of categories that languages may use” (2-3). MH claims that there are few substantive proposals in syntax (as opposed to phonology) for such a comprehensive inventory of primitives. Moreover, MH suggests that this is not the main problem with the idea. What is? Here is MP (3-4):

To my mind, a more serious problem than the lack of comprehensive proposals is that linguistics has no clear criteria for assessing whether a feature should be assumed to be a natural kind (=part of the innate language faculty).

The typical linguistics paper considers a narrow range of phenomena from a small number of languages (often just a single language) and provides an elegant account of the phenomena, making use of some previously proposed general architectures, mechanisms and categories. It could be hoped that this method will eventually lead to convergent results…but I do not see much evidence for this over the last 50 years.

And this failure is principled MH argues relying that it does on claims “that cannot be falsified.”

Despite the invocation of that bugbear “falsification,”[2] I found the whole discussion to be disconcertingly convincing and believe me when I tell you that I did not expect this. MH and I do not share a common vision of what linguistics is all about. I am a big fan of the idea that FL is richly structured and contains at least some linguistically proprietary information. MP leans towards the idea that there is no FL and that whatever generalizations there might be across Gs are of the Greenberg variety.

Need I also add that whereas I love and prize Chomsky Universals, MH has little time for them and considers the cataloguing and explanation of Greenberg Universals to be the major problem on the linguist’s research agenda, universals that are best seen as tendencies and contrasts explicable “though functional adaptation.” For MH these can be traced to cognitively general biases of the Greenberg/Zipf variety. In sum, MH denies that natural languages have joints that a theory is supposed to cut or that there are “innate “natural kinds”” that give us “language-particular categories” (8-9).

So you can see my dilemma. Or maybe you don’t so let me elaborate.

I think that MH is entirely incorrect in his view of universals, but the arguments that I would present would rely on examples that are best bundled under the heading “structural universals.” The arguments that I generally present for something like a domain specific UG involve structural conditions on well-formedness like those found in the theories of Subjacency, the ECP, Binding theory, etc. The arguments I favor (which I think are strongest) involve PoS reasoning and insist that the only way to bridge the gap between PLD and the competence attained by speakers of a given G that examples in these domains illustrate requires domain specific knowledge of a certain kind.[3]

And all of these forms of argument loose traction when the issue involves features, categories and their innate status. How so?

First, unlike with the standard structural universals, I find it hard to identify the gap between impoverished input and expansive competence that is characteristic of arguments illustrated by standard structural universals. PLD is not chock full of “corrected” subjacency violations (aka, island effects) to guide the LAD in distinguishing long kosher movements from trayf ones. Thus the fact that native speakers respect islands cannot be traced to the informative nature of the PLD but rather to the structure of FL. As noted in the previous post (here), this kind of gap is where PoS reasoning lives and it is what licenses (IMO, the strongest) claims to innate knowledge. However, so far as I can tell, this gap does not obviously exist (or is not as easy to demonstrate) when it comes to supposing that such and such a feature or category is part of the basic atomic inventory of a G. Features are (often) too specific and variable combining various properties under a common logo that seem to have little to do with one another. This is most obvious for phi-features like gender and number, but it even extends to categories like V and A and N where what belongs where is often both squishy within a G and especially so across them. This is not to suggest that within a given G the categories might not make useful distinctions. However, it is not clear how well these distinctions travel among Gs. What makes for a V or N in one G might not be very useful in identifying these categories in another. Like I said at the outset, I am not expert in these matters, but the impression I have come away with after hearing these matters discussed is that the criteria for identifying features within and across languages is not particularly sharp and there is quite a bit of cross G variation. If this is so, then the particular properties that coagulate around a given feature within a given G must be acquired via experience with that that particular feature in that particular G. And if this is so, then these features differ quite a bit in their epistemological status from the structural universals that PoS arguments most effectively deploy. Thus, not only does the learner have to learn which features his G exploits, but s/he even has to learn which particular properties these features make reference to, and this makes them poor fodder for the PoS mill.

Second, our theoretical understanding of features and categories is much poorer than our understanding of structural universals. So for example, islands are no longer basic “things” in modern theory. They are the visible byproducts of deeper principles (e.g. Subjacency). From the little I can tell, this is less so for features/categories. I mentioned the feature theory underlying the substantive N,V,A,P categories (though I believe that this theory is not that well regarded anymore). However, this theory, even if correct, is very marginal nowadays within syntax. The atoms that do the syntactic heavy lifting are the functional ones, and for this we have no good theoretical unification (at least so far as I am aware). Currently, we have the functional features we have, and there is no obvious theoretical restraint to postulating more whenever the urge arises. Indeed, so far as I can tell, there is no theoretical (and often, practical) upper bound on the number of possible primitive features and from where I sit many are postulated in an ad hoc fashion to grab a recalcitrant data point. In other words, unlike what we find with the standard bevy of structural universals, there is no obvious explanatory cost to expanding the descriptive range of the primitives, and this is too bad for it bleaches featural accounts of their potential explanatory oomph.

This, I take it, is largely what MH is criticizing, and if it is, I think I am in agreement (or more precisely, his survey of things matches my own). Where we part company is what this means. For me this means that these issues will tell us relatively little about FL and so fall outside the main object of linguistic study. For MH, this means that linguistics will shed little light on FL as there is nothing FLish about what linguistics studies. Given what I said above, we can, of course, both be right given that we are largely agreeing: if MH’s description of the study of substantive universals is correct, then the best we might be able to do is Greenberg, and Greenberg will tell us relatively little about the structure of FL. If that is the argument, I can tag along quite a long way towards MH’s conclusion. Of course, this leaves me secure in my conclusion that what we know about structural universals argues the opposite (viz. a need for linguistically specific innate structures able to bridge the easily detectable PoS gaps).

That said, let me add three caveats.

First, there is at least one apparent substantive universal that I think creates serious PoS problems; the Universal Base Hypothesis (UBH). Cinque’s work falls under this rubric as well, but the one I am thinking about is the following. All Gs are organized into three onion like layers, what Kleanthes Grohmann has elegantly dubbed “prolific domains” (see his thesis). Thus we find a thematic layer embedded into an agreement/case layer embedded into an A’/left periphery layer. I know of no decent argument arguing against this kind of G organization. And if this is true, it raises the question of why it is true. I do not see that the class of dependencies that we find would significantly change if the onion were inversely layered (see here for some discussion). So why is it layered as it is? Note that this is a more abstract than your typical Greenberg universal as it is not a fact about the surface form of the string but the underlying hierarchical structure of the “base” phrase marker. In modern parlance, it is a fact about the selection features of the relevant functional heads (i.e. about the features (aka substance) of the primitive atoms). It does not correspond to any fact about surface order, yet it seems to be true. If it is, and I have described it correctly, then we have an interesting PoS puzzle on our hands, one that deals with the organization of Gs which likely traces back to the structure of FL/UG. I mention this because unlike many of the Greenberg universals, there is no obvious way of establishing this fact about Gs from their surface properties and hence explaining why this onion like structure exists is likely to tell us a lot about FL.

Second, it is quite possible that many Greenberg universals rest on innate foundations. This is the message I take away from the work by Culbertson & Adger (see here for some discussion). They show how some order within nominals relating Demonstratvies, Adjectives, Numerals and head Nouns are very hard to acquire within an artificial G setting. They use this to argue that their absence as Greenberg options has a basis in how such structures are learned. It is not entirely clear that this learning bias is FL internal (it regards relating linear and hierarchical order) but it might be. At any rate, I don’t want anything I said above to preclude the possibility that some surface universals might reflect features of FL (i.e. be based on Chomsky Universals), and if they do it suggests that explaining (some) Greenberg universals might shed some light on the structure of FL.

Third, though we don’t have many good theories of features or functional heads, a lazy perusal of the facts suggest that not just anything can be a G feature or a G head. We find phi features all over the place. Among the phi features we find that person, number and gender are ubiquitous. But if anything goes why don’t we find more obviously communicatively and biologically useful features (e.g. the +/- edible feature, or the +/- predator feature, or the +/- ready for sex feature or…). We could imagine all sorts of biologically or communicatively useful features that it would be nice for language to express structurally that we just do not find. And the ones that we do find, seem from a communicative or biological point of view to often be idle (gender (and, IMO, case) being the poster child for this). This suggests that whatever underlies the selection of features we tend to see (again and again) and those that we never see is more principled than anything goes. And if that is correct, then what basis could there be for this other than some linguistically innate proclivity to press these features as opposed to those into linguistic service. Confession: I do not take this argument to be very strong, but it seems obvious that the range of features we find in Gs that do grammatical service is pretty small, and it is fair to ask why this is so and why many other conceivable features that we could imagine would be useful are nonetheless absent.

Let me reiterate a point about my shortcomings I made at the outset. I really don’t know much about features/categories and their uniform and variable properties. It is entirely possible that I have underestimated what GG currently knows about these matters. If so, I trust the comments section will set things straight. Until that happens, however, from where I sit I think that MH has a point concerning how features and categories operate theoretically and that this is worrisome. That we draw opposite conclusions from these observations is of less moment than that we evaluate the current state of play in roughly the same way.

[1] This is the main theme of On Wh Movement and I believe what drives the unification behind Merge based accounts of FL.

[2] Falsification is not a particularly good criterion of scientific adequacy, as I’ve argued many times before. It is usually used to cudgel positions one dislikes rather than push understanding forward. That said, in MH, invoking the F word does not really play much more than an ornamental role. There are serious criticisms that come into play.

[3] I abstract here from minimalist considerations which tries to delimit the domain specificity of the requisite assumptions. As you all know, I tend to think that we can reduce much of GB to minimalist principles. The degree to which this hope is not in vain, to that degree the domain specificity can be circumscribed to whatever it is that minimalism needs to unify the apparently very different principles of GB and the generalizations that follow from them.

Wednesday, February 14, 2018

The significant gaps in the PoS argument

I admit it, the title was meant to lure some in with the expectation that Hornstein was about to recant and confess (at last) to the holiness (as in full of holes) of the Poverty of Stimulus argument (PoS). If you are one of these, welcome (and gotcha!). I suspect, however, that you will be disappointed for I am here to affirm once again how great an argument form the PoS actually is, and if not ‘holy’ then at least deserving of the greatest reverence. The remarks below are prompted by an observation in a recent essay by Epstein, Kitahara and Seely (EKS’s) (here p. 51, emphasis mine):

…Recognizing the gross disparity between the input and the state attained (knowledge) is the first step one must take in recognizing the fascination surrounding human linguistic capacities. The chasm (the existence of which is still controversial in linguistics, the so-called poverty of the stimulus debate) is bridged by the postulation of innate genetically determined properties (uncontroversial in biology)…

This is a shocking statement! And what makes it shocking is EKS’s completely accurate observation that many linguists, psychologists, neuroscientists, computer scientists and other language scientists still wonder whether there exists a learning problem in the domain of language at all. Yup, after more than 60 years of, IMO, pretty conclusive demonstrations of the poverty of the linguistic stimulus (and several hundred years of people exploring the logic of induction) the question of whether such a gap exists is still not settled doctrine. To my mind, the only thing that this could mean is that skeptics really do not understand what the PoS in the domain of language (or any other really) is all about. If they did, it would be uncontroversial that a significant gap (in fact, as we shall see several) exists between evidence available to fix the capacity and the capacity attained. Of course, recognizing that significant gaps exist does not by itself suffice to bridge them. However, unrecognized problems are particularly difficult to solve (you can prove this to yourself by trying to solve a couple of problems that you do not know exist right now) and so as a public service I would like to rehearse (one more time and with gusto) the various ways that the linguistic input (aka, Primary Linguistic Data (PLD)) to the language acquisition device (LAD (aka child)) underdetermines the structure of the competence attained (knowledge of G that a native speaker has). The deficiencies are severe and failure to appreciate this stems from a misconception of what G acquisition consists in. Let’s review.

There are three different kinds of gaps.

The first and most anodyne relates to the quality of the input. There are several ways that the quality might be problematic. Here are some.

1. The input PLD is in the form of uttered bits of language. This gap adverts to the fact that there is a slip betwixt phrases/sentences (the structures that we know something about) and the lip(s that utter them). So the PLD in the ambient environment of the LAD is not “perfect.” There are mispronunciations, half thoughts badly expressed, lots of ‘hmms’ and ‘likes’ thrown in for no apparent purpose (except to irritate parents), misperceptions leading to misarticulations, cases of talking with one’s mouth full, and more. In short, the input to the LAD is not ideal. The input data are not perfect examples of the extensional range of sentences and phrases of the language.

2. The range of input data also falls short. Thus, since forever (Newport, Gleitman and Gleitman did the leg work on this over 30 years ago (if not more)) it has been pointed out that utterances addressed to LADs are largely in imperative or interrogative form. When talking to very young kids, native speakers tend to ask a lot of questions (to which the asker already knows the answer so it is not a “real” question) and issue a log of commands (actually many of the putative questions are also rhetorically disguised commands). Kids, in contrast, are basically declarative utterers. In other words, kids don’t grow up talking motherese, though large chunks of the input has a very stylized phon/syntactic contour (at least in some populations). They don’t sound mothereesish and they eschew use of the kinds of sentences directed at them. So even at a gross level, the match between what LADs hear in their ambient environment and what they do mismatches.

So, the input is not perfect and the attained competence is an idealized version of what the LAD actually has access to. Even the Structuralists appreciated this point, knowing full well that texts of actual speech needed curation to be taken as useful evidence for anything. Anyone who has ever read a non-edited verbatim text of an interview knows that the raw uttered data can be pretty messy. As indeed are data of any kind. The inputs vary in the quality of the exemplar, some being closer approximations to the ideal than others. Or to put this another way: there is a sizeable gap between the set of sentences and the set of utterances. Thus, if we assume that LADs acquire sentential competence, there is a gap to be traversed in building the former set from the latter.

The second gap between input and capacity attained is decidedly more qualitative and significant. It is a fact that native speakers are linguistically creative in the sense of effortlessly understanding and producing linguistic objects never before experienced. Linguistic creativity requires that what an LAD acquires on the basis of PLD is not a list of sentences/phrases previously encountered in ambient utterances but a way of generating an open ended list of acceptable sentences/phrases. In other words, what is acquired is (at least) some kind of generative procedure (rule system or G) that can recursively specify the open ended set of linguistic objects of the native speaker’s language. And herein lies the second gap: the outputs of the acquisition process is a G or set of rules while the input is products of this G or set of rules AND products of rules and rules (or sentences and the Gs that generate them) are ontologically different kinds of objects. Or to put this another way: an LAD does not “experience” Gs directly but only via their products and there is a principled gap between these products and the generative procedures that generate them. Furthermore, so far as I know nobody has ever shown how to bridge this ontological divide by, say, using conventional analytical methods. For example, so far as I know the standard substitution methods prized by the transitional probability crowd has yet to converge on the actual VP expansion rule (one that includes ate, ate a potato, ate a potato that I bout at the store, ate a potato that I bought at the store that is around the block, ate a potato that I bought at the store that is around the block near the gin joint whose owner has a red buggy which people from Detroit want to buy for a song, etc. All of these are VPs but so far that they are all VPs is not something that artificial G learners have managed to cover. Recursion really is a pain).

In fact, it is worse than this. For any finite set of data specified extensionally there are an infinite number of different functions that can generate those data. This observation goes back to Hume, and has been clear to anyone that has thought about the issue for at least the last several hundred years. Wittgenstein made this point. Goodman made it. Fodor made it. Chomsky made it. Even I, standing on the shoulders of giants) have been known to make it. It is a very old point. The data (always a finite object) cannot speak for themselves in the sense of specifying a unique function that generate those data. Data cannot bootstrap a unique function. The only way to get from data to the functions that generate them is to make concrete assumptions about the specific nature of the induction and in one way or other, this means specifying (listing them, ranking them) the class of admissible functional targets. In the absence of this, data cannot induce functions at all. As Gs or generative procedures just are functions, there is a qualitative irreducible ontological difference between the PLD and the Gs that the LAD acquires. There is no way to get from any data to any function (induce any G from any finite PLD) without specifying in some way the range of potential candidate Gs. Or, to say this another way, if the target is Gs and the input is a finitely specified list of data, there is no way of uniquely specifying the target in terms of the properties of the data list alone.

All of which leads to consideration of a third gap: the evidence required for choosing among plausible competing Gs to which native speakers converge is underdetermined by the PLD available for deciding among competing Gs. So, not only do we need some way of getting native speakers from data to functions that generate that data, but even given a list of plausible options, there exists little evidence in the PLD itself for choosing among these plausible functions/Gs. This is the problem that Chomsky (and moi) has tended to zero in on in discussions of PoS. So for example, a perfectly simple and plausible transformation would manipulate inputs in virtue of their string properties, another in virtue of their hierarchical properties. We have evidence that native speakers do the latter, not the former. Evidence for this conclusion exists in the wider linguistic data (surveyed by the linguist and including complex cases and unacceptable data) but not the Primary linguistic data the child as access to (which is typically quite simple and well formed). Thus, the fact that LADs induce along structure dependent lines is an induction to a particular G from a given list of possible Gs with no basis in the data justifying the induction. So not only do humans project and not only do they do so uniformly, they do so uniformly in the absence of any available evidence that could guide this uniform projection. There are endlessly many qualitatively different kinds of inductions that could get you from PLD to a G and native speakers project in the same way despite no evidence in the PLD constraining this projection. The gap between plausible Gs and the PLD is strongly underdetermined.

It is worth observing that this gap does not require that the set of grammatical objects be infinite (though making the argument is a whole lot easier if something like this is right). The point is that native speakers make systematic conclusions about novel data. That they conclude anything at all implies something like a rule system or generative procedure. That native speakers largely do so in the same way suggests that they are not (always) just guessing.[1] The systematic projection from the data provided in the input to novel examples implies that LADs generalize in the same way from LAD input. Thus, native speakers project the same Gs from finite sample examples of those Gs. And that is the problem: how do native speakers bridge the gap between samples of Gs and the Gs of which they are samples in the same way despite the fact that there are infinitely many ways to project from a finite samples to functions that generate those samples. Answer: the projection is natively constrained in some way (plug in your favorite theory of UG here). The divide between sample data and Gs that generate them is further exacerbated by the fact that native speakers converge on effectively the same kinds of Gs despite having precious little evidence in the PLD for making a decision among plausible Gs.

So there are three gaps: a gap between utterances and sentences, a gap between sentences/phrases and the Gs that generate them and a gap between the Gs that are projected and the evidence to choose among these Gs in the PLD the child has access to while fixing these Gs. Each gap presents its own challenges, though it is the last two that are, IMO, the most serious. Were the problem restricted to getting from exemplars to idealized examples (i.e. utterances to sentences) then the problem would be solvable by conventional statistical massaging. But that is not the problem. It is at most one problem, and not the big one. So are there chasms that must be jumped, and gaps that must be filled? Yup. And does jumping them/filling them the way we do require something like native knowledge? Yup. And will this soon be widely accepted and understood? Don’t bet on it.

Let me end with the following observation. Randy Gallistel recently gave a talk at UMD observing that it is trivial to construct PoS arguments in virtually every domain of animal learning. The problem is not limited to language or to humans. It saturates the animal navigation literature, the animal foraging literature, and the animal decision literature. The fact, as he pointed out, is that in the real world animals learn a whole lot from very little while Eish theories of learning (e.g. associationism, connectionism, etc) assume that animals learn very little from a whole lot. If this is right, then the main problem with Eish theories (which as EKS note still (sadly) dominate the mental and neural sciences and which lie behind the widespread skepticism concerning PoS as a basic fact of mental life in animals) is that the way that it frames the problem of learning has things exactly backwards. And it is for this reason that Es have a hard time spotting the gaps. The failure of Eism, in other words, is not surprising. If you misconceive the problem your solutions will be misconceived as well.

[1] The ‘always’ is a nod to interesting work by Lidz and friends that suggests that sometimes this is exactly what LADs do.

Friday, February 2, 2018

Evolang one more time

Footnotes to Plato is a blog run by a philosopher-biologist named Massimo Pigliucci (MP). It has lots of interesting material and I have personally learned a lot by reading it. Currently, MP is writing a multi part commentary on a new book on evolution Darwin’s Unfinished Symphony by Kevin Laland. It’s on the evolution of culture and its impact on the evolution of mind. It is actually a pretty good read and, unlike much of the literature that discusses mind and culture, it does not fall into the continuity thesis trap that takes what humans do to simply be a beefed up version of what other animals do. In other words, it rightly treats the human case as different in kind and asks how this difference might have arisen. I don’t agree with everything Laland proposes, but it starts with the right presuppositions (what humans have really is different) and proceeds from there (see here for some brief discussion).

MP’s latest installment of his running commentary on Laland’s book (here) addresses the evolution of language. In the chapter, Laland surveys traditional accounts for how language arose in the species. Here is the list that MP compiled:

To facilitate cooperative hunting.
As a costly ornament allowing females to assess male quality.
As a substitute for the grooming exhibited by other primate species.
To promote pair bonding.
To aid mother-child communication.
To gossip about others.
To expedite tool making.
As a tool for thought.

Laland finds these wanting and adds another contender: language evolved to teach relatives. Laland spends lots of time in previous parts of his book arguing that learning via imitation and observation is a key feature of biological minds and that this power promotes biological success of the evo relevant variety. In this context, it is pretty clear why the pedagogical role of language might find a spotlight: language looks like an excellent medium for molding minds (though parents and teachers might beg to differ regarding how efficient a method it is!). At any rate, Laland’s proposal is that language evolved for instructional purposes, rather than to make tool making easier, or gossip more salacious, or promote pair-bonding or, or, or… Of course, once language arrived on the evo scene it could have served all these other functions as well, but according to Laland that was not what set the whole language thing in motion. Nope, it arose so that one day we could have long and boring faculty meetings at pedagogical institutions like UMD.

MP’s post critically reviews Laland’s proposal and points out that it does not obviously do better on the criteria proposed than do variants of the other rejected approaches. Moreover, MP argues, all these evo scenarios share a common difficulty; because the evolution of language has happened exactly once (i.e. it is a unique evo event) it’s very hard to provide convincing evolutionary evidence of the sort typically on offer for the various alternative scenarios. Here is MP:

For me, though, what makes this chapter the least convincing of those we have read so far is that even if we grant Kevin everything he is arguing for, we are still left, at best, with an hypothetical scenario that falls far short of empirical verification. Yes, maybe language evolved so that we could efficiently teach valuable information to our relatives, and things then went on from there. Or maybe there is a clever variant of one of the other hypotheses now on the table that will be even more convincing than the present analysis. Or perhaps there is yet another scenario that simply nobody has thought up yet. We just don’t know. And to be honest I don’t think we are likely to know any time soon, if ever. Precisely because of a major stumbling block acknowledged by Laland himself: the evolution of language was a unique historical event, and unique historical events are exceedingly difficult (though not impossible) to study.

MP goes on to flag Lewontin’s skepticism regarding the availability of robust evolutionary accounts for cognitive traits given the paucity of footprints in the fossil record left by the exercise of such capacities (see here). Lewontin’s point, that MP endorses, is that it is unlikely that we will ever get enough evidence to build a compelling case for the evolution of any human cognitive trait, including (especially!) language given its biological uniqueness and the faint traces it physically leaves.

I agree with much of this, but I think that it misses the real problem with Laland’s discussion, and with the other scenarios MP catalogues. The big hole in these accounts is that they fail to specify what exactly language is. In other words, the projects fail from the start as they do not sufficiently specify the cognitive capacity whose evolution we are interested in explaining.[1] What exactly is it that has evolved? What are its key properties/characteristics? Only after specifying these does it make sense to ask how it and they arose. Sadly Laland doesn’t do this. Rather he seems to presuppose that we all know what language is and so specifying the relevant capacity of interest in some detail is unnecessary. But linguists know that this is wrong. Language is not a simple thing, but a very complex capacity and so asking how it evolved is asking how all of these complex intricacies came together in humans and only in humans. So, the real problem with Laland (and MP’s discussion) is not just that relevant data bearing on evolutionary scenarios sucks (though it does) but that most of the discussions out there fail to specify what needs explaining. Only after answering this question in some detail can the evolutionary question even be broached coherently.

Let me expand on this a bit. MP starts his comment on Laland as follows:

Despite much talk of animal communication, that’s just what other species do: communicate. Language is a very special, and highly sophisticated, type of communication. Characterized by grammar, capable of recursivity, inherently open ended. Nothing like that exists anywhere else in the animal world. Why?

Given this preamble, the thing that MP (and I assume Laland) thinks needs explaining is how a certain kind of grammar based system of communication arose, with emphasis on ‘grammar’ (after all, this is one key factor that makes human communicative systems unique).

So what features does such a system have? Well, it generates unboundedly many hierarchical structured objects that pair a meaning with an articulation. But this is not all. In addition, its use is very very labile (there is no apparent restriction on the kinds of topics it can be used to “discuss” and it exploits a lexicon several orders of magnitude larger than anything else we find in animal communication systems and whose entries have semantic features quite unlike those we find with other animals. In sum, the syntax of human language, the vocab of human language and the applicability of human language are each unique.

More specifically, as GGers know human Gs embody a very specific form of hierarchical structure (e.g. binary branching, labeled nodes), a very specific form of recursion (e.g. Merge like rather than say FSG like) and human G use is open ended in many different ways (e.g. its use is not stimulus bound (i.e. you can talk about what’s not right before your eyes (viz. independently of the famous 4-Fs) or even actual), the semantics of its atoms are not referentially constricted,[2] its domain of application seems to be topic neutral (i.e. not domain restricted like, say, bee dances or vervet alarm calls)). And all of the above is still a pretty surfacy description of just some of distinctive features of human language (there is nothing quite like morphology evident in other communication systems either). As any GGer can attest, the descriptions available for each of these features that are empirically well motivated are endless.

I could go on, but, even this very cursory and brief description suffices for the main point I want to make: if these are the features that make human language unique then the evolutionary forces Laland lists, including his own, don’t in any obvious way get anywhere near explaining any of them. To wit: How does the fact that language is used to teach realtives or to gossip about them (or others) explain the fact that human Gs are hierarchically recursive, let alone recursive in the specific way that they are? How does the possibility that language promotes pair bounding or can be used to identify predators or to support good ways to hunt explain why human linguistic atoms are not particularly referentially bound? How does the claim that language can guide tool making or teach migration patterns explain why humans can use language in a non-stimulus bound way? How do any of these “functions” explain why the domains of application of human language are so labile? They don’t. Not even close. And that is the real problem. Not only is relevant evidence hard to come by (i.e. Lewontin’s point) but, more importantly, the form of the accounts are conceptually insufficient to explain the (acknowledged) unique features of interest. The problem, in other words, is that the proposals Laland (and MP) survey fail to make contact with the properties that need explaining. And that is far more problematic than simply being empirically hard (maybe, impossible) to verify.

Let me be a little harsher. A standard objection (again from stemming from Lewontin) is that many evolutionary accounts are just-so stories. And this is correct. Many are. And this is indeed a failing. Let’s even say that it is a very serious failing. But whatever their vices, just-so stories do have one vital ingredient missing from the accounts Laland and MP survey: were they accurate they would explain the relevant feature. Why did moths go from light colored to dark when pollution arose? Because the white ones were less able to camouflage themselves and were eaten leaving only the dark ones around. I don’t care if this story is entirely correct (but see here reporting that it is). It has the right form (i.e. if correct it would explain why the moths are speckled dark). So too stories we tell about why polar bears are white and why giraffe necks are long. However, this is precisely what is missing from most EvoLang accounts, including Laland’s. Or more precisely, if the features of interest are the ones that MP notes at the outset (which, recall, MP flags as being what makes human communication systems distinctive), then it is entirely unclear how the gossiping, teaching, cooperating would fuel the emergence of a system that is recursive, non-referential, domain general and stimulus free. So, the accounts fail conceptually, not just empirically. These accounts are not even just-so adequate. And that is a big failure. A very big failure. Indeed, an irreparable one![3]

I could go further (and so I will). Given an FL like ours which produces Gs like ours with generative procedures like ours and vocabulary items like ours it is pretty easy to tell a story as to how such a system could be used to do wonderful things, among others teach, gossip, makes tools, coordinate hunts, discuss movie reviews and more and more and more. That direction is easy. Given the characteristics of the system of language the variable uses it can be deployed in service of is pretty easy to understand. Not so the opposite. Even if teaching or bonding or gossiping is important it is not clear why doing any of these things demands a system with the special properties we find. One could imagine a perfectly serviceable teaching system that did not exploit lexical items with the peculiar semantic properties ours do or did not have generative procedures that allowed for the construction of endlessly hierarchically complex structures or that allowed for vastly different kinds of articulators (hands and tongues) or… You get the point, though, sadly, it seems to be a hard one to get. It is the point that Chomsky has been repeatedly making for quite a while now and it correctly flags the fact that an adequate evolutionary account of a capacity logically require a specification of the capacity whose evolution is being accounted for. This, after all, is the explanadum in any EvoLang account and, as such, is the explanatory target of any admissible explanans. Laland doesn’t spend much time specifying the features that make human language unique (the one’s that MP limns) and so spends no time explaining how his candidate proposal leads to communicative systems with these properties. Not surprisingly, then, the accounts he surveys and the one he provides don’t explain how these capacities could have arisen, let along how they actually did.

So, another discussion of evolang that really gets nowhere. This is nothing new, but it is sad that such smart people (and they are very very smart) are derailed in the same old uninteresting way. We really do know a lot about human language and its unique features. It would be nice if evolutionary types interested in evolang would pay some attention (though I am really not holding my breath).

[1] The very first comment on MP’s post by saphsin correctly makes this point.

[2] See here for some discussion of this and more specifically Paul Pietroski’s discussions of how little linguistic meaning has to do with truth (e.g. Paul’s contribution here and articles on his webpage here).

[3] I do know of a story that does not make this mistake and that concentrates on trying to explain some features on evolutionary terms. It’s one that Bob Brandon and I provided many many years ago here: From Icon to Symbol: Some Speculations on the Evolution of Natural Language (1986), Philososphy & Biology. Vol. 1.2 pp.169-189. This speculative paper no doubt suffers from Lewontin’s critique, but at least it tries to isolate different features of the overall capacity and say which ones might be have an available evolutionary explanation. This virtue is entirely due to Robert Brandon’s efforts (he is a hot shot philosopher of biology and a friend).