Thursday, October 5, 2017

Pity the non-Chomskyan!

Pity the non Chomskyans! They don’t value their work except in opposition to what Chomsky does (or doesn’t). The only glory they prize is reflected, and they will go to great lengths to sun themselves in it, even to the point of (knowingly?) distorting the mirror in which they reflect themselves. Imagine the irony that someone like me perceives. I am constantly remonstrated with for not sufficiently valuing non GG work and then when I look at some I find that the practitioners themselves only prize their research to the degree that it overturns some GG nostrum and thereby “revolutionizes” the study of language (never a brick in the wall for them, always a complete overturning of the basics). It would appear that for these investigators Chomsky has indeed defined the limits of the interesting in the study of language (a view I have some sympathy with, I would add) and that anything that does not directly address a point that he has (allegedly) made is of little value. Indeed, compared to them, my insistence that one can study language with interests orthogonal to GG’s must seem disingenuous. To non-Chomskyans, Chomsky is everywhere and always and their research is nowhere and never unless it confronts his.

A recent addition to this literature of self-loathing is making quite a splash (here, here, here, here). Part of the splash can be traced to the PR-Academy complex that I mentioned in a previous post (here). Some of the co-authors have Max Planck affiliation and so the powerful PR Wurlitzer has been fully cranked up to spread the word.

However, part of the splash is due to the claim the paper makes that Chomsky is, once again, wrong, more specifically that culture rather than biology is what drives language structure. Of course, as you all know, this is one fork in the intellectual road that any sane person should immediately take. Is it culture or biology? Well, depending on the linguistic feature of interest it could be either, neither or both.  Language, we all know, is a complex thing and the confluence of many different interacting causal forces. Everybody knows this, so it is not news (though it is often intoned as if it were a great discovery, like people noting, sagely, that any given cognitive capacity is the combination of learned and innate factors (duh!)). What is news is finding out which factor predominates for any given property of interest and how it does so. But finding this out in any given case will not (and I can guarantee this) discredit the causal efficacy of other factors in other cases. And, moreover this can hardly be news. So, this is something that GG has acknowledged for a very very very …very long time.  Even if you think, as I do, that biology (widely construed) plays the lead role in restricting the class of Generative Procedures (GP) available to human Gs, you need not think that culture (widely construed) plays no role in determining what a given G looks like. For example, why I have the G that I have is not exclusively due to my having a human FL/UG. I have an Englishy G because I grew up in an English speaking environment, was culturally exposed to Howdy Doody, Captain Kangaroo, and Rocky and Bullwinkle, and read Bill Shakespeare in high school. Many of my G’s idiosyncrasies are similarly cultural (e.g. I am a proud speaker of a dialect in which Topicalization (aka Yiddish movement) runs rampant). But I very much doubt the fact that my Topicalization forming G displays ECP effects has much to do with Rocky, Shakespeare, Bullwinkle or Sholem Aleichem. Here I look to biology to explain why my G obeys the ECP (and for the familiar Poverty of Stimulus reasons which I could go on about for hours (and have)). So biology AND culture, with each playing a more prominent role depending in the phenomenon of interest.

Curiously, this most obvious position is tacitly denied by non-Chomsyans. They act as if Chomskyans must think that anything languagy must reflect innate features of the mind/brain and so that if anything is shown to not be such, then this shows that Chomsky was wrong. And their obsession with showing that Chomsky is wrong suggests that they believe that unless he is, then what they have shown about, say, the influence of cultural mechanisms on some languagy fact is inherently BORING, without any possible intrinsic interest. This, atleast, would neatly explain why non-Chomskyans consistently assume that Chomsky’s position consists in the absurd claim that anything involving language in any way must be innate.

You probably think that I am exaggerating here. But I am not, really. Here is the authors’ summary of the Dunn, Greenhill, Levinson, and Gray (DGLG) paper published in Nature:

Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language (1, 2). In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases (3, 4, 5), rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework (6). First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that—at least with respect to word order—cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.

Let’s engage in some initial parsing. The paper aims to show that language change (in particular word order changes in diachronically related languages) is path dependent, with different dependencies changing at different rates across different groupings of languages. DGLG concludes from this that the transitions between the languages is not driven by innate features of FL/UG, nor does it reflect systematic universal probabilistic biases. And they conclude form this that Chomsky and Greenberg must be wrong. I am not qualified to discuss Greenberg’s positions in any detail, but I would like to cast a very skeptical eye towards the claims made for Chomsky’s parameter views.

Let’s read the above précis a little more carefully.

First, DGLG focuses on “languages” and the diachronic changes between them. To be  GG/Chomsky relevant, we need to unpack this and relate it to grammars. With this translation we get the following:

Grammars (G) vary widely but not without limit. The central goal of linguistics is to describe the diversity of human Gs and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language…

Second, I disagree with even this reworked version of DGLG’s claims about the aim of linguistics, at least as GG and Chomsky understand it. The ultimate aim is to describe the structure of FL/UG. A way station towards this end is to understand the structure of human Gs used by speakers of different languages. Hence, describing these (their commonalities and differences) is a useful proximate goal towards the ultimate end. Linguists have traced some of the differences between human languages to differences in the Gs that native speakers use. This implies diverse Gs and this further implies that FL must be capable of acquiring Gs at least as diverse as these (and maybe yet more diverse given that it is unclear that 7,000 (the purported number of languages out there) marks the limit of G diversity). So yes, describing the variety of human languages to the degree that it enables us to describe the variety of human Gs is a useful step in exploring the structure of FL/UG but the ultimate aim of linguistics is to understand the structure of FL/UG not to describe the diversity of human Gs, or “the diversity of human languages and the constraints on that diversity”.[1]

Third, the phrase “constraints on that diversity” is ambiguous. One reading is anodyne and correct. One aim of GG has been to describe principles of grammar invariant across Gs, the idea being that these will reveal the design features of FL/UG.[2] This does not imply that the language products of these Gs will manifest invariant patterns. Missing this is to again confuse the difference between Chomsky and Greenberg Universals. Two Gs may embody the very same principle and yet the products of those Gs might differ greatly. Thus, for example, Rizzi’s early proposal concerning the Fixed Subject Effect is that the ECP (which, let us say underlies the effect) holds in both Italian and English but the two Gs derive subject A’-movement in different ways so that only English perceptibly falls under the scope of the ECP.[3] In other words, Italian manages to derivationally escape the purview of the G invariant ECP and hence does not show Fixed Subject Effects. Note, crucially, Italian does embody the ECP but it does not show Fixed Subject Effects. The “languages” (English vs Italian) differ, the invariant principle (the ECP) is the same.  So the move from invariant principles to invariant language effects is not one that any GGer can or should blithely license. In sum, if you mean that a goal of Chomskyan linguistics is to describe the properties of Gs that arise as products of the design features of FL/UG then you would be correct. But this still leaves distance between this and invariant properties of languages.[4]

So rightly understood, describing G invariants is a proximate goal of GG inquiry. But this must be distinguished from a different project: explaining the limits of diversity. It is entirely possible that Gs have invariant properties without it being the case that there is a limit on G diversity. Let me explain. One of the innovations of LGB is its Principles and Parameters (P&P) architecture. The idea is that FL/UG specifies not only the invariant properties of Gs but all of the ways that Gs could possibly differ. These differences are coded as a finite number of two valued parameters with a given G being a (vector) specification of these specific values. As P&P was understood to have a finite number of parameters, say N, and as they could only bear one of two values this meant that there were at most 2N distinct Gs that FL/UG permitted. On this LGB/P&P conception there is a reasonable sense in which FL/UG could explain the limits of G diversity.  So, DGLG are correct in thinking that some version of GG, the LGB/P&P theory aimed to place strong limits on G diversity.

However, this theory did not address how parameters were set or how parameters changed over time as Gs changed. P&P theories must be supplemented with theories of learning/acquisition to provide a theory of language change. Or, even if you think that there are only a finite number of Gs because any G is simply a list of finite parameter values, you still need a theory of how parameters are fixed to explain how parameters change over time. Now, one theory of G change would be that it tracks some intrinsic structure/fault lines of the parameter space (e.g. parameter 1 links to 2 and to 4 so that if you change the value of 1 from A to a then you need to change the values of 2 from B to b and 4 from C to c). This is one possible theory. Call it an endogenous theory of parameter setting (EnPS). EnPS accounts would provide FL/UG internal paths along which G change would occur and would provide a very strong implicit theory on the dynamics of G diversity. It would not only explain what the range of possibility is, but would also specify the possible range of changes between Gs that is available. Note that this kind of view need not endorse the position that all G change is canalized by FL/UG. It is possible that some changes are and some are not. But the strongest view would aim to predict the dynamics of G change entirely from the endogenous structure of FL/UG.

To my knowledge nobody has ever proposed such a view. In fact, to my knowledge, such a view has been understood to be very problematic, the reason being that the degree to which the parameters are mutually dependent to that degree the problem of incremental parameter setting increases. In fact, were all the parameters to speak to one another (i.e. the value of any parameter being conditional on the value of every parameter) the problem of incremental parameter setting becomes effectively impossible.

Dresher and Kaye discussed this first a while ago as regards stress systems, and Fodor and Sakas have explored this in detail as regards syntactic parameters. The solution has been to try and identify linguistic “triggers,” types of data that relied exclusively on the value of a single parameter. Triggers, in other words, are ways of trying to finesse the intractability of incremental parameter setting without denying that parameters are inter-twined. The idea is that their intermingling need not appear everywhere in the PLD and that all that setting requires is that there be some PLD data that unambiguously reveals what value a given parameter has. In other words, the idea is that in some domains the parameters function as if independent of one another (do not interact) and this relieves the computational problem that intertwining presents.

Why do I mention this? Because, the bulk of work on parameters has not been in trying to limn their interdependencies, but to isolate them and render them relatively independent so that incremental parameter setting be possible. In other words, so far as I know, there has precious little work or commitment to a EnPS kind of theory within Chomskyan GG. Moreover, and this is the important bit, a Chomskyan theory of GG does not need such an account. In other words, if it is true, then it is very interesting, but there is nothing as regards the Chomskyan project that requires that something like this be available. It is entirely consistent with that project that there be no explicit or implicit dynamics coded in the parameter space. So asserting that the absence of such a theory of parameters challenges the Chomsky conception of FL/UG (which is what DGLG does) is just plain wrong. Or, to put this another way: claiming that there is a richly structured FL/UG is compatible with the claim that FL/UG does not determine how populations of speakers  move from one G in the space of possible Gs to another.[5]

We can in fact go further. As those who have read some linguistics since LGB know, the idea that FL/UG contains a finite list of parameters that delimit the range of possible Gs has been under debate. We have even talked about this on FoL (e.g. here, here, here, here). What’s important here is that the parameters part of P&P is not an intrinsic part of the Chomskyan problematic. It might be true, but then it might not be. There are theories like GB that endorse a P&P architecture and there are accounts like that in Aspects or LSLT (and, from the way I read it, current MP accounts) that do not. If FL/UG has a list of specified parameters, that would be an amazing and remarkable discovery. But the evidence is not overwhelming that this is the case (so far as I can tell as a non expert in these matters) and if it is not the case it does not mean that Chomsky is wrong in claiming that we have an innately specified FL/UG that limits the properties of human Gs. All it means is that there are some features of Gs about which FL/UG is mute. Happily, that would leave something for non syntacticians to do (e.g. provide theories of learning that would address how we go from PLD to Gs given FL/UG (e.g. as Yang and Lidz do, for example)).

In sum, DGLG’s claims about the implications of their results for the Chomsky enterprise are flatly wrong, and in two ways. First, nobody has proposed the kind of theory that DGLG’s data is meant to refute and second the Chomskyan conception does need a theory of the kind that DGLG’s data is meant to refute. So, whether or not DGLG is correct is at right angles to Chomsky’s central claims. And, what is more, I suspect that DGLG knows this (or at least should have). Let me say a word about this.

DGLG cites LGB and Baker’s book as the source of the ideas that the paper argues to be incorrect. However, DGLG cites no specific passages or pages for this claim. Why not? When Chomsky goes after someone critically he does so chapter and verse. He quotes exactly what his protagonist says before arguing that it is incorrect. This is not what Chomsky’s critics generally do. Rather they assert in very broad brushstrokes what Chomsky’s views are and then go on to state that they are inadequate.[6] The problem is that what they criticize is often not his views. The fact that this happens so often leads me to think that this is not accidental. Either critics do not care what his views are (they only care to discredit them so as to discredit him) or they are too lazy to do serious criticism. I am not sure which is worse, but they are both serious intellectual failings.

What I did not realize until recently is that Chomsky’s critics might well be motivated less by malice and sloth than by a deep intellectual insecurity. Many of Chomsky’s critics are upset by the possibility (fact?) that he does (might?) not care about what they are doing. What motivates some critics, then, is the suspicion (fear?) that what they are doing is of little value. Too assuage this desperation, they orient their conclusions as rebuttals of Chomsky’s putative views. Why? Because they are sure that what Chomsky does is interesting and so they reassure themselves that their work has value by arguing that it shows that his views are wrong. The implication is that if this were not the case (and very often it is in fact not the case as the empirical conclusions are generally irrelevant to Chomsky’s claims (Everett is the poster child for this)) then their own work was boring and of dubious interest.[7] And I thought that I was in thrall of Chomsky![8]

Let me end with one more point regarding the DGLG paper and then point you to a very good review.

DGLG focuses entirely on word order. What DGLG means by “linguistic structure” is word order properties of utterances/sentences. It says this very explicitly. But if this is the focus then DGLG must know that it will be of dubious relevance to Chomsky’s central claims which, as everyone knows by now, considers word order effects to be at best second order (and maybe even less relevant) as regards the central features of FL/UG. Word order effects are, of course, centrally relevant to Greenberg’s conceptions and there are GGers who have concentrated on this (e.g. Kayne) and we have even covered some of this in FoL (see Culbertson and Adger here). But as regards Chomsky’s views, word order effects are decidedly secondary. Indeed, from what I know of Chomsky’s views, he might agree that word order effects are entirely “cultural” in the sense of driven by the properties of the child’s ambient linguistic environment.[9] So far as I can tell, nothing Chomsky has said in recent years (or before) would be inconsistent with this. So the fact that DGLG knowingly focuses on the kinds of effects that the authors (or at least some of the authors (I am looking at you Mr Levinson)) know are at right angles to Chomsky’s concerns further buttresses my conclusion that DGLG thinks its results worthless unless they directly gainsay Chomsky’s views. Sadly, if this is DGLG’s position, then worthless it is. The paper even if completely correct scarcely bears on Chomsky’s central claims. Fortunately, the DGLG conclusions are worth thinking about, IMO, even if they do not bear on Chomsky’s views at all. Let me turn to this briefly.

I have spent a lot of time pooping on the DGLG paper’s claim that it overturns some central Chomskyan dogma. However, contrary to the authors, I am not as sure as they seem to be that its results are uninteresting unless they bear on Chomsyan/GG concerns. My tastes then are more catholic than DGLG’s. I believe that finding that G change is path dependent is potentially very interesting, even if not all that new. It is not a new idea as it is already embodied in the position that language contact can affect how Gs change (how Gs change (and maybe even their rates of change) is likely a function of the specific properties of Gs in contact. If so, change will be path dependent. Indeed, from what I know this is the standard view, which is why Bickerton’s contrary claim is so contentious.

Mark Liberman has an excellent discussion of the DGLG paper (here) that touches on this point as well as many others way above my pay grade (damn I wish I knew more stats). There is also some interesting response by Greenhill in the comment section of Mark’s post, though I personally think that he fails to engage the main point concerning the large number of ignored dimensions and the kind of structure they might contain. As Mark observes, there is lots of room in these ignored dimensions for an EnPS story should one care to make one. 

I also think that Liberman’s last point touches on something critical. As he points out, at least as regards GGers who work on G change (like Tony Kroch or David Lightfoot or Ian Roberts): “…features like “OBV” (the code for whether objects follow verbs) should be seen as superficial grammatical symptoms rather than atomic grammatical traits” (3). This points to a larger problem of the relevance of DGLG to GG research into diachrony; DGLG takes the project to be language change rather than G change (as I noted above, these are not the same thing). Greenhill responds to this that these categories are not his, but those that other people have identified, DGLG just aiming to test them. The problem Mark is pointing to is that they are the wrong things to test, at least if one’s interests lie with the structure of Gs.  Going from overt language to G rules/parameters is not straightforward (see Dresher and Kaye, and Fodor and Sakas). What is relevant to speakers qua Language Acquisition Devices is the features of the Gs, so abstracting from this is, as Mark observes, a problem.

Ok, should you read the DGLG paper? Sure. It is very short and potentially interesting (though, IMO, inevitably overhyped) and, as Mark Liberman notes, the product of a lot of hard work. But, it is also deeply misleading and, IMO, borders (well, IMO, crosses the border to) dishonest. The source of the dishonesty is likely overdetermined. I mentioned malice and sloth. But I suspect that intellectual insecurity is really what is driving the anti GG, anti Chomsky slant. Anti Chomskyans do not have the courage of their stated interests. So, when you are done reading DGLG, spend a second mourning the sad plight of the non Chomskyan. Only by being anti can they be at all. Sad really. And I would be greatly sympathetic regarding this insecurity were they not sullying the intellectual landscape in trying to convince themselves of the value of their research.

[1] An analogy: the aim of astronomy is not to describe the motion of the planets but to describe the forces that have the planets move as they do. Of course an excellent first step towards the latter is a decent description of the former. But the two goals are not identical. Moreover, as Newton discovered, describing the motion of non-planets here on earth is also a useful step in exposing the forces at work our there in the heavens.
[2] Though, IMO, looking for common detectable features across individual Gs is not as useful as generally supposed.
[3] Perceptible to the linguist that is, not the LAD as the evidence will be of the negative variety; fixed subject violations lead to unacceptability.
[4] My reading of Greenberg is that his project was to identify invariant properties of languages. From what I’ve seen (which is limited) my impression is that typologists are pretty skeptical that many of these exist. If this is right, then it seems as regards “surfacy” language properties, invariants are pretty hard to come by. This would be no surprise to a Chomskyan GGer.
[5] Which is not to say that this might not be a very interesting question to investigate, and GGers did so. See Berwick and Niyogi’s early work on this in a kind of Neo-Darwinian setting.
[6]  I want to emphasize the broad brushstroke nature here. Like any interesting view, Chomsky’s has several interacting sub-parts and is based on several assumptions. It is entirely possible (probable?) that he may be right in some ways and wrong in others (as is true of most everyone’s views). The goal of a critic is to isolate how someone is wrong, and then means getting into details. What specific assumption is false? What particular inference would we like to challenge? Quoting passage and verse forces one to zero in on the problem. Citing LGB as the source does not. So, not only is this lazy, but it allows a critic to let him/herself off the hook and allows him/her to avoid explaining in detail how someone is wrong. As criticism is valuable in allowing us to isolate troublesome assumptions, this kind of lazy citation promotes obfuscation. So what part of Chomsky’s assumptions are wrong? Well you know the LGB part. Really? Common.
[7] I personally would not conclude this. But it provides a reasonable motive for the otherwise inexplicable regular incapacity of Chomsky’s critics to get his views right.
[8] It is interesting that Chomsky does not feel equally threatened by work different from his own. He just gets on with it. Yes, he responds to critics. But he what he mainly does is define the project and get on with it. It would be nice if this were the norm.
[9] To be honest, I do not know what ‘cultural’ means. I am assuming it means to contrast with biological, memes and all that. In effect, fixed via something like learning rather than fixed by biological make-up.


  1. David Foster Wallace wrote in Infinite Jest that 'defining yourself in opposition to something is still being anaclitic on that thing, isn’t it?'

    Orienting one's scientific pursuits solely around a quest to undermine and out-do some figurehead usually stymies originality and, as a consequence, reduces the ultimate long-term impact - probably one of the reasons why so many of these anti-generative papers hit the headlines for a few weeks and then quietly disappear and are never cited again.

  2. Are you conflating (or confusing) two very different papers here? A quick glance shows that new Greenhill et al. PNAS paper has no mention of Chomsky or GG or UG anywhere, nor do their press materials. Just as well since it's irrelevant: the paper deals with other matters. So when you write "However, part of the splash is due to the claim the paper makes that Chomsky is, once again, wrong (...)" I wonder what claim you're referring to.

    The tired "UG is wrong" Newsweek coverage you link to does refer to the paper but appears to be concocted entirely by Newsweek and bears no relation to the actual claims of the paper or of the "PR Wurlitzer" materials as far as I can see. Finally, the Dunn et al. Nature paper that you devote most of this post to is old news — it's from 2011.