Comments

Tuesday, October 31, 2017

Scientific myths

Like any other organized body of doctrine, science has its founding myths. Modern science has two central ones: (i) That there exists a non-trivial scientific method (SM)[1] and (ii) that theory in science plays second fiddle to observation/experiment. Combine (i) and (ii) and we reach a foundational principle: good science practice discards theories when they clash with experimental observation.  And, maybe just as important, bad science involves holding onto theories that conflict with experiment/observation. Indeed, from the perspective of SM perhaps the cardinal irrationality is theoretical obstinacy in the face of recalcitrant data.[2] 

There are several reasons for the authority of this picture. One is that it fits snugly with the Empiricist (E) conception of knowledge. As I’ve noted before (much too often for many of you I suspect) E is both metaphysically (see here) and epistemologically (see here) suspicious of the kind of generalizations that required by theory.

Metaphysically, for E, observation is first, generalizations are second, the latter being, summations of the former. As theories, at least good ones, rest on generalizations and given that they are only as good as the data that they “generalize” it is not surprising that when they come in conflict, theories are the things that must yield. Theories and laws are, for E, useful shorthand compendia of the facts/data/observations and shorthands are useful to the degree that they faithfully reflect that which they hand shortly.

Epistemologically, generalizations are etiologically subsequent to observations. In fact they are inductively based on them and, in the limit, should do nothing more than summarize them. Good scientific practice should teach how to do this, should teach how to eliminate the “irrationalities” that waylay legit inductions forcing them away form the data that they are (or should be) built on. So again, in practice, when data and generalization/theory conflict, the problem likely lies with some illegitimate bias tripping up the induction from data to generalization/theory.

There is a less highfalutin reason that makes the two fold myth above attractive to scientists. It lends them authority. On this view, scientists are people who know how to see the world without illusion or distortion. They are privy to a method that allows them to find the truth (or at least not be distracted from it. So armed, scientists have a kind of epistemological expertise that makes their opinions superior to those of the untrained. The speculations of scientists are grounded in and regulated by the facts, unlike the theories (and prejudices) of the unwashed. Observe that here the scientific opinion is grounded in reality, and is not just one opinion among many. Being scientific endows legitimacy. Being non-scientific removes it. 

This thnking is a holdover from the old demarcation debates between science and non-science (i.e. religion, ethics, prejudice, etc.) that the Positivists loved to engage in. Within philosophy proper the idea that there exists a sharp demarcation between science and non-science is not well regarded anymore. In fact, it has proven very difficult to find any non-circular ways of establishing what belongs on either side of the divide. But scientists don’t always hold the wisdom of philosophers in high regard, especially when, IMO, it serves to puncture their self-regard and forces them to reconsider the degree to which their scientific credentials entitles them to automatic deference in the public sphere. Most everyone enjoys the deference that legitimate authority confers, and science’s access to such revolves around the combo of (i) and (ii) above. The bottom line is that the myth buttresses a very flattering view: scientists think more clearly and so see better because their views are based in the facts and so deserve deferential respect.

So here are two reasons that the founding myth has proven so strong. But there are other reasons too. We tend to tell our stories (at least our mythical ones) about how science advances largely in terms that fit this picture. A recent short Aeon paper discusses one such founding myth involving that great scientific hero Galileo and the Copernican world view. Btw, I am one of those that count both as heros. They really were pretty great thinkers. But as this paper notes, what made them such is not that they were unafraid to look at the facts while their opponents were mired in prejudice. Nope. There was a real debate, a scientific one, based on a whole series of interacting assumptions, both empirical and theoretical. And given the scientific assumptions of the time, the push back against Galileo’s Copernicism was not irrational, though it proved to be wrong.[3]

The hero of the short piece is one Johann Locher. He was a Ptolemeian and believed that the earth was the center of the universe. But, he made his case in purely scientific terms, the biggest problem for the Copernican vision being the star-size problem which it took some advances in optics to square away. But, as the piece makes clear, this is not the view we standardly have. The myth is that opposition to Galileo/Copernicus involved disregard for the facts driven by religious prejudice. This convenient account is simply false, though it part of the reason it became standard is Galileo’s terrific popular polemic in his Dialogues Concerning the Two Chief World Systems.

Christopher Graney, the author of the short piece, thinks that one baleful result of the scientific caricature Galileo unleashed is today’s science skepticism. He believes that today’s skeptics “… wrap themselves in the mantle of Galileo, standing (supposedly) against a (supposedly) corrupted science produced by the ‘Scientific Establishment’” (3). This may be so. But I doubt that this is the only, or most important problem with the myth. The real problem (or another problem) is that the myth sustains a ghostly version of the demarcation criterion among working scientists. Here’s what I mean.

As I’ve noted before, there is a general view among scientists that data trumps, or should trump, theory. The weak version of this is unassailable: when data and theory clash then this constitutes a prima facie problem for theory. But this weak version is compatible with another weak view: when data and theory clash this constitutes a prima facie problem for the data. When there is a clash, all we know, if the clash is real, is that there is either a problem with the theory or a problem with the data and that is not knowing very much. No program of action follows form this. Is it better to drop/change the theory to handle the data or to reanalyze the data to save the theory? Dunno. The clash tells us nothing. However, due to the founding myth, the default view is that there is something wrong with the theory. This view, as I’ve noted, is particularly prevalent in linguistics IMO and leads the field to dismiss theory and to exalt description over explanation. So, for example, missing a data point is considered a far worse problem than having a stilted explanation. Ignoring data is being unscientific. Eschewing explanation is just being cautious. The idea really is that facts/data have an integrity that theories do not. This asymmetric attitude is a reflection of Science’s founding myths.

So where does this leave us? The aim of science is to understand why things are as they are. This involves both data and theory in complex combinations. Adjudicating between accounts requires judgment that rarely delivers unequivocal conclusions. The best we can do is hold onto two simple dicta and try to balance them: (i) never believe a theory unless grounded in the facts and (ii) never believe a fact unless grounded in a theory. It is the mark of a truly scientific temperament, in my view, that it knows how to locally deploy these two dicta to positive effect in particular circumstances. Unfortunately, doing this is very difficult (and politically it is not nearly as powerful as holding the first of these exclusively). As Graney notes, “science has always functioned as a contest of ideas,” not just a contest of competing observations and data points. Facts (carefully curated) can tell us how things are. But scientific explanation aims to explain how things must be, and for this, facts are not enough.



[1] By this I mean a substantive set of precepts rather than cheers of the sort “do your best in the circumstances at hand,” or as Percy Bridgeman said “use your noodle and no holds barred.”
[2] The reply to this is well known: a theory is responsible for relevant data and what exactly counts as relevant often requires theory to determine. But I put such niceties aside here.
[3] A most amusing discussion of this period can be found in Feyerabend’s writings. He notes that there were many reasons to think that looking through a telescope was hardly an uncontroversial way of establishing observtions. He is also quite funny and does a good job, IMO, of debunking the idea that a non-trivial SM exists. By non-trivial I intend something other than “do your best in the circumstances.”

Tuesday, October 24, 2017

So not only me on why only us

The facts are clear: nothing does language like humans do. Nothing even comes close. I've repeatedly made this point. But it comes with pushback, often from Darwinian acolytes who insist that if this cannot be so. Such qualitative divides are biologically unbridgeable and so it what I deem obvious cannot be so. It must be that other animals also do language, at least in part, and what we do is just a souped up version of what they do.

I mention this because every now and then I come across an ego biologist who sees exactly what I do: that we are linguistically unique. And sees this in roughly the way I do, as obvious! Here is a recent discovery.

Massimo Pigliucci has a blog, Footnotes to Plato (which I recommend btw), and here he discusses various issues in biology and philosophy. He also gives extended reviews of books. His latest post (here) discusses a recent book by Kevin Laland which touch on the topic of human uniqueness. Not only does nothing do language like we do, but nothing does culture like we do and nothing does mind reading like we do and ... (no doubt all of these facts are related, though how is as yet unclear). At any rate, the facts are clear: "...if a complex mind, language and a sophisticated culture are truly advantageous for survival and reproduction, why did they evolve only in the human lineage?" (1).

Thems the facts. The biological problem is how to explain this. A good first step involves understanding the contours of the problem and this involves recognizing the obvious.

It will also require more: precisely identifying those properties that we have that are unique. If it is language, then what about language is species specific? You know the MP line; it's recursion. But there may be a lot more (e.g. the labile nature of our lexical items). And once one has identified these features we need to ask what mental powers they require. These are first steps towards a rational discussion, not final ones.  Sadly, they are rarely taken. But don't believe me on this. Read Pigliucci's post and his discussion of the push back one gets from spotting the obvious.

So, take a look at the post and at the book (something I have not yet done but intend to do). It looks like there may be someone worth talking to out there.

Monday, October 23, 2017

The future of (my kind of) linguistics

I have been pessimistic of late concerning the fate of linguistics.  It’s not that I think it is in intellectual trouble (I actually cannot think of a more exciting period of linguistic research), but I do think that the kind of linguistics I signed up for as a youth is currently lightly prized, if at all. I have made no secret of this view. I even have a diagnosis. I believe that the Minimalist Program (MP) has forced to the surface a tension that was inchoate in the field since its inception 60 or so years ago. Sociologically, within the profession, this tension is becoming resolved in ways that disfavor my conception of the enterprise. You have no doubt guessed what the tension resides in: the languist-linguist divide. Languists and linguists are interested in different problems and objects of study. Languists mainly care about the subtle ways that languages differ. Linguists mainly care about the invariances and what these tell us about the overarching capacities that underlie linguistic facility. Languists are typologists. Linguists are cognitivists.

Before the MP era, it was pretty easy to ignore the different impulses that guide typological vs cognitive work (see here for more discussion). But MP has made this harder, and the field has split. And not evenly.  The typologists have largely won, at least if one gauges this by the kind of work produced and valued. The profession loves languages with all of their intricacies and nuances. The faculty of language, not so much. As I’ve said many times before, and will repeat again here, theoretical work aimed at understanding FL is not highly valued (in fact, it is barely tolerated) and the pressures to cover the data far outweigh demands to explain it. This is what lies behind my pessimistic view about the future of (my kind of) linguistics. Until recently. So what happened? 

I attended a conference at UMD sponsored by BBI (Brain and behavior initiative) (here). The workshop brought together people studying vocalization in animals and linguists and cog-neuro types interested in language.  The goal was to see if there was anything these two groups could say to one another. The upshot is that there were potential points of contact, mainly revolving around sound in natural language, but that as far as syntax was concerned, there is little reason to think that animal models would be that helpful, at least at this point in time. Given this, why did I leave hopeful?  Mainly because of a great talk by David Poeppel that allowed me to glimpse what I take to be the future of my brand of linguistics. I want to describe to you what I saw.

Cog-neuro is really really hard. Much harder than what I do. And it is not only hard because it demands mastery of distinct techniques and platforms (i.e. expensive toys) but also because (and this is what David’s talk demonstrated) to do it well presupposes a very solid acquaintance with results on some branch of cognition. So to study sound in humans requires knowing a lot about acoustics, brain science, computation, and phonology. This, recall, is a precondition for fruitful inquiry, not the endpoint. So you need to have a solid foundation in some branch of cognition and then you need to add to this a whole bunch of other computational, statistical, technical and experimental skills. One of the great things about being a syntactician is that you can do excellent work and still be largely technically uneducated and experimentally inept. I suspect that this is because FL is such a robust cognitive system that shoddy methods suffice to get you to its core general properties, which is the (relatively) abstract level that linguists have investigated. Descending into wetware nitty gritty demands loosening the idealizations that the more abstract kind of inquiry relies on and this makes things conceptually (as well as practically) more grubby and difficult. So, it is very hard to do cog-neuro well. And if this is so, then the aim of cognitive work (like that done in linguistics) is to lighten cog-neuro’s investigative load. One way of doing this is to reduce the number of core operations/computations that one must impute to the brain. Let me explain.

What we want out of a cog-neuro of language is a solution to what Embick and Poeppel call the mapping problem: how brains execute different kinds of computations (see here). The key concept here is “the circuit,” some combination of brain structures that embody different computational operations. So part of the mapping problem is to behaviorally identify the kinds of operations that the brain uses to chunk information in various cognitive domains and to figure out which brain circuits execute them and how (see here for a discussion of the logic of this riffing on a paper by Dehaene and friends). And this is where my kind of linguistics plays a critical role. If successful, Minimalism will deliver a biologically plausible description of all the kinds of operations that go into making a FL.  In fact, if successful it will deliver a very small number of operations very few of which are language specific (one? Please make it one!) that suffice to compute the kinds of structures we find in human Gs. In this context, the aim of the Minimalist Program (MP) is to factor out the operations that constitute FL and to segregate the cognitively and computationally generic ones form the more bespoke linguistic ones. The resulting descriptive inventory provides a target for the cog-neuro types to shot at.

Let me say this another way. MP provides the kind of parts list Embick and Poeppel have asked for (here) and identifies the kinds of computational structures that Dehaene and company focus on (here). Putting this another way, MP descriptions are at the right grain for cog-neuro redemption. It provides primitives of the right “size” in contrast to earlier (e.g. GBish) accounts and primitives that in concert can yield Gs with GBish properties (i.e. ones that have the characteristics of human Gs).

So that’s the future of my brand of linguistics, to be folded into the basic wisdom of the cog-neuro of language. And what makes me hopeful is that I think that this is an attainable goal. In fact, I think that we are close to delivering a broadly adequate outline of the kinds of operations that go into making a human FL (or something with the broad properties of our FL) and separating out the linguistically special from the cognitively/computationally generic. Once MP delivers this, it will mark the end of the line of investigation that Chomsky initiated in the mid 1950s into human linguistic competence (i.e. into the structure of human knowledge of language). There will, of course, be other things to do and other important questions to address (e.g. how do FLs produce Gs in real time? How do Gs operate in real time? How do Gs and FLs interact with other cognitive systems?) but the fundamental “competence” problems that Chomsky identified over 60 years ago will have pretty good first order answers.

I suspect that many reading this will find my views delusional, and I sympathize. However, here are some reasons why I think this.

First, I believe that the last 20 years of work has largely vindicated the GB description of FL. I mean this in two ways: (i) the kinds of dependencies, operations, conditions and primitives that GB has identified have proven to be robust in that we find them again and again across human Gs. (ii) these dependencies, operations, conditions and primitives have also proven to be more or less exhaustive in that we have not found many additional novel dependencies, operations, conditions and primitives despite scouring the world’s Gs (i.e. over the last 25 years we have identified relatively few new potential universals). What (i) and (ii) assert is that GB identified more or less all the relevant G dependencies and (roughly) accurately described them. If this is correct (and I can hear the howls as I type) then MP investigations that take these to be legit explananda (in the sense of providing solid probes into the fundamental structure of FL) is solid and that explaining these features of FL will suffice to explain why human FLs have the features they do. In other words, deriving GB in a more principled way will be a solid step in explaining why FL is built as it is and not otherwise.

Second, perhaps idiosyncratically, I think that the project of unifying the modules and reducing them to a more principled core of operations and principles has been quite successful (see three part discussion ending here). As I’ve argued before, the principle criticisms I have encountered wrt MP rest on a misapprehension of what its aims are.  If you think of MP as a competitor to GB (or LFG or GPSG or Construction Grammar or…) then you’ve misunderstood the point of the program. It does not compete with GB. It cannot for it presupposes it. The aim is to explain GB (or its many cousins) by deriving its properties in a more principled and perspicuous way. This would be folly if the basic accuracy of GB was not presupposed. Furthermore, MP so understood has made real progress IMO, as I’ve argued elsewhere. So GB is a reasonable explanandum given MP aims and Minimalist theories have gone some way in providing non-trivial explananses.

Third, the MP conception has already animated interesting work in the cog-neuro of language. Dehaene, Friederici, Poeppel, Moro and others have clearly found the MP way of putting matters tractable and fecund. This means that they have found the basic concepts engageable, and this is what a successful MP should do. Furthermore, this is no small thing. This suggests that MP “results” are of the right grain (or “granularity” in Poeppel parlance). MP has found the right level of abstraction to be useful for cog-neuro investigation and the proof of this is that people in this world are paying attention in ways that they did not do before. The right parts list will provoke investigation of the right neural correlates, or at least spur such an investigation.

Say I am right. What comes next? Well, I think that there is still some theoretical work to do in unifying the modules and then investigating how syntactic structures relate to semantic and phonological ones (people like Paul Pietroski, Bill Idsardi, Jeff Heinz, and Thomas Graf are doing very interesting work along these lines). But I think that this further work relies on taking MP to have provided a pretty good account of the fundamental features of human syntax.

This leaves as the next big cognitive project figuring out how Gs and FL interact with other cognitive functions (though be warned, interaction effects are very tough to investigate!). And here I think that typological work will prove valuable. How so?

We know that Gs differ, and appear to differ a lot. The obvious question revolves around variation: how does FL build Gs that have these apparently different features (are they really different or only apparently so? And how are the real differences acquired and used?). Studying the factors behind language use will require having detailed models of Gs that differ (I am assuming the standard view that performance accounts presuppose adequate competence models). This is what typological work delivers: solid detailed descriptions of different Gs and how they differ. And this is what theories of G use require as investigative fodder.

Moreover, the kinds of questions will look and feel somewhat familiar: is there anything linguistically specific about how language is used or does language use exploit all the same mechanisms as any other kind of use once one abstracts from the distinctive properties of the cognitive objects manipulated? So for example, do we parse utterances differently than we do scenes? Are there linguistic parsers fitted with their own special properties or is parsing something we do pretty much in the same way in every domain once we abstract away from the details of what is being parsed?[1] Does learning a G require different linguistically bespoke learning procedures/mechanisms? [2] There is nothing that requires performance systems to be domain general. So are they? Because this kind of inquiry will require detailed knowledge of particular Gs it will allow for the useful blurring of the languistics/linguistics divide and allow for a re-emergence of some peaceful co-existence between those mainly interested in the detailed study of languages and their differences and those interested in the cognitive import of Gs.

Let me end this ramble: I see a day (not that far off) when the basic questions that launched GG will have been (more or less) answered. The aim will be achieved when MP distills syntax down to something simple enough for the cog-neuro types to find in wet ware circuits, something that can be concisely written onto a tee shirt. This work will not engage much with the kinds of standard typological work favored by working linguists. It addresses different kinds of questions.

Does this mean that typological work is cognitively idle? No, it means that the kinds of questions it is perfect for addressing are not yet being robustly asked, or at least not in the right way. There are some acquisitionists (e.g. Yang, Lidz) that worry about the mechanisms that LADs use to acquire different Gs, but there is clearly much more to be done. There are some that worry about how different Gs differentially affect parsing or production. But, IMO, a lot of this work is at the very early stages and it has not yet exploited the rich G descriptions that typologists have to offer. There are many reasons for this, not the least of which is that it is very hard to do and that typologists do not construct their investigations with the aim of providing Gs that fit these kinds of investigations. But this is a topic for another post for another time. For now, kick back and consider the possibility that we might really be close to having answered one of the core questions in GG: what does linguistic knowledge consist in?



[1] Jeff Lidz once put this as the following question: is there a linguistic parser or does the brain just parse? On the latter view, parsing is an activity that the brain does using knowledge it has about the objects being parsed. On the latter view, linguistic parsing is a specific activity supported by brain structure special to linguistic parsing. There is actually not much evidence that I am aware of that parsing is dedicated. In this sense there may be aprsing without parsers, unless by parser you mean the whole mind/brain.
[2] Lisa Pearl’s thesis took this question on by asking whether the LAD is built to ignore data from embedded clauses or if it just “happens” to ignore it because it is not statistically robust. The first view treats language acquisition as cognitively special (as it comes equipped with blinders of a special sort), the latter as like everything else (rarer things are causally less efficacious than more common things). Lisa’s thesis asked the question but could not provide a definitive answer though it provided a recipe for a definitive answer.

Monday, October 16, 2017

When Moocs bestrode the world like colossi

The Moocs, it seems, have come and gone. A short but bright life that likely made some people a lot of money, occupied the waking hours of countless deans and provosts and university presidents and promised to radically redesign higher education. Some were skeptical. Some even said as much (and yes, I was one of these! Type ‘Moocs’ on the search box and you will see my little contribution). Here (by John Warner) is an obituary written for Inside Higher Ed. It gloats a little, and with good reason. Now that the owl of Minerva has spread her wings, we see that Moocs were just the latest attempt to mechanize education (click on the Audrey Watters’ time line in the link). Like the others, it flamed for a brief period and disappeared. How do we know this is over with till memories fade and someone else tries to shake the money tree (it seems that “personalized learning” is the next big thing, until it is not)? Because Udacity has thrown in the towel. As Warner writes about Udacity:

From transforming all of higher ed to targeted training in five short years.

It is worth thinking about the Mooc explosion for a while before we happily purge it from our memories. It is worth recalling all the awards offered for this new disruptive technology, all the hype about transforming education, all the frenzy to get in on the action, all the money thrown at those ready to redesign courses to fit the format, all the techno enthusiasm and the hailing of a new messianic age of education. Remember it all, because believe me it is coming again. In fact, it has already come even if we are not sure what it is. So keep this one in mind to inoculate yourself against the next big thing.

We live in the age of techno hucksters. It will come again.

Thursday, October 5, 2017

Pity the non-Chomskyan!

Pity the non Chomskyans! They don’t value their work except in opposition to what Chomsky does (or doesn’t). The only glory they prize is reflected, and they will go to great lengths to sun themselves in it, even to the point of (knowingly?) distorting the mirror in which they reflect themselves. Imagine the irony that someone like me perceives. I am constantly remonstrated with for not sufficiently valuing non GG work and then when I look at some I find that the practitioners themselves only prize their research to the degree that it overturns some GG nostrum and thereby “revolutionizes” the study of language (never a brick in the wall for them, always a complete overturning of the basics). It would appear that for these investigators Chomsky has indeed defined the limits of the interesting in the study of language (a view I have some sympathy with, I would add) and that anything that does not directly address a point that he has (allegedly) made is of little value. Indeed, compared to them, my insistence that one can study language with interests orthogonal to GG’s must seem disingenuous. To non-Chomskyans, Chomsky is everywhere and always and their research is nowhere and never unless it confronts his.

A recent addition to this literature of self-loathing is making quite a splash (here, here, here, here). Part of the splash can be traced to the PR-Academy complex that I mentioned in a previous post (here). Some of the co-authors have Max Planck affiliation and so the powerful PR Wurlitzer has been fully cranked up to spread the word.

However, part of the splash is due to the claim the paper makes that Chomsky is, once again, wrong, more specifically that culture rather than biology is what drives language structure. Of course, as you all know, this is one fork in the intellectual road that any sane person should immediately take. Is it culture or biology? Well, depending on the linguistic feature of interest it could be either, neither or both.  Language, we all know, is a complex thing and the confluence of many different interacting causal forces. Everybody knows this, so it is not news (though it is often intoned as if it were a great discovery, like people noting, sagely, that any given cognitive capacity is the combination of learned and innate factors (duh!)). What is news is finding out which factor predominates for any given property of interest and how it does so. But finding this out in any given case will not (and I can guarantee this) discredit the causal efficacy of other factors in other cases. And, moreover this can hardly be news. So, this is something that GG has acknowledged for a very very very …very long time.  Even if you think, as I do, that biology (widely construed) plays the lead role in restricting the class of Generative Procedures (GP) available to human Gs, you need not think that culture (widely construed) plays no role in determining what a given G looks like. For example, why I have the G that I have is not exclusively due to my having a human FL/UG. I have an Englishy G because I grew up in an English speaking environment, was culturally exposed to Howdy Doody, Captain Kangaroo, and Rocky and Bullwinkle, and read Bill Shakespeare in high school. Many of my G’s idiosyncrasies are similarly cultural (e.g. I am a proud speaker of a dialect in which Topicalization (aka Yiddish movement) runs rampant). But I very much doubt the fact that my Topicalization forming G displays ECP effects has much to do with Rocky, Shakespeare, Bullwinkle or Sholem Aleichem. Here I look to biology to explain why my G obeys the ECP (and for the familiar Poverty of Stimulus reasons which I could go on about for hours (and have)). So biology AND culture, with each playing a more prominent role depending in the phenomenon of interest.

Curiously, this most obvious position is tacitly denied by non-Chomsyans. They act as if Chomskyans must think that anything languagy must reflect innate features of the mind/brain and so that if anything is shown to not be such, then this shows that Chomsky was wrong. And their obsession with showing that Chomsky is wrong suggests that they believe that unless he is, then what they have shown about, say, the influence of cultural mechanisms on some languagy fact is inherently BORING, without any possible intrinsic interest. This, atleast, would neatly explain why non-Chomskyans consistently assume that Chomsky’s position consists in the absurd claim that anything involving language in any way must be innate.

You probably think that I am exaggerating here. But I am not, really. Here is the authors’ summary of the Dunn, Greenhill, Levinson, and Gray (DGLG) paper published in Nature:

Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language (1, 2). In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases (3, 4, 5), rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework (6). First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that—at least with respect to word order—cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.

Let’s engage in some initial parsing. The paper aims to show that language change (in particular word order changes in diachronically related languages) is path dependent, with different dependencies changing at different rates across different groupings of languages. DGLG concludes from this that the transitions between the languages is not driven by innate features of FL/UG, nor does it reflect systematic universal probabilistic biases. And they conclude form this that Chomsky and Greenberg must be wrong. I am not qualified to discuss Greenberg’s positions in any detail, but I would like to cast a very skeptical eye towards the claims made for Chomsky’s parameter views.

Let’s read the above précis a little more carefully.

First, DGLG focuses on “languages” and the diachronic changes between them. To be  GG/Chomsky relevant, we need to unpack this and relate it to grammars. With this translation we get the following:

Grammars (G) vary widely but not without limit. The central goal of linguistics is to describe the diversity of human Gs and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language…

Second, I disagree with even this reworked version of DGLG’s claims about the aim of linguistics, at least as GG and Chomsky understand it. The ultimate aim is to describe the structure of FL/UG. A way station towards this end is to understand the structure of human Gs used by speakers of different languages. Hence, describing these (their commonalities and differences) is a useful proximate goal towards the ultimate end. Linguists have traced some of the differences between human languages to differences in the Gs that native speakers use. This implies diverse Gs and this further implies that FL must be capable of acquiring Gs at least as diverse as these (and maybe yet more diverse given that it is unclear that 7,000 (the purported number of languages out there) marks the limit of G diversity). So yes, describing the variety of human languages to the degree that it enables us to describe the variety of human Gs is a useful step in exploring the structure of FL/UG but the ultimate aim of linguistics is to understand the structure of FL/UG not to describe the diversity of human Gs, or “the diversity of human languages and the constraints on that diversity”.[1]

Third, the phrase “constraints on that diversity” is ambiguous. One reading is anodyne and correct. One aim of GG has been to describe principles of grammar invariant across Gs, the idea being that these will reveal the design features of FL/UG.[2] This does not imply that the language products of these Gs will manifest invariant patterns. Missing this is to again confuse the difference between Chomsky and Greenberg Universals. Two Gs may embody the very same principle and yet the products of those Gs might differ greatly. Thus, for example, Rizzi’s early proposal concerning the Fixed Subject Effect is that the ECP (which, let us say underlies the effect) holds in both Italian and English but the two Gs derive subject A’-movement in different ways so that only English perceptibly falls under the scope of the ECP.[3] In other words, Italian manages to derivationally escape the purview of the G invariant ECP and hence does not show Fixed Subject Effects. Note, crucially, Italian does embody the ECP but it does not show Fixed Subject Effects. The “languages” (English vs Italian) differ, the invariant principle (the ECP) is the same.  So the move from invariant principles to invariant language effects is not one that any GGer can or should blithely license. In sum, if you mean that a goal of Chomskyan linguistics is to describe the properties of Gs that arise as products of the design features of FL/UG then you would be correct. But this still leaves distance between this and invariant properties of languages.[4]

So rightly understood, describing G invariants is a proximate goal of GG inquiry. But this must be distinguished from a different project: explaining the limits of diversity. It is entirely possible that Gs have invariant properties without it being the case that there is a limit on G diversity. Let me explain. One of the innovations of LGB is its Principles and Parameters (P&P) architecture. The idea is that FL/UG specifies not only the invariant properties of Gs but all of the ways that Gs could possibly differ. These differences are coded as a finite number of two valued parameters with a given G being a (vector) specification of these specific values. As P&P was understood to have a finite number of parameters, say N, and as they could only bear one of two values this meant that there were at most 2N distinct Gs that FL/UG permitted. On this LGB/P&P conception there is a reasonable sense in which FL/UG could explain the limits of G diversity.  So, DGLG are correct in thinking that some version of GG, the LGB/P&P theory aimed to place strong limits on G diversity.

However, this theory did not address how parameters were set or how parameters changed over time as Gs changed. P&P theories must be supplemented with theories of learning/acquisition to provide a theory of language change. Or, even if you think that there are only a finite number of Gs because any G is simply a list of finite parameter values, you still need a theory of how parameters are fixed to explain how parameters change over time. Now, one theory of G change would be that it tracks some intrinsic structure/fault lines of the parameter space (e.g. parameter 1 links to 2 and to 4 so that if you change the value of 1 from A to a then you need to change the values of 2 from B to b and 4 from C to c). This is one possible theory. Call it an endogenous theory of parameter setting (EnPS). EnPS accounts would provide FL/UG internal paths along which G change would occur and would provide a very strong implicit theory on the dynamics of G diversity. It would not only explain what the range of possibility is, but would also specify the possible range of changes between Gs that is available. Note that this kind of view need not endorse the position that all G change is canalized by FL/UG. It is possible that some changes are and some are not. But the strongest view would aim to predict the dynamics of G change entirely from the endogenous structure of FL/UG.

To my knowledge nobody has ever proposed such a view. In fact, to my knowledge, such a view has been understood to be very problematic, the reason being that the degree to which the parameters are mutually dependent to that degree the problem of incremental parameter setting increases. In fact, were all the parameters to speak to one another (i.e. the value of any parameter being conditional on the value of every parameter) the problem of incremental parameter setting becomes effectively impossible.

Dresher and Kaye discussed this first a while ago as regards stress systems, and Fodor and Sakas have explored this in detail as regards syntactic parameters. The solution has been to try and identify linguistic “triggers,” types of data that relied exclusively on the value of a single parameter. Triggers, in other words, are ways of trying to finesse the intractability of incremental parameter setting without denying that parameters are inter-twined. The idea is that their intermingling need not appear everywhere in the PLD and that all that setting requires is that there be some PLD data that unambiguously reveals what value a given parameter has. In other words, the idea is that in some domains the parameters function as if independent of one another (do not interact) and this relieves the computational problem that intertwining presents.

Why do I mention this? Because, the bulk of work on parameters has not been in trying to limn their interdependencies, but to isolate them and render them relatively independent so that incremental parameter setting be possible. In other words, so far as I know, there has precious little work or commitment to a EnPS kind of theory within Chomskyan GG. Moreover, and this is the important bit, a Chomskyan theory of GG does not need such an account. In other words, if it is true, then it is very interesting, but there is nothing as regards the Chomskyan project that requires that something like this be available. It is entirely consistent with that project that there be no explicit or implicit dynamics coded in the parameter space. So asserting that the absence of such a theory of parameters challenges the Chomsky conception of FL/UG (which is what DGLG does) is just plain wrong. Or, to put this another way: claiming that there is a richly structured FL/UG is compatible with the claim that FL/UG does not determine how populations of speakers  move from one G in the space of possible Gs to another.[5]

We can in fact go further. As those who have read some linguistics since LGB know, the idea that FL/UG contains a finite list of parameters that delimit the range of possible Gs has been under debate. We have even talked about this on FoL (e.g. here, here, here, here). What’s important here is that the parameters part of P&P is not an intrinsic part of the Chomskyan problematic. It might be true, but then it might not be. There are theories like GB that endorse a P&P architecture and there are accounts like that in Aspects or LSLT (and, from the way I read it, current MP accounts) that do not. If FL/UG has a list of specified parameters, that would be an amazing and remarkable discovery. But the evidence is not overwhelming that this is the case (so far as I can tell as a non expert in these matters) and if it is not the case it does not mean that Chomsky is wrong in claiming that we have an innately specified FL/UG that limits the properties of human Gs. All it means is that there are some features of Gs about which FL/UG is mute. Happily, that would leave something for non syntacticians to do (e.g. provide theories of learning that would address how we go from PLD to Gs given FL/UG (e.g. as Yang and Lidz do, for example)).

In sum, DGLG’s claims about the implications of their results for the Chomsky enterprise are flatly wrong, and in two ways. First, nobody has proposed the kind of theory that DGLG’s data is meant to refute and second the Chomskyan conception does need a theory of the kind that DGLG’s data is meant to refute. So, whether or not DGLG is correct is at right angles to Chomsky’s central claims. And, what is more, I suspect that DGLG knows this (or at least should have). Let me say a word about this.

DGLG cites LGB and Baker’s book as the source of the ideas that the paper argues to be incorrect. However, DGLG cites no specific passages or pages for this claim. Why not? When Chomsky goes after someone critically he does so chapter and verse. He quotes exactly what his protagonist says before arguing that it is incorrect. This is not what Chomsky’s critics generally do. Rather they assert in very broad brushstrokes what Chomsky’s views are and then go on to state that they are inadequate.[6] The problem is that what they criticize is often not his views. The fact that this happens so often leads me to think that this is not accidental. Either critics do not care what his views are (they only care to discredit them so as to discredit him) or they are too lazy to do serious criticism. I am not sure which is worse, but they are both serious intellectual failings.

What I did not realize until recently is that Chomsky’s critics might well be motivated less by malice and sloth than by a deep intellectual insecurity. Many of Chomsky’s critics are upset by the possibility (fact?) that he does (might?) not care about what they are doing. What motivates some critics, then, is the suspicion (fear?) that what they are doing is of little value. Too assuage this desperation, they orient their conclusions as rebuttals of Chomsky’s putative views. Why? Because they are sure that what Chomsky does is interesting and so they reassure themselves that their work has value by arguing that it shows that his views are wrong. The implication is that if this were not the case (and very often it is in fact not the case as the empirical conclusions are generally irrelevant to Chomsky’s claims (Everett is the poster child for this)) then their own work was boring and of dubious interest.[7] And I thought that I was in thrall of Chomsky![8]

Let me end with one more point regarding the DGLG paper and then point you to a very good review.

DGLG focuses entirely on word order. What DGLG means by “linguistic structure” is word order properties of utterances/sentences. It says this very explicitly. But if this is the focus then DGLG must know that it will be of dubious relevance to Chomsky’s central claims which, as everyone knows by now, considers word order effects to be at best second order (and maybe even less relevant) as regards the central features of FL/UG. Word order effects are, of course, centrally relevant to Greenberg’s conceptions and there are GGers who have concentrated on this (e.g. Kayne) and we have even covered some of this in FoL (see Culbertson and Adger here). But as regards Chomsky’s views, word order effects are decidedly secondary. Indeed, from what I know of Chomsky’s views, he might agree that word order effects are entirely “cultural” in the sense of driven by the properties of the child’s ambient linguistic environment.[9] So far as I can tell, nothing Chomsky has said in recent years (or before) would be inconsistent with this. So the fact that DGLG knowingly focuses on the kinds of effects that the authors (or at least some of the authors (I am looking at you Mr Levinson)) know are at right angles to Chomsky’s concerns further buttresses my conclusion that DGLG thinks its results worthless unless they directly gainsay Chomsky’s views. Sadly, if this is DGLG’s position, then worthless it is. The paper even if completely correct scarcely bears on Chomsky’s central claims. Fortunately, the DGLG conclusions are worth thinking about, IMO, even if they do not bear on Chomsky’s views at all. Let me turn to this briefly.

I have spent a lot of time pooping on the DGLG paper’s claim that it overturns some central Chomskyan dogma. However, contrary to the authors, I am not as sure as they seem to be that its results are uninteresting unless they bear on Chomsyan/GG concerns. My tastes then are more catholic than DGLG’s. I believe that finding that G change is path dependent is potentially very interesting, even if not all that new. It is not a new idea as it is already embodied in the position that language contact can affect how Gs change (how Gs change (and maybe even their rates of change) is likely a function of the specific properties of Gs in contact. If so, change will be path dependent. Indeed, from what I know this is the standard view, which is why Bickerton’s contrary claim is so contentious.

Mark Liberman has an excellent discussion of the DGLG paper (here) that touches on this point as well as many others way above my pay grade (damn I wish I knew more stats). There is also some interesting response by Greenhill in the comment section of Mark’s post, though I personally think that he fails to engage the main point concerning the large number of ignored dimensions and the kind of structure they might contain. As Mark observes, there is lots of room in these ignored dimensions for an EnPS story should one care to make one. 

I also think that Liberman’s last point touches on something critical. As he points out, at least as regards GGers who work on G change (like Tony Kroch or David Lightfoot or Ian Roberts): “…features like “OBV” (the code for whether objects follow verbs) should be seen as superficial grammatical symptoms rather than atomic grammatical traits” (3). This points to a larger problem of the relevance of DGLG to GG research into diachrony; DGLG takes the project to be language change rather than G change (as I noted above, these are not the same thing). Greenhill responds to this that these categories are not his, but those that other people have identified, DGLG just aiming to test them. The problem Mark is pointing to is that they are the wrong things to test, at least if one’s interests lie with the structure of Gs.  Going from overt language to G rules/parameters is not straightforward (see Dresher and Kaye, and Fodor and Sakas). What is relevant to speakers qua Language Acquisition Devices is the features of the Gs, so abstracting from this is, as Mark observes, a problem.

Ok, should you read the DGLG paper? Sure. It is very short and potentially interesting (though, IMO, inevitably overhyped) and, as Mark Liberman notes, the product of a lot of hard work. But, it is also deeply misleading and, IMO, borders (well, IMO, crosses the border to) dishonest. The source of the dishonesty is likely overdetermined. I mentioned malice and sloth. But I suspect that intellectual insecurity is really what is driving the anti GG, anti Chomsky slant. Anti Chomskyans do not have the courage of their stated interests. So, when you are done reading DGLG, spend a second mourning the sad plight of the non Chomskyan. Only by being anti can they be at all. Sad really. And I would be greatly sympathetic regarding this insecurity were they not sullying the intellectual landscape in trying to convince themselves of the value of their research.



[1] An analogy: the aim of astronomy is not to describe the motion of the planets but to describe the forces that have the planets move as they do. Of course an excellent first step towards the latter is a decent description of the former. But the two goals are not identical. Moreover, as Newton discovered, describing the motion of non-planets here on earth is also a useful step in exposing the forces at work our there in the heavens.
[2] Though, IMO, looking for common detectable features across individual Gs is not as useful as generally supposed.
[3] Perceptible to the linguist that is, not the LAD as the evidence will be of the negative variety; fixed subject violations lead to unacceptability.
[4] My reading of Greenberg is that his project was to identify invariant properties of languages. From what I’ve seen (which is limited) my impression is that typologists are pretty skeptical that many of these exist. If this is right, then it seems as regards “surfacy” language properties, invariants are pretty hard to come by. This would be no surprise to a Chomskyan GGer.
[5] Which is not to say that this might not be a very interesting question to investigate, and GGers did so. See Berwick and Niyogi’s early work on this in a kind of Neo-Darwinian setting.
[6]  I want to emphasize the broad brushstroke nature here. Like any interesting view, Chomsky’s has several interacting sub-parts and is based on several assumptions. It is entirely possible (probable?) that he may be right in some ways and wrong in others (as is true of most everyone’s views). The goal of a critic is to isolate how someone is wrong, and then means getting into details. What specific assumption is false? What particular inference would we like to challenge? Quoting passage and verse forces one to zero in on the problem. Citing LGB as the source does not. So, not only is this lazy, but it allows a critic to let him/herself off the hook and allows him/her to avoid explaining in detail how someone is wrong. As criticism is valuable in allowing us to isolate troublesome assumptions, this kind of lazy citation promotes obfuscation. So what part of Chomsky’s assumptions are wrong? Well you know the LGB part. Really? Common.
[7] I personally would not conclude this. But it provides a reasonable motive for the otherwise inexplicable regular incapacity of Chomsky’s critics to get his views right.
[8] It is interesting that Chomsky does not feel equally threatened by work different from his own. He just gets on with it. Yes, he responds to critics. But he what he mainly does is define the project and get on with it. It would be nice if this were the norm.
[9] To be honest, I do not know what ‘cultural’ means. I am assuming it means to contrast with biological, memes and all that. In effect, fixed via something like learning rather than fixed by biological make-up.