Friday, September 23, 2016

Chomsky was wrong

An article rolled across my inbox a few weeks ago entitled Chomsky was Wrong. Here's a sequence of six short bits on this article.

Yes, Chomsky said this

When I read the first sentence that new research disproved Chomsky's claim that English is easy - that it turns out English is a hard language - I thought it was a parody. To start with, Chomsky said no such thing. But, indeed, it's about a real paper - even if that paper is about English orthography. In SPE (Sound Pattern of English), Chomsky and Halle claimed that English orthography was "near-optimal": that it reflected pretty closely the lexical representations of English words, except where the pronunciation was not predictable.

That's not what you expect to be reading about when you read about Chomsky. For one thing, there's a reason that there's a rumour Chomsky didn't even write any of SPE. The rumour is pretty clearly false, but Chomsky never worked in phonology again, and he certainly didn't write anything close to all of SPE. For another thing, after all the attacks on "Chomsky's Universal Grammar," it's jarring to read a rebuttal of a specific claim.

But here it is. Let's give both credit and blame where credit is due. Chomsky's name's on the book, so he's responsible for what's in it. If it's wrong, then fine. Chomsky was wrong.

The paper in question is sound

The paper in question is by Garrett Nicolai and Greg Kondrak of the University of Alberta, and it's from the 2015 NAACL (North American Association for Computational Linguistics), linked here.

Nicolai and Kondrak have a simple argument. Any spelling system that's isomorphic to the lexical representations of words should also be "morphologically consistent." That is, the spelling of any given morpheme should be the same across different words. Of course: because multi-morphemic words, at least according to SPE, are built by combining the lexical representations of their component morphemes. Regardless of what those lexical representations are, any spelling that reflects them perfectly will have perfect morphological consistency. English spelling, it turns out, doesn't have this property.

As Chomsky and Halle observed, though, there's a reason that this perfect transparency might not hold for a real spelling system: the spelling system might also want to tell you something about the way the word is actually pronounced. In words like deception, pronounced with [p] but morphologically related to deceive, which is pronounced with [v], you can have a morphologically consistent spelling in which the dece- morpheme has a p in deception, or in which it has a in deceive, (or neither), but you can't have both and still be morphologically consistent. And yet, morphological consistency can make the pronunciation pretty opaque. And so Nicolai and Kondrak have a way to evaluate what's driving English spelling's lack of morphological consistency is perhaps a dose of reader-saving surface-pronunciation transparency. It's not.

The paper is nice because it gets around the obvious difficulty in responding to arguments from linguists, which is that they are usually tightly bound to one specific analysis. Here the authors have found a way to legitimately skip this step (reflecting the underlying forms - thus at least being morphologically consistent - except when really necessary to recover the surface pronunciation). It's a nice approach, too. The argument rests on the constructibility a pseudo-orthography for English which maximizes morphological consistency except when it obscures the pronunciation - exactly what you would expect from a "near-optimal" spelling system - a system that turns out to have much higher levels of both morphological consistency and surface-pronunciation transparency than traditional English orthography. I review some of the details of the paper - which isn't my main quarry - at the bottom below for the interested.

Hanlon's razor

Sometimes you see scientific work obviously distorted in the press and you say, I wonder how that happened - I wonder what happened between the interview and the final article that got this piece so off base. No need to wonder here.

A piece about this research (a piece which was actually fine and informative) appeared on the University of Alberta news site (Google cached version) about a year after the paper was published. Presumably, the university PR department came knocking on doors looking for interesting research. The university news piece was then noticed by CBC Edmonton, who did an interview with Greg Kondrak on the morning show and wrote up a digested version online. The author of this digested version evidently decided to spice it up with some flippant jokes and put "Chomsky was wrong" in the headline with a big picture of Chomsky, because people have heard of Chomsky.

The journalist didn't know too much about the topic, clearly. In an earlier version Noam Chomsky was "Norm Chomsky," Morris Halle was "Morris Hale," and the photo caption under Chomsky was missing the important context - about how the original claim was, in the end, an insignificant one - and so one could read, below Chomsky's face, the highschool-newspaper-worthy "This is the face of a man who was wrong." And, predictably, "English spelling" is conflated with "English", leading to the absurd claim that "English is 40 times harder than Spanish."

The awkward qualification that now appears in the figure caption ("This is the face of a man who - in a small segment from a book published in 1968 - was wrong") bears the mark of some angry linguists with pitchforks complaining to the CBC. Personally, I don't know about the pitchforks. Once I realized that the paper was legit, I wasn't able to muster raising a hackle about the CBC article. It doesn't appear to be grinding an axe, just a bad piece of writing. If it weren't for the fact that, in many other quarters, the walls echo with popular press misinformation about generative linguistics which is both damaging and wrong, this article wouldn't even be a remote cause for concern.

It is possible to talk about Chomsky being wrong without trying to sell me the Brooklyn Bridge

The Nicolai and Kondrak paper, and the comments they gave to the university news site, show that you can write something that refutes Chomsky clearly, and in an accurate, informed, and mature way. The content of the paper demonstrates that they know what they're doing, and have thought carefully about what Chomsky and Halle were actually saying. In the discussion, nothing is exaggerated, and no one is claiming to be the winner or exaggerating their position as the great unlocker of things.

Contrast this with Ibbotson and Tomasello's Scientific American article. Discussed by Jeff in a three-part series recently on the blog by Jeff, it purports to disprove Chomsky. It should be possible to write a popular piece summarizing your research program without fleecing the reader, but they don't. When the topic is whether Chomsky is right or wrong, fleece abounds.

Let me just take three egregious instances of simple fact checking in the Scientific American article:
  • "Recently ... cognitive scientists and linguists have abandoned Chomsky’s 'universal grammar' theory in droves"
    • (i) abandoned - as in, previously believed it but now don't - (ii) in droves - as in, there are many who abandoned all together - and (iii) recently. I agree that there are many people who reject Chomsky, and (i) is certainly attested over the last 60 years, but (ii) and (iii), or any conjunction of them, is totally unfounded. It feels like disdain for the idea that one should even have to be saying factually correct things - a Trump-level falsehood.
  • "The new version of the theory, called principles and parameters, replaced a single universal grammar for all the world’s languages with ..."
    • There was never any claim to a single grammar for all languages.
  • "The main response of universal grammarians to such findings [about children getting inversion right in questions with some wh-words but not others] is that children have the competence with grammar but that other factors can impede their performance and thus both hide the true nature of their grammar"
    • I know it's commonplace in describing science wars to make assertions about what your opponent said that are pulled out of nowhere, but that doesn't make it right. I so doubt that the record, if there is one, would show this to be the "main response" to the claim they're referring to, that I'm willing to call this out as just false. Because this response sounds like a possible response to some other claim. It just doesn't fit here. This statement sounds made up.

This article was definitely written by scientists. It contains some perfectly accurate heady thoughts about desirable properties of a scientific theory, a difficult little intellectual maze on the significance of the sentence Him a presidential candidate!?, and it takes its examples not out of noodling a-priori reasoning but actually out of concrete research papers. In principle, scientists are careful and stick to saying things that are justified. And yet, when trying to make the sale, the scientist feels no compunction about just making up convenient facts.

Chomsky and Halle's claim sucks

The statement was overblown to begin with. C&H are really asking for this to be torn down. Morphological-consistency-except-where-predictable is violated in the very examples C&H use to demonstrate the supposed near optimality, such as divine (divinE), related in the same breath to divinity (divin- NO e -ity), which is laxed under the predictable trisyllabic laxing rule.

But the claim can, I think, further, be said to "suck" in a deeper way in the sense that it

  1. is stated, in not all but many instances as it's raised throughout the book, as if the lexical forms given in SPE were known to be correct, not as if they were a hypothesis being put forward
  2. is backed up by spurious and easily defeasible claims, convenient for C&H if they were true - but not true.
Some examples of (2) are the claim on page 49 that "the fundamental principle of orthography is that phonetic variation is not indicated where it is predictable by general rule" - says who? - and the whopper in the footnote on page 184 (which also contains examples of (1)):

Notice, incidentally, how well the problem of representing the sound pattern of English is solved in this case by conventional orthography [NB: by putting silent e at the end of words where the last syllable is, according to the SPE analysis, [+tense], but leaving it off when it's [-tense], while, consistent with the SPE proposal, leaving the vowel symbol the same, in spite of the radical surface differences induced by the vowel shift that applies to [+tense] vowels]. Corresponding to our device of capitalization of a graphic symbol [to mark +tense vowels], conventional orthography places the symbol e after the single consonant following this symbol ([e] being the only vowel which does not appear in final position phonetically ...). In this case, as in other cases, English orthography turns out to be [!] rather close to an optimal system for spelling English. In other words, it turns out to be rather close to the true [!] phonological representation, given the nonlinguistic constraints that must be met by a spelling system, namely, that it utilize a unidimensional linear representation instead of the linguistically appropriate feature representation and that it limit itself essentially to the letters of the Latin alphabet.

Take a minute to think about the last statement. There are many writing systems that use two dimensions, including any writing system that uses the Latin alphabet with diacritics. In most cases, diacritics are used to signify something phonetically similar to a given sound, and, often, the same diacritic is used consistently to mark the same property across multiple segments, much like a phonological feature. Outside the realm of diacritics, Korean writing uses its additional dimension to mark several pretty uncontroversial phonological features. As far as being limited to letters of the Latin alphabet goes - let's assume this means for English, and not really for "spelling systems"  - just as with diacritics, new letters have been invented as variants of old ones, throughout history. My guess is that this has happened fairly often. And, after all - if you really felt you had to insert an existing letter to featurally modify an old one, presumably, you would stick the extra letter next to the one you were modifying, not as a silent letter at the end of the syllable. 

As for making it sound like the theory was proven fact, maybe it's not so surprising. Chomsky, in my reading, seems to hold across time pretty consistently to the implicit rhetorical/epistemological line that it's rarely worth introducing the qualification "if assumption X is correct." Presumably, all perceived "fact" is ultimately provisional anyway - so who could possibly be so foolish as to take any statement claiming to be fact at face value? I don't really know if Chomsky was even the one responsible for leaving out all the instances of "if we're correct, that is" throughout SPE. But it wouldn't surprise me. But Chomsky isn't alone in this - Halle indulges in the same, and, perhaps partly as a result, the simplified logic of 1960s generative phonology to suppose on the basis of patterns observed in a dictionary that one has all the evidence one needs about the generalizations of native speakers is still standard, drawing far too little criticism in phonology. Hedging too much in scientific writing is an ugly disease, but there is such a thing as hedging too little.

In the current environment, where we're being bombarded with high profile bluster about how wrong generative linguistics is, it's worth taking a lesson from SPE in what not to do. Chomsky likes to argue and he's good at it. Which means you never really lose too much when you see him valuing rhetoric over accuracy. He's fun and interesting to read, and the logic of whatever he's saying is worth pursuing further, even if what he's saying is wrong. And you know he knows. But if an experimental paper rolled across my desk to review and it talked about its conclusions in the same way SPE does, only the blood, of the blood, sweat and tears that would go into the writing of my review, would be metaphorical.

Make no mistake - if it comes to a vote between a guy who's playing complicated intellectual games with me and a simple huckster, I won't vote for the huckster. But I won't be very happy. Every cognitive scientist, I was once cautioned, is one part scientist and one part snake oil salesman.

Nicolai and Kondrak is a good paper, and, notably, despite being a pretty good refutation of a claim of Chomsky's, it's a perfectly normal paper, in which the Chomsky and Halle claim is treated as a normal claim - no need for bluster. And the CBC piece about it is a lesson. If you really desire your scientific contribution to be coloured by falsehood and overstatement, you're perfectly safe. You have no need to worry, and there's no need to do it yourself. All you have to do is send it to a journalist.

Some more details of this paper

Here is a graph from Nicolai and Kondrak's paper of the aforementioned measures of morphological consistency ("morphological optimality") and surface-pronunciation transparency ("orthographic perplexity") - closer to the origin is better on both axes:

The blue x on sitting on the y axis, which has 1 for orthographic perplexity (but a relatively paltry 93.9 for morphemic optimality), is simply the IPA transcription of the pronunciation. The blue + sitting on the x axis, which has 100 for morphemic optimality (but a poor 2.51 for orthographic perplexity), is what you would obtain if you simply picked one spelling for each morpheme, and concatenated them as the spelling of morphologically complex words. (The measure of surface-pronunciation transparency is obviously sensitive to how you decide to spell each morpheme, but for the moment that's unimportant.)

Importantly, the blue diamond is standard English orthography ("traditional orthography", or T.O.), sitting at a sub-optimal 96.1 morphemic optimality and 2.32 orthographic perplexity (for comparison, SR and SS, two proposed spelling reforms, are given). On the other hand, Alg, the orange square, is a constructed pseudo-orthography that keeps one spelling for each morpheme except where the pronunciation isn't predictable, in which case as few surface details as possible are inserted, which leads to a much better orthographic perplexity of 1.33, while maintaining a morphemic optimality of 98.9. This shows that there's no obvious excuse for the lack of morphological consistency.

What keeps English orthography from being optimal? If you apply some well-known English spelling rules it's easy to see. If in your calculation of morphological consistency you ignore the removal the final silent e in voice etc which disappears in voicing, the spelling of panic (and other words that can be followed by -ing to violate the consistency of the pronunciation of -ci- as [s]) as panick instead, and the replacement of the y in industry and other similar words with i in industrial), along with a few other obvious changes, then English orthography pops up to 98.9 percent morphemic optimality, the same level as Alg.

Those spelling rules should attract the attention of anyone who's read SPE, as they're tangentially related to the vowel shift rule, the velar softening rule, and the final yod, but the fact is in all these cases the spelling, with respect to the SPE analysis of these words, reflects the lexical representation in the base form and the surface pronunciation in the derived form. Well, sure, which means that these alternations are presumably at least throwing the orthography a bone as far as its pronunciation transparency goes, but you can do way, way, way better. That's the point. English orthography may be better than it could be if it were maximally morphologically consistent, but it doesn't seem to be optimal.

For details of the measures you can have a look at the paper.

Wednesday, September 21, 2016

Two readables

The science is dying meme is big nowadays. Stories about the replicability crisis are now a staple of everyday journalism and everything from bad incentives to rampant scientific dishonesty are cited as causes for the decline of science. I have been (and remain) quite unmoved by this for several reasons.

First, I have no idea if this is much worse than before. Before the modern era did science faithfully replicate its findings and things have gotten worse? Maybe a replication rate of 25% (the usual horror story number) is better than it used to be. How do we know what a good rate ought to be? Maybe replicating 25% of experiments is amazingly good. We need a base rate, and, so far, I have not seen one provided. And until I do see one, I cannot know whether we are in crisis mode or not.  But I am wary, especially of decline from a golden age stories. I know we no longer live in an age of giants (nobody ever lives in a golden age of giants). The question is whether 50 years from now we will discover that we actually had lived in such a golden age. You know, when the dust has settled and we can see things more clearly.

Second, I think that part of the frustration with our current science comes from having treated anything with numbers and "experiments" as science. The idea seems to be that one can do idea free investigations. Experiments are good or not on their own regardless of the (photo)theory they are tacitly or explicitly based on. IMO, what makes the "real" sciences experimentally stable is not only their superior techniques, but the excellent theory that it brings to the investigative table. This body of knowledge serves as prophylactic against misinterpretation. Remember, never trust a fact until it has been verified by a decent theory! And, yes, the converse also holds. But the converse is taken as definitional of science while the role theory does in regulating experimental inquiry is, IMO, regularly under-appreciated.

So, I am skeptical. This said, there is one very big source of misinformation out there, especially in domains where knowledge translates into big money (and power). We see this in the global warming debates. We saw it on research into tobacco and cancer. Indeed, there are whole public relations outfits whose main activity is to spread doubt and misinformation dressed up as science. And recently we have been treated to a remarkable example of this. Here are two interesting pieces (here, here) on how the sugar industry shaped nutrition science quite explicitly and directly for their own benefit. These cases leave little to the imagination as regards science disrupting mechanisms. And they occurred a while ago, one might be tempted to say in the golden age.

As funding for research becomes more and more privatized this kind of baleful influence on inquiry is sure to increase. People want to get what they are paying for, and research that impinges on corporate income is gong to be in the firing line. If what the articles say is correct, the agnotology industry is very very powerful.

A second interesting piece for those interested in the Sapir-Whorf hypothesis. I have been persuaded by people like Lila that there is no real basis for the hypothesis, i.e. that one's particular language has, at best, a mild influence on the way that one perceives the world. Economists however are unconvinced. Here is a recent piece arguing that the gender structure of a language's pronoun system has effects on how women succeed sociopolitically. Here is the conclusion:

First, linguistic differences can be used to uncover new evidence such as that concerning the formation and persistence of gender norms. Second, as the observed association between gender in language and gender inequality has been remarkably constant over the course of the 20th century, language can play a critical role as a cultural marker, teaching us about the origins and persistence of gender roles. Finally, the epidemiological approach also offers the possibility to disentangle the impact of language from the impact of country of origin factors. Our preliminary evidence suggests that while the lion’s share of gender norms can be attributed to other cultural and environmental influences, yet a direct role language should not be ignored.
Evaluating this is beyond my pay grade, but it is interesting and directly relevant to the Sapir-Whorf hypothesis. True? Dunno. But not uninteresting.

Monday, September 19, 2016

Brain mechanisms and minimalism

I just read a very interesting shortish paper by Dehaene and associates (Dehaene, Meyniel, Wacongne, Wang and Pallier (DMWWP) that appeared in Neuron. I did not find an open source link, but you can use this one if you are university affiliated. I recommend it highly, not the least reason being that Neuron is a very fancy journal and GG gets very good press there. There is a rumor running around that Cog Neuro types have dismissed the findings of GG as of little interest or consequence to brain research. DMWWP puts paid to this and notes, quite rightly, that the problem lies less with GG than with the current state of brain science. This is a decidedly Gallistel inspired theme (i.e. the cog part of cog-neuro is in many domains (e.g. language) healthier and more compelling than the neuro part and it is time for the neuro types to pay attention and try to find mechanisms adequate for dealing with the well grounded cog stuff that has been discovered rather than think it msut be false because the inadequate and primitive neuro models (i.e. neural net/connectionist) don’t have ways of dealing with it) and the more places it gets said the greater the likelihood that CN types will pay attention. So, this is a very good piece for the likes of us (or at least me).

The goal of the paper is to get Cog-Neuro Science (CNS) people to start taking the integration of behavioral, computational and neural as CNS’s main central concern. Here is the abstract:

A sequence of images, sounds, or words can be stored at several levels of detail, from specific items and their timing to abstract structure. We propose a taxonomy of five distinct cerebral mechanisms for sequence coding: transitions and timing knowledge, chunking, ordinal knowledge, algebraic patterns, and nested tree structures. In each case, we review the available experimental paradigms and list the behavioral and neural signatures of the systems involved. Tree structures require a specific recursive neural code, as yet unidentified by electrophysiology, possibly unique to humans, and which may explain the singularity of human language and cognition.

I found the paper interesting in at least three ways.

First, it focuses on mechanisms, not phenomena. So, the paper identifies five kinds of basic operations that reasonably underlies a variety of mental phenomena and takes the aim of CNS to (i) find where in the brain these operations are executed, (ii) provide descriptions of circuits/computational operations that could execute such operations and (iii) investigate how these circuits might be/are neutrally realized.

Second, it shows how phenomena can be and have been used to probe the structure of these mechanisms. This is very well done for the first three kinds of mechanisms: (i) approximate timing of one item relative to the proceeding one, (ii) chunking items into larger units, and (iii) the ordinal ranking of items. Things get more speculative (in a good way, I might add) for the more “abstract” operations: the coding of “algebraic” patterns and nested generated structures.

Third, it gives you a good sense of the kinds of things that CNS types want from linguistics and why minimalism is such a good fit for these desires.

Let me say a word about each.

The review of the literature on coding time relations is a useful pedagogical case. DMWWP reviews the kind of evidence used to show that organisms “maintain internal representations of elapsed time” (3). It then look for “a characteristic signature” of this representation and the “killer” data that supports the representational claim. It then reviews the various brain locations that respond to these signature properties and review the kind of circuit that could code this kind of representation, arguing that “predictive coding” (i.e. ones that “form an internal model of input sequences”) is the right one in that it alone accommodates the basic behavioral facts (4) (basically minsmatched negativity effects without an overt mismatch). Next, it discusses a specific “spiking neuron model” of predictive coding (4) that “requires a neurophysiological mechanism of “time stamp” neurons that are tuned to specific temporal intervals,”  which have, in fact, been found in various parts of the brain. So, in this case we get the full Monte: a task that implicates signature properties of the mechanism, that demands certain kinds of computational circuits, realized by specific neuronal models, realized in neurons of a particular kind, found in different parts of the brain. It is not quite the Barn Owl (see here), but it is very very good.

DMWWP do this more or less again for chunking, though in this case “the precise neural mechanisms of chunk formulation remain unknown” (6). And then again for ordinal representations. Here there are models for how this kind of information might be neutrally coded in terms of “conjunctive cells jointly sensitive to ordinal information and stimulus identity” (8). These kinds of conjunctive neurons seem to be all over the place, with potential application, DMWWP suggests, as neuronal mechanisms for thematic saturation.

The last two kinds of mechanisms, those that would be required to represent algebraic patterns and hierarchical tree-like structures are behaviorally very well-established but currently pose very serious challenges on the neuro side. DMWWP observes that humans, even very young ones, demonstrate amazing facility in tracking such patterns. Monkeys also appear able to exploit similar abstract structures, though DMWWP suggests that their algebraic representations are not quite like ours (9). DMWWP further correctly notes that these sorts of patterns and the neural mechanisms underlying them are of “great interest” as “language, music and mathematics” are replete with such. So, it is clear that humans can deploy algebraic patters which “abstract away from the specific identity and timing of the sequence patterns and to grasp their underlying pattern,” and maybe other animals can too. However, to date there is “no accepted neural network mechanism to accomplish this and it looks like “all current neural network models seem too limited to account for abstract rule-extraction abilities” (9). So, the problem for CNS is that it is absolutely clear that human (and maybe monkey) brains have algebraic competence though it is completely unclear how to model this in wet ware. Now, that is the right way to put matters!

This last reiterates conclusions that Gallistel and Marcus have made in great detail elsewhere. Algebraic knowledge requires the capacity to distinguish variables from values of variables. This is easy to do in standard computer architectures but is not at all trivial in connectionist/neural net frameworks (as Gallistel has argued at length (e.g. see here)). Indeed, one of Gallistel’s main arguments with such neural architectures is their inability to distinguish variables from their values, and to store them separately and call them as needed. Neural nets don’t do this well (e.g. they cannot store a value and later retrieve it), and that is the problem because we do and we do it a lot and easily. DMWWP basically endorses this position.

The last mechanism required is one sufficient to code the dependencies in a nested tree.[1] One of the nice things about DMWWP is that it recognizes that linguistics has demonstrated that the brain codes for these kinds of data structures. This is obvious to us, but the position is not common in the CNS community and the fact that DMWWP is making this case in Neuron is a big deal. As in the case of algebraic patterns, there is no good models of how these kinds of (unbounded) hierarchical dependencies might be neurally coded. The DMWWP conclusion? The CNS community should start working on the problem. To repeat, this is very different from the standard CNS reaction to these facts, which is to dismiss the linguistic data because there are no known mechanisms for dealing with it.

Before ending I want to make a couple of observations.

First, this kind of approach, looking for basic computational mechanisms that are implicated in a variety of behaviors, fits well with the aims of the minimalist program (MP). How so? Well, IMO, MP has two immediate theoretical goals: to show that the standard kinds of dependencies characteristic of linguistic competence are all different manifestations of the same underlying mechanism (e.g. are all instances of Merge). Were it possible to unify the various modules (binding, movement, control, selection, case, theta, etc) as different faces of the same Merge relation and were we able to find the neural “merge” circuit then we would have found the neural basis for linguistic competence. So if all grammatical relations are really just ones built out of merges, then CNSers of language could look for these and thereby discover the neural basis for syntax. In this sense, MP is the kind of theory that CNSers of language should hope is correct. Find one circuit and you’ve solved the basic problem. DMWWP clearly has bought into this hope.

Second, it suggests what GGers with cognitive ambitions should be looking for theoretically. We should be trying to extract basic operations from our grammatical analyses as these will be what CNSers will be interested in trying to find. In other words, the interesting result from a CNS perspective is not a specification of how a complicated set of interactions work, but isolating the core mechanisms that are doing the interacting. And this implies, I believe, trying to unify the various kinds of operations and modules and entities we find (e.g. in a theory like GB) to a very small number of core operations (in the best case just one). DMWWP’s program aims at this level of grain, as does MP and that is why they look like a good fit.

Third, as any MPer knows, FL is not just Merge. There are other operations. It is useful to consider how we might analyze linguistic phenomena that are Merge recalcitrant in these terms. Feature checking and algebraic structures seem made for each other. Maybe memory limitations could undergird something like phases (see DMWWP discussion of a Marcus suggestion on p. 11 that something like phases chunk large trees into “overlapping but incompletely bound subtrees”). At any rate, getting comfortable with the kinds of mental mechanisms extant in other parts of cognition and perception might help linguists focus on the central MP question: what basic operations are linguistically proprietary? One answer is: those operations required in addition to those that other animals have (e.g. time interval determination, ordinal sequencing, chunking, etc.).

This is a good paper, especially so because of where it appears (a very leading brain journal) and because it treats linguistic work as obviously relevant to the CNS of language. The project is basically Marr’s, and unlike so much CNS work, it does not try to shoehorn cognition (including language) into some predetermined conception of neural mechanism which effectively pretends that what we have discovered over the last 60 years does not exist.

[1] DMWWP notes that the real problem is dependencies in an unbounded nested tree. It is not merely the hierarchy, but the unboundedness (i.e. recursion) as well.

Tuesday, September 13, 2016

The Generative Death March, part 3. Whose death is it anyway?

I almost had a brain hemhorrage when I read this paragraph in the Scientific American piece that announced the death of generative linguistics:

“As with the retreat from the cross-linguistic data and the tool-kit argument, the idea of performance masking competence is also pretty much unfalsifiable. Retreats to this type of claim are common in declining scientific paradigms that lack a strong em­­pirical base—consider, for instance, Freudian psychology and Marxist in­­terpretations of history.”

Pretty strong stuff. Fortunately, I was able to stave off my stroke when I realized that this claim, i.e., that performance can’t mask competence, is possibly the most baseless of all of ITs assertions about generative grammar.

Consider the phenomenon of agreement attraction:

(1) The key to the cabinets is/#are on the table

The phenomenon is that people occasionally produce “are” and not “is” in sentences like these (around 8% of the time in experimental production tasks, according to Kay Bock) and they even fail to notice the oddness of “are” in speeded acceptability judgment tasks. Why does this happen? Well, Matt Wagers, Ellen Lau and Colin Phillips have argued that (at least in comprehension) this has something to do with the way parts of sentences are stored and reaccessed in working memory during sentence comprehension. That is, using an independently understood model of working memory and applying it to sentence comprehension these authors explained the kinds of agreement errors that English speakers do and do not notice. So, performance masks competence in some cases. 

Is it possible to falsify claims like this one? Well, sure. You would do so by showing that the independently understood performance system didn’t impact whatever aspect of the grammar you were investigating. Let’s consider, for example, the case of island-violations. Some authors (e.g., Kluender, Sag, etc) have argued that sentences like those in (2) are unacceptable not because of grammatical features but because of properties of working memory.

(2) a.  * What do you wonder whether John bought __?
b.  * Who did the reporter that interviewed __ win the Pulitzer Prize

So, to falsify this claim about performance masking competence Sprouse, Wagers and Phillips (2012) conducted an experiment to ask whether various measures of variability in working memory predicted the degree of perceived ungrammaticality in such cases. They found no relation between working memory and perceived ungrammaticality, contrary to the predictions of this performance theory. They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?

Now, in all fairness to IT, when they said that claims of performance masking competence were unfalsifiable, they were talking about children. That is, they claim that it is impossible for performance factors to be responsible for the errors that children make during grammatical development, or at least that claims that such factors are responsible for errors are unfalsifiable. Why children should be subject to different methodological standards than adults is a complete mystery to me, but let’s see if there is any merit to their claims.

Let’s get some facts about children’s performance systems on the ground. First, children are like adults in that they build syntactic representations incrementally. This is true in children ranging from 2- to 10-years old (Altmann and Kamide 1999, Lew-Williams & Fernald 2007, Mani & Huettig 2012, Swingley, Pinto & Fernald 1999; Fernald, Thorpe & Marchman 2010). Second, along with this incrementality children display a kind of syntactic persistence, what John Trueswell dubbed “kindergarten path effects”. Children show difficulty in revising their initial parse on the basis of information arriving after a parsing decision has been made. This syntactic persistence has been shown by many different research groups (Felser, Marinis & Clahsen 2003, Kidd & Bavin 2005, Snedeker & Trueswell 2004, Choi & Trueswell 2010, Rabagliati, Pylkkanen & Marcus 2013).  

These facts allow us to make predictions about the kinds of errors children will make. For example, Omaki et al (2014) examined English- and Japanese-learning 4-year-olds’ interpretations of sentences like (3).

(3) Where did Lizzie tell someone that she was going to catch butterflies?

These sentences have a global ambiguity in that the wh-phrase could be associated with the matrix or embedded verb. Now, if children are incremental parsers and if they have difficulty revising their initial parsing decisions, then we predict that English children should show a very strong bias for the matrix interpretation, since that interpretation would be the first one an incremental parser would access. And, we predict that Japanese children would show a very strong bias for the embedded interpretation, since the order of verbs would be reversed in that language. Indeed, that is precisely what Omaki et al found, suggesting that independently understood properties of the performance systems could explain children’s behavior. Clearly this hypothesis is falsifiable because the data could have come out differently.

A similar argument for incremental interpretation plus revision difficulties has also been deployed to explain children’s performance with scopally ambiguous sentences. Musolino, Crain and Thornton (2000) observed that children, unlike adults, are very strongly biased to interpret ambiguous sentences like (4) with surface scope:

(4) Every horse didn’t jump over the fence
a. All of the horses failed to jump (= surface scope)
b. Not every horse jumped (= inverse scope)

Musolino & Lidz (2006), Gualmini (2008) and Viau, Lidz & Musolino (2010) argued that this bias was not a reflection of children’s grammars being more restricted than adults’ but that other factors interfered in accessing the inverse scope interpretation. And they showed how manipulating those extragrammatical factors could move children’s interpretations around. Moreover, Conroy (2008) argued that a major contributor to children’s scope rigidity came from the facts that (a) the surface scope is constructed first, incrementally, and (b) children have difficulty revising initial interpretations. Support for this view comes from several adult on-line parsing studies demonstrating that children’s only interpretation corresponds to adults’ initial interpretation.  

Again, these ideas are easily falsifiable. It could have been that children were entirely unable to access the inverse scope interpretation and it could have been that other performance factors explained children’s pattern of interpretations. Indeed, the more we understand about performance systems, the better we are able to apportion explanatory force between the developing grammar and the developing parser (see Omaki & Lidz 2015 for review).

So, what IT must have meant was that imprecise hypotheses about performance systems are unfalsifiable. But this is not a complaint about the competence-performance distinction. It is a complaint about using poorly defined explanatory predicates and underdeveloped theories in place of precise theories of grammar, processing and learning. Indeed, we might turn the question of imprecision and unfalsifiability back on IT. What are the precise mechanisms by which intuition and analogy lead to specific grammatical features and why don’t these mechanisms lead to other grammatical features that happen not to be the correct ones? I’m not holding my breath waiting for an answer to that one.

Summing up our three-day march, we can now evaluate IT’s central claims.

1) Intuition and analogy making can replace computational theories of grammar and learning. 
Diagnosis: False. We have seen no explicit theory that links these “general purpose” cognitive skills to the kind of grammatical knowledge that has been uncovered by generative linguistics. Claims to the contrary are wild exaggerations at best.

2) Generative linguists have given up on confronting the linking problem.
Diagnosis: False. This problem remains at the center of an active community of generative acquisitionists. Claims to the contrary reflect more about ITs ability to keep up with the literature than with the actual state of the field.

3) Explanations of children’s errors in terms of performance factors are unfalsifiable and reflect the last gasps of a dying paradigm.
Diagnosis: False. The theory of performance in children has undergone an explosion of activity in the past 15 years and this theory allows us to better partition children’s errors into those caused by grammar and those caused by other interacting systems.

IT has scored a trifecta in the domain of baseless assertions. 

Who’s leading the death march of declining scientific paradigms, again?

Monday, September 12, 2016

The Generative Death March, part 2.

I’m sitting here in my rocking chair, half dozing (it’s hard for me to stay awake these days) and I come across this passage from the Scientific American piece by Ibbotson and Tomasello (henceforth IT):

“And so the linking problem—which should be the central problem in applying universal grammar to language learning—has never been solved or even seriously confronted.”

Now I’m awake. To their credit, IT correctly identifies the central problem for generative approaches to language acquisition. The problem is this: if the innate structures that shape the ways languages can and cannot vary are highly abstract, then it stands to reason that it is hard to identify them in the sentences that serve as the input to language learners. Sentences are merely the products of the abstract recursive function that defines them, so how can one use the products to identify the function? As Steve Pinker noted in 1989 “syntactic representations are odorless, colorless and tasteless.” Abstractness comes with a cost and so we are obliged to say how the concrete relates to the abstract in a way that is transparent to learners.

And IT correctly notes that Pinker, in his beautifully argued 1984 book Language Learnability and Language Development, proposed one kind of solution to this problem. Pinker’s idea was based on the idea that there are systematic correspondences between syntactic representations and semantic representations. So, if learners could identify the meaning of an expression from the context of its use, then they could use these correspondences to infer the syntactic reprentations. But, of course, such inferences would only be possible if the syntax-semantics correspondences were antecedently known. So, for example, if a learner knew innately that objects were labeled by Noun Phrases, then hearing an expression (e.g., “the cat”) used to label an object (CAT) would license the inference that that expression was a Noun Phrase. The learner could then try to determine which part of that expression was the determiner and which part the noun. Moreover, having identified the formal properties of NPs, certain other inferences would be licensed for free. For example, it is possible to extract a wh-phrase out of the sentential complement of a verb, but not out of the sentential complement of a noun:

(1) a. Who did you [VP claim [S that Bill saw __]]?
b.   * Who did you make [NP the claim [S that Bill saw __]]?

Again, if human children knew this property of extraction rules innately, then there would be no need to “figure out” (i.e., by general rules of categorization, analogy, etc) that such extractions were impossible. Instead, it would follow simply from identifying the formal properties that identified an expression as an NP, which would be possible given the innate correspondences between semantics and syntax. This is what I would call a very good idea. 

Now, IT seems to think that Pinker’s project is widely considered to have failed [1]. I’m not sure that is the case. It certainly took some bruises when Lila Gleitman and colleagues showed that in many cases, even adults can’t tell from a context what other people are likely to be talking about. And without that semantic seed, even a learner armed with Pinker’s innate correspondence rules wouldn’t be able to grow a grammar. But then again, maybe there are a few “epiphany contexts” where learners do know what the sentence is about and they use these to break into the grammar, as Lila Gleitman and John Trueswell have suggested in more recent work. But the correctness of Pinker’s proposals is not my main concern here. Rather, what concerns me is the 2nd part of the quotation above, the part that says the linking problem has not been seriously confronted since Pinker’s alleged failure [2]. That’s just plain false.

Indeed, the problem has been addressed quite widely and with a variety of experimental and computational tools and across diverse languages. For example, Anne Christophe and her colleagues have demonstrated that infants are sensitive to the regular correlations between prosodic structure and syntactic structure and can use those correlations to build an initial parse that supports word recognition and syntactic categorization. Jean-Remy Hochmann, Ansgar Endress and Jacques Mehler demonstrated that infants use relative frequency as a cue to whether a novel word is likely to be a function word or a content word. William Snyder has demonstrated that children can use frequent constructions like verb-particle constructions as a cue to setting an abstract parameter that controls the syntax of a wide range of complex predicate constructions that may be harder to detect in the environment. Charles Yang has demonstrated that the frequency of unambiguous evidence in favor of a particular grammatical analysis predicts the age of acquisition of constructions exhibiting that analysis; and he built a computational model that predicts that effect. Elisa Sneed showed that children can use information structural cues to identify a novel determiner as definite or indefinite and in turn use that information to unlock the grammar of genericity. Misha Becker has argued that the relative frequency of animate and inanimate subjects provides a cue to whether a novel verb taking an infinitival complement is treated as a raising or control predicate, despite their identical surface word orders. In my work with Josh Viau, I showed that the relative frequency of animate and inanimate indirect objects provides a cue to whether a given ditransitive construction treats the goal as asymmetrically c-commanding the theme or vice versa, overcoming highly variable surface cues both within and across languages. Janet Fodor and William Sakas have built a large scale computational simulation of the parameter setting problem in order to illustrate how parameters could be set, making important predictions for how they are set. I could go on [3].

None of this work establishes the innateness of any piece of the correspondences. Rather it shows that it is possible to use the correlations across domains of grammar in order to make inferences on the basis of observable phenomena in one domain to the abstract representations of another.  The Linking Problem is not solved, but there is a large number of very smart people working hard to chip away at it. 

The work I am referring to is all easily accessible to all members of the field, having been published in the major journals of linguistics and cognitive science. I have sometimes been told, by exponents of the Usage Based approach and their empiricist cousins, that this literature is too technical, that, “you have to know so much to understand it.” But abbreviation and argot are inevitable in any science, and a responsible critic will simply have to tackle it. What we have in IT is an irresponsible cop out from those too lazy to get out of their armchairs.

I think it’s time for my nap. Wake me up when something interesting happens.


[1] IT also thinks that something about the phenomenon of ergativity sank Pinker’s ship, but since Pinker spent considerable time in both his 1984 and 1989 books discussing that phenomenon, I think these concerns may be overstated.

[2] You can sign me up to fail like Pinker in a heartbeat.

[3] A reasonable review of some of this literature, if I do say so myself, can be found in Lidz and Gagliardi (2015) How Nature Meets Nurture: Statistical Learning and Universal Grammar. Annual Reviews of Linguistics 1. Also, the new Oxford Handbook of Developmental Linguistics (edited by Lidz, Snyder and Pater) is also full of interesting probes into the linking problem and other important concerns.

Sunday, September 11, 2016

Universals: a consideration of Everett's full argument

I have consistently criticized Everett’s Piraha based argument against Chomsky’s conception about Universal Grammar (UG) by noting that the conclusions only follow if one understands ‘universal’ in Greenberg rather than Chomsky terms (e.g. see here).  I have recently discovered that this is correct as far as it goes, but it does not go far enough. I have just read this Everett post, which indicates that my diagnosis was too hasty. There is a second part to the argument and the form is actually one of a dilemma: either you understand Chomsky’s claims about recursion as a design feature of UG in Greenbergian terms OR Chomsky’s position is effectively unfalsifiable (aka: vacuous). That’s the full argument.  I (very) critically discuss it in what follows. The conclusion is that not only does it fail to understand the logic of a CU conception of universal, it also presupposes a rather shallow Empiricist conception of science, one in which theoretical postulates are only legitimate if directly reflected in surface diagnostics. Thus, Everett’s argument gains traction only if one mistakes Chomsky Universals (CU) for Greenberg Universals (GUs), misunderstands what kind of evidence is relevant for testing CUs and/or tacitly assumes that only GUs are theoretically legit conceptions in the context of linguistic research. In short, the argument still fails, even more completely than I thought.

Let’s give ourselves a little running room by reviewing some basic material. GUs are very different from CUs. How so?

GUs concern the surface distributional properties of the linguistic objects (LO) that are the outputs of Gs. They largely focus on the string properties of these LOs.  Thus, one looks for GUs by, for example, looking at surface distributions cross linguistically. For example, one looks to see if languages are consistent in their directionality parameters (If ‘OP’ then ‘OV’). Or if there are patterns in the order in nominals of demonstratives, modifiers and numerals wrt the heads they modify (e.g. see here for some discussion).

CUs specify properties of the Faculty of Language (FL). FL is the name given to the mental machinery (whatever its fine structure) that outputs a grammar for L (GL) given Primary Linguistic Data from language L (PLDL). FL has two kinds of design features. The linguistically proprietary ones (which we now call UG principles) versus the domain general ones, which are part of FL but not specific to it. GGers  investigate the properties of FL by, first, investigating the properties of language particular Gs and second, via the Poverty of the Stimulus argument (POS). POS aims to fix the properties of FL by seeing what is needed to fill the gap between information provided about the structure of Gs in the PLD and the actual properties that Gs have. FL has whatever structure is required to get the Language Acquisition Device (LAD) from PLDL to GL for any L. Why any L? Because any kid can acquire any G when confronted with the appropriate PLD. 

Now on the face of it, GUs and CUs are very different kinds of things. GUs refer to the surface properties of G outputs. CUs refer to the properties of FL, which outputs Gs. CUs are ontologically more basic than GUs[1] but GUs are less abstract than CUs and hence epistemologically more available.

Despite the difference between GUs and CUs, GGers sometimes use string properties of the outputs of Gs to infer properties of the Gs that generate these LOs. So, for example, in Syntactic Structures Chomsky argues that human Gs are not restricted to simple finite state grammars because of the existence of sentences that allow non-local dependencies of the sort seen in sentences of the form ‘If S1 then S2’. Examples like this are diagnostic of the fact that the recursive Gs native speakers can acquire must be more powerful than simple FSGs and therefore that FL cannot be limited to Gs with just FSG rules.[2]  Nonetheless, though GUs might be useful in telling you something about CUs, the two universals are conceptually very different, and only confusion arises when they are run together.

All of this is old hat, and I am sorry for boring you. However, it is worth being clear about this when considering the hot topic of the week, recursion, and what it means in the context of GUs and CUs. Let’s recall Everett’s dilemma. Here is part 1:

1.     Chomsky claims that Merge is “a component of the faculty of language,” (i.e. that it is a Universal).[3]
2.     But if it is a universal then it should be part of the G of every language.
3.     Piraha does not contain Merge.
4.     Therefore Chomsky is wrong that Merge is a Universal.

This argument has quite a few weak spots. Let’s review them.

First, as regards the premises (1) and (2), the argument requires assuming that if something is part of FL, a CU, then it appears in every G that is a product of FL. For unless we assume this, it cannot be that the absence of Merge in Piraha is inconsistent with the conclusion that it is part of FL. But, the assumption that Merge is a CU does not imply that it is a GU. It simply implies that FL can construct Gs with that embody (recursive) Merge.[4] Recall, CUs describe the capacities of the LAD not its Gish outputs. FL can have the capacity to construct Merge containing Gs even if it can also construct Gs that aren’t Merge containing Gs. Having the capacity to do something does not entail that the capacity is always (or even ever) used. This is why a claim like Everett’s that argues from (3), the absence of Merge in the G of Piraha, does not argue against Merge as part of FL.

Second, what is the evidence that Merge is not part of Piraha’s G? Everett points to the absence of “recursive structures” in Piraha LOs (2). What are recursive structures? I am not sure, but I can hazard a guess. But before I do so, let me note that recursion is not properly a predicate of structures but of rules. It refers to rules that can take their outputs as inputs. The recursive nature of Merge can be seen from the inductive definition in (5):

5.   a. If a is a lexical item then a is a Syntactic Object (SO)
b. If a is an SO and b is an SO then Merge(a,b) is an SO

With (5) we can build bigger and bigger SOs, the recursive “trick” residing in the inductive step (5b). So rules can be recursive. Structures, however, not so much. Why? Well they don’t get bigger and bigger. They are what they are. However, GGers standardly illustrate the fact of recursion in an L by pointing to certain kinds of structures and these kinds have come to be fairly faithful diagnostics of a recursive operation underlying the illustrative structures. Here are two examples both of which have a phrase of type A embedded in another one of type A.

6.     S within an S: e.g. John thinks that Bill left in which the sentence Bill left is contained within the larger sentence John thinks that Bill left.
7.     A nominal within a nominal: e.g.  John saw a picture of a picture where the nominal a picture is contained within the larger nominal a picture of a picture.

Everett follows convention and assumes that structures of this sort are diagnostic of recursive rules. We might call them reliable witnesses (RW) for recursive rules. Thus (6)/(7) are RWs for the claim that the rule for S/nominal “expansion” can apply repeatedly (without bound) to their outputs.

Let’s say this is right. It does not imply that the absence of RWs implies the absence of recursive rules. As it is often said: the absence of evidence is not evidence of absence. Merge may be applying even though we can find no RWs diagnostic of this fact in Piraha.

Moreover, Everett’s post notes this. As it says: “…the superficial appearance of lacking recursion does not mean that the language culd not be derived from a recursive process like Merge. And this is correct” (2-3). Yes it is. Merge is sufficient to generate the structures of Piraha. So, given this, how can we know that Piraha does not employ a Merge like operation?

So far as I can tell, the argument that it doesn’t is based on the assumption that unless one has RWs for some property X one cannot assume that X is a characteristic of G. So absent visible “recursive structures” we cannot assume that Merge obtains in the G that generates these structures. Why?  Because Merge is capable of generating unboundedly big (long and deep) structures, and we have no RWs indicating that the rule is being recursively applied. But, and this I really don’t get; the fact that Merge could be used to generate “recursive structures” does not imply that in any given G it must so apply. So how exactly does the absence of RWs for recursive rule application in Piraha (note, I am here tentatively conceding that Everett’s factual claims might be right (which is likely incorrect, for they are likely wrong)) show that Merge is not part of a Piraha G? Maybe Piraha Gs can generate unboundedly large phrases but then applies some filters to the outputs to limit what surfaces overtly (this in fact appears to be Everett’s analysis).[5] In this sort of scenario, the Merge rule is recursive and can generate unboundedly large SOs but the interfaces (to use minimalist jargon) filters these out preventing the generated structures from converging. On this scenario, Piraha Gs are like English or Braizilain Portuguese or … Gs, but for the filters.

Now, I am not saying that this is correct. I really don’t know and I leave the relevant discussions to those that do.[6] But, it seems reasonable and if this is indeed what the right G analysis for Piraha is, then it too contains a recursive rule (aka, Merge) though because of the filters it does not generate RWs (i.e. the whole Piraha G does not output “recursive structures”).

Everett’s post rejects this kind of retort. Why? Because such “universals cannot be seen, except by the appropriate theoretician” (4). In other words, they are not surface visible, in contrast to GUs, which are. So, the claim is that unless you have a RW for a rule/operation/process you cannot postulate that rule/operation/process exists within that G. So, absence of positive evidence for (recursive) Merge within Piraha is evidence against (recursive) Merge being part of Piraha G. That’s the argument. The question is why anyone should accept this principle?

More exactly, why in the case of studying Gs should we assume that absence of evidence is evidence of absence, rather than, for example, evidence that more than Merge is involved is yielding the surface patterns attested. This is what we do in any other domain of inquiry. The fact that planes fly does not mean that we throw out gravity. The fact that balls stop rolling does not mean that we dumb inertia, the fact that there are complex living systems does not mean that entropy doesn’t exist. So why should the absence of RWs for recursive Merge in Piraha imply that Piraha does not contain Merge as an operation?

In fact, there is a good argument that it does. It is that many many other Gs have  RWs for recursive Merge (a point that Everett accepts). So, why not assume that Piraha Gs do too? This is surely the simplest conclusion (viz. that Piraha Gs are just like other Gs fundamentally) if it is possible to make this assumption and still “capture” the data that Everett notes.  The argument must be that this kind of reply is somehow illicit. What could license the conclusion that it is?

I can only think of one: that universals just are summaries of surface patterns. If so, then  without  surface patterns that are RWs for recursion in a given G means that there are no recursive rules in that G for there is nothing to “summarize.” All G generalizations must be surface “true.” The assumption is that it is scientifically illicit to postulate some operation/principle/process whose surface effects are “hidden” by other processes. The problem then with considering a theory according to which Merge applies in Piraha but its effects are blocked so that there are no RW-like “recursive structures” to reliably diagnose that it is there is that such an assumption is unscientific!

Note that this methodological principle applied in the real sciences would be considered laughable. Most of 19th century astronomy was dedicated to showing that gravitation regulates planetary motion despite the fact that planets do not appear to move in accord with the inverse square law. The assumption was made that some other mass was present and it was responsible for the deviant appearances. That’s how we discovered Neptune (see here for a great discussion). So unless one is a methodological dualist, there is little reason to accept Everett’s presupposed methodological principle.

It is worth noting that an Empiricist is likely to endorse this kind of methodological principle and be attracted to a Greenberg conception of universals. If universals are just summaries of surface patterns then absent the pattern there is no universal at play.

Importantly, adopting this principle runs against all of modern GG, not just the most recent minimalist bit. Scratch any linguist and s/he will note that you can learn a lot about language A by studying language B. In particular, modern comparative linguistics within GG assumes this as a basic operating principle. It is based on the idea that Gs are largely similar and that what is hard to see within the G of language A might be pretty easy to observe in that of language B. This, for example, is why we often conclude that Irish complementizer agreement tells us something about how WH movement operates in English, despite there being very little (no?) overt evidence for C to C movement in English. Everett’s arguments presuppose that all of this reasoning is fallacious. His is not merely an argument against Merge, but a broadside on virtually all of the cross-linguistic work within GG for the last 30+ years.

Thankfully, the argument is really bad. It either rests on a confusion between GUs and CUs or rests on bogus (dualist) methodological principles. Before ending, however, one more point.

Everett likes to say that a CU conception of universals is unfalsifiable. In particular, that a CU view of universals robs these universals of any “predictive power” (5). But this too is false.  Let’s go back to Piraha.

Say you take recursion to be a property of FL then what would you conclude if you ran into speakers that spoke a language without RWs for that universal? You would conclude that they could learn a language where that universal has clear overt RWs. So, assume (again only for the sake of discussion) that you find a Piraha speaker sans recursive G. Assuming that recursion is part of FL you predict that such speakers could acquire Gs that are clearly recursive. In other words, you would predict that Piraha kids would acquire, to take an example at random, Brazilian Portuguese just like non-Piraha kids do. And, as we know, they do! So, taking recursion to be a property of FL makes a prediction about the kinds of Gs LADs can/do acquire. And these predictions seem to be correct. So, postulating CUs does have empirical consequences and it does make predictions, it’s just that it does not make predictions about whether CUs will be surface visible in every L (i.e. provide RWs in every L) and there is no good reason that they should.

Everett complains in this post that people reject his arguments because they confuse GUs and CUs and that this is incorrect (i.e. they don’t make this confusion). However, it is clear that there is lots of other confusion and lots of methodological dualism and lots of failure to recognize the kinds of “predictions” a CU based understanding of universals does make. Both prongs of the dilemma that the argument against CUs rests on collapse on pretty cursory inspection. There is no there there.

Let me end with one more observation, one I have made before. That recursion is part of FL is not reasonably debatable. What kind of recursion there is, and how it operates is very debatable. The fact is not debatable because it is easy to see its effects all around you. It’s what Chomsky called linguistic productivity (LP). LP, as Chomsky has noted repeatedly, requires that linguistic competence involve knowledge of a G with recursive rules. Moreover, that any child can acquire any language implies that every child comes equipped to the language acquisition task with the capacity to acquire a recursive G. This means that the capacity to acquire a recursive G (i.e. to have an operation like Merge) must be part of every human FL.  This is a near truism and, as Chomsky (and many others, including moi) have endlessly repeated, it is not really contestable. But there is a lot that is contestable. What kind of rules/operations do Gs contain (e.g. FSGs, PSGs, TGs, MPs?)? Are these rules/operations linguistically proprietary (i.e. part of UG or not?)? How do Gs interact with other cognitive systems, etc.? These are all very hard and interesting empirical questions which are and should be vigorously debated (and believe me, they are). The real problem with Everett’s criticism is that it has wasted a lot of time by confusing the trivial issues with the substantive ones. That’s the real problem with the Piraha “debate.” It’s been a complete waste of time.

[1] By this I mean that whereas you are contingently a speaker of English and English is contingently SVO it is biologically necessary that you are equipped with an FL. So, Norbert is only accidentally a speaker of English (and so has a GEnglish) and it is only contingently the case that English is SVO (it could have been SOV as it once was). But it is biologically necessary that I have an FL. In this sense it is more basic.
[2] Actually, they are diagnostic on the assumption that they depict one instance of an unbounded number of sentences of the same type. If one only allows finite substitutions in the S positions then a more modest FSG can do the required work.
[3] Quote from 202 Science paper with Fitch and Hauser.
[4] I henceforth drop the bracketed modifier.
[5] Thanks to Alec Marantz for bringing this to my attention.
[6] This point has already been made By Nevins, Pesestsky and Rodrigues in their excellent paper.