Friday, September 23, 2016

Chomsky was wrong

An article rolled across my inbox a few weeks ago entitled Chomsky was Wrong. Here's a sequence of six short bits on this article.

Yes, Chomsky said this

When I read the first sentence that new research disproved Chomsky's claim that English is easy - that it turns out English is a hard language - I thought it was a parody. To start with, Chomsky said no such thing. But, indeed, it's about a real paper - even if that paper is about English orthography. In SPE (Sound Pattern of English), Chomsky and Halle claimed that English orthography was "near-optimal": that it reflected pretty closely the lexical representations of English words, except where the pronunciation was not predictable.

That's not what you expect to be reading about when you read about Chomsky. For one thing, there's a reason that there's a rumour Chomsky didn't even write any of SPE. The rumour is pretty clearly false, but Chomsky never worked in phonology again, and he certainly didn't write anything close to all of SPE. For another thing, after all the attacks on "Chomsky's Universal Grammar," it's jarring to read a rebuttal of a specific claim.

But here it is. Let's give both credit and blame where credit is due. Chomsky's name's on the book, so he's responsible for what's in it. If it's wrong, then fine. Chomsky was wrong.

The paper in question is sound

The paper in question is by Garrett Nicolai and Greg Kondrak of the University of Alberta, and it's from the 2015 NAACL (North American Association for Computational Linguistics), linked here.

Nicolai and Kondrak have a simple argument. Any spelling system that's isomorphic to the lexical representations of words should also be "morphologically consistent." That is, the spelling of any given morpheme should be the same across different words. Of course: because multi-morphemic words, at least according to SPE, are built by combining the lexical representations of their component morphemes. Regardless of what those lexical representations are, any spelling that reflects them perfectly will have perfect morphological consistency. English spelling, it turns out, doesn't have this property.

As Chomsky and Halle observed, though, there's a reason that this perfect transparency might not hold for a real spelling system: the spelling system might also want to tell you something about the way the word is actually pronounced. In words like deception, pronounced with [p] but morphologically related to deceive, which is pronounced with [v], you can have a morphologically consistent spelling in which the dece- morpheme has a p in deception, or in which it has a in deceive, (or neither), but you can't have both and still be morphologically consistent. And yet, morphological consistency can make the pronunciation pretty opaque. And so Nicolai and Kondrak have a way to evaluate what's driving English spelling's lack of morphological consistency is perhaps a dose of reader-saving surface-pronunciation transparency. It's not.

The paper is nice because it gets around the obvious difficulty in responding to arguments from linguists, which is that they are usually tightly bound to one specific analysis. Here the authors have found a way to legitimately skip this step (reflecting the underlying forms - thus at least being morphologically consistent - except when really necessary to recover the surface pronunciation). It's a nice approach, too. The argument rests on the constructibility a pseudo-orthography for English which maximizes morphological consistency except when it obscures the pronunciation - exactly what you would expect from a "near-optimal" spelling system - a system that turns out to have much higher levels of both morphological consistency and surface-pronunciation transparency than traditional English orthography. I review some of the details of the paper - which isn't my main quarry - at the bottom below for the interested.

Hanlon's razor

Sometimes you see scientific work obviously distorted in the press and you say, I wonder how that happened - I wonder what happened between the interview and the final article that got this piece so off base. No need to wonder here.

A piece about this research (a piece which was actually fine and informative) appeared on the University of Alberta news site (Google cached version) about a year after the paper was published. Presumably, the university PR department came knocking on doors looking for interesting research. The university news piece was then noticed by CBC Edmonton, who did an interview with Greg Kondrak on the morning show and wrote up a digested version online. The author of this digested version evidently decided to spice it up with some flippant jokes and put "Chomsky was wrong" in the headline with a big picture of Chomsky, because people have heard of Chomsky.

The journalist didn't know too much about the topic, clearly. In an earlier version Noam Chomsky was "Norm Chomsky," Morris Halle was "Morris Hale," and the photo caption under Chomsky was missing the important context - about how the original claim was, in the end, an insignificant one - and so one could read, below Chomsky's face, the highschool-newspaper-worthy "This is the face of a man who was wrong." And, predictably, "English spelling" is conflated with "English", leading to the absurd claim that "English is 40 times harder than Spanish."

The awkward qualification that now appears in the figure caption ("This is the face of a man who - in a small segment from a book published in 1968 - was wrong") bears the mark of some angry linguists with pitchforks complaining to the CBC. Personally, I don't know about the pitchforks. Once I realized that the paper was legit, I wasn't able to muster raising a hackle about the CBC article. It doesn't appear to be grinding an axe, just a bad piece of writing. If it weren't for the fact that, in many other quarters, the walls echo with popular press misinformation about generative linguistics which is both damaging and wrong, this article wouldn't even be a remote cause for concern.

It is possible to talk about Chomsky being wrong without trying to sell me the Brooklyn Bridge

The Nicolai and Kondrak paper, and the comments they gave to the university news site, show that you can write something that refutes Chomsky clearly, and in an accurate, informed, and mature way. The content of the paper demonstrates that they know what they're doing, and have thought carefully about what Chomsky and Halle were actually saying. In the discussion, nothing is exaggerated, and no one is claiming to be the winner or exaggerating their position as the great unlocker of things.

Contrast this with Ibbotson and Tomasello's Scientific American article. Discussed by Jeff in a three-part series recently on the blog by Jeff, it purports to disprove Chomsky. It should be possible to write a popular piece summarizing your research program without fleecing the reader, but they don't. When the topic is whether Chomsky is right or wrong, fleece abounds.

Let me just take three egregious instances of simple fact checking in the Scientific American article:
  • "Recently ... cognitive scientists and linguists have abandoned Chomsky’s 'universal grammar' theory in droves"
    • (i) abandoned - as in, previously believed it but now don't - (ii) in droves - as in, there are many who abandoned all together - and (iii) recently. I agree that there are many people who reject Chomsky, and (i) is certainly attested over the last 60 years, but (ii) and (iii), or any conjunction of them, is totally unfounded. It feels like disdain for the idea that one should even have to be saying factually correct things - a Trump-level falsehood.
  • "The new version of the theory, called principles and parameters, replaced a single universal grammar for all the world’s languages with ..."
    • There was never any claim to a single grammar for all languages.
  • "The main response of universal grammarians to such findings [about children getting inversion right in questions with some wh-words but not others] is that children have the competence with grammar but that other factors can impede their performance and thus both hide the true nature of their grammar"
    • I know it's commonplace in describing science wars to make assertions about what your opponent said that are pulled out of nowhere, but that doesn't make it right. I so doubt that the record, if there is one, would show this to be the "main response" to the claim they're referring to, that I'm willing to call this out as just false. Because this response sounds like a possible response to some other claim. It just doesn't fit here. This statement sounds made up.

This article was definitely written by scientists. It contains some perfectly accurate heady thoughts about desirable properties of a scientific theory, a difficult little intellectual maze on the significance of the sentence Him a presidential candidate!?, and it takes its examples not out of noodling a-priori reasoning but actually out of concrete research papers. In principle, scientists are careful and stick to saying things that are justified. And yet, when trying to make the sale, the scientist feels no compunction about just making up convenient facts.

Chomsky and Halle's claim sucks

The statement was overblown to begin with. C&H are really asking for this to be torn down. Morphological-consistency-except-where-predictable is violated in the very examples C&H use to demonstrate the supposed near optimality, such as divine (divinE), related in the same breath to divinity (divin- NO e -ity), which is laxed under the predictable trisyllabic laxing rule.

But the claim can, I think, further, be said to "suck" in a deeper way in the sense that it

  1. is stated, in not all but many instances as it's raised throughout the book, as if the lexical forms given in SPE were known to be correct, not as if they were a hypothesis being put forward
  2. is backed up by spurious and easily defeasible claims, convenient for C&H if they were true - but not true.
Some examples of (2) are the claim on page 49 that "the fundamental principle of orthography is that phonetic variation is not indicated where it is predictable by general rule" - says who? - and the whopper in the footnote on page 184 (which also contains examples of (1)):

Notice, incidentally, how well the problem of representing the sound pattern of English is solved in this case by conventional orthography [NB: by putting silent e at the end of words where the last syllable is, according to the SPE analysis, [+tense], but leaving it off when it's [-tense], while, consistent with the SPE proposal, leaving the vowel symbol the same, in spite of the radical surface differences induced by the vowel shift that applies to [+tense] vowels]. Corresponding to our device of capitalization of a graphic symbol [to mark +tense vowels], conventional orthography places the symbol e after the single consonant following this symbol ([e] being the only vowel which does not appear in final position phonetically ...). In this case, as in other cases, English orthography turns out to be [!] rather close to an optimal system for spelling English. In other words, it turns out to be rather close to the true [!] phonological representation, given the nonlinguistic constraints that must be met by a spelling system, namely, that it utilize a unidimensional linear representation instead of the linguistically appropriate feature representation and that it limit itself essentially to the letters of the Latin alphabet.

Take a minute to think about the last statement. There are many writing systems that use two dimensions, including any writing system that uses the Latin alphabet with diacritics. In most cases, diacritics are used to signify something phonetically similar to a given sound, and, often, the same diacritic is used consistently to mark the same property across multiple segments, much like a phonological feature. Outside the realm of diacritics, Korean writing uses its additional dimension to mark several pretty uncontroversial phonological features. As far as being limited to letters of the Latin alphabet goes - let's assume this means for English, and not really for "spelling systems"  - just as with diacritics, new letters have been invented as variants of old ones, throughout history. My guess is that this has happened fairly often. And, after all - if you really felt you had to insert an existing letter to featurally modify an old one, presumably, you would stick the extra letter next to the one you were modifying, not as a silent letter at the end of the syllable. 

As for making it sound like the theory was proven fact, maybe it's not so surprising. Chomsky, in my reading, seems to hold across time pretty consistently to the implicit rhetorical/epistemological line that it's rarely worth introducing the qualification "if assumption X is correct." Presumably, all perceived "fact" is ultimately provisional anyway - so who could possibly be so foolish as to take any statement claiming to be fact at face value? I don't really know if Chomsky was even the one responsible for leaving out all the instances of "if we're correct, that is" throughout SPE. But it wouldn't surprise me. But Chomsky isn't alone in this - Halle indulges in the same, and, perhaps partly as a result, the simplified logic of 1960s generative phonology to suppose on the basis of patterns observed in a dictionary that one has all the evidence one needs about the generalizations of native speakers is still standard, drawing far too little criticism in phonology. Hedging too much in scientific writing is an ugly disease, but there is such a thing as hedging too little.

In the current environment, where we're being bombarded with high profile bluster about how wrong generative linguistics is, it's worth taking a lesson from SPE in what not to do. Chomsky likes to argue and he's good at it. Which means you never really lose too much when you see him valuing rhetoric over accuracy. He's fun and interesting to read, and the logic of whatever he's saying is worth pursuing further, even if what he's saying is wrong. And you know he knows. But if an experimental paper rolled across my desk to review and it talked about its conclusions in the same way SPE does, only the blood, of the blood, sweat and tears that would go into the writing of my review, would be metaphorical.

Make no mistake - if it comes to a vote between a guy who's playing complicated intellectual games with me and a simple huckster, I won't vote for the huckster. But I won't be very happy. Every cognitive scientist, I was once cautioned, is one part scientist and one part snake oil salesman.

Nicolai and Kondrak is a good paper, and, notably, despite being a pretty good refutation of a claim of Chomsky's, it's a perfectly normal paper, in which the Chomsky and Halle claim is treated as a normal claim - no need for bluster. And the CBC piece about it is a lesson. If you really desire your scientific contribution to be coloured by falsehood and overstatement, you're perfectly safe. You have no need to worry, and there's no need to do it yourself. All you have to do is send it to a journalist.

Some more details of this paper

Here is a graph from Nicolai and Kondrak's paper of the aforementioned measures of morphological consistency ("morphological optimality") and surface-pronunciation transparency ("orthographic perplexity") - closer to the origin is better on both axes:

The blue x on sitting on the y axis, which has 1 for orthographic perplexity (but a relatively paltry 93.9 for morphemic optimality), is simply the IPA transcription of the pronunciation. The blue + sitting on the x axis, which has 100 for morphemic optimality (but a poor 2.51 for orthographic perplexity), is what you would obtain if you simply picked one spelling for each morpheme, and concatenated them as the spelling of morphologically complex words. (The measure of surface-pronunciation transparency is obviously sensitive to how you decide to spell each morpheme, but for the moment that's unimportant.)

Importantly, the blue diamond is standard English orthography ("traditional orthography", or T.O.), sitting at a sub-optimal 96.1 morphemic optimality and 2.32 orthographic perplexity (for comparison, SR and SS, two proposed spelling reforms, are given). On the other hand, Alg, the orange square, is a constructed pseudo-orthography that keeps one spelling for each morpheme except where the pronunciation isn't predictable, in which case as few surface details as possible are inserted, which leads to a much better orthographic perplexity of 1.33, while maintaining a morphemic optimality of 98.9. This shows that there's no obvious excuse for the lack of morphological consistency.

What keeps English orthography from being optimal? If you apply some well-known English spelling rules it's easy to see. If in your calculation of morphological consistency you ignore the removal the final silent e in voice etc which disappears in voicing, the spelling of panic (and other words that can be followed by -ing to violate the consistency of the pronunciation of -ci- as [s]) as panick instead, and the replacement of the y in industry and other similar words with i in industrial), along with a few other obvious changes, then English orthography pops up to 98.9 percent morphemic optimality, the same level as Alg.

Those spelling rules should attract the attention of anyone who's read SPE, as they're tangentially related to the vowel shift rule, the velar softening rule, and the final yod, but the fact is in all these cases the spelling, with respect to the SPE analysis of these words, reflects the lexical representation in the base form and the surface pronunciation in the derived form. Well, sure, which means that these alternations are presumably at least throwing the orthography a bone as far as its pronunciation transparency goes, but you can do way, way, way better. That's the point. English orthography may be better than it could be if it were maximally morphologically consistent, but it doesn't seem to be optimal.

For details of the measures you can have a look at the paper.


  1. On Chomsky not writing much of the Chomsky & Halle paper: do linguists use the physicist convention* or the biologist convention** or something else entirely? Being a biologist, I found myself assuming the biologist convention, but I don't actually know that.

    * Authors in alphabetical order.
    ** The first author did the biggest share of the work, and so on in decreasing order to the last author, who did the least work but may have provided the lab, the funding, and/or the basic idea. Often the first author is a previously unknown PhD student of the last author.

    1. It's a book, not a paper. It is hard to tell from the book how much of it is Chomsky and how much of it is Halle. While argumentation from phonology figures prominently in the early works of Chomsky (especially more philosophically-oriented stuff about the goals of linguistic theory, often in polemics against the structuralists) and Chomsky's masters thesis "The Morphophonemics of Modern Hebrew" was eventually published as a book, I don't actually remember a single-authored Chomsky paper entirely on phonology (maybe I have missed something here -- someone else could correct me on this).

      Linguists usually follow the "authors in alphabetical order" convention (though there are exceptions, like the Wexler and Culicover book), though in the more "cog sci"-oriented subfields (psycho, neuro), it's the other convention that seems to be used nowadays.

    2. He published a phonology paper in Language in 1967 ("Some General Properties of Phonological Rules"), which was a sort of trailer version of SPE (presenting the theory, not the analysis). As far as single-author papers, that's all I know of.

    3. Yeah, just found it. I would guess that Chomsky would have had a lot to do with the formalization of phonology (his papers on formal grammars are from 59-63), and SPE was "forthcoming" for a long time.