Friday, October 5, 2012

Three psychologists walk into a bar…

Linguists should applaud recent efforts by the Royal Society to inject humor into research on language.  Clearly, the Royals believe that linguists have taken their work far too seriously and in a welcome reversal of a previous policy of benign neglect, they have published a near pitch perfect parody (“How Hierarchical is language use”) by the trio of Frank, Bod and Christiansen (hereafter FBC). These authors breezily describe a research program aimed at unseating a fixed point in research on natural language that has been virtually uncontested for the last several hundred years: that sentences and phrases have both linear and hierarchical dimensions. Like all good parodists, they are undaunted by the obvious; in this case the fact that anyone who has ever examined language has concluded that words combine into phrases that combine into sentences. No, this dynamic trio, in the grand tradition of J. Swift and W. Allen, suggests that all we need are bi-grams and tri-grams.  Using the powerful methods of neuroscience, computational linguistics and behavioral psychology they propose that it’s all just one damn word after another, no hierarchy needed. Moreover, FBC do this without ever letting down their parodic cloak.  Indeed, the paper is so successfully crafted (it has such a vivid sense of seriousness) that as a public service I believe it is necessary to affix to it the right warning label just in case those with arthritic funny bones are misled, take this exquisite lampoon too seriously, and thereby wander into an intellectual desert.

The paper, like all good parody has a central rhetorical thread.  The first strand is that linguistic hierarchy poses a problem for evolution due to its biological sui genericity.  The second, as FBC note, is that linguistic use cares about sentential sequence. The third combines the two to conclude that because linguistic hierarchy is evolutionarily problematic (in contrast to sequence information) it couldn’t possibly have evolved (i.e. arisen in humans) and so linguistic systems can’t really have any.  Conclusion; linguistic use is sensitive exclusively to sequence information, as it must be. The satire requires that you inadvertently slip to the conclusion that if language use exploits linear properties of sentences and phrases then hierarchy is dispensable for all linguistic analysis (after all: if you can hop on one leg who needs two legs?). Occam and his razor are invoked to make sure that after you slip to their desired conclusion you don’t jump back up incredulous.  It’s all neatly done and very amusing.

Let’s pull the conceit apart to better admire its artistry. FBC know that linguists from the thirteenth century grammarians, through Bloomfield and Harris in the mid 1950s to Chomsky today have all taken it as obvious that natural language grammars are hierarchically organized, labeled brackets (or parse trees) being the instrument of choice to display this. The reader who is in on their send-up knows why. There are centuries of data in its favor.  Here’s a taste.

Such bracketing allows one to distinguish the two readings of ‘old men and women,’ the ambiguity of ‘I photographed a woman with a camera,’ and the three readings in ‘I saw the girl sitting on the stoop.’ The several readings can be easily coaxed from these sentences, even if some jump to the ear faster than others.  So, if one’s interest is in accounting for how the same string of words can carry several readings labeled brackets (or equivalently parse trees) are very very handy.  And not only for this. Once one takes even a modestly serious look at language (something no satirist should do as too much seriousness wrong foots the parodic muse), one finds an inexhaustible number of intra-sentential relations that seem to supervene on hierarchical rather than linear properties of phrases and sentences.  This even has a name; grammatical rules are structure dependent.  Here’s a short (and not exhaustive) list of phenomena that advert to hierarchical structure: Aux fronting, WH question formation, topicalization, focus movement, VP fronting, VP ellipsis, reflexive binding, pronoun obviation, passivization, raising, negative concord, sluicing, parasitic gap licensing, donkey anaphora, island effects, …etc., etc., etc. In fact, it is almost impossible to find a syntactic phenomenon that fails to exploit hierarchical relations.

Knowing all of this, what does a good satirist do? Ignore and misrepresent. So for example, FBC find evidence from neuroscience and psychology that there are linear effects in language use.  Yup, those neuro guys using their big expensive fMRIs are able to finally show that Broca’s area lights up (I hope the pictures of Broca’s area are in purple, as red, blue and green are so last year) when both unimpaired and aphasic humans parse a sentence left to right. Even more astounding, Broca’s area also lights up in processing music! Wow, call the Nobel committee. This is a breathtaking discovery. Who would have thought that order left/right a difference might understanding in a sentence make!! Thank goodness for “repetitive transcranial magnetic stimulation” techniques for without these the relevance of linear order to parsing sentences would have surely remained hidden from view. You gotta love this.  Said with a straight pen this is very funny stuff.

But wait, there’s more. FBC (no doubt channeling Anthony Trollope who made a similar observation in his 1883 autobiography) note that linear order can affect the recognition of agreement dependencies.  So we say and hear approvingly The coat with the ripped cuffs were hanging in the closet rather than was hanging in the closet when the linearly nearest nominal is plural rather than singular; an interesting and curious effect that implicates linear proximity in assessing agreement. This demonstrates that those linguists who natter on about hierarchy being relevant for coding agreement effects are just obtuse.  Of course, the budding satirist should take note here and learn how to use data to misdirect.  Quietly kick under a nearby rug the fact that The doors in this compound were/*is closed does not pattern like the master example or that speakers when asked to assess the acceptability of the first sentence above with was in place of were rate it no worse than were, or that for many sentences proximity is irrelevant (e.g. The book that the boys liked was/*were long). No good parody lingers over complications: if linear order rules in one example then it does so everywhere, every time. Disagree and Occam (Sweeny Todd style) will gut you with his razor.  Anyone with aspirations to satire has to love FBC senseis' demonstrated artistry.

Consider one last wonderful manoeuver.  Chomsky fathered one of the canonical arguments for structure dependence based on aux inversion in Yes/No (Y/N) question.  The argument is as follows. Consider how Y/N questions are formed in English. Here are some examples:
(1)       a. Can John come
b. Will Mary sing
c. Is Frank kissing Sue
Using (1a-c) as “data” what kind of rule would one form? Easy: take the aux (i.e. “helping verb”) and move it to the front.  Question: which aux? Easy to answer in the examples in (1) as there is but one.  So let’s consider more complex forms.  For example, what is the correct form of the Y/N question taking (2a) as an answer?  It is clearly (2b) not (2c):
(2)       a. John is saying that Mat can swim
            b. Is John saying that Mat can swim
            c. *Can John is saying that Mat swim
Ok, it seems that the rule needs to specify which helping verb to take when there is more than one.  Here are two possible answers: take the one linearly closest to the front, i.e. the left most one. That works in correctly singling out (2b) from (2c). But, and there is always a ‘but’ isn’t there, what of yet more complex forms.  What do we do in (3a,b)?
(3)       a. The fact that John is sleeping should surprise Sue
            b. The man who Bill is talking to will surprise Sue
            c. That John was asleep all day might irritate Mary
Here if we move the leftmost helping verb, the one linearly closest to the front we get rather delightful word salad:
(4)       a. *Is the fact that John sleeping should surprise Mary
            b. *Is the man who Bill talking to will surprise Mary
            c. *Was that John asleep all day might irritate Mary
This indicates that the rule cannot be framed in terms of moving the linearly leftmost helping verb. So what’s the right restriction? Well it seems that we need to move the “highest” one, the one next to the subject.  The subject in (3a) is the fact that Bill is sleeping, in (3b) it is the man who Bill is talking to and in (3c) that John was asleep all day. So the right auxiliaries to move to form the unimpeachable (5a,b,c) are should, will and might respectively:
(5)       a. Should the fact that John is sleeping surprise Mary
            b. Will the man who Bill is talking to surprise Mary
            c. Might that John was asleep all day irritate Mary
Note, to get this right we invoke hierarchical notions: we need to treat several words (in fact the linear size of the subject is unbounded) as a single unit, i.e. the “subject.” Not surprisingly, the same rule works in the other cases as well. 

This short survey of some facts is, of course, not the whole story. However, it gives one a good flavor for the kind of problem that has convinced grammarians that hierarchical structure matters. And that humans are predisposed to exploit such structure in forming the rules of grammar.  And that this predisposition is built into the powers that humans bring to the task of acquiring and using language.  The reasoning is simple: the data that will tell the child to choose the hierarchical specification over the linear one is only available in examples like (4) and (5) and given that such sentences are virtually absent in the data available to the child learning the grammar it cannot be that the predisposition towards the hierarchical condition is data driven. 

Champions of the linear have analyzed this simple example to death (as Jerry Fodor once observed: this should teach Chomsky never to give a simple uncomplicated illustrative example!).  They have worked mightily to construct algorithms using bi- and tri-grams that can distinguish the relevant good and bad cases all the while eschewing hierarchical structure. FBC note this and get rid of all problems of hierarchy with the wave of reference or two.  Moreover, being first-rate parodists they keep hidden the fact that all these algorithms have a common problem. Were the facts opposite to those cited the relevant algorithms could “learn” these as well. So, it is only an accident that that there is no natural language that has a rule analogous to the one that generates (4). For FBC such a natural language would be no less humanly accessible than the ones we happen to find.  We even know what this linguistic “gap” would look like ((4) good, (5) bad).  Thus the attested linguistic holes on this view are purely accidental.  It is of course perfectly conceivable that the absence of well-formed structures like (4) is an accidental gap and that children could learn to form Y/N questions in this way.  Accidents do happen.  And some people can be sold famous bridges.  Such niceties however would clog up a good parody, and so FBC wisely put them aside.

There are other titivations that FBC artfully use to ornament their piece, to make it seem like they are really serious about the overall proposal. Good satire demands detail and a straight face.  However, I don’t want to ruin your pleasure in finding these bonbons for yourself.  What I would like to do is end with an appreciation of what I take the real tour de force of their masterpiece to be. As noted at the outset, FBC wrap the whole discussion in a delightful Darwinian wrapping. It seems that natural selection cannot digest the kinds of hierarchy found in grammars. Thus, linear relations it is or you are a creationist!! Like any good parody, this is more suggested than stated. But the evolutionary angle adds a welcome frisson to the discussion. So, not only is this paper really funny, but it smacks of the monumental. Just think, hierarchy:God:religious fanatic, linearity:Darwin:hard scientist.  Onto the ramparts! It’s time for some culture war.

Let me end with another round of kudos. This paper is a must read. Until I got through it I thought that the art of the academic lampoon was dead. FBC have proved me wrong. There are levels of silliness, stupidity and obtuseness left to plumb. Thanks to FBC and the Royal Society for demonstrating that parody and satire are still possible.


  1. I'm sympathetic to your motives, but I don't think mockery is the right way to deal with this kind of research, so I tried to address some of their claims in a more direct manner over at


  2. Mockery! I was taking their contribution as seriously as it deserves to be. It really was very funny and very well done. Surely your don't think they were being serious, do you?

  3. I do think they were being serious. I even think they have a couple of valid points, although the paper also has some serious flaws. I tried to address two of those flaws as explicitly as I could - that their parallel streams may turn out to nothing but hierarchical structure by another name once you allow for the kind of diacritics needed to make sure the switched happen at the right points, and that their argument from evolutionary continuity rests on the unspoken and quite possibly flawed assumption that non-linguistic processes are universally non-hierarchical in nature.

  4. Aside from what's already been said, what I find bizarre is how in one breath FBC say that hierarchy is disallowed on evolutionary grounds while in another breath say that everyone is equipped to use hierarchy for linguistic analysis. The difference? for FBC hierarchical analysis, assigning a structural description to a sentence (and its parts) is a case of special scrutiny by the organism.

    Obvious questions are begged. A sample of them are:
    1 - why, where does hierarchical cognition come from?
    2 - how does a person acquire it (for competence and performance)?
    3 - why is it present in linguistic cognition?
    4 - why should this ability be available on demand?

    the basic question is just reformulated in this paper + lots of hand waving.

    1. You ask some interesting questions. Answering them in any depth would of course require much more space than FBC had in their short paper. So maybe not quite fair slamming them for merely making SUGGESTIONS that could be an alternative to the Minimalist doctrine.

      Now maybe you can direct me to a source where minimalists answer these questions. It would seem for 'expressing my own thought to myself' none of this stuff is needed either and simple Merge alone [the only thing for which Chomsky "explains" how it evolved won't get us even close to the brain structures that ARE required. I have been asking for more than a year now what BIOlinguists HAVE discovered in terms of domain specific brain structures - so far just big silence...

    2. This comment has been removed by the author.

    3. The problems with FBC go far beyond any fight with Minimalism. The most fundamental support for the hierarchical nature of language is that in many languages, utterances fit a pattern of repetition of NP-like phrasal units interspersed with other stuff, such as, for English:

      (P NP) NP (aux)* V (NP) (NP) (P NP)*

      were the NPs themselves have a range of structural possibilities that recur in every position where the NPs are allowed, eg something like

      (Det) (Adj)* N (P NP)

      And while there might be more than one possible theory of the relatively simple manifestations of this that appear in certain genres such as sports news, any theory of the phenomena (including tendencies, which statistics tells us are really just as important as absolute rules) has to explain how there also appear far more complex manifestations of NP-within-NP, including the sometimes rather spectacular center-embeddings found in German and Greek.

      There are consistent and sensible ways of taking the position that sequencing constraints matter more than some current syntactic theories acknowledge; LFG with its use of phrase-structure rules has maintained this for decades, and, more recently, Jackendoff and Culicover argue for it in their 'Simpler Syntax' framework, proposing very flat phrase-structure rules (cf Culicover (2009) _Natural Language Syntax_ ch 4).

      FBC otoh seem to be contradicting both themselves and reality as linguists have mostly seen it, claiming on the one hand that a facility to manage hierarchical structure can't have evolved, but on the other sort of admitting that it sometimes seems to be there anyway, just not used very much, and producing no worked-out story about the phenomena that have led most linguists to think that it exists (I see zero chance that the frame-hopping story could be filled out in a workable way so as to be anything but yet another account of hierarchical structure, of which conventional PS rules and Minimalist Merge/Move are simply two inhabitants of a stable that contains a number of others).

      That the profession doesn't seem to have managed a coherent and non-humorous response to FBC seems to me to be a Bad Sign, & my theory of what it is a bad sign of is that we have not paid enough attention to what Chomsky called 'descriptive adequacy', in the sense of capturing the significant generalizations, which we can now reformulate as getting the prior over grammars more or less consistent with what we can observe (such as similar patterning of referring expressions in multiple positions, including inside of each other). Rather, we have been somewhat excessively mesmerized by the pursuit of rather fragile supposed explanations. ("We must move beyond mere explanation to actual description" - Bruce Hayes (2000), 'Phonetically grounded phonology',

    4. Avery, I did not mean to suggest that FBC wrote a flawless paper. i think it is not even particularly good. But that is no reason to trash it in the way Norbert did. Identify the problems and DEMONSTRATE [not merely assert] how minimalism [=Norbert's framework] solves them. And please remember that for biolinguists it is not merely enough to provide a story that is formally plausible [capturing significant generalizations by getting the prior over grammars +/- consistent with observed phenomena]. They also need to show [or at a minimum] suggest] how their proposal can be implemented in a human brain [constrained by what is known about human brains]. Maybe the inability to address the biology questions i keep asking has something to do with the fact that "the profession doesn't seem to have managed a coherent and non-humorous response to FBC"? And to be honest that inability "seems to me to be a Bad Sign" as well,

  5. @Christina:

    "You ask some interesting questions. Answering them in any depth would of course require much more space than FBC had in their short paper. So maybe not quite fair slamming them for merely making SUGGESTIONS that could be an alternative to the Minimalist doctrine."

    I think that if we were in the 1950's perhaps making shot-in-the-dark suggestions might be excusable, but we're not - and this level of murkiness can't be chalked up to our collective ignorance. The authors assert that they offer no alternative mechanism, nor are they trying to articulate a proposal that would be explicit enough for evaluation. This paper seems more polemical then scholarly. Moreover, every time I read over classic transcripts on debates in the 50's-70's (ish) era about linguistic inquiry outside of technical narrow work I notice these same debates recurring. And every time the same excuse is used: "well, we may have no evidence - but I'm just offering a suggestion." Take a look at the Royaumont debate for example: the parallel is uncanny (and sad).

    "Now maybe you can direct me to a source where minimalists answer these questions. It would seem for 'expressing my own thought to myself' none of this stuff is needed either and simple Merge alone [the only thing for which Chomsky "explains" how it evolved won't get us even close to the brain structures that ARE required. I have been asking for more than a year now what BIOlinguists HAVE discovered in terms of domain specific brain structures - so far just big silence..."

    I think the following conference transcript reports on some interesting investigations on these questions within the framework: {}.

    I could suggest more works that I've found interesting.

    I do think the issue here isn't that we have a set of questions posed, and a variety of camps trying to work them out:

    the issue, from my observation, seems to be that we have a set of questions, and a sector of the community that is trying to answer them; meanwhile another sector is trying to come up with a framework in which these questions don't arise at all.

    I haven't seen any convincing evidence that the question of say, structure-dependent rules or their acquisition can be dumped by appeal to general learning mechanisms, which is what these alternative proposals boil down.

    1. Thank you for your kind literature recommendation. You may be interested in this review of "Of Minds and Language"
      After you've read that you may understand why I have slight doubts about the relevance of the biology presented there. So if you could suggest one of the 'more' I'd appreciate it. Preferably with fewer digressions into the genetically fixed behaviour of bees or nematodes. To paraphrase what you say; it was already known in the 1950s that studying these organisms will reveal nothing about human language. In fact even Rene Descartes knew that back in the 17th century.