Tuesday, September 13, 2016

The Generative Death March, part 3. Whose death is it anyway?

I almost had a brain hemhorrage when I read this paragraph in the Scientific American piece that announced the death of generative linguistics:

“As with the retreat from the cross-linguistic data and the tool-kit argument, the idea of performance masking competence is also pretty much unfalsifiable. Retreats to this type of claim are common in declining scientific paradigms that lack a strong em­­pirical base—consider, for instance, Freudian psychology and Marxist in­­terpretations of history.”

Pretty strong stuff. Fortunately, I was able to stave off my stroke when I realized that this claim, i.e., that performance can’t mask competence, is possibly the most baseless of all of ITs assertions about generative grammar.

Consider the phenomenon of agreement attraction:

(1) The key to the cabinets is/#are on the table

The phenomenon is that people occasionally produce “are” and not “is” in sentences like these (around 8% of the time in experimental production tasks, according to Kay Bock) and they even fail to notice the oddness of “are” in speeded acceptability judgment tasks. Why does this happen? Well, Matt Wagers, Ellen Lau and Colin Phillips have argued that (at least in comprehension) this has something to do with the way parts of sentences are stored and reaccessed in working memory during sentence comprehension. That is, using an independently understood model of working memory and applying it to sentence comprehension these authors explained the kinds of agreement errors that English speakers do and do not notice. So, performance masks competence in some cases. 

Is it possible to falsify claims like this one? Well, sure. You would do so by showing that the independently understood performance system didn’t impact whatever aspect of the grammar you were investigating. Let’s consider, for example, the case of island-violations. Some authors (e.g., Kluender, Sag, etc) have argued that sentences like those in (2) are unacceptable not because of grammatical features but because of properties of working memory.

(2) a.  * What do you wonder whether John bought __?
b.  * Who did the reporter that interviewed __ win the Pulitzer Prize

So, to falsify this claim about performance masking competence Sprouse, Wagers and Phillips (2012) conducted an experiment to ask whether various measures of variability in working memory predicted the degree of perceived ungrammaticality in such cases. They found no relation between working memory and perceived ungrammaticality, contrary to the predictions of this performance theory. They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?

Now, in all fairness to IT, when they said that claims of performance masking competence were unfalsifiable, they were talking about children. That is, they claim that it is impossible for performance factors to be responsible for the errors that children make during grammatical development, or at least that claims that such factors are responsible for errors are unfalsifiable. Why children should be subject to different methodological standards than adults is a complete mystery to me, but let’s see if there is any merit to their claims.

Let’s get some facts about children’s performance systems on the ground. First, children are like adults in that they build syntactic representations incrementally. This is true in children ranging from 2- to 10-years old (Altmann and Kamide 1999, Lew-Williams & Fernald 2007, Mani & Huettig 2012, Swingley, Pinto & Fernald 1999; Fernald, Thorpe & Marchman 2010). Second, along with this incrementality children display a kind of syntactic persistence, what John Trueswell dubbed “kindergarten path effects”. Children show difficulty in revising their initial parse on the basis of information arriving after a parsing decision has been made. This syntactic persistence has been shown by many different research groups (Felser, Marinis & Clahsen 2003, Kidd & Bavin 2005, Snedeker & Trueswell 2004, Choi & Trueswell 2010, Rabagliati, Pylkkanen & Marcus 2013).  

These facts allow us to make predictions about the kinds of errors children will make. For example, Omaki et al (2014) examined English- and Japanese-learning 4-year-olds’ interpretations of sentences like (3).

(3) Where did Lizzie tell someone that she was going to catch butterflies?

These sentences have a global ambiguity in that the wh-phrase could be associated with the matrix or embedded verb. Now, if children are incremental parsers and if they have difficulty revising their initial parsing decisions, then we predict that English children should show a very strong bias for the matrix interpretation, since that interpretation would be the first one an incremental parser would access. And, we predict that Japanese children would show a very strong bias for the embedded interpretation, since the order of verbs would be reversed in that language. Indeed, that is precisely what Omaki et al found, suggesting that independently understood properties of the performance systems could explain children’s behavior. Clearly this hypothesis is falsifiable because the data could have come out differently.

A similar argument for incremental interpretation plus revision difficulties has also been deployed to explain children’s performance with scopally ambiguous sentences. Musolino, Crain and Thornton (2000) observed that children, unlike adults, are very strongly biased to interpret ambiguous sentences like (4) with surface scope:

(4) Every horse didn’t jump over the fence
a. All of the horses failed to jump (= surface scope)
b. Not every horse jumped (= inverse scope)

Musolino & Lidz (2006), Gualmini (2008) and Viau, Lidz & Musolino (2010) argued that this bias was not a reflection of children’s grammars being more restricted than adults’ but that other factors interfered in accessing the inverse scope interpretation. And they showed how manipulating those extragrammatical factors could move children’s interpretations around. Moreover, Conroy (2008) argued that a major contributor to children’s scope rigidity came from the facts that (a) the surface scope is constructed first, incrementally, and (b) children have difficulty revising initial interpretations. Support for this view comes from several adult on-line parsing studies demonstrating that children’s only interpretation corresponds to adults’ initial interpretation.  

Again, these ideas are easily falsifiable. It could have been that children were entirely unable to access the inverse scope interpretation and it could have been that other performance factors explained children’s pattern of interpretations. Indeed, the more we understand about performance systems, the better we are able to apportion explanatory force between the developing grammar and the developing parser (see Omaki & Lidz 2015 for review).

So, what IT must have meant was that imprecise hypotheses about performance systems are unfalsifiable. But this is not a complaint about the competence-performance distinction. It is a complaint about using poorly defined explanatory predicates and underdeveloped theories in place of precise theories of grammar, processing and learning. Indeed, we might turn the question of imprecision and unfalsifiability back on IT. What are the precise mechanisms by which intuition and analogy lead to specific grammatical features and why don’t these mechanisms lead to other grammatical features that happen not to be the correct ones? I’m not holding my breath waiting for an answer to that one.

Summing up our three-day march, we can now evaluate IT’s central claims.

1) Intuition and analogy making can replace computational theories of grammar and learning. 
Diagnosis: False. We have seen no explicit theory that links these “general purpose” cognitive skills to the kind of grammatical knowledge that has been uncovered by generative linguistics. Claims to the contrary are wild exaggerations at best.

2) Generative linguists have given up on confronting the linking problem.
Diagnosis: False. This problem remains at the center of an active community of generative acquisitionists. Claims to the contrary reflect more about ITs ability to keep up with the literature than with the actual state of the field.

3) Explanations of children’s errors in terms of performance factors are unfalsifiable and reflect the last gasps of a dying paradigm.
Diagnosis: False. The theory of performance in children has undergone an explosion of activity in the past 15 years and this theory allows us to better partition children’s errors into those caused by grammar and those caused by other interacting systems.

IT has scored a trifecta in the domain of baseless assertions. 

Who’s leading the death march of declining scientific paradigms, again?


  1. This comment has been removed by the author.

  2. Funny thing about these "retreats" that Ibbotson and Tomasello talk about – as someone pointed out to me, your average generative linguist seems to have a much better grasp of what 'ergative' means than these authors do (based on their use of it in the piece). I wonder how that ends up happening when generativists are so busy retreating from cross-linguistic data.

    (I know this post is dedicated to one of the other alleged "retreats" but I couldn't resist pointing this out.)

  3. Hi Jeff,

    I think you're underestimating how difficult it is to separate performance & competence accounts. To the extent that they really are difficult if not impossible to tease apart, then the spirit of IT is accurate on this issue.

    You point to the Sprouse et al as a "pretty simple" demonstration of this idea. The problem, as Ivan Sag, Laura Staum Casasanto and I pointed out, is that they didn't show their measures of memory were predictive of anything. Imagine someone said they’d decided that cranial capacity was a good measure of memory, then tried to see if cranial capacity was predictive of the magnitude of island effects, and lo and behold, it wasn’t. Well, then it would be at least premature, if not outright unscientific, to conclude that working memory is not important for island effects — because cranial capacity hasn’t been linked to anything. Sprouse et al used a memory measure, yes, but didn't show that it had any predictive value for any type of sentence acceptability contrast, and especially not for sentences differing in difficulty. The larger point here is: what is the empirical diagnostic that rigorously & consistently shows a difference between examples assumed to be defined by performance factors, however defined, and competence factors?

    By extension, while the Wagers et al stuff is a fine demonstration of how off-the-shelf processing models can account for agreement mismatch effects, the conclusion that performance masks competence is not the only possible interpretation. For instance, what if these sorts of "performance" profiles drive changes in agreement systems, or whatever other dimension of the grammar you prefer? That is, if the processing exigencies of a language cause patterns to be noisy, maybe this is exactly the sort of thing that contributes to syntactic change? Or, for someone who thinks there's no performance-competence split, one could look at the agreement facts as a demonstration that there's a strong bias to have subjects and verbs agree in English, but that this bias is intimately & inextricably interwoven with the ability to detect the signal amidst the noise of the input.

    Whatever the case, and regardless of what else IT have to say in their article, there is no agreed upon way of deciphering what's due to performance and what's due to competence. If you're aware of how you can show that an acceptability contrast or developmental profile is purely driven by competence or purely driven by performance, then that would do the job.

    1. There are no general solutions to any of these problems. Rather, people can act in good faith and build explicit theories to account for phenomena and then a productive discussion can happen. Of course it is difficult to tell whether a given effect is due to competence or performance. That certainly doesn't mean that the distinction is invalid. It means that it's hard to answer hard questions. Nobody ever said science is easy. But it's impossible if people are unwilling to consider relevant evidence or build maximally explicit models.

    2. Well, yes. "We cannot draw a line between light and darkness, yet day and night are, upon the whole, tolerably distinct." (Edmund Burke). But when it is not a question of drawing a line, but of interpenetration of meanings throughout the domain of discourse, one begins to wonder whether the distinction really has a difference attached.

      English grammarians have spent centuries trying to sort out prepositions, subordinating conjunctions, and (some) adverbs: Huddleson & Pullum's view that there is only one lexical class, whose members may take a NP, a clause, or nothing as their objects in a lexically-specific way, makes for a great simplification.

  4. If the c/p distinction can’t be clearly applied across the board or even if it is unclear how to apply it in general, then that is regrettable, non-ideal, but such a situation doesn’t invalidate the distinction, as Jeff says, if one has a sound general argument for it and can apply it in some cases. So, and this thought goes back to Aspects chp. 1, section 1, we know that a grammar that produces some finite set of structures that explain a finite set of acceptable quotidian sentences will also produce infinitely many structures that, counterfactually, would explain the acceptability or ambiguity, say, of sentences with 15 billion clauses that are not performable, as it were, for humans (analogous remarks hold for garden paths, inter alia). So, some c/p distinction is going to be play as soon as one wants a grammar to be simple (having no stipulated finite bound) and counterfactually robust, going beyond whatever just happens to obtain. So, far from being incoherent, some c/p distinction is nigh-on necessary, which supports the kind of methodology Jeff espouses. Or so it sees to me.

  5. I really don't understand the fuss about the c/p distinction. If Russian athletes in Rio, RAM upgrades, binoculars, abacuses, tools, the million-dollar man, string theories, and Beethoven can do it, why can't generative grammars? It's as if David Marr was just someone's long-forgotten uncle.

  6. I just find it hard to believe that a squishy, perversely complicated neural network patched together over evolutionary time like the one that produces human language should run on a small set of logically precise rules that can be brought into human-readable form.

    Not to mention that our knowledge of both human languages and the brain and mind seems much too limited right now to tease out those rules from all the other factors of influence even if we accept that they do exist. We're not done collecting stamps yet.

    I mean, we have trouble figuring out and describing in human-readable form how exactly a virtual neural network conditioned to identify digits or pictures of cats does what it does. We end up saying things like "well, this set of transformations seems to detect rounded edges near the top...this one does...whiskers, maybe?". And that's 1. orders of magnitude simpler, 2. We can turn bits of it off to see what happens, and 3. WE BUILT IT and know exactly how it's structured and what its design principles are.

    Any UG "rules" that exist are going to be fuzzy squishy unreliable tendencies strongly shaped by reinforcement and susceptible to social factors. Those are also a simple (if less satisfying) explanation for errors like in example (1): The squishy neural network takes in/conceives of the whole phrase at once, the verb is right next to a plural form, so sometimes a wire gets crossed and a plural is produced because it's "thinking about" plurals at the time. Just like you might occasionally mistake a shadow for a person (and will be more likely to do so if you've been primed to look for persons) or get the wrong cutlery from the cupboard. It's not a case of "performance masking competence", it's just that competence isn't perfect because the network isn't perfect and doesn't run on perfect rules. In that particular case we might say it has a 92% success rate of correctly (i.e. according to our desired output) choosing singular over plural in that particular construction.

    As of right now we frankly have no way of telling whether attempts at UG correspond to any sort of underlying structure or are just an alternative way of describing the output.
    Basically, I do believe that there is some sort of "universal grammar", and Chomsykan inquiries into it are very interesting. It's just that their conclusions as to what this grammar should specifically look like are very much in the spirit of postwar scientific hubris and its overly simplistic conception of human cognition.

    1. @Mortimer: There are some implicit assumptions in what you're saying, which I think should be brought out front and center and discussed:

      assumption 1: If a set of phenomena X is squishy (or more precisely, looks squishy from our current scientific vantage point), and we hypothesize that X is generated from the interaction of a set of principles Y with a set of complex real-world factors Z, then it follows that Y must be squishy, too.

      assumption 2: If our current understanding of the brain (say, neural networks) is unable to mesh with contemporary theories of linguistics (say, minimalist syntax), the onus is on the latter to change.

      As others have written here before, I see no reason to believe that either of these assumptions is valid. With regard to (1), one need look no further than Gleitman's work on, e.g., odd numbers, to see that this is false. (But if one insists, one can also look at our squishy-looking physical universe which seems, nevertheless, to be underlain by quanta.) With regard to (2), looking at the brief scientific history that is currently at our disposal, there is no reason to believe this is correct, certainly not as a matter of general principle.

      You may disagree, but I think it is helpful to bring (1-2) out into the surface, to at least make it clear what it is that the disagreement is really about.

    2. @Mortimer: That is a very strange argument. Linguists believe that there are very robust regularities within and across speech communities. If you are rejecting this, you must at least say something explaining why it seems to be the case.

      If you are not rejecting this, what exactly is your point? That we do not know how such generalizations are encoded in the brain? No one on this board would argue with that. That you expect the generalizations to be 'squishier'? Again, they seem not to be, and you need to say at least something in this regard (are they really, but we're not looking at them in the right way?).

      The problem with neural networks, as you point out, is that (except perhaps for Bengio and Hinton) they are black boxes. If you have a performant neural network account of some phenomenon, you are in no better a position to understand that phenomenon than you were before. To be sure, you are in a great position to do stuff and make money, but that is not the goal of linguistics.

      There is also very interesting work on compiling 'rules' into networks (see e.g. Smolensky and beim Graben). Beim Graben has been interested in the possibility of phase transitions, which introduce (basically) transitions between configurations which were not present in the rule-based presentation.

    3. @Mortimer, I find it hard to believe that air is a gas. Indeed, I find it hard to believe that air is even a thing. But a little bit of science reveals that it's there. Same deal for grammar.

    4. Oh hey, I wasn't actually expecting to get any replies, seeing as I came so late to the discussion (I got here via a recent Pharyngula post) :)
      Since the system doesn't seem to allow individual replies, I'll try to respond to people in order -- though I reserve the right to talk to one person at a time if I feel Í'm being dogpiled later. I apologise in advance for getting different people's arguments mixed up in my head.

      @Omer Preminger:

      Your quantum physics analogy is instructive, I think. The underlying non-squishy quanta of human language are the same as the underlying non-squishy quanta of everything else -- namely, quantum physics (which is an incomplete model of course, but we'll assume that Universal String Theory or whatever will also turn out to be non-squishy).
      But you will notice that quantum physics is no use in describing a game of football, nor did we discover it by observing football games. Chomskyans are not attempting to describe language in terms of quantum physics, they are attempting to find higher-level rules, while ignoring the messiness of something as complex as a brain at this scale. My contention is that there's no reason to think such tidy higher-level rules exist, and that this assumption goes counter to our experience with other, similarly complex areas of human cognition. Related fields like psychology seem to understand this and have changed their methods accordingly. Meanwhile, Chomsykans handwave away this complexity by claiming that "performance masks competence" etc., rather than going with the face evidence which suggests that "competence" isn't an absolute (and therefore can't be controlled by clockwork-tidy logical rules).

    5. @Greg Kobele:

      I also believe that there are "very robust regularities within and across speech communities". This is unsurprising, seeing as humans are all basically the same and view the world in broadly the same way. I also don't reject the idea of biologically inherited language-forming systems, so some of those regularities being especially robust or even universal is similarly unsurprising.

      "That you expect the generalizations to be 'squishier'? Again, they seem not to be"

      The generalisations only seem non-sqhishy if you handwave away the unreliability in actual performance and the disagreement in judgement between native speakers. The number of actual, 100% universal linguistic universals we have discovered can be counted on hands and feet, and even some of those may be accidents of history (would our human sound inventory include clicks if the Khoisan languages had gone extinct 2000 years ago?).

      The problem of understanding and describing black boxes is exactly why I contend that linguistics should focus on what it can do given our current understanding of the human mind and our best ability to come up with "rules" for it -- statistically, rather than in absolutes. The ultimate products of Chomskyan endeavours are neither immediately useful except in doing more Chomskyanism, nor do I see any reason to believe that they correspond any better to the underlying structure -- i.e. contribute to our actual knowledge of the structure of human language -- than traditional grammars do. !!!Which is not to say that I think traditional grammars are an accurate representation of the underlying system!!! I think that for all intents and purposes, the system is a black box and will stay that way for a long time, which means we should be focusing on doing useful things with its output and not saddling ourselves with an overly rigorous rule system at this time, which is liable to bias our observations.

      "To be sure, you are in a great position to do stuff and make money, but that is not the goal of linguistics." That's exactly the problem! We currently lack data, technology and scientific skills in many interconnected disciplines to accomplish that "goal" of linguistics. We can't do it. My complaint with Chomskyans isn't that they have the wrong goal (I find the idea of a universal grammar immensely appealing, and as I said previously, I do believe that something of the sort exists), but that they are desperately premature and hence wasting their efforts. They are Greek philosophers trying to come up with quantum physics by watching a ball game.

      @Jeff Lidz:

      If you think I am making an argument from incredulity, I respectfully suggest you read my post again.

    6. @ Mortimer

      I hate self-referencing, but I understand your impressions on this topic and I'd suggest you take a look at my "Brains & Syntax" posts from early September. There I propose a rough theory that reconciles generative syntax with what appear to be reliable generalizations about the nature of language use.

      I think that generative syntacticians have the right goals and have made massive progress in understanding the nature of the faculty of language, and that if we work a bit to propose the right linking theory, then we can incorporate the successes of non-generative approaches and more progress can potentially be made.

    7. @William Matchin

      That is my hope, too. Chomskyans' independent rediscovery of Case and the like makes me cautiously optimistic that they will eventually rejoin the rest of us in expanding our beautiful stamp collection.

      Thank you for the pointer -- I am currently working my way backwards through this blog one breakfast at a time, and I'll be reaching your posts soon :)

    8. @Mortimer: Thank you for your thoughtful reply. I do not believe, nor do I think linguists in general believe, that in any particular instance the concrete generalization being made is 'right'. What I believe is happening is that we are getting a better handle on the nature/kind/forms of generalizations that seem relevant. I think that this is of fundamental importance.

      For example, near consensus is that the kinds of constructions manifested in natural language are mildly context sensitive. This allows us to dig down into the nature of things that can 1) describe only such patterns, 2) use (parse/generate) such patterns, and 3) induce such patterns from data. We need to understand this because whatever we are actually doing when we learn language, the kinds of generalizations we end up with fall into this very restricted set. By recognizing this, it gives us principled guidance into an otherwise even more horribly underdetermined problem.

      Having a good understanding of the kinds of mechanisms which are capable of expressing the relevant kinds of generalizations allows for principled approaches to engineering tasks. I think that Kevin Knight at the ISI is a fantastic example of this, as he is using a kind of graph model for machine translation that corresponds to exactly this class of patterns.

      I would be interested to know if you felt that there were a case in which some insight into the nature of a problem was gained by training a neural net to perform well according to some metric on that problem. I personally feel that neural nets are the wrong level for understanding; while there were certainly monsters who could read assembly as Neo did the code for the matrix, a high level language makes everything easier to comprehend. This is what is motivating my question.

    9. @Mortimer. There’s obviously quite a lot in your comments that deserves a response, but since others have already responded to the main points, I just wanted to focus on the following:

      The ultimate products of Chomskyan endeavors are neither immediately useful except in doing more Chomskyanism, nor do I see any reason to believe that they correspond any better to the underlying structure -- i.e. contribute to our actual knowledge of the structure of human language -- than traditional grammars do.

      Research in generative syntax has uncovered lots of phenomena which went completely unnoticed by traditional grammarians. Many of these are highly informative as to the structure of human language. Antecedent-contained deletion would be one example. Without a formal(ish) theory of ellipsis and quantifier scope, there is nothing at all noteworthy or puzzling about a sentence such as “John read every book that Mary did”. But as soon as you start to construct such a theory, you realize that the answers to fundamental questions about syntax, semantics and the syntax/semantics interface hang on the correct analysis of these sentences.

      What do you think about this? Do you think that linguists should stop investigating ACD and stop trying to figure out the rules and principles that underly it? Or do you think that they should keep doing this but somehow make everything “squishier”? Or, let’s take your statement that we should “be focusing on doing useful things with [the black box]’s output and not saddling ourselves with an overly rigorous rule system, which is liable to bias our observations.” How might we actually follow this advice in the case at hand? Is the system of rules assumed in typical analyses of ACD too rigorous? If so, how might we fix this problem?

    10. "I find it hard to believe that air is a gas. Indeed, I find it hard to believe that air is even a thing. But a little bit of science reveals that it's there. Same deal for grammar."

      This is a stunningly disingenuous and stupid response.

  7. On her sewing/knotting blog, a BA student of mine from about 10 years ago wrote a scathing response to the IT paper: "Scientific American says Universal Grammar is dead: a response"

    As the title suggests, she not only takes on IT, but also questions the judgment of the Scientific American editors. It is an easy read and a good thing to give to your non-linguist friends who ask you about the death of UG--send them to Allison who describes herself thus:

    "I sew my own clothes. I knit my own sweaters. I throw pots. I spin fibre into yarns and dye them with plant based dyes. I weave, on occasion. I tat too, when the mood strikes."

    And she understands generative linguistics.

    1. Her article is trash.

      "The SA article claims that recent research has discredited Noam Chomsky’s theory of Universal Grammar and that scientists are happily throwing it away."

      No it doesn't.

  8. This comment has been removed by the author.

  9. There's another point, which is that even if there proves to be no UG, interpreting UG as a single, fixed grammar-writing notation that is useful for explaining how languages can be learned (by making some generalizations easier to acquire than others), and clearly better for that purpose than a wide range of alternatives, there are still very large numbers of extremely precise and non-variable regularities that can be described with grammatical rules. Mixed up with stuff that seems to be mushier, but the regularities of particular languages are there, and the sharpness of many of them has been evident to many people for a long time before Chomsky, such as Sapir in his 1922 book 'Language'. So any neural architecture has to deal with this, however unexpected it seems.

  10. "this claim, i.e., that performance can’t mask competence"

    This is stunningly incompetent. A claim that P is unfalsifiable is not a claim that P is false.

  11. "They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?"

    No ... this is a laughably bad misunderstanding of falsifiability. A falsifiable theory is one that *could* be false, and *if false* could be shown to be false. But if you actually succeed in falsifying your theory then your theory *is false*. e.g., the Theory of Evolution is falsifiable, and *could have been false, and shown false, in some other possible world*, but it can't be shown false in *this* world, because it isn't false. You have essentially said that your linguistic theory has been shown false in this world. (But you're wrong about that.)