Faculty of Language: The Generative Death March, part 3. Whose death is it anyway?

Tuesday, September 13, 2016

The Generative Death March, part 3. Whose death is it anyway?

I almost had a brain hemhorrage when I read this paragraph in the Scientific American piece that announced the death of generative linguistics:

“As with the retreat from the cross-linguistic data and the tool-kit argument, the idea of performance masking competence is also pretty much unfalsifiable. Retreats to this type of claim are common in declining scientific paradigms that lack a strong empirical base—consider, for instance, Freudian psychology and Marxist interpretations of history.”

Pretty strong stuff. Fortunately, I was able to stave off my stroke when I realized that this claim, i.e., that performance can’t mask competence, is possibly the most baseless of all of ITs assertions about generative grammar.

Consider the phenomenon of agreement attraction:

(1) The key to the cabinets is/#are on the table

The phenomenon is that people occasionally produce “are” and not “is” in sentences like these (around 8% of the time in experimental production tasks, according to Kay Bock) and they even fail to notice the oddness of “are” in speeded acceptability judgment tasks. Why does this happen? Well, Matt Wagers, Ellen Lau and Colin Phillips have argued that (at least in comprehension) this has something to do with the way parts of sentences are stored and reaccessed in working memory during sentence comprehension. That is, using an independently understood model of working memory and applying it to sentence comprehension these authors explained the kinds of agreement errors that English speakers do and do not notice. So, performance masks competence in some cases.

Is it possible to falsify claims like this one? Well, sure. You would do so by showing that the independently understood performance system didn’t impact whatever aspect of the grammar you were investigating. Let’s consider, for example, the case of island-violations. Some authors (e.g., Kluender, Sag, etc) have argued that sentences like those in (2) are unacceptable not because of grammatical features but because of properties of working memory.

(2) a. * What do you wonder whether John bought __?

b. * Who did the reporter that interviewed __ win the Pulitzer Prize

So, to falsify this claim about performance masking competence Sprouse, Wagers and Phillips (2012) conducted an experiment to ask whether various measures of variability in working memory predicted the degree of perceived ungrammaticality in such cases. They found no relation between working memory and perceived ungrammaticality, contrary to the predictions of this performance theory. They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?

Now, in all fairness to IT, when they said that claims of performance masking competence were unfalsifiable, they were talking about children. That is, they claim that it is impossible for performance factors to be responsible for the errors that children make during grammatical development, or at least that claims that such factors are responsible for errors are unfalsifiable. Why children should be subject to different methodological standards than adults is a complete mystery to me, but let’s see if there is any merit to their claims.

Let’s get some facts about children’s performance systems on the ground. First, children are like adults in that they build syntactic representations incrementally. This is true in children ranging from 2- to 10-years old (Altmann and Kamide 1999, Lew-Williams & Fernald 2007, Mani & Huettig 2012, Swingley, Pinto & Fernald 1999; Fernald, Thorpe & Marchman 2010). Second, along with this incrementality children display a kind of syntactic persistence, what John Trueswell dubbed “kindergarten path effects”. Children show difficulty in revising their initial parse on the basis of information arriving after a parsing decision has been made. This syntactic persistence has been shown by many different research groups (Felser, Marinis & Clahsen 2003, Kidd & Bavin 2005, Snedeker & Trueswell 2004, Choi & Trueswell 2010, Rabagliati, Pylkkanen & Marcus 2013).

These facts allow us to make predictions about the kinds of errors children will make. For example, Omaki et al (2014) examined English- and Japanese-learning 4-year-olds’ interpretations of sentences like (3).

(3) Where did Lizzie tell someone that she was going to catch butterflies?

These sentences have a global ambiguity in that the wh-phrase could be associated with the matrix or embedded verb. Now, if children are incremental parsers and if they have difficulty revising their initial parsing decisions, then we predict that English children should show a very strong bias for the matrix interpretation, since that interpretation would be the first one an incremental parser would access. And, we predict that Japanese children would show a very strong bias for the embedded interpretation, since the order of verbs would be reversed in that language. Indeed, that is precisely what Omaki et al found, suggesting that independently understood properties of the performance systems could explain children’s behavior. Clearly this hypothesis is falsifiable because the data could have come out differently.

A similar argument for incremental interpretation plus revision difficulties has also been deployed to explain children’s performance with scopally ambiguous sentences. Musolino, Crain and Thornton (2000) observed that children, unlike adults, are very strongly biased to interpret ambiguous sentences like (4) with surface scope:

(4) Every horse didn’t jump over the fence

a. All of the horses failed to jump (= surface scope)

b. Not every horse jumped (= inverse scope)

Musolino & Lidz (2006), Gualmini (2008) and Viau, Lidz & Musolino (2010) argued that this bias was not a reflection of children’s grammars being more restricted than adults’ but that other factors interfered in accessing the inverse scope interpretation. And they showed how manipulating those extragrammatical factors could move children’s interpretations around. Moreover, Conroy (2008) argued that a major contributor to children’s scope rigidity came from the facts that (a) the surface scope is constructed first, incrementally, and (b) children have difficulty revising initial interpretations. Support for this view comes from several adult on-line parsing studies demonstrating that children’s only interpretation corresponds to adults’ initial interpretation.

Again, these ideas are easily falsifiable. It could have been that children were entirely unable to access the inverse scope interpretation and it could have been that other performance factors explained children’s pattern of interpretations. Indeed, the more we understand about performance systems, the better we are able to apportion explanatory force between the developing grammar and the developing parser (see Omaki & Lidz 2015 for review).

So, what IT must have meant was that imprecise hypotheses about performance systems are unfalsifiable. But this is not a complaint about the competence-performance distinction. It is a complaint about using poorly defined explanatory predicates and underdeveloped theories in place of precise theories of grammar, processing and learning. Indeed, we might turn the question of imprecision and unfalsifiability back on IT. What are the precise mechanisms by which intuition and analogy lead to specific grammatical features and why don’t these mechanisms lead to other grammatical features that happen not to be the correct ones? I’m not holding my breath waiting for an answer to that one.

Summing up our three-day march, we can now evaluate IT’s central claims.

1) Intuition and analogy making can replace computational theories of grammar and learning.

Diagnosis: False. We have seen no explicit theory that links these “general purpose” cognitive skills to the kind of grammatical knowledge that has been uncovered by generative linguistics. Claims to the contrary are wild exaggerations at best.

2) Generative linguists have given up on confronting the linking problem.

Diagnosis: False. This problem remains at the center of an active community of generative acquisitionists. Claims to the contrary reflect more about ITs ability to keep up with the literature than with the actual state of the field.

3) Explanations of children’s errors in terms of performance factors are unfalsifiable and reflect the last gasps of a dying paradigm.

Diagnosis: False. The theory of performance in children has undergone an explosion of activity in the past 15 years and this theory allows us to better partition children’s errors into those caused by grammar and those caused by other interacting systems.

IT has scored a trifecta in the domain of baseless assertions.

Who’s leading the death march of declining scientific paradigms, again?

24 comments:

OmerSeptember 14, 2016 at 1:52 AM
This comment has been removed by the author.
ReplyDelete
Replies
OmerSeptember 14, 2016 at 2:11 AM
Funny thing about these "retreats" that Ibbotson and Tomasello talk about – as someone pointed out to me, your average generative linguist seems to have a much better grasp of what 'ergative' means than these authors do (based on their use of it in the piece). I wonder how that ends up happening when generativists are so busy retreating from cross-linguistic data.

(I know this post is dedicated to one of the other alleged "retreats" but I couldn't resist pointing this out.)

ReplyDelete
Replies
AnonymousSeptember 14, 2016 at 7:46 AM
Hi Jeff,

I think you're underestimating how difficult it is to separate performance & competence accounts. To the extent that they really are difficult if not impossible to tease apart, then the spirit of IT is accurate on this issue.

You point to the Sprouse et al as a "pretty simple" demonstration of this idea. The problem, as Ivan Sag, Laura Staum Casasanto and I pointed out, is that they didn't show their measures of memory were predictive of anything. Imagine someone said they’d decided that cranial capacity was a good measure of memory, then tried to see if cranial capacity was predictive of the magnitude of island effects, and lo and behold, it wasn’t. Well, then it would be at least premature, if not outright unscientific, to conclude that working memory is not important for island effects — because cranial capacity hasn’t been linked to anything. Sprouse et al used a memory measure, yes, but didn't show that it had any predictive value for any type of sentence acceptability contrast, and especially not for sentences differing in difficulty. The larger point here is: what is the empirical diagnostic that rigorously & consistently shows a difference between examples assumed to be defined by performance factors, however defined, and competence factors?

By extension, while the Wagers et al stuff is a fine demonstration of how off-the-shelf processing models can account for agreement mismatch effects, the conclusion that performance masks competence is not the only possible interpretation. For instance, what if these sorts of "performance" profiles drive changes in agreement systems, or whatever other dimension of the grammar you prefer? That is, if the processing exigencies of a language cause patterns to be noisy, maybe this is exactly the sort of thing that contributes to syntactic change? Or, for someone who thinks there's no performance-competence split, one could look at the agreement facts as a demonstration that there's a strong bias to have subjects and verbs agree in English, but that this bias is intimately & inextricably interwoven with the ability to detect the signal amidst the noise of the input.

Whatever the case, and regardless of what else IT have to say in their article, there is no agreed upon way of deciphering what's due to performance and what's due to competence. If you're aware of how you can show that an acceptability contrast or developmental profile is purely driven by competence or purely driven by performance, then that would do the job.
ReplyDelete
Replies
UnknownSeptember 14, 2016 at 11:32 AM
If the c/p distinction can’t be clearly applied across the board or even if it is unclear how to apply it in general, then that is regrettable, non-ideal, but such a situation doesn’t invalidate the distinction, as Jeff says, if one has a sound general argument for it and can apply it in some cases. So, and this thought goes back to Aspects chp. 1, section 1, we know that a grammar that produces some finite set of structures that explain a finite set of acceptable quotidian sentences will also produce infinitely many structures that, counterfactually, would explain the acceptability or ambiguity, say, of sentences with 15 billion clauses that are not performable, as it were, for humans (analogous remarks hold for garden paths, inter alia). So, some c/p distinction is going to be play as soon as one wants a grammar to be simple (having no stipulated finite bound) and counterfactually robust, going beyond whatever just happens to obtain. So, far from being incoherent, some c/p distinction is nigh-on necessary, which supports the kind of methodology Jeff espouses. Or so it sees to me.
ReplyDelete
Replies
mohinishSeptember 14, 2016 at 1:37 PM
I really don't understand the fuss about the c/p distinction. If Russian athletes in Rio, RAM upgrades, binoculars, abacuses, tools, the million-dollar man, string theories, and Beethoven can do it, why can't generative grammars? It's as if David Marr was just someone's long-forgotten uncle.
ReplyDelete
Replies
UnknownOctober 3, 2016 at 7:40 AM
I just find it hard to believe that a squishy, perversely complicated neural network patched together over evolutionary time like the one that produces human language should run on a small set of logically precise rules that can be brought into human-readable form.

Not to mention that our knowledge of both human languages and the brain and mind seems much too limited right now to tease out those rules from all the other factors of influence even if we accept that they do exist. We're not done collecting stamps yet.

I mean, we have trouble figuring out and describing in human-readable form how exactly a virtual neural network conditioned to identify digits or pictures of cats does what it does. We end up saying things like "well, this set of transformations seems to detect rounded edges near the top...this one does...whiskers, maybe?". And that's 1. orders of magnitude simpler, 2. We can turn bits of it off to see what happens, and 3. WE BUILT IT and know exactly how it's structured and what its design principles are.

Any UG "rules" that exist are going to be fuzzy squishy unreliable tendencies strongly shaped by reinforcement and susceptible to social factors. Those are also a simple (if less satisfying) explanation for errors like in example (1): The squishy neural network takes in/conceives of the whole phrase at once, the verb is right next to a plural form, so sometimes a wire gets crossed and a plural is produced because it's "thinking about" plurals at the time. Just like you might occasionally mistake a shadow for a person (and will be more likely to do so if you've been primed to look for persons) or get the wrong cutlery from the cupboard. It's not a case of "performance masking competence", it's just that competence isn't perfect because the network isn't perfect and doesn't run on perfect rules. In that particular case we might say it has a 92% success rate of correctly (i.e. according to our desired output) choosing singular over plural in that particular construction.

As of right now we frankly have no way of telling whether attempts at UG correspond to any sort of underlying structure or are just an alternative way of describing the output.
Basically, I do believe that there is some sort of "universal grammar", and Chomsykan inquiries into it are very interesting. It's just that their conclusions as to what this grammar should specifically look like are very much in the spirit of postwar scientific hubris and its overly simplistic conception of human cognition.
ReplyDelete
Replies
Charles ReissOctober 5, 2016 at 1:52 PM
On her sewing/knotting blog, a BA student of mine from about 10 years ago wrote a scathing response to the IT paper: "Scientific American says Universal Grammar is dead: a response" http://woolandpotato.com/2016/10/05/scientific-american-says-universal-grammar-is-dead-a-response/

As the title suggests, she not only takes on IT, but also questions the judgment of the Scientific American editors. It is an easy read and a good thing to give to your non-linguist friends who ask you about the death of UG--send them to Allison who describes herself thus:

"I sew my own clothes. I knit my own sweaters. I throw pots. I spin fibre into yarns and dye them with plant based dyes. I weave, on occasion. I tat too, when the mood strikes."

And she understands generative linguistics.

ReplyDelete
Replies
AveryAndrewsOctober 7, 2016 at 1:11 AM
This comment has been removed by the author.
ReplyDelete
Replies
AveryAndrewsOctober 7, 2016 at 1:18 AM
There's another point, which is that even if there proves to be no UG, interpreting UG as a single, fixed grammar-writing notation that is useful for explaining how languages can be learned (by making some generalizations easier to acquire than others), and clearly better for that purpose than a wide range of alternatives, there are still very large numbers of extremely precise and non-variable regularities that can be described with grammatical rules. Mixed up with stuff that seems to be mushier, but the regularities of particular languages are there, and the sharpness of many of them has been evident to many people for a long time before Chomsky, such as Sapir in his 1922 book 'Language'. So any neural architecture has to deal with this, however unexpected it seems.
ReplyDelete
Replies
jqbMay 25, 2024 at 2:40 AM
"this claim, i.e., that performance can’t mask competence"

This is stunningly incompetent. A claim that P is unfalsifiable is not a claim that P is false.
ReplyDelete
Replies
jqbMay 25, 2024 at 3:13 AM
"They therefore concluded that performance did not mask competence in this case. Pretty simple falsification, right?"

No ... this is a laughably bad misunderstanding of falsifiability. A falsifiable theory is one that *could* be false, and *if false* could be shown to be false. But if you actually succeed in falsifying your theory then your theory *is false*. e.g., the Theory of Evolution is falsifiable, and *could have been false, and shown false, in some other possible world*, but it can't be shown false in *this* world, because it isn't false. You have essentially said that your linguistic theory has been shown false in this world. (But you're wrong about that.)
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, September 13, 2016

The Generative Death March, part 3. Whose death is it anyway?

24 comments:

Contributors