Tuesday, January 15, 2019

Movement, islands and the ECP

Some papers reset the research agenda. This one by Lu and Yoshida (L&Y), I believe, is one of those (here is a slide conveying the basic point. The paper is under submission at LI and I assume it will be accepted and rapidly published (if not this is will tell us more about LI than it will about the quality of this paper)). The topic is the island status of Wh-in-situ (WIS) constructions in Chinese. The finding is that using judgment studies of the Sprouse experimental syntax (ES) variety provides evidence for two stunning conclusions: (i) that WISs respect islands and (ii) that there is no evidence for an argument/adjunct distinction wrt WISs. Both data points are theoretically pregnant and this post will largely concentrate on drawing out some of the implications. Many of these are mentioned in the paper (yup, I have a draft), so are not original with me. Let’s start.

L&Y is motivated by the premise that ES provides a useful tool for the refining linguistic judgments. The idea, as Sprouse has convincingly argued, is that grammatical complexity should induce a super-additivity effect in well-constructed judgment experiments (see, e.g. here and here for discussion and here for a nice review of the methodology). Importantly, super-additivity profiles arise in cases where less involved rating studies find nothing indicating un-grammaticality.

Before pressing on, let’s make an important and obvious point: all GGers distinguish (or should distinguish) acceptability from grammaticality. Acceptability is a probe for grammaticality. Acceptability is an observable property of utterances. Grammaticality is an abstract property of I-linguistic mental representations. Grammaticality is inferred from acceptability under the right conditions (given the right controls as realized by the appropriate minimal pairs). All of this is old hat, but a still very stylish and durable hat. 

Happily, for most of what we have done in GG, acceptability closely tracks grammaticality, but we also know that the two notions can and do diverge (see here for some discussion). ES is particularly useful for cases where this happens and the simple judgment elicitation procedure (e.g. ask a native speaker) indicates all is well. Diogo Almeida has dubbed cases of the latter “subliminal.” One of ES’s important contributions to syntax has been the discovery of such subliminal effects (SE), SEs being cases where ES procedures reveal super-additivity effects while more regular elicitation suggests grammaticality. So, for example, we now have many examples where standard elicitation has indicated that a certain dependency in a certain language shows no island sensitivity (i.e. the sentences are judged (highly) acceptable) while ES techniques indicate sensitivity to these same island effects (i.e. the relevant data display super-additivity effects).

We also find the converse: standard techniques indicating a profound difference in acceptability, while ES techniques showing nothing at all.[1]All in all then, ES has provided a useful additional kind of data, one that is often more sensitive to G structure than the quick and dirty (and largely accurate and hence very useful) standard judgment techniques, which sometimes fail to track these. 

So, back to the main point: L&Y is an ES study of WISs in Chinese and it has two important findings: that allWISs in Chinese exhibit relative clause island effects (henceforth RCI) (i.e. they alldisplay the super-additivity profile) and that there is no ES evidence that long “why” movement from an RCI is appreciably worse than long “why” movement absent an RCI (i.e. these cases when contrasted do not show a super-additvity profile). The first result argues that WISs are island sensitive and the second argues that there is no additional ECP effect distinguishing WISs like who/what from WISs like why. If correct, this is very big news, and, IMO, very welcome news. Let me say why.

First, as L&Y emphasizes, this result rules out most of the standard approaches to WIS constructions. In particular the result rules out two kinds of theories: (i) accounts that distinguish between overt movement vs covert movement (e.g. Huang’s) and treat island effects as effectively reflexes of overt movement (say, via a chain condition at SS) and (ii) theories that postulate two different kinds of operations (Movement vs Binding) to license WISs with movement subject to islands and binding exempt from them (as in, say, a Rizzi-Cinque approach to ECP effects). Both such kinds of theories will have problems with the apparent fact that WISs induce super-additivity effects.

It is worth noting, furthermore, that the sensitivity of WISs to islands is not the only example of apparent non-movement generated structures being island compliant. The same holds wrt resumptive pronoun (RP) constructions. These also appear to respect islands despite the absence of the main hallmark of movement (i.e. a gap in the “movement” site).[2]Both this RP data and now the WIS data point to the same conclusion: that island effects are notPF effects.[3]From my reading of the literature, this is the most popular current approach to islands and it has some terrifically interesting evidence in support (in particular the fact that some ellipsis (i.e. sluicing) obviates island violations). However, if L&Y are right, then we may have to rethink this assumption (see note 3 however).

Indeed, I would go further (and here it is NH speaking rather than L&Y). There have long been two general approaches to islands. 

First, we have Chomsky’s view of subjacency elaborated in ‘On wh movment’ that treats islands as reflecting bounds on the computational procedure. Island effects reflect the subjacency condition (aka PIC), which bounds the domain of computation (an idea motivated by the reasonable assumption that bounding a domain of computation makes doing computations more tractable).[4]

The second approach can be traced back to Ross’s thesis (islands restrict chopping rules) but has been developed as part of the linearization industry spurred by Kayne’s seminal work and mooted most explicitly by Uriagereka.[5]

The L&Y results argue pretty strongly, IMO, for Chomsky’s original conception precisely because they appear to hold whether or not the construction involves an obvious phonetic gap (gaps being problematic as they undo linearizations). If this is so, then it argues against linearization based approaches to the problem (leaving, of course a very big question: what to do about sluicing).[6]

We can go further still. The L&Y results also argue for a Merge only syntax. Here is what I mean. IMO, the central empirical thesis of the Minimalist Program (MP) is the Merge Hypothesis (MH). MH is the claim that the only specifically linguistic operation of FL is Merge. This entails that allG dependencies are merge mediated. The strong version of the thesis excludes operations like long distance Agree, which spans the same domains as I-merge but is a different operation. Note that it is natural to suppose that I-merge is movement and Agree is some kind of binding or feature sharing. At any rate, the classical conceptions gain empirical benefit from the “observation” that WISs do not display island effects. Why? Because, we might say, they are licensed by Agree not by I-merge and only the latter (being the MP analogue of movement) is subject to subjacency (or its current analogue). But as L&Y indicates this is precisely the wrong conclusion. WISs are subject to islands. A merge only syntax insists that all A’-dependencies are formed in the same way, via I-Merge, as this as the only way to establish any non-local grammatical dependency. So if WISs are G licensed, then they must be G licensed via I-merge and so will form a natural class with overt Wh movement. And this is what L&Y find. Both show super-additivity effects across islands. Thus, L&Y’s findings are what we should expect from a merge only syntax and it cautions against larding this best of MP theories with Agree/Probe-Goal titivations.[7]

We can milk a second important conclusion from L&Y. It solves a giant problem for MP. Which problem? The problem of unifying subjacency with the ECP. I have suggested elsewhere (see here) that the argument/adjunct asymmetries at the heart of the ECP are very MP problematic. This is so for a variety of reasons. The three that move me most are the fact that the ECP is a trace licensing condition and MP eschews traces, the huge theoretical redundancy between ECP and subjacency, and the “ugliness” of the basic technical machinery required to allow the ECP to track the argument/adjunct asymmetry. One of the nice implications of L&Y is that we need not worry about the problems that the ECP generates for MP because the theoretical apparatus is based on a mistaken description of the data. If L&Y is right, then there is no argument/adjunct asymmetry. Poof, the MP problem disappears and with it the ad-hoc theoretically unmotivated (within MP) technical apparatus required to track it.  

Of course, this overstates matters. It behooves us to go over the ECP data more carefully and see how to resolve the difference in acceptability that the standard literature identified. Why after all if all Whs are created equal do long distance adjuncts resist movement more fiercely than do long distance arguments?[8]L&Y offer a suggestion (no spoiler from me, read the squib when it comes out). But whatever the right answer is, it does not rest on making an invidiousgrammaticaldistinction between the two kinds of dependencies. And this is just what MP needs in order to start distinguishing ECP effects from the ECP theoretical apparatus in GB.

Let me hit this a bit harder. The ugliest parts of theories of A’-dependency within GB arise in response to argument/adjunct asymmetry effects. The technical machinery in Barriers (built on Lasnik and Saito foundations), though successful empirically (IMO, Lasnik and Saito’s theory was considerably more empirically effective than Barriers) had little of the virtual conceptual necessity MPers pine for. Nor were alternative theories (Generalized Binding, Connectedness) much prettier. Nonetheless, we put up with that stuff and developed it theoretically because it appeared to be empirically called for. The right aesthetic conclusion should have been (and actually was) that it was too contrived to be correct. L&Y provides courage for our aesthetic convictions. We should have judged these theories as suspect because ugly, though we would have been empirically premature in drawing that conclusion. Given L&Y, the facts are not what we took them to be despite reflecting very different acceptability profiles. 

There is a moral here, and you can all guess what it is but I cannot resist making it explicit anyhow. L&Y provide evidence for a methodological precept that we fail to respect enough: facts can change, no less than theory can. Or to put this another way: just as we can make theoretical wrong turns that we come to revise, we can adopt empirical generalizations that turn out to be misleading. The standard view is that data is hard and theory is fluffy and when the two clash it is best to revise the theory than rethink the data. L&Y provides a case where this is reversed. And I say hooray!

Let me make one more point and I will end this overly long post (long, and yet, filled with endlessly many loose ends). L&Y exemplifies something that I think is important. It is an empirical paper whose purpose is to directlytest a core theoretical assumption. This is not something that we generally see within syntax. Most papers are not out to test theoretical assumptions. Most papers use theory to explore more data. Theory might be tested but it is generally a by-product of better descriptive coverage. L&Y works differently. It starts from the theory and constructs an empirical intervention to probe it. Moreover, the question is quite precise and the assumptions required to answer it are clear within the confines of the project. This has all the look and smell of an honest to god experiment, a process whereby we query the theory using a relatively well-understood probe. Both empirical methods of exploration are worthwhile, but they are different and it is only relatively recently, I think, that we are seeing examples of the second experimental kind gaining traction. 

Curiously (perhaps), a feature of this second kind of paper is that the paper is short. L&Y is a squib. Empirical explorations in linguistics often read like novellas. L&Y is very definitely a very very short story. I would like to suggest that experimental papers like L&Y reflect the scientific health of linguistics. It is now possible to ask a sharp question, and give a sharp answer. We need more of these kinds of short pointed experimental forays into the data starting from well-formulated theoretical starting points.

That’s it. I have gone on far too long. L&Y is terrific. If correct, it is very important. I personally hope the results stand up. It would go a long way to cleaning up a particularly untidy part of syntactic theory and thereby further vindicating the promise of MP, indeed a particularly strong version of MP, one that endorses a Merge only conception of grammar.

[1]ES techniques have, for example, suggested that adjunct island effects might not be of a piece with other islands as they (often) fail to display super-additivity effects.
[2]To be slightly more careful and squinting at the ES results wrt resumptives the following is more accurate: resumptives uniformly ameliorate fixed subject constraint violations but note “mere” subjacency violations. Thus, resumptives inside islands seem to show the same super-additvity profiles as their moved counterparts.
[3]Perhaps a more felicitous way of putting matters is that RPs and WISs are also products of I-merge and so expected to be subject to islands. One reason for treating islands as PF effects is to capture the distinctionbetween these cases and more conventional examples of overt movement. If, however, they pattern the same then the motivation for treating islands as PF effects weakens. This said, I am pretty sure it is possible to model these cases as formed via movement/I-merge by, for example, treating them as cases of remnant movement, the moved Wh or Q morpheme starts as part of a doubled structure including the RP or WIS. 
[4]See Chomsky’s ‘On wh movement’ for discussion. See herefor more prose.
[5]Versions of this original idea were developed by many people including Hornstein, Lasnik and Uriagereka, and Fox and Pesetsky. The idea centers on the idea that the problem with movement is that it reorders elements and so can come into conflict with the ordering algorithm. In this sense, gaps are a big deal and what distinguish movement from other kinds of long distance dependencies like binding.
[6]Or again, it argues for treating the operation (e.g. movement) as in need of constraint rather than the output of the operation (a gap, or new linear order). What plausibly unifies cases of “overt” WH (as in English), “covert” WH (as in Chinese) and resumptive WH (as in Hebrew) is that they all involve relating an A’ element to a non-local syntactic position that can be arbitrarily far away. IT’s the span that seems to matter, not what sits at the tail of the chain.
[7]RP constructions must also be formed via I-merge and so too all forms of binding. This requirement fits well with the observation that RPs obey islands. Binding, especially pronominal binding, is likely to be more problematic. As they say at this point in a journal paper in reply to referee 2; these are topics for further research.
[8]As I have noted in other places, this description of the ECP contrast is not quite correct, as we have known since Rizzi’s work on minimality. The distinction seems less a matter of argument vs adjunct than object centered vs non object centered quantification. But this is a topic for another time.


  1. A few things:

    1. You say, in the paragraph starting with "So, back to the main point," the following: "there is no ES evidence that long 'why' movement from an RCI is appreciably worse than long 'why' movement absent an RCI." I don't have the manuscript, but looking at the poster you linked to, this doesn't seem quite right. If I understood correctly, what they have found is that there is no interaction of islandhood with the type of wh-phrase. That is, long 'why'-movement out of RCI is harder than long 'why'-movement absent an RCI, but in just the same way as (or rather, without a statistically significant different relative to) 'who'/'what'-movement.

    2. What you say about resumptive pronouns only extends to "obligatory" resumptive pronouns, of the Lebanese Arabic type. "Optional" resumptive pronouns (e.g. in Hebrew) are not island sensitive, and if I'm not mistaken, Meltzer-Asscher has verified this experimentally. I'm using scare-quotes around "obligatory" and "optional" because the picture is slightly more complicated (in interesting ways), as Sichel has shown in the last few years.

    3. There are some noteworthy problems with the Chomskyan view whereby islands bound computations to make computations more tractable. First, whatever bounding buys you in chopping up a long A-bar dependency into, say, CP-sized (a.k.a. wh-Island-sized) chunks, an isomorphic derivation involving successive-cyclic A-movement can be built, that has just as many steps and just as much structure, and no bounding nodes whatsoever (on standard assumptions, at least). In light of this, the appeal to bounding computations seems awfully hand-wavy to me. Second, even restricting ourselves to A-bar dependencies, the Minimal-Compliance effects uncovered by Richards in languages like Bulgarian suggest that the picture cannot be so simple. The computation cannot just forget about structure, even if that structure is inside an island, on pains of having to recover it later on. Of course, here too one could imagine a world where we check more carefully and the data turn out to be different than we thought, and, as you know, such checking is afoot :-)

  2. This comment has been removed by the author.

  3. >The L&Y results argue pretty strongly, IMO, for Chomsky’s original conception.
    Could you please explain how? I thought that there is either
    a) original Subjacency Condition which explicitly bans movement (and there is no movement in Chinese in this case) or
    b) its variant in Barriers which bans such a chain, but in Merge-only view there is no dependency between the wh-word and some other head like C (e.g. via Agree), hence no chains to ban
    The slide says the findings support covert LF-movement but isn't that a deprecated idea today?


  4. It is useful data. Can you make more content with utility for others? Find Punjabi meanings with maximum possible meanings from English on Dictionary

  5. My friend mentioned to me your blog, so I thought I'd read it for myself. Very interesting insights, will be back for more!Jogo para criança online
    play Games friv
    free online friv Games