Wednesday, December 4, 2013

Dreams of a unified theory; a great big juicy problem (Yay!!)

The intricacies of A’-syntax is one of the glories of GB.[1]  The unification of Ross’s islands in terms of subjacency and the discovery of ECP dependencies (especially the adjunct/argument distinction) coupled with wide ranging investigations of these effects in a large variety of different kinds of languages marked a high point in Generative Grammar. This all changed with the Minimalist (M) “Revolution” (yes; these are scare quotes). Thereafter, Island and ECP effects mostly fell from the hot topics list (compare post M work with that done in the 80s and early 90s where it seemed that every other paper/book was about A’-dependencies and their island/ECP restrictions). Moreover, though early M was chock full of discussions of Superiority, an A’-effect, it was mainly theoretically interesting for the light that it threw on Minimality and Shortest Move/Attract rather than how it bore on Islands or the ECP. Indeed, from where I sit, the bulk of the interesting work within M has been on A rather than A’ dependencies.[2]

Moreover, whereas there has been interesting research aiming to unify various grammatical modules, subjacency and ECP have resisted theoretical integration, at least interesting versions thereof. It is possible, indeed easy, to translate bounding theory or barriers into phase terminology.[3] However, there is nothing particularly insightful gained in doing this. It is also possible to unify Islands with Minimality given the right use of features placed in appropriate edge positions, but IMO little has been gained to date in so proceeding. So Island and ECP effects, once the pride of theoretical syntax have become a backwater and a slightly embarrassing one for three related reasons.

First, though it is pretty easy to translate Subjacency (viz. bounding theory) in phase terms, this translation simply duplicates the peccadillos of the earlier approaches (e.g. we stipulated bounding nodes, we now stipulate (strong) phases, we stipulated escape hatches (C yes, D no) we now stipulate phase edges (both which phases have any to use and how many they have)).

Second, ad hoc as this is, it’s good compared to the problems the ECP throws up. For example, the ECP is conceptually a trace licensing requirement. Where does this leave us when we replace traces with copies as M does? Do copies need licensing? Why if they are simply different occurrences of a single expression? Moreover, how do we code the difference between adjuncts versus arguments?  What makes the former so restricted when compared to the latter?

Last, the obvious redundancy between Subjacency and the ECP raises serious M questions. Both involve the same island like configurations yet they are entirely different licensing conditions. Talk of redundancy! One of Subjacency or the ECP is bad enough, but both? Argh!!

So, A’-syntax raises M issues and a natural hope is to dispose of these problems by placing them in someone else’s trash bin. And there have been several attempts, to do just this, e.g. Kluender & Kutas, Sag & Hoffmeister, Hawkins, among others. The idea has been to treat island effects as a reflection of processing complexity, the latter arising when parsers try to relate elements outside an island (fillers) to positions (gaps) within an island.  It is well known that filler/gap dependencies impose a memory/storage cost as the process of relating a filler to a gap requires keeping the filler “live” until it’s discharged in the appropriate position. Interestingly, there is independent psycho-ling evidence that the cost of keeping elements active can depend on the details of the parse quite independently of whether islands are involved (e.g. beginnings of finite clauses induce load, as does the parsing of definites).[4] Island effects, on this view, are just the sum total of these island-independent processing costs. In effect, Islands are just structures where these other costly independently manifested requirements converge. If true, this idea could, with some work, let M off the island hook.[5] Wouldn’t that be nice?

It would be, but I personally doubt that this strategy will work out.  The main problem is that it seems very hard to explain the unacceptability profiles of island effects in processing terms. A recent volume (of which I am co-editor though Jon Sprouse did all the really heavy lifting and deserves all the credit, Experimental Syntax and Island Effects) reviews the basic issues. The main take home message is that when considered in detail, the relevant cited complexity inducers (e.g. definiteness) do not eliminate the structural contributions of islands to the perceived acceptability, though they can modulate it (viz. the super-additive effects of islands remain even if the severity of the unacceptability can be manipulated). Many of the papers in the volume address these issues in detail (see especially those by Jon Sprouse, Matt Wagers, and Colin Phillips). The book also contains good representatives of the processing “complexity” alternative and the interested reader is encouraged to take a look at the papers (WARNING: being a co-editor forbids me in good conscience, from advocating purchase but I believe that many would consider this book a perfect holiday gift even for those with no interest in the relevant intellectual issues, e.g. it’s really heavy and would make a perfect paperweight or door stopper).

A nice companion piece to the papers in the above volume that I have recently read seconds the conclusion that Island Effects have a structural source.  The paper (here) is by Yoshida, Kazanina, Pablos and Sturt (YKPS) and it explores the problem in a very clever way. Here’s a quick review.

YKPS starts from the assumption that if the problem is one of the processing complexities of islands, then any dependency into an island that is computed online (as filler/gap dependencies are) should show island like properties even if these dependencies are not products of movement. They identify forward cataphora (e.g. his1 managers revealed that [island the studio that notified Jeffrey Stewart1 about the new film] selected a novel for the script) as one such dependency. YKPS shows that the indicated referential dependency is calculated online just as filler/gap dependencies are (both are very greedy in fixing the dependency). However, in contrast to movement dependencies, pronoun resolution in forward cataphora does not exhibit island effects. The argument is easy to follow and the conclusion strikes me as pretty solid, but read it and judge for yourself. What I liked about it is that it is a classic example of a typical linguistic argument form: YKPS identifies a dog that doesn’t bark. If parsing complexity is the relevant variable then it needs to explain both why some dependencies exhibit island effects and, just as importantly, why some do not. In other words, negative data counts! The absence of island effects is as much a datum as its presence is, though it is often ignored.[6] As YKPS put it:

Complexity accounts, which attribute island effects to the effect of processing complexity of the online dependency formation process, need to explain why the same complexity does not affect (my emphasis, NH) the formation of cataphoric dependencies. (17)

So, it seems to me that islands are here to stay, even if their presence in UG embarrasses minimalists.

Three points and I end. First, the argument that YKPS presents is another nice example of how psycho-techniques can be used to advance syntactic ends.  How so? Well, it is critical to YKPS’s point that forward cataphora involves the same kind of processing strategies (active filler) as do regular filler/gap dependencies that one finds in movement despite the dependencies being entirely different grammatically. This is what makes it possible to compare the two kinds of processes and conclude from their different behavior wrt islands that structural effects cannot be reduced to parsing complexity (a prima facie very reasonable hypothesis and one that might even be nice were it true!).[7] 

Second, the complexity theory of islands pertains to Subjacency Effects. The far harder problem, as I mentioned earlier, involve ECP effects. Indeed, were Subjacency Effects reduced to complexity effects, the presence of ECP effects in the very same configurations would become even more puzzling, at least to me. At any rate, both problems remain, and await a decent M analysis.

Third, let me end with some personal intellectual history. I taught a course on the old GB A’ material with Howard Lasnik this semester (a great experience, thx Howard) and have become pretty convinced that finding a way to simply recapitulate ECP and Island effects in M terms is by no means trivial.  To see this, I invite you to simply try to translate the GB theories into an M acceptable idiom. Even this is pretty hard to do, and a simple translation still leaves one short of a M acceptable account. Conclusion? This is still a really juicy research topic for the unificationally inclined, i.e. a great Minimalist research topic.

[1] I take GB to be the logical culmination of work that first developed as the Extended Standard Theory. Moreover, I here, again, take GB to be one of several kissing cousins, such as GPSG, LFG, HPSG.
[2] This is a bird’s eye evaluation and there are notable exceptions to this coarse generalization. Here is one very conspicuous exception: how ellipsis obviates island effects. Lasnik and Merchant have turned this into a small very productive industry. The main theoretical effect has been to make us reconsider what makes an island islandy. The ellipsis effects have revived an interpretation that has some roots in Ross, that it is not the illicit dependency that matters but the phonological realization thereof that counts.  Islands, on this view, are PF rather than syntactic effects. At any rate, this is really interesting stuff which has led us to understand Island Effects in new ways.
[3] At least if one allows D to be a phase, something some (e.g. Chomsky) has only grudgingly accepted.
[4] Rick Lewis has some nice models of this based on empirical work by Gibson.
[5] Of course, more work needs doing. For example, one needs to explain, why, ellipsis obviates these processing effects (see note 2).
[6] Note 4 indicates another bit of negative data that needs explanation on the complexity account. One might think, for example, that having to infer structure would add to complexity and thus increase the unacceptability of island violations, contrary to what we in fact find.
[7] Very reasonable indeed as witnessed by Chomsky’s extensive efforts to argue against the supposition that island effects are simple complexity effects in On Wh Movement.


  1. I am actually very fond of deriving island constraints from processing effects. Not so much the proposals that are currently out there, but the basic idea.

    The Specifier Island Constraint, for example, does lower parsing complexity because you never have to guess whether a mover came from within a specifier or a complement. From this perspective it also makes sense that there is nothing like the Complement Island Constraint, because sometimes you do not have any specifier at all so things must have come from inside the complement. Basically, if you want to cut down on the number of decision points, ignoring specifiers can work without exception, ignoring complements cannot.

    Another curious point is that the processing accounts of islands assume that the constraints are not part of the grammar (or maybe I've read a very skewed sampling so far). This does not follow. Processing reasons could just as well nudge the learner in a direction where they prefer grammars with island constraints that reduce processing complexity.

    1. I like some of these too. My favorite, surprise surprise, is the Berwick and Weinberg proposal that subjacency relates to efficient parsing with an LRK parser. We also know (Colin Phillips work shows this, some of which is reprised in the mentioned volume above) that island effects have on-line processing effects in that filled gap effects do NOT occur within islands unless grammatically licensed via parasitic gaps. This shows a tight connection between parsing and grammar. I like these results, though I am not sure that this indicates that grammatical constraints are DERIVED from parsing constraints, there is an important functional relation that is interesting.

      The island as grammar agnostics take a more extreme (and so interesting) view: that wrt islands there is no grammar specific to structure at all. Or more exactly, what one finds in islands is what one finds more generally in parsing non-islands and Island effects (including specifiers I would assume) result from the confluence of effects piling up at island boundaries. This is a non-structural view of islands in that there is nothing particularly special about these structures except that many things pile up there. This is what Sprouse and others argue against and that, if correct, has grammatical implications for it implies that islands are structural conditions sensitive to structure islands express but other sentences don't. I suspect that you know all of this, but I thought it would be nice to take advantage of your comment to reiterate it all.

    2. To be clear, there is indeed much evidence that real-time comprehension processes respect island constraints. But that certainly does not motivate a reductionist account of islands. Nor does it provide any evidence for the notion that things like subjacency constraints are in some way a consequence of efficient parsing needs. In just the same way, if I demonstrate that comprehenders are very sensitive to subject-verb agreement, you would not (I hope) conclude that subject-verb agreement is either an epiphenomenon of parsing or motivated by efficient parsing. It bears emphasizing, because I've been surprised at how often people assume that if we have some findings about X in language processing then we must be pursuing a reductionist account of X. The relevant work is summarized in the Sprouse/Hornstein volume in a section that is specifically dedicated to findings that do NOT bear on the reductionist question.

      The distinction that Norbert and Thomas are making is one that we have referred to as "reductionist" vs. "grounded" accounts of islands. They are very different claims, though they are sometimes mistaken for one another. Reductionist accounts of islands claim that the effects are really epiphenomena. These are quite testable accounts. Grounded accounts claim that the effects are thungs-that-it-would-be-nice-for-a-grammar-to-have. Those accounts are very hard indeed to test.

    3. if I demonstrate that comprehenders are very sensitive to subject-verb agreement, you would not (I hope) conclude that subject-verb agreement is either an epiphenomenon of parsing or motivated by efficient parsing.

      Indeed I wouldn't, but not for the reason you are suggesting. For the argument that I sketched above, the crucial issue isn't processing sensitivity but parser performance. Those are two very different things (if you buy into my idiosyncratic terminology). Processing sensitivity is an empirical question and hence subject to experimental investigation. Parser performance is a theoretical issue: given a parsing model, how does this model's performance scale with respect to grammar size, the choice of constraints and movement types, etc. This is the realm of proofs and theorems. So if you can prove that some island constraints improve parsing performance, you've got a nice argument why we find island constraints even if they're not part of UG.

      Of course the argument hinges on some assumptions that are subject to empirical investigation: 1) Is your parsing model a good representation of the parser(s) humans use? 2) Is your measure of performance relevant for the problems the human parser has to solve, and does it correlate with perceived sentence difficulty? Still, those aren't problems with the validity of the argument as such but rather with the applicability of the models and definitions being employed.

      [A quick side remark for illustration of 2): Berwick and Weinberg have pointed out some real issues with the CS notion of parsing performance, which gives string length a lot more weight than grammar size --- if you're interested in parsing programs with thousands of lines of code, that is indeed the right perspective to take, but no so much for natural language sentences that hardly ever have more than 30 words]

      Anyways, you guys have gotten me really interested in the book, so I'll get myself a copy as an early Xmas gift.

    4. I agree, Thomas. Thanks. I was responding to Norbert's (partially retracted) suggestion that some of our experiments had shown something that I do not think that they show.

    5. OIC. With proper threading it's sometimes easy to misconstrue comments --- it would be really nice if Blogger would finally lift the arbitrary restriction to two levels of embedding. But it's good to know that you agree, it suggests that my view is not completely bonkers ;)

  2. There seems to be a fairly clear split between island effects and the ECP in terms of how easy it is to ground these in parsing considerations. As Thomas and Norbert have pointed out, there are lots of "minimal search" type reasons why parsing might be aided by limits on possible dependency paths. On the other hand, the adjunct/argument asymmetry seems less amenable to an explanation in these terms. This is particularly so because it appears to be independent of the distinction between A'-movements which leave an argument structure gap and those which don't (which is the distinction you might expect the parser to be interested in). This is what I take the following paradigm to show. Sentence (1) is ambiguous, depending on whether how many books binds a variable in the object position or (I guess) a variable within the object position. The ambiguity disappears in (2), presumably because the ECP views extraction of how many books as argument extraction in (a) and adjunct extraction in (b). The embedded question blocks adjunct extraction but not argument extraction. From a parsing point of view all of these cases should look like argument extraction, since read is missing an object.

    How many books did you say that John read?
    a. For how many books x did you say that John read x?
    b. For which n did you say that John read n books?

    How many books did you ask John why he read?
    a. ?For how many books x did you ask John why he read x?
    b. *For which n did you ask John why he read n books?

    1. That's an interesting point, Alex. I'm fond of that paradigm, but hadn't thought about it in the context of this particular debate before. The one reservation that I'd have about your argument is whether the real-time interpretive mechanism really does treat both types of extraction as equivalent. If its goal is purely syntactic ("find me a gap"), then your argument is quite right. But if the interpretive difference is adopted from the start of the sentence, then the argument is less straightforward.

    2. It pays to consider another version of the same effect. In sentences like (1) it is possible to get a pair/list answer (i.e. John a pizza, Sue a pie, Sam a salami)
      (1) What did you say that everyone brought
      But this is not possible once there is a wh-island. In this case the only available reading is a singleton, e.g. a desert
      (2) What did you ask if everyone brought
      So it seems that the "dependent" reading is blocked into wh-islands while the "wide scope" reading is permitted.
      A question: what evidence do we have that filler/gap dependencies look for semantically specific gaps of this variety, as you suggested to Alex?

    3. Interesting. I'm not sure that we have any evidence on the grain size of the interpretations that people attribute to wh-phrases upon first encounter. Your paradigm is interestingly different. In Alex's paradigm, one could claim that there's an interpretive ambiguity that is already apparent at the "how many" phrase, and so a comprehender could make a commitment right away. But in your examples that seems less plausible, as the interpretive difference only becomes relevant once the universal quantifier is reached.