Thursday, May 9, 2013

Phases: some questions

One of the nice thing about conferences is that you get to bump into people you haven’t seen for a while. This past weekend, we celebrated our annual UMD Mayfest (it was on prediction in ling sensitive psycho tasks) and, true to form, one of the highlights of the get together was that I was able to talk to Masaya Yoshida (a syntax and psycho dual threat at Northwestern) about islands, subjacency, phases and the argument-adjunct movement asymmetry.  At any rate, as we talked, we started to compare Phase Theory with earlier approaches to strict cyclicity (SC) and it again struck me how unsure I am that the new fangled technology has added to our stock of knowledge.  And, rather than spending hours upon hours trying to figure this out solo, I thought that I would exploit the power of crowds and ask what the average syntactician in the street thinks phases have taught us above and beyond standard GB wisdom.  In other words, let’s consider this a WWGBS (what would GB say) moment (here) and ask what phase wise thinking has added to the discussion.  To set the stage, let me outline how I understand the central features of phase theory and also put some jaundiced cards on the table, repeating comments already made by others. Here goes.

Phases are intended to model the fact that grammars are SC. The most impressive empirical reflex of this is successive cyclic A’-movement.  The most interesting theoretical consequence is that SC grammatical operations bound the domain of computation thereby reducing computational complexity.  Within GB these two factors are the province of bounding theory, aka Subjacency Theory (ST). The classical ST comes in two parts: (i) a principle that restricts grammatical commerce (at least movement) to adjacent domains (viz. there can be at most one bounding node (BN) between the launch site and target of movement) and (ii) a metric for “measuring” domain size (viz. the unit of measure is the BN and these are DP, CP, (vP), and maybe TP and PP).[1] Fix the bounding nodes within a given G and one gets locality domains that undergird SC. Empirically A’-movement applies strictly cyclically because it must given the combination of assumptions (i) and (ii) above.

Now, given this and a few other assumptions and it is also possible to model island effects in a unified way.  The extra assumptions are: (iii) some BNs have “escape hatches” through which a moving element can move from one cyclic domain to another (viz. CP but crucially not DP) (iv) escape hatches can accommodate varying numbers of commuters (i.e. the number of exits can vary; English thought to have just one, while multiple WH fronting languages have many). If we add a further assumption - (v) DP and CP (and vP) are universally BNs but Gs can also select TP and PP as BNs – the theory allows for some typological variation.[2] (i)-(v) constitutes the classical Subjacency theory. Btw, the reconstruction above is historically misleading in one important way.  SC was seen to be a consequence of the way in which island effects were unified. It’s not that SC was modeled first and then assumptions added to get islands, rather the reverse; the primary aim was to unify island effects and a singular consequent of this effort was SC. Indeed, it can be argued (in fact I would so argue) that the most interesting empirical support for the classical theory was the discovery of SC movement.

One of the hot debates when I was a grad student was whether long distance movement dependencies were actually SC. Kayne and Pollock and Torrego provided (at the time surprising) evidence that it was, based on SC inversion operations in French and Spanish.  Chung supplied Comp agreement evidence from Chamorro to the same effect.  This, added to the unification of islands, made ST the jewel in the GB crown, both theoretically and empirically. Given my general rule of thumb that GB is largely empirically accurate, I take it as relatively uncontroversial that any empirically adequate theory of FL must explain why Gs are SC.

As noted in a previous post (here), ST developed and expanded.  But let’s leave history behind and jump to the present. Phase Theory (PT) is the latest model for SC. How does it compare with ST?  From where I sit, PT looks almost isomorphic to it, or at least a version that extends to cover island effects does.  A PT of this ilk has CP, vP and DP as phases.[3] It incorporates the Phase Impenetrabiltiy Condition (PIC) that requires that interacting expressions be in (at most) adjacent phases.[4] Distance is measured from one phase edge to the next (i.e. complements to phase heads are grammatically opaque, edges are not). This differs from ST in that the cyclic boundary is the phase/BN head rather than the MaxP of the Phase/BN head, but this is a small difference technically. PT also assumes “escape hatches” in the sense that movement to a phase edge moves an expression from inside one phase into the next higher phase domain and, as in ST, different phases have different available edges suitable for “escape.”  If we assume that Cs have different numbers of available phase edges and we assume that D has no such available edges at all then we get a theory effectively identical to the ST.  In effect, we traded phase edges for escape hatches and the PIC for (i).[5]
There are a few novelties in PT, but so far as I can tell they are innovations compatible with ST. The two most distinctive innovations regard the nature of derivations and multiple spell out (MSO). Let me briefly discuss each, in reverse order.

MSO is a revival of ideas that go back to Ross, but with a twist.  Uriagereka was the first to suggest that derivations progressively make opaque parts of the derivation by spelling them out (viz. spell out (SO) entails grammatical inaccessibility, at least to movement operations).  This is not new.  ST had the same effect, as SC progressively makes earlier parts of the derivation inaccessible to later parts.  PT, however, makes earlier parts of the derivation inaccessible by disappearing the relevant structure.  It’s gone, sent to the interfaces and hence no longer part of the computation.  This can be effected in various ways, but the standard interpretations of MSO (due to Chomsky and quite a bit different form Uriagereka’s) have coupled SO with linearization conditions in some way (Uriagereka does this as do Fox and Pesetsky, in a different way). This has the empirical benefit of allowing deletion to obviate islands. How? Deletion removes the burden of PF linearization and if what makes an island an island are the burdens of linearization (Uriagereka) or frozen linearizations (Fox and Pesetsky) then as deletion obviates the necessity of linearization, island effects should disappear, as they appear to do (Ross was the first to note this (surprise, surprise) and Merchant and Lasnik have elaborated his basic insight for the last decade!). At any rate, interesting though this is (and it is very interesting IMO), it is not incompatible with ST. Why? Because, ST never said what made an island an island, or more accurately, what made earlier cyclic material unavailable to later parts of the computation. (i.e. it had not real theory of inaccessibility, just a picture) and it is compatible with ST that it is PF concerns that render earlier structure opaque. So, though PT incorporates MSO, it is something that could have been added to ST and so is not an intrinsic feature of PT accounts. In other words, MSO does not follow from other parts of PT any more than it does from ST. It is an add-on; a very interesting one, but an add-on nonetheless.[6]

Note, btw, that MSO accounts, just like STs require a specification of when SO occurs. It occurs cyclically (i.e. either at the end of a relevant phase, or when the next phase head is accessed) and this is how PT models SC. 

The second innovation is that phases are taken to be the units of computation.  In Derivation by Phase, for example, operations are complex and non-markovian within the phase.  This is what I take Chomsky to mean when he says that operations in a phase apply “all at once.” Many apply simultaneously (hence not one “line” at a time) and they have no order of application. I confess to not fully understanding what this means. It appears to require a “generate and filter” view of derivations (e.g. intervention effects are filters rather than conditions on rule application).  It is also the case that SO is a complex checking operation where features are inspected and vetted before being sent for interpretation.  At any rate, the phase is a very busy place: multiple operations apply all at once; expressions E and I merged, features checked and shipped.

This is a novel conception of the derivation, but again, is not inherent in the punctate nature of PT.[7] Thus, PT has various independent parts, one of which is isomorphic to traditional ST and other parts that are logically independent of one another and the ST similar part. That which explains SC is the same as what we find in ST and is independent of the other moving parts. Moreover, the parts of PT isomorphic to ST seem no better motivated (and no less worse) than the analogous features in ST: e.g. why the BNs are just these has no worse answer within ST than the question why the phase heads are just those.

That’s how I see PT.  I have probably skipped some key features. But here are some crowd directed questions: What are the parade cases empirically grounding PT? In other words, what’s the PT analogue of affix hopping? What beautiful results/insights would we loose if we just gave PT up? Without ST we loose an account of island effects and SC. Without PT we loose…? Moreover, are these advantages intrinsic to minimalism or could they have already been achieved in more or less the same form within GB. In other words, is PT an empirical/theoretical advance or just a rebranding of earlier GB technology/concepts (not that there is anything intrinsically wrong with this, btw)?  So, fellow minimalists, enlighten me. Show me the inner logic, the “virtual conceptual necessity” of the PT system as well as its empirical virtues. Show me in what ways we have advanced beyond our earlier GB bumblings and stumblings. Inquiring minimalist minds (or at least one) want to know.

[1] This “history” compacts about a decade of research and is somewhat anachronistic.  The actual history is quite a bit more complicated (thanks Howard).
[2] Actually, if one adds vP as a BN then Rizzi like differences between Italian and English cannot be accommodated. Why? Because, once one moves into an escape hatch movement is thereafter escape hatch to escape hatch, as Rizzi noted for Italian. The option of moving via CP is only available for the first move. Thereafter, if CP is a BN movement must be CP to CP. If vP is added as a BN then it is the first available BN and whether one moves through it or not, all CP positions must be occupied. If this is too much “inside baseball” for you, don’t sweat it. Just the nostalgic reminiscences of a senior citizen.
[3] vP is an addition from Barriers versions of ST, though how it is incorporated into PT is a bit different from how vP acted in ST accounts.
[4] There are two versions of the PIC, one that restricts grammatical commerce to expressions in the same phase and a looser one that allows expressions in adjacent phases to interact. The latter is what is currently assumed (for pretty meager empirical reasons IMO – Nominative object agreement in quirky subject transitive sentences in Icelandic, I think).
[5] As is well known, Chomsky has been reluctant to extend phase status to D. However, if this is not done then PT cannot account for island effects at all and this removes one of the more interesting effects of cyclicity. There has been some allusions to the possibility that islands are not cyclicity effects, indeed not even grammatical effects.  However, I personally find the latter suggestion most implausible (see the forthcoming collection on this edited by Jon Sprouse and yours truly: out sometime in the fall). As for the former, well, if islands are grammatical effects (and like I said, the evidence seems to me overwhelming) then if PT does not extend to cover these then it is less empirically viable than ST.  This does not mean that it is wrong to divorce the two, but it does burden the revisionist with a pretty big theoretical note payable.
[6] MSO is effectively a theory of the PIC. Curiously, from what I gather, current versions of PT have began mitigating the view that SO removes structure by sending it to the interfaces. The problem is that such early shipping makes linearization problematic.  It is also necessitates processes by which spelled out material is “reassembled” so that the interfaces can work their interpretive magic (think binding which is across interfaces, or clausal intonation, which is also defined over the entire sentence, not just a phase).
[7] Nor is the assumption that lexical access is SC (i.e. the numeration is accessed in phase sized chunks). This is roughly motivated on (IMO view weak) conceptual reasons concerning SC arrays reducing computational complexity and empirical facts about Merge over Move (btw: does anyone except me still think that Merge over Move regulates derivations?).


  1. From note 6: The problem is that such early shipping makes linearization problematic. It is also necessitates processes by which spelled out material is “reassembled” so that the interfaces can work their interpretive magic (think binding which is across interfaces, or clausal intonation, which is also defined over the entire sentence, not just a phase).

    This point has always worried/confused me. Just take linearization to keep things concrete: When we're deriving 'Who did John meet yesterday', if 'meet yesterday' is "sent to the interfaces" at the end of the vP phase and 'who did John' is "sent to the interfaces" at the end of the CP phase, what determines that we get the desired word order instead of 'meet yesterday who did John'? I think Boeckx and Grohmann call this "the recombination problem".

    I'm a fan of thinking of spellout along the lines of what Uriagereka's original MSO paper supposed, which seems to avoid this problem: nothing "disappears" or is "sent away", instead a chunk of structure just gets flattened into an unstructured word-like chunk, which can participate in the next phase up in the same way as a simple lexical word does.

    So, I'd like to add a question: why think of spellout as "disappearing" or "sending away", rather than "becoming unstructured"?

  2. Or just becoming unavailable? Being unstructured is one way, but is this the only/best way?

    1. True, being unstructured per se is not central to my question. The main point is that Juan's theory stops short of saying that the spelled-out stuff is "completely gone", and therefore doesn't run into the recombination problem.

      Having said that, becoming unstructured does seem to be a pretty well-suited metaphor for this common kind of "becoming unavailable" that both phases and subjacency implement. Once we get to, say, the end of an embedded CP phase or bounding domain, we want the CP as a whole to still be present, to serve as the complement of the embedding verb (e.g. 'wonder'). Where for "serve as the complement" you can read any of the usual ways you might like to think of this: satisfying subcategorization, being linearized to its immediate right, serving as its semantic argument, whatever. For these purposes, the CP domain "as a whole" is available. It's the internal structure that's unavailable.

      So certainly there may be alternatives to saying that a spelled-out chunk "becomes unstructured". But also "becoming unavailable" doesn't quite do justice the details, since only the insides of the spelled-out domain become unavailable.

  3. There seems to be a bit of a tension in phase theory between (i) motivating phases in terms of computational complexity and (ii) dealing with non-cyclic cross-phasal interpretative dependencies. On the strongest conception of phases, a completed phase is Spelled Out, interpreted at both interfaces, and never needs to be looked at again. This seems very efficient, but it's hard to reconcile with (e.g.) cross-phasal variable binding. If phases are just linearization domains, the existence of cross-phasal interpretative dependencies is no longer problematic. But then, it's not obvious why linearizing a structure chunk-by-chunk should be more efficient than linearizing it all in one go. Any reasonable linearization algorithm can be implemented by traversing the tree using bounded memory. In contrast, it seems plausible that the difficulty of interpreting a structure might grow very quickly as it gets bigger, so that interpreting chunk-by-chunk would actually be more efficient.

    1. Any chance that we can tie this with Chomsky's suspicion that it's phonological expression that's really costly? I don't see how, but it would address your linearization remarks (note: address, not resolve).

  4. In the classical ST account, the insides also became unavailable, but this was because they were "too far away" rather than because they were non longer in a manipulable format. The idea of SO being the source of opacity revolves around getting some handle on why island effects are obviated under ellipsis. This is a neat result and it is unclear why this should hold on the classical theory. However, I think that the main problem with classical ST wrt this obviation of islandhood under ellipsis is that it was conceptualized as a PFish condition on rule application, rather than as an output condition. If the latter perspective is taken then whether the format changes or not, the empirics remain the same. What SO might do is provide a rationale for this PFish effect. However, I think that this is more metaphorical than tightly deductive. At any rate, tying island amnesties to absence of linearization has a nice feel to it, even if the details, imo, are a bit hazy.

    1. It is a neat result, but I agree that the details are hazy - and there are many many cases where island repair effects are mysteriously missing. The *-trace theory for island escaping XPs is unsatisfying, and makes the wrong predictions in many cases.

      There is a strain of work that takes island repair under ellipsis to be illusory - such approaches also make the right predictions (in many cases if not all) regarding when such effects should be missing (e.g. contrastive clausal ellipsis, VP ellipsis, and certain left branch extractions).

      If these approaches can be maintained, this would make one result of PT less "neat" - that is, there'd be one less empirical/conceptual argument in favor.

  5. Could you elaborate a little bit. This is interesting. What's the work that argues it is illusory? I know the VP ellipsis stuff which shows that one gets island effects, but what are the results that show that the apparent obviation of island effects under ellipsis is an illusion.

    Btw, I agree that the *-trace stuff is very unsatisfying. I've always taken it to be a stand in until something better came along. I actually like some version of Max Elide properly understood as a kind of A over A effect with deaccenting. What I mean is that you can elide or deaccent, but you cannot deaccent some and elide some.

    1. This comment has been removed by the author.

    2. I'm replacing with a shorter version of that comment.

      Other kinds of lack of repair:
      - Multiple remnant sluicing doesn't fix violations of the right roof constraint (Howard Lasnik has some manuscripts online discussing these cases)
      - Left branch extraction

      Work arguing that repair is illusory
      - Merchant (2001) (discussion of relative clause islands)
      - Fukaya (2007) (PhD thesis, good evidence from the interpretation of RC island ameliorating sluices here)
      - Szczegielniak (2008) for sluicing in Polish
      - Barros (2012) (CLS 48 proceedings)
      - Barros, Elliott and Thoms (2013) (CLS 49 handout available on my website)

      The Max Elide idea is interesting for the VPE cases, but I think it can be controlled for by blocking de-accenting lower in the structure:

      (1) SAM speaks GREEK, but I don't know which language SALLY does.
      (2) *SAM hired someone who speaks GREEK, but I don't know which language SALLY did (hire someone who speaks).

      (1) isn't perfect, but way better than (2), Merchant (2008) discusses cases like these w.r.t. Max Elide and I believe concludes it couldn't be behind the lack of repair effects in VPE cases.

  6. So the Kayardild challenge to phases, as I see it, is that the case-marking on members of things that ought to be inside phases is often conditioned by things that ought to be outside them, without any worked out account that I have noticed of what kind escape-hatch is required (including in Norvin's recent paper that someone either here or on fb directed me towards). So for example (12-62) from Evans' 1995 grammar:

    ki-lda kurri-j, ngijuwa
    2-pl-NOM see-ACT 1sgSUBJ:COBL

    murruku-rrka kala-thurrk
    woomera-MLOC:COBL cut-IMMED:COBL

    'You see/saw that I am/was cutting a woomera'

    (rats, gloss alighment trashed by blogger)
    'ACT' is the 'Actual' tense-mood ':COBL' is the `complementizing oblique' which is stuck with some morphological fusion onto all members of subordinate clauses that don't share their subjects with the main clause (and various other kinds of clauses). MLOC is `modal locative' case that the object of a realis verb would show.

    Can't find a 'cut a woomera' example of this kind, but here's an MLOC marked object for 'cut a boomerang':

    (7-7) ngada kala-tharri wangalk-i
    lsgNOM cut-NEG.ACT boomerang-MLOC

    Surely most of us could come up with a new kind of escape hatch for this if we had to hand it a paper about it the day after tomorrow, but it would be nice to have something more considered.

    I think it also challenges the bottom-up conception of current minimalism: equipping words with, say sequences of features that can be merged and interpreted later in the derivation seems a tad perverse than suggesting that morphology has some capacity to access aspects of the previous production and planning of the constituent(s) that the word is inside of.

  7. Part of the bad impression that Minimalism seems to make on so many people might be caused by the fact that it superficially looks as if sentences in many languages have to be construct in close to the reverse order in which they are produced, whereas, it seems to me that part of the promise of phases would be to eliminate this feature by isolating the processing over the higher/earlier parts of the sentence from the lower/later ones (to stuff transmitted through the escape hatches, whatever they turn out to be).

    Perhaps some kind of unification of MP with Categorial Grammar could achieve this, since the latter is heavily local, and inherently order-independent (with a lot of substantial math not very far in the background), but underpowered for dealing with syntactic details, as far as I can make out.

    1. Not sure in get this Avery. Do you mean bottom up being the problem? Not sure how phases per se solve this problem. But I might be misunderstanding.

  8. I see bottom-upness as at least a presentational problem, and possibly a real one, that could be defused by showing how the theory doesn't mandate any particular order of computatation. Phases might be able to help by limiting the interactions between different levels.

    The idea that Kayardild speakers, for example, literally have to compose their subordinate clause fully, spell it out into a buffer somewhere, produce the main clause and then finally speak their subordinate clause (Norvin Richards' and David Pesetsky's idea for how to manage Tangkic languages by delaying spellout), strikes most people as completely absurd; just saying that the theoretical order of operations isn't the real one was OK 40 years ago but is certainly not OK now.

    Thinking in the mode of categorial grammar or LFG's glue semantics, you could however envision a system whereby the contents of what a finite clause would have in its escape hatches (including, for Kayardild, complementizing oblique case under many circumstances) was posited as an assumption, the main clause computation done, then a subordinate clause with the appropriate properties produced to discharge the assumption.

    Without a phase-like idea, you couldn't do something like this, due to no clear statement about what you have to know about the subordinate clause to produce the main one.

    Or so it seems to me. Of course if I'd managed to work out how to implement something like this with a decent amount of syntactic detail, it would have appeared on lingbuzz rather than here.

  9. I agree that phases, if they work, can limit interaction between domains to a select number of elements. But this was also true of earlier subjacency accounts, no? What do phases add?

    As for the order of operations: so far as I know, bottom up grammars are easy enough to put into Left-Right parsers (e.g. left corner parsers) and so the fact that grammars generate bottom up does not mean that they must be so used. All the bottom up stuff does is make evident what the constituents are. It does not say that sentences are produced or parsed this way. However, I agree that anything that breaks long computations into smaller independent ones would be welcome. Phases do this, as do barriers and bounding nodes. That made the latter two interesting and it holds for phases as well.