Monday, March 10, 2014

From Derivations to Outputs: Transfer in Minimalist Grammars

Things have been a little quiet on the Minimalist grammar front around here recently (February and March are paper writing season for me). But fortunately Greg Kobele whipped up an amazing intro to the MG conception of Transfer that you should all check out. It starts out with the very basics of Minimalist derivations, which I also discussed in two earlier posts. But then it moves to some really juicy material, namely how derivations can be mapped to surface strings and semantic interpretations without having to construct some kind of phrase structure tree first.

Under a Minimalist conception, syntax builds (possibly multi-dominant) phrase structure trees, and at certain points during the structure-building process chunks are processed by the operations Transfer-PF and Transfer-LF and sent off to the interfaces. As Chris Collins lamented in his guest post, the technical details of these operations have never been fully spelled out in the literature. It is usually assumed that they apply every time a phase is completed, but the mechanisms for determining which occurrences of an LI should be pronounced or how quantifier scope should be determined are left open. These are not trivial issues. As a matter of fact, a whopping 9 pages of Collins and Stabler are dedicated to getting the system off the ground and figuring out what pieces are indispensable for it to work.

The MG perspective Greg presents is a lot simpler on a technical level while actually doing more work.
  • First of all, it operates directly on derivation trees, with no reference to phrase structure trees (as Greg will be quick to point out, though, we could just as well state it over multidominant phrase structure trees if we wanted to because they are so similar to derivation trees). So this is yet another instance where derived trees turn out to be redundant, they serve no function that can't be taken care of by derivation trees.
  • Second, MG Transfer is more local than its Minimalist conception because the former applies at every step of the derivation but the latter does not. This addresses a problem that as far as I know was first raised by Michael Brody: If Transfer applies to entire CPs, then it operates on very elaborate tree representations. So Minimalism is both derivational and representational, which isn't all that minimal. The MG perspective shows that Transfer doesn't have to wait for full CPs to be assembled, it can apply at every point in the derivation. Minimalism without representations is possible after all.
  • Third, the procedure Greg describes is precise, yet general. It handles standard phrasal movement just as easily as head movement, roll-up movement or remnant movement. And it gives an exact translation from derivations to logical formulas, including a simple treatment for scope ambiguities.
I could still go on for a while, but I suggest that you check out Greg's paper instead. It is a short yet approachable read with plenty of trees and examples. And it might give you some food for thought even if you are deeply attached to the standard version of Transfer (whatever that might be, exactly).


  1. Transfer applies to entire CPs for (alleged) empirical reasons, not because there's any theoretical difficulty in having a smaller transfer domain. So the question would be whether defining transfer in the way Greg proposes does whatever empirical work phase theory is supposed to do. Of course, there is no reason why Greg's proposal should be expected to do that, but it it doesn't, it's not really fair to compare the two proposals in terms of simplicity.

    1. I think the empirical argument is very weak in this case, and your use of "alleged" seems to indicate that you share this feeling.

      As far as I can remember the majority of reasons for CP and vP being phases in Chomsky (2001) were conceptual in nature, not empirical (and subarrays in Chomsky 2000 were introduced to fix Merge-over-Move).

      The empirical arguments that have been put forward since then are variants of the idea that material inside phases becomes inaccessible, which doesn't really fly because 1) without any restrictions on OCC features, anything can be moved into the edge to remain accessible, and 2) Collins and Stabler show that if Transfer applies only at CP and vP, then the material contained within those phases must remain accessible for Transfer.
      There's also the arguments for successive cyclic movement, of course, but phases are just the most recent way of bringing that about.

      I'm also unaware of any work that shows that phases must not be smaller, while there are several proposals that every phrase is a phase (e.g. Gereon Mueller's 2010 paper in LI), which deal with this small locality domain just fine.

      Finally, the MG system can be converted into one where Transfer is delayed until vP and CP, it just doesn't make a difference.

      So I think the standard priorities are actually reversed in this case: if you have a very simple system that works just fine, then you have to make a good case why a more complicated one should be used instead. A carefully constructed argument along these lines would be very interesting.

    2. I agree. The key question, then, is how much empirical bite standard phase theories have.

    3. It's not clear to me what the empirical question is. Or rather, there are multiple possible empirical questions. My approach to transfer is committed to the SMC, or something like it. However the idea, that the interpreted objects are tuples/cooper stores, is easily extended to the entire formalism minus the SMC; this means however that we have to use free variables, which I would really like to avoid. But still, this allows for interpretation to proceed at every derivational step. This is empirical question 1: SMC or no SMC? (And it can be rephrased as: variable free or variable-full?)
      Empirical question 2 is something like: can we do (full) interpretation at every derivational step? Here it's not so clear what would be an objection to this. Some things that I've thought of over the years are ellipsis, and quantifier scope (especially in inverse linking constructions). I've shown however that these can be dealt with in the usual way without maintaining syntactic structures.

      I don't think `standard phase theories' have anything to do with either of these questions, at least when they are construed as theories of locality; their empirical bite, if any, lies in a sort of demarcation of islands/movement-step-delimitation. This sort of information is finite and is easily encoded into features (as Thomas described in his first-ish post). (See also Ed Stabler's representation of relativized minimality.) Whether or not you think this is insightful, it shows that this kind of consideration is not relevant for empirical questions.

    4. @Greg My point was just that some of the complexity of the Collins & Stabler model derives from its attempt to encode certain features of phase theory. So, if these features do some empirical work, that may justify the additional complexity. If they don't then it doesn't.

    5. (Also, I don't agree with the claim that it is not an empirical question whether we need just the SMC or the SMC plus some additional domain-based constraints, but this probably relates to differing philosophical background assumptions which we've discussed before.)

    6. Also, I don't agree with the claim that it is not an empirical question whether we need just the SMC or the SMC plus some additional domain-based constraints
      I didn't intend to claim otherwise. (Emprical question 1 was whether or not the SMC could be maintained at all.)

    7. @Greg I was referring to you second paragraph. I took you to be saying that since adding phases to standard MGs doesn't increase their generative power, it would be difficult/impossible to find empirical evidence for phases.

    8. @Alex (sorry for dropping the ball here): I was going for a strong generative capacity argument. This is a little shady, because there's no formally precise theory of phases to serve as the baseline.

  2. 3 thoughts on this. First, is it not the case that if something has more than one selector feature in a row (ie =D =D for a ditransitive verb), it will be unable to assign case or position so as to discriminate the two argument roles? Whereas, something like =D +acc =D +dat would work (although people seem to assume that the case-assigning head needs to be different from the theta-assigner).

    Which leads on to 2: what if only functional categories, not lexcial ones, were able to select or asign case at all? So a ditransitive would be of semantic type e->e->e->t, with syntactic features v, getting its arguments both supplied semantically and case-checked by the functional projections. An advantage would be that if only the FPs can specify the ordering possibilities for the arguments, then we get an explanation for why open lexial categories don't seem to do this on an idiosyncratic basis, ie 'eat' taking its patient in preverbal position, 'drink' in postverbal.

    Finally, continuing to go in this way, there is then no conceptual reason for the usual 'introduce, move' two-step for supplying arguments, it would presumably be a fairly trivial adjustment to let functional heads both select and check a case for something, perhaps with a notation such as [=D+acc]. If this has bad empirical consequences, it might be interesting to know what they are. On the good side, simultaneous introduction and checking as the default would explain why complete nondiscrimination of arguments is relatively rare (although not completely unheard of).

    1. Hi Avery,
      (i) If DPs need case (are of the form D -k), then your proposed ditransitive verb will run afoul of the SMC. If you do not assume the SMC, then you are correct.
      (ii) This is the common idea in GB (AgrO, ArgS, AgrP, etc), and still prominent in many post-GB analyses. Certainly, it's the obvious thing to do with the SMC. Furthermore, if you do `morphology in the syntax', it is easier to deal with valence decreases (like in the passive) by saying that that now gone argument (=d feature) was never there in the first place. Pushed to the logical conclusion (i.e. given that you can say `Killing is bad'), the object should be introduced outside of the verb phrase, and the verb itself given an atomic category.
      (iii) You could do that (I like the notation =[D -k] better; it reminds me of categorial grammar; if you place a bound on the depth of embedding this is equivalent to normal MGs), but the benefit you mention is not applicable with the SMC.

    2. I don't see how the SMC will apply to the first version of ditransitives. Suppose the lexicon is:

      sýndi::=d =d vd (showed)
      Hoskuldi::d -dat
      börnin::d -acc (the children)
      e:: =vd+dat vt
      e:: =vt+acc v

      Then we can produce v-phrases with the dative and accusative d's appearing in the correct order (assuming appropriate linearization), & I don't see how the SMC applies, because the triggering features are different. The problem being the apparent absence of a way to get them supplied in the correct order to the verb itself, and therefore pick up approprate theta roles, which, in this case, are not predictable from the referents.

      Likewise I don't see how the SMC applies to the =d -dat =d -acc idea.

      There are surely a lot of ways in which these systems might be set up, but the greater formal clarity of MGs makes me think that the chances of identifying the advantages and disadvantages of the possibilities might be greater than with the less formalized approaches.

    3. Ah yes, I misunderstood. If you have different case features (instead of a monolithic `-k') then there is no SMC problem. I myself favor the monolithic approach (which then needs augmenting with something like finite feature unification, which can be implemented in many ways, among them splitting the -k into -nom, -dat, -acc, etc); this allows me to say that the self same DP is the object of an active and the subject of a passive. In general, as I alluded to in my last comment, valence decreasing operations are hard to implement as such (they would seem to require getting rid of features without checking them). The way people have hit upon to deal with them is by denying that they exist; the feature that is missing is missing because it was never there. This leads directly to a decompositional style of analysis, which I think of as one of the hallmarks of the TG/GB/MP program.

      I'm not sure I understand your problem (the apparent absence of a way to get them supplied in the correct order to the verb itself) -- if the verb has type (e^n t), the $i^{th}$ syntactic argument will receive the theta role of the $i^{th}$ semantic argument. Splitting things apart syntactically could lead to problems, if one adopted a neoDavidsonian approach; but even there it is not insurmountable.

      There are surely a lot of ways in which these systems might be set up, but the greater formal clarity of MGs makes me think that the chances of identifying the advantages and disadvantages of the possibilities might be greater than with the less formalized approaches.
      You're preaching to the choir!

    4. The exact problem that I'm having is for example that if the first/innermost e of sýndi is Patient and the second/outermost Recipient (following the grain of the usual semantic role hierarchies), we could merge either the dative or the accusative into either position, and get the same observable result after the two Moves, creating unwanted ambiguity. It doesn't matter whether we interpret the 'case' features as morphological case, or 'abstract Case', not very distinct from the Grammatical Relations/Functions of RG or LFG. I don't grasp how the subdivision of k would work, but not allowing open class items to merge with any arguments at all sidesteps this strictly theoretical issue, I think.

      There does appear to be an interesting cognitive divide between people who find standard informal GB/MP sufficiently clear to work with, and those who don't (me and you in the latter group). Part of my current interest here is to better understand what the real differences between MP and LFG, might actually be, if there are any. Noting for example that the derived structure trees with their multiattachments sort of in some ways resemble the structural proposal that Chris Manning & me made in our 1999 book, with a 'master tree' where attributes belonging to different classes spread across levels according to various patterns (tho that architecture proved not to generalize to more constructions in the way that I/we hoped that it would).

    5. Ah, I see now. Yes, =d =d v means that any two d's can be merged there, and if they have different case features, the type of the resulting syntactic expression will be a v with a -dat and a -acc, irrespective of which d hosted the -dat and which the -acc.
      The problem isn't solved by requiring just that open class items should not take arguments; we could easily have a closed class item of the form =V =d =d v, where the same problem would arise. What needs to happen is that the case features of the one must be checked before the other is introduced. In German, which I have thought about more, I would say that all case checking is covert, and that the various surface positions of DPs is a result of additional movement features.

      This is an interesting goal! I suspect that you mean `MP as she is practiced' and `LFG as she is practiced' rather than `the formal framework of MG' and `the formal framework of LFG'; this latter question we understand a (tiny) bit better, of course, although my impression (is this right?) is that LFG is only rendered context-sensitive by virtue of an offline parsability constraint.

      Another impression of mine is that TG/GB/MP is unique in that it promotes decompositional styles of analysis, and in particular ones with a phrasal approach to constructions. This is what I personally find the most interesting about MP. If this is correct (that MP is unique in this regard), I'm not sure how to compare LFG/MP as they are practiced. Do you have any ideas?

    6. You could have such things as closed class items, and possibly they exist in some languages. In Crow (Apsaroke) for example, if my recollections from a bit of field work in the 70s aren't too misleading, there is no way to distinguish grammatically between 'I am better than you' and 'you are better than me':

      bii lii iihchiisee-k (I II be.better.than-DECL)
      lii wii iihchiisee-k

      Are both ambiguous(!!). The point of the [=d -acc] notation being preferred to a lineup was to make this kind on nondiscrimination merely disfavored rather than actually impossible.

      The two frameworks equivalent in some sense as practiced, definitely. A more careful formulation is that there might be something which practitioners of each would accept as a variant of their framework. This partly triggered by the observations that a) MP/GB seems to have essentially come over to the LFG treatment of subject-raising an clitic doubling, with multiattachment (and the discovery of backward raising takes care of what I thought was a big problem for LFG). Of course a really clear account of empirical non-equivalence would be just as interesting, what we have now seems to me just to be a fog.

      LFG doesn't promote lexical decomposition, but it could be included with 'glue semantics', since lexical items can introduce multiple meaning constructors. How much and what kind of decomposition there really is is a rather live issue, however. I don't find the ones I've looked at even remotely convincing, but that might be my fault rather than that of the analyses.

      One big formal problem for LFG is that the formal framework literally says nothing at all about the nature of language from the point of view of weak generative capacity, since offline parseability is extraneous; this is presumably because there is no limit on the amount or nature of f-structure that can be accumulated inside a phrase or phrases and then used for grammaticality checking outside of it. I think this problem must be fixable because this kind of power just isn't used in actual LFG analysis, but nobody seems to have fixed it (there is however some math work I found on restricted versions of LFG that are better behaved, but have no idea ATM how they would fare for actual descriptive work).

  3. This comment has been removed by a blog administrator.