Sunday, January 26, 2014

Two More Things about Minimalist Grammars

My intro post on Minimalist grammars garnered a lot more attention than I expected. People asked some damn fine questions, some of which I could not fully address in a single comment. So let's take a closer look at two technical issues before we move on to derivation trees next week (I swear, Norbert, we'll get to them soon).

Domain of Merge

In an attempt to keep things intuitive and focused on examples, I reduced the definition of Merge and Move to, literally, your old buddies Merge and Move. Succinct, but not particularly insightful (also, it's kind of presumptuous on my part to assume that you're on good footing with Merge and Move). For Move it seems to have worked just fine, but there was some confusion about when, where and how Merge is allowed to apply. So let's be a wee bit more precise this time, ideally using some kind of fancy-schmancy mathematical notation.

First, we write t[x] to denote a tree t whose head has x as its first unchecked feature, e.g. t[D+] for likes::D+ D+ V- or the beautiful fellow below:
Second, if we already have some t[x] and write t instead, then this means that the feature x on the head of t has been checked off. Then we define Merge(t[x+],u[x-]) = [Label(t) t u] if t consists of a single lexical item, and [Label(t) u t] otherwise. So what information is packed into this lovely statement?
  1. Merge takes two trees as arguments.
  2. The heads of these two trees have matching features of opposite polarity.
  3. Merge combines these two trees into one big tree.
  4. The root node of the new tree receives some suitably defined label that picks out the head of t as the head of the whole new tree.
  5. If the selecting head (which carries the positive polarity feature) has not selected anything else yet, it is linearized to the left of the selectee. Otherwise it appears to the right. This expresses the ordering difference between specifiers and complements.
Points 4 and 5 deviate from current Minimalist thinking, but let's ignore them for now. They will turn out to be irrelevant anyways once we move on to derivation trees. What is actually important is that when combining two trees, Merge has access only to the features on the respective heads. Hence a lexical item farther down in a tree can no longer trigger any Merge operations. This rules out countercyclic Merge.1

Consider this example by Avery, where likes::D+ D+ V- is merged with Mary::D- top-.

Can we then continue to merge this tree with, say, Fred::top+? The answer is no, because the head of the tree isn't Mary but likes, whose first unchecked feature is D+ rather than top-.


Are All Features the Same?

Another thing I wasn't crystal clear about is the nature of features. You might have noticed in my examples that all the features checked by Merge have uppercase names (D,V,T,C), while those triggering Move are lowercase (top, nom). So is this just a happy accident, or is there actually a difference between Merge and Move features? Well, in the summary I explicitly claim that each feature is either a Merge feature or a Move feature. But the truth of the matter is, it doesn't matter. At all.

The standard definition of Minimalist grammars indeed posits a bifurcation between Merge and Move features. So every feature has two binary parameters: i) the type of operation it can trigger, and ii) its polarity. The standard feature notation in the literature succinctly represents these parameter values:

+ -
Merge =f f
Move +f -f

These features also have specific names: f is a category feature, =f a selector feature, +f a licensor feature, and -f a licensee feature. While the notation is fairly simple, it is yet another hurdle for MG newbies, so I decided to go with the simpler + and - notation instead.

There's also a theoretical reason for doing away with the split between Merge and Move features: no such thing is entertained in the syntactic literature. The EPP, for example, is commonly analyzed as a D feature on the T-head that needs to be checked.2 If some DP has already moved to Spec,TP to get its case feature checked, then it can also check the D feature to avoid troubles with the EPP police. If T has no case feature that could attract a DP, then an expletive is merged instead. Therefore a single feature can be checked by either Merge or Move.

Abandoning the Merge/Move dichotomy does not affect the expressive power of MGs, and here's why. It should be clear that MGs without this distinction are at most as powerful as standard MGs, since the latter have more fine-grained control over the structure-building process. But they aren't weaker either. Suppose that you have a lexical item with the feature specification f+ g+ h+ i- j- k-.3
  • The first feature, f+, can only be checked by Merge because Merge is the only way to enter the derivation and you cannot attract a mover until you have entered the derivation.
  • The features g+ and h+ we're less sure about; maybe something will be merged, maybe there is a licit mover somewhere else in the tree that can be attracted.
  • However, i- is definitely a Merge feature. If it were a Move feature, it could only be checked by moving to some higher position. But until the lexical item has been selected, there is no such higher position that could be targeted. Being selected involves the checking of a negative feature, and i- is the first such feature. Hence it must be a Merge feature.
  • Since i- is a negative Merge feature, it will not be the head of the tree created by Merge. Consequently, all features following it can only be checked by Move. This means that both j- and k- must be Move features.
As you can see, only the features between the first positive polarity features and the first negative polarity feature are ambiguous. But this ambiguity hinges on the assumption that the negative counterparts --- for our example g- and h- --- are also ambiguous between Merge and Move. This ambiguity we can avoid by requiring that if a feature is the first negative feature on some lexical item, then there is no other lexical item where it occurs after the first negative feature. In other words, Merge features and Move features use different names. This eliminates the ambiguity for non-first positive polarity features, so that the grammar behaves exactly like one with an explicit split between Merge and Move features.

What Lies Ahead

Hopefully you've all got a decent grasp of the basic MG machinery now; or at the very least, things aren't less clear now. As the semester is winding up again, I've got a busy week ahead of me, but I'll try to crank out at least one post about a topic that is particularly dear to my heart: derivation trees.

Many recent findings suggest that syntax isn't about phrase structure trees but rather derivation trees. Derivation trees satisfy a number of desirable properties such as Chomsky's extension condition, they address issues like the labeling algorithm and the symmetry of Merge, they are linearly unordered, offer an intuitive definition of c-command, provide a novel perspective on the T-model, establish a close connection between Minimalism and Aspects-style Transformational grammar, and are easier to parse. Oh, and whatever phrase structure trees do, derivation trees can do too. What's not to like?

  1. Countercyclic Merge aka Late Merge can be added to MGs, of course, but how this is done and how it affects the formalism is a fairly technical topic.
  2. Btw, what is the current thinking about the EPP? This used to be a fairly big topic in the literature, but I haven't heard much about it since Epstein & Seely (2006) Derivations in Minimalism.
  3. Here's an easy exercise for the theoretically inclined among you: explain why a lexical item cannot occur in a well-formed tree unless all its positive polarity features precede all its negative polarity features.


  1. But you could merge Fred with Mary counter cyclically? Mary still has an active feature opposite in polarity to Fred's (+/-Top). So could these two merge?

    Re EPP: there are a few facts, Hoard Lasnik is the locus of the discussion of these, where something like an EPP feature has utility. BTW, Castillo, Drury and Grohmann were the first, I believe, to take the tack that Epstein and Seely adopted for eliminating the EPP as a primitive. At any rate, the real issue with the EPP is that since the introduction of Agree in addition to Merge, EPP is the name we give for those cases of Agree that require phonological displacement. At this point, it's everywhere! And, oddly, precisely for this reason it becomes uninteresting. This, I believe, is the state of play.

    1. But you could merge Fred with Mary counter cyclically?
      No, because Merge only takes entire trees as arguments, not their subtrees. You give if two trees, and if the heads of these trees have the right feature configuration, it combines them into a bigger tree. So Merge never operates on any elements that are buried deep within the tree, that's the job of Move.

      EPP is the name we give for those cases of Agree that require phonological displacement.
      Yes, I remember now, that was the switch from EPP-features to OCC-features in Beyond Explanatory Adequacy, right? But how exactly does this solve the empirical puzzle that Spec,TP must be filled, while Spec,CP, for instance, can remain empty. I just took a quick glance at On Phases, and the last paragraph outlines how this could be tied to feature inheritance from the C phase head, but then why don't N or V show similar behavior (assuming that D and v are phase heads)?

    2. Thanks, that helps. So, analyzing Move as a species of Merge (I-merge) is not in the cards for this formalism, I assume. In effect, this codes Chomsky's original proposal that operations only involve the "tops" of trees. Good.

      It doesn't solve the original EPP puzzle at all. So far as I know, there is no way to eliminate the necessity for something like an EPP feature, be it a pure EPP or a feature of a feature (e.g. strong case feature) as in the early MP papers (and, if I get it, what Kobele does in his thesis). The question that Epstein and Seely and Castillo et. al.raised was whether one could have pure EPP features and this revolves around whether DPs that raise successively must stop in Spec T of a non-finite clause. If yes, then it looks like EPP holds even without another feature there for it to "strengthen." The data relevant to this are very subtle, roughly what to do with sentences like "John seems to Mary to appear to herself/her to be intelligent" discussed first by Danny Fox. IF, and i stress the conditional here, you find a contrast between the above with 'Mary' anteceding 'her' but not 'hers;f' then there is a binding theoretical explanation possible if 'John' had to transit via the intermediate Spec T for EPP reasons. But, at least for me, the judgment is unmakable, and so the argument form, though clever beyond belief, is not dispositive. Too bad.

  2. So, 1-4 look a lot like HPSG subcat features plus the head feature constraint (may be muddling the technicalities of different versions here): one daughter has a CAT feature V and a subcat feature ; it combines with a tree with CAT feature D to produce a new tree with CAT feature V and subcat feature . Or function application in simple type theory. It is interesting how the same basic ideas keep popping up in superficially different 'theoretical' context.

    1. @Avery: Ed Stabler & Christian Retoré had a paper where they investigated this to some extent. The upshot: this is a formalization of the apparent fact that natural languages are resource sensitive. I'm not sure why your `theoretical' is in scare quotes. Certainly HPSG is (pace Norbert) a quite different theory from MG, which is different (but closer) to TAG which is quite close to CCG, all of which are different from the Lambek Calculus (if this is what you mean by simple type theory).

  3. How does one deal with adjuncts in MG? What features would be needed to allow Merge to work on them?

    1. @karthik: There are many different ways. One can adopt a categorial grammar-like solution, assigning to adjuncts the type =x x, and thereby reducing adjunction to merger. Or one could introduce a new feature type, which checks only itself (asymmetric feature checking); this is the approach of Frey & Gärtner. Tim Hunter has yet another proposal. Thomas Graf has a birds eye view of all of this, which I find very attractive. I would characterize his approach as concentrating more on how adjuncts behave at the level of the derivation, and noting that there are many ways of interpreting derivations so as to achieve this behaviour.

    2. Meaghan Fowlie at UCLA had an interesting paper at MOL last year on adjunction in MGs."Order and Optionality".

    3. Yes, I briefly mentioned Meaghan's work in one of my first posts. The idea is that Merge does not require exact feature checking; if you can select something of category X, then you can also select categories that are, in a certain sense, subsumed by X. This allows you to capture the optionality and iterability of adjuncts while still enforcing certain linear orders.

      I like Meaghan's approach for two reasons:

      1) It takes us away from the MG picture of the category system as a flat unstructured object where all members are on equal footing. Something similar seems to hold for the categories inferred by some of Alex's recent learning algorithms, so there might be interesting connections there.

      2) As you might remember, Merge gives MGs the full power of MSO. But this result hinges of Merge requiring perfect feature matching. Meaghan's work explores hierarchical matching instead, and depending on what the algebraic properties of these hierarchies are, unwanted MSO constraints may no longer be expressible via Merge.

    4. @Thomas: I don't quite understand; isn't the trivial reflexive order a possible ordering on features? In that case MF's approach properly subsumes the standard approach.

    5. @Greg: Without any restriction on what orders you allow the two are completely equivalent. But it seems to me that if you look at the major lexical categories people agree on (N, V, A), they all can be adjoined to. So you could require that your grammar must obey a certain kind of "selection algebra" such that you can never require an exact match like f^+ and f^- because a well-formed grammar must include other LIs with different categories g^- or h^- that can also be selected via f^+. Then you can no longer express certain constraints. What this selection algebra looks like is a purely empirical question.