Monday, June 5, 2017

The wildly successful minimalist program II

In an earlier post (here) I mentioned that I was asked to write about the Minimalist Program (MP) for a volume aimed at comparing various linguistic  “frameworks.” I am personally skeptical about “framework differences.” I personally find them to be more notational variants of common themes (indeed, when pressed, I have been known to complain that H/GPSG, LFG, RG, are all “dialects” of GB) than actual conceptually divergent perspectives. The main reason is that most linguistic theory lacks any real depth and most of the frameworks have the wherewithal to mimic one another’s (ahem) deep insights. So, where others see ideological divergence, I tend to see slight differences in accent.

That said, I have a further gripe about this kind of volume. Even were I to recognize that these different frameworks empirically competed (or should compete) I do not think that MP should be included in the race. MP is not an alternative to GB (or its various dialects) but presupposes that the results of GB are (largely) empirically correct. MP builds on GB results (and its dialects) and aims to conserve these results. Thus, it is not intended (or should not be intended) as a wholesale replacement. For an MPer, GB is wrong the way that Newtonian Gravitation is wrong when compared to General Relativity: the latter theory derives the former as a limiting case. It does not reject the former as misguided, rather it treats it as descriptive rather than fundamental. Indeed, an important part of the argument in favor of General Relativity is that it can derive Newton as a special case. If it could not, that would be an excellent argument that it was fundamentally flawed.

And the MP point? From the MP perspective, as M Antony might have put matters: MPers come to (largely) praise (and incorporate) GB not to bury it. The whole point of MP is to show that the “laws” that GB discovered can be understood in more fundamental MP terms. IMO, it has been less appreciated than it should be how much MP has succeeded in making good on this ambition. So, in this post (and another I will put up later on) I will try to show how far MP has come in making good on its ambitions.

A caveat: if you find GB (and its cousins) to be hopelessly wrongheaded, then you will find a theory that derives its results also hopelessly wrongheaded. If you are one of these, then MP will have, at most, aesthetic interest. If you, like me, take GB to be more or less correct, then MP’s aesthetic virtues will combine with GB’s empirical panache to create a very powerful intellectual rush. It will even create the impression that MP is very much on the right track.

The Merge Hypothesis: Explaining some core features of FL/UG

Here is a list of some characteristic features of FL/UG and its GLs:

(1)  a.   Hierarchical recursion
b.   Displacement (aka, movement)
c.     Gs generate natural formats for semantic interpretation
d.     Reconstruction effects
e.     Movement targets c-commanding positions
f.      No lowering rules
g.     Strict cyclicity
h.     G rules are structure dependent
i.      Antecedents c-command their anaphors
j.      Anaphors never c-command their antecedents (i.e. Principle C effects and Strong Cross Over Effects)
k.     XPs move, X’s don’t, X0s might
l.      Control targets subjects of “defective” (i.e. tns or agreement deficiency) clauses
m.   Control respects the Principle of Minimal Distance
n.     Case and agreement are X0-YP dependencies
o.     Reflexivization and Pronominalization are in complementary distribution
p.    Selection/subcategorization are very local head-head relations
q.     Gs treat arguments and adjuncts differently, with the former less “constrained” than the latter

Note, I am not saying that this exhausts the properties of FL/UG, nor am I saying that all LINGers agree with all of these accurately describe FL/UG.[1] What I am saying is that (1a-q) identify empirically robust(ish) properties of FL/UG and the generative procedures its GLs allow. Put another way, I am claiming (i) that certain facts about human GLs (e.g. that they have hierarchical recursion and movement and binding under c-command and display principle C effects and obligatory control effects, etc.) are empirically well-grounded and (ii) that it is appropriate to ask why FL/UG allows for GLs with these properties and not others. If you buy this, then welcome to the Minimalist Program (MP).

I would go further; not only are the assumptions in (i) reasonable and the question in (ii) appropriate, MP has provided some answers to the question in (ii). One well-known approach to (1a-h), the Merge Hypothesis (MH), unifies all these properties, deriving them from the core generative mechanism Merge. Or more particularly, MH postulates that FL/UG contains a very simple operation (aka, Merge) that suffices to generate unbounded hierarchical structures (1a) and that these Merge generated hierarchical structures will also have the seven additional properties (1b-h). Let’s examine the features of this simple operation and see how it manages to derive these eight properties?

Unbounded hierarchy implies a recursive procedure.[2] MH explains this by postulating a simple operation (“Merge”) that generates the requisite unbounded hierarchical structures. Merge consists of a very simple recursive specification of Syntactic Objects (SO) coupled with the assumption that complex SOs are sets.

(2)  a. If a is a lexical item then a is a SO[3]
b. If a is an SO and b is an SO the Merge(a,b) is an SO

(3)  For a, b, SOs, Merge(a,b)à {a,b}

The inductive step (2b) allows Merge to apply to its own outputs and thus licenses unboundedly “deep” SOs with sets contained within sets contained within sets… The Merge Hypothesis is that the “simplest” conception of this combinatoric operation (the minimum required to generate unbounded hierarchically organized objects) suffices to explain why FL/UG has many of the other properties listed in (1).

In what way is Merge the “simplest” specification of unbounded hierarchy? The operation has three key features: (i) it directly and uniquely targets hierarchy (i.e. the basic complex objects are sets (which are unordered), not strings), (ii) it in no way changes the atomic objects combined in combining them (Inclusiveness), and (iii) it in no way changes the complex objects combined in combining them (Extension). Inclusiveness and Extension together constitute the “No Tampering Condition” (NTC). Thus, Merge recursively builds hierarchy (and only hierarchy) without “tampering” with the inputs in any way save combining them in a very simple way (i.e. just hierarchy no linear information).[4] The key theoretical observation is that if FL/UG has Merge as its primary generative mechanism,[5] then it delivers GLs with properties (1a-h). And if this is right, it provides a proof of concept that it is not premature to ask why FL/UG is structured as it is. In other words, this would be a very nice result given the stated aims of MP. Let’s see how Merge so conceived derives (1a-h).

It should be clear that Gs with Merge can generate unbounded hierarchical dependencies. Given a lexicon containing a finite list of atoms a,b,g,d,… we can, using the definitions in (2) and (3) form structures like (4) (try it!).

            (4)       a. {a, {b, {g, d}}}
                        b.  {{a, b}, {g, d}}
                        c.  {{{a, b}, g}, d}

And given the recursive nature of the operation, we can keep on going ad libitum. So Merge suffices to generate an unbounded number of hierarchically organized syntactic objects.

Merge can also generate structures that model displacement (i.e. movement dependencies). Movement rules code the fact that a single expression can enjoy multiple relations within a structure (e.g. it can be both a complement of a predicate and the subject of a sentence).[6] Merge allows for the derivation of structures that have this property. And this is a very good thing given that we know (due to over 60 years of work in Generative Grammar) that displacement is a key feature of human GLs.

Here’s how Merge does this. Given a structure like (5a) consider how (2) and (3) yield the movement structure (5b). Observe that in (5b), b occurs twice. This can be understood as coding a movement dependency, b being both sister of the SO a and sister of the derived SO {g, {l, {a, b}}}. The derivation is in (6).

(5)       a. {g, {l, {a, b}}}
      b. {b, {g, {l, {a, b}}}}

(6) The SO {g, {l, {a, b}}} and the SO b (within {g, {l, {a, b}}}) merge to      from {b, {g, {l, {a, b}}}}

Note that this derivation assumes that once an SO always an SO. Thus, Merging an SO a to form part of a complex SO b that contains a does not change (tamper with) a’s status as an SO. Because complex SOs are composed of SOs Merge can target a subpart of an SO for further Merging. Thus, NTC allows Merge to generate structures with the properties of movement; structures where a SO is a member of two different “sets.”

Let me emphasize an important point: the key feature that allows Merge to generate movement dependencies (viz. the “once an SO always an SO” assumption) follows from the assumption that Merge does nothing more than take SOs and form them into a unit. It otherwise leaves the combined objects alone. Thus, if some expression is an SO before being merged with another SO then it will retain this property after being Merged given that Merge in no way changes the expressions but for combining them. NTC (specifically the Inclusiveness and Extension Conditions) leaves all properties of the combining expressions intact. So, if a has some property before being combined with b (e.g. being an SO), it will have this property after it is combined with b. As being an SO is a property of an expression, Merging it will not change this and so Merge thus legitimately combine a subpart of an SO to its container.

Before pressing on, a comment: unifying movement and phrase building is an MP innovation. Earlier theories of grammar (and early minimalist theories) treated phrasal dependencies and movement dependencies as the products of entirely different kinds of rules (e.g. phrase structure rules vs transformations/Merge vs Copy+Merge). Merge unifies these two kinds of dependencies and treats them as different outputs of a single operation. As such, the fact that FL yields Gs that contain both unbounded hierarchy and displacement operations is unsurprising. Hierarchy and displacement are flips sides of the same combinatoric coin. Thus, if Merge is the core combinatoric operation FL makes available, then MH explains why FL/UG constructs GLs that have both (1a) and (1b) as characteristic features.

Let’s continue. As should be clear, Merge generated structures like those in (5) and (6) also provides all we need to code the two basic types of semantic dependencies: predicate-argument structures (i.e. thematic dependency) and scope structure. Let me be a bit clearer. The two basic applications of move are those that take two separate SOs and combine them and those that take two SOs with one contained in the other and combines them. The former, E-Merge, is fit for the representation of predicate-argument (aka, thematic structure). The latter, I-Merge, provides an adequate grammatical format for representing operator/variable (i.e. scope) dependencies. There is ample evidence that Gs code for these two kinds of semantic information in simple constructions like Wh-questions. Thus, it is an argument in its favor that Merge as defined in (2) and (3) provides a syntactic format for both. An argument saturates the predicate it E-merges with and scopes over the SO it I-merges with. If this is correct, then Merge provides structure appropriate to explain (1c).

And also (1d). A standard account of Reconstruction Effects (RE) involves allowing a moved expression to function as if it still occupied the position from which it moved. This as-if is redeemed theoretically if the movement site contains a copy of the moved expression. Why does a displaced expression semantically comport itself as if it is in its base position? Because a copy of the moved expression is in the base position. Or, to put this another way, a copy theory of movement would go a long way towards providing the technical wherewithal to account for the possibility of REs. But Merge based accounts of movement like the one above embody a copy theory of movement. Look at (5b): b is in two positions in virtue of being I-merged with its container. Thus, b is a member of the lowest set and the highest. Reconstruction amounts to choosing which “copy” to interpret semantically and phonetically.[7] Reducing movement to I-merge explains why movement should allow REs.[8]

Furthermore, having this option follows from a key assumption concerning Merge. Recall that it eschews tampering. In other words, if movement is a species of Merge then no-tampering requires coding movement with “copies.” To see this contrast how movement is treated in GB.

Within GB, if a moves from its base position to some higher position a trace is left in the launch site. Thus, a GB version of (5b) would look something like (7):

            (7) {b1, {g, {l, {a, t1}}}}

Two features are noteworthy; (i) in place of a copy in the launch site we find a trace and (ii) that trace is co-indexed with the moved expression b.[9] These features are built into the GB understanding of a movement rule. Understood from a Merge perspective, this GB conception is doubly suspect for it violates the Inclusiveness Condition clause of the NTC twice over. It replaces a copy with a trace and it adds indices to the derived structure. Empirically, it also mystifies REs. Traces have no contents. That’s what makes them traces (see note 13). Why are they then able to act as if they did have them? To accommodate such effects GB adds a layer of theory specific to REs (e.g. it invokes reconstruction rules to undo the effects of movement understood in trace theoretic terms). Having copies in place of traces simplifies matters and explains how REs are possible. Furthermore, if movement is a species of Merge (i.e. I-merge) then SOs like (7) are not generable at all as they violate NTC. More specifically, the only kosher way to code movement and obey the NTC is via copies. So, the only way to code movement given a simple conception of syntactic combination like Merge (i.e. one that embodies no tampering) results in a copy theory of movement that serves to rationalize REs without a theoretically bespoke theory of reconstruction. Not bad![10]

So Merge delivers properties (1a-d), and the gifts just keep on coming. It also serves up (6e,f,g) as consequences. This time let’s look at the Extension Condition (EC) codicil to the NTC. EC requires that that inputs to Merge be preserved in the outputs to Merge (any other result would “change” one of the inputs). Thus, if an SO is input to the operation it will be a unit/set in the output as well because Merge does no more than create linguistic units from the inputs. Thus, whatever is a constituent in the input appears as a constituent with the same properties in the output.  This implies (i) that all I-merge is to a c-commanding position, (ii) that lowering rules cannot exist, and (iii) that derivations are strictly cyclic.[11] The conditions that movements be always upwards to c-commanding positions and strictly cyclic thus follows trivially from this simple specification of Merge (i.e. Merge with NTC understood as embodying EC).

An illustration will help clarify this. NTC prohibits deriving structure (8b) from (8a). Here we Merge g with a. The output of this instance of Merge obliterates the fact that {a,b} had been a unit/constituent in (8a), the input to Merge. EC prohibits this. It effectively restricts I-Merge to the root. So restricted, (8b) is not a licit instance of I-Merge (note that {a,b} is not a unit in the output. Nor is (8c) (note that {{a,b},{g,d}} is not a unit in the output). Nor is a derivation that violates the strict cycle (as in (8d)). Only (8e) is a grammatically licit Merge derivation for here all the inputs to the derivation (i.e. g and {{a,b},{g,d}}) are also units in the output of the derivation (i.e. thus the inputs have been preserved (remain unchanged) in the output). Yes a new relation has been added, but no previous ones have been destroyed (i.e. the derivation is info-preserving (viz. monotonic). Repeat the slogan: once an SO always an SO). In deriving (8b-c) one of the inputs (viz. {{a,b},{g,d}}) is no longer a unit in the output and so NTC/EC has been violated.

(8)       a. {{a,b},{g,d}}
b. {{{g,a},b}, {g,d}}
                        c. {{g,{a,b}},{g,d}}
                        d.  {{a, b}, {d, {g,d}}
                        d. {g,{{a,b},{g,d}}}

 In sum, if movement is I-merge subject to NTC then all movement will necessarily be to c-commanding positions, upwards, and strictly cyclic.

It is worth noting that these three features are not particularly recondite properties of FL/UG and find a place in most GG accounts of movement. This makes their seamless derivation within a Merge based account particularly interesting.

Last, we can derive the fact that the rules of grammar are structure dependent ((1h) above), an oft-noted feature of syntactic operations.[12] Why should this be so? Well, if Merge is the sole syntactic operation and then non-structure dependent operations are very hard (impossible?) to state. Why? Because the products of Merge are sets and sets impose no linear requirements on their elements. If we understand a derivation to be a mapping of phrase markers into phrase markers and we understand phrase markers to effectively be sets (i.e. to only specify hierarchical relations) then it is no surprise that rules that leverage linear left-right properties of a string cannot be exploited. They don’t exist for phrase markers eschew this sort of information and thus operations that exploit left/right (i.e. string based) information cannot be defined.  So, why are rules of G structure dependent? Because this is the only structural information that Merge based Gs represent. So, if the basic combinatoric operation that FL/UG allows is Merge, then FL/UGs restriction to structure dependent operations is unsurprising.

This is a good place to pause for a temporary summary: Research in GG over the last 60 years has uncovered several plausible design features of FL/UG. (1a-h) summarizes some uncontroversial examples. All of these properties of FL/UG can be unified if we assume that Merge as outlined in (2) and (3) is the basic combination operation that FL/UG affords. Put simply, the Merge Hypothesis has (1a-h) as consequences.

Let me say the same thing more tendentiously. All agree that a basic feature of FL/UG is that allows for Gs with unbounded hierarchy. A very simple inductive procedure sufficient for specifying this property (1a), also entails many other features of FL/UG (1b-h). What makes this specification simple is that it directly targets hierarchy and requires that the computation be strongly monotonic (embody the NTC). Thus we can explain the fact that FL/UG has these properties by assuming that it embodies a very simple (arguably, the simplest) version of a procedure that any empirically adequate theory of FL/UG would have to embody. Or, given that FL/UG allows for unbounded hierarchical recursion (a non-controversial fact given the fact of Linguistic Productivity), the simplest (or at least, very simple) version of the requisite procedure brings in its train displacement, an adequate format for semantic interpretation, Reconstruction Effects, movement rules that target c-commanding positions, eschew lowering and are strictly cyclic, and G operations that are structure dependent. Thus, if the Merge Hypothesis is true (i.e. if FL/UG has Merge as the basic syntactic operation), it explains why FL/UG has this bushel of properties. In other words, the Merge Hypothesis, provides a plausible first step in answering the basic MP question: why does FL/UG have the properties it has?

Moreover, it is morally certain that something like Merge will be part of any theory of FL/UG precisely because it is so very simple. It is always possible to add bells and whistles to the G rules FL/UG makes available. But any theory hoping to be empirically adequate will contain at least this much structure. After all, what do (2) and (3) specify? They specify a recursive procedure for building hierarchical structures that does nothing but build such structures. Given the fact of Linguistic Productivity and Linguistic Promiscuity any theory of FL/UG will contain at least this much. If it does not contain much more than this much, then (1a-h) results. Not bad.

[1] For example, fans of Dependent Case Theory will reject (1n).
[2] Recall, that LP implies recursion and linguistics has discovered ample evidence that GLs can generate structures of arbitrary depth.
[3] The term lexical item denotes the atoms that are not themselves products of Merge. These roughly correspond to the notion morpheme or word, though these notions are themselves terms of art and it is possible that the naïve notions only roughly corresponds to the technical ones. Every theory of syntax postulates the existence of such atoms. Thus, what is debatable is not their existence but their features.
[4] In my opinion, this line of argument does not require that Merge be the “simplest” possible operation. It suffices that it be natural and simple. The conception of Merge in (2) and (3) meets this threshold.
[5] In the best of all worlds, the sole generative procedure.
[6] A phrase marker is just a list of relations that the combined atoms enjoy. Derivations that map phrase markers into phrase markers allow an expression to enjoy different relations coded in the various relations it enjoys in the varying phrase markers.
[7] Copy is simply a descriptive term here. A more technically accurate variant is “occurrence.” b occurs twice in (5b). The logic, however, does not change.
[8] A full theory of REs would articulate the principles behind choosing which copies to interpret. See Sportiche (forthcoming) for an interesting substantive proposal.
[9] Traces within GB are indexed contentless categories: [1 ec]. 
[10] We could go further: Merge based theory cannot have objects like traces. Traces live on the distinction between Phrase Structure Rules and lexical insertion operations. They are effectively the phrase structure scaffolding without the lexical insertion. But, Merge makes no distinction between structure building and lexical insertion (i.e. between slots and contents). As such, if traces exist, they must be lexical primitives rather than syntactically derived formatives. This would be a very weird conception of traces, inconsistent with the GB rendering in note 12. The same, incidentally, goes for PRO, which we will talk about later on. The upshot: not only would traces violate No tampering, they are indefinable given the “bare phrase structure” nature of movement understood as I-merge.
[11] The first conjunct only holds if there is no inter-arboreal/sidewards movement. For now, let’s assume this to be correct.
[12] For a recent review and defense of the claim see Berwick et. al.


  1. Hi,

    "I personally find them to be more notational variants of common themes (indeed, when pressed, I have been known to complain that H/GPSG, LFG, RG, are all “dialects” of GB) than actual conceptually divergent perspectives."

    There is this old joke: "Your theory is a notational variant of mine and it is wrong." You may check textbooks on LFG and HPSG (eg Bresnan's) to read about movement paradoxes that do not arise or can be solved in LFG/HPSG. The theory about extraction in GPSG and HPSG gets Across the Board extraction right without any further stipulation. And so on. There are differences.

    "The main reason is that most linguistic theory lacks any real depth and most of the frameworks have the wherewithal to mimic one another’s (ahem) deep insights."

    This is just plainly wrong. There are deep implemented analyses of several languages both in LFG and HPSG. You may read about the CoreGram project here.

    It has large scale grammars for German, Danish, Persian, Maltese and smaller ones for Yiddish, Hindi, French, English. All grammars come with a morphology component that is part of the linguistic theory and with semantics (underspecified semantics, Minimal Recursion Semantics). No such thing exists for GB not to talk about MP. You may read up in my grammar theory text book. It lists implementations for all the alternative frameworks. And it also discusses formalization of GB/MP theories. There are papers called "The GB blues" by authors who gave up frustrated since it was impossible to come up with implementations since core notions were not worked out. The most prominent implementations were those by Stabler, but they are toy grammars compared to what is available elsewhere. Please read the sections about complexity and interaction of phenomena in the CoreGram paper or in my GT textbook. As Abney said: the more you cover the more complex it gets.

    You may download a virtual machine and play with the grammars:

    They exist, they are consistent, they work and they share a common core. All the goals that Generative Grammar always had are reached in this project. Please drop the statement that all theories are shallow. It is not true.

    "So, where others see ideological divergence, I tend to see slight differences in accent."

    I agree about move and merge.

    They were part of HPSG right from the beginning, but research style, rhetoric and the overall architecture are quite different. Kayne-style theories would never be entertained in any of the alternative theories.



  2. A side note on the following comment:

    >You may check textbooks on LFG and HPSG (eg Bresnan's) to read about movement paradoxes that do not arise or can be solved in LFG/HPSG. 

    Movement paradoxes aren’t paradoxes in any vaguely modern variety of transformational grammar. The idea of a movement paradox sort of makes sense with reference to varieties of TG where (i) there are no surface filters and (ii) apparent optionality is accounted for via optional transformations. It is then indeed puzzling that you can say “That languages are learnable is captured by this theory” but not “This theory captures that languages are learnable” (given that the only way to account for the availability of both structures with other passive verbs is to posit an optional transformation). However, once you have surface filters and/or featural triggers for movement, there is nothing paradoxical about such cases at all. For example, you could add a “* captures that” filter, or have the verb ‘capture’ refuse to select a CP complement lacking whatever feature it is that triggers the movement.

    1. Looks like "epicycles" to me. In any case the difference remains. You have a transformational theory that needs filters and you have other theories that do not. And what about Haider's cases where you have "[Einen Hund füttern, der Hunger hat,] wird man wohl müssen." but not "weil man wohl einen Hund füttern, der Hunger hat, müssen wird"

  3. It's important to be clear on what the criticism actually is. You seem to be conceding that there is no paradox. So rather than making vague allusions to epicycles, you need to explain exactly how the data that usually come under the heading of "movement paradoxes" pose a problem for any modern variety of TG. This has not been done in the existing literature, so it would be a useful contribution if you could take the time to spell it out.

    As for needing filters, you'll notice that I mentioned a way of handling these cases that doesn't make use of filters. But anyway, every framework has some kind of technical device that the others lack (this is what makes the frameworks different!), so I fail to see how this is a big deal. You might just as well point out that the transformational theory needs transformations while the others do not.

    You'll have to elaborate on the significance of the German examples. What makes them more difficult?

  4. Yes, you are right. The theories are different. This was my main point. So, they are not just notational variants of each other.

    The point about the German data is that there is fronted stuff that cannot originate from one point since it would be ungrammatical there (The relative clause has to follow all verbs, it cannot go into the middle).

  5. The fact that the theories use different technical devices does not show that they aren't notational variants.

    Given your description of the German data, it does not seem any more problematic than the English examples that I referred to. What exactly makes it different and more difficult to deal with?

  6. The story you say is very long. I think this has made me realize this is so much and many people are interested in reading this very well. For those who are interested in reading this, hurry to read this page at all.