In two earlier posts (here and here) I outlined an argument that revolves around the following point: MP forces a concentration on properties of FL/UG and forces a LING (vs LANG) conception of the study of language. Prior to MP, LING and LANG interpretations of the study of language could happily co-exist, each readily and happily interpreting the results of the other’s perspective without threatening one’s favored point of view. I then reviewed how it is that the Merge Hypothesis (MH) advances the goal of explaining some of the basic features of FL/UG by showing them to be by-products of a simple, fundamental recursive procedure. In my view, this result is breathtaking and, given the assumption that earlier GG was empirically roughly correct, it provides an excellent model of how to realize MP ambitions, thereby demonstrating that the big question that MP poses for itself can be fruitfully addressed and answered.
In this post, I want to push this line of argument further, much further. As the old saying goes, you can never know how far you can go until you go a wee bit further. The idea is to entertain an Extended Merge Hypothesis (EMH) which takes non local dependencies to “live on” Merge generated structures (aka, chains). The aim is to unify all non-local dependencies and treat them as chain dependencies. If doable, this will have the effect of unifying several otherwise independent modules of the grammar and showing their properties to effectively reduce to chain properties. As chain properties are just Merge properties, this can be understood as a first step towards reducing all the laws of GB to products of Merge.
The EMH is much less well grounded than the MH, IMO. It includes a hobbyhorse of mine as a special case: the movement theory of construal relations (including both control and binding). I sketch how this might look. Do I believe it? Well, who cares really? That’s a question about personal psychology. Do I think that this is the right way for an MPer to proceed? Yup. If the goal is to explain the properties of FL/UG in the most efficient manner possible, then some extension of MH is the obvious way to go. And, IMO, one should never do the less obvious when the obvious is staring you in the face. So, it is worth doing. Here goes.
The Extended Merge Hypothesis: Explaining more features of FL/UG
There are proposals in the MP literature that push the MH line of argument harder still. Here’s what I mean. MH unifies structure building and movement and in the process explains many central properties of FL/UG. It accomplishes this by reducing phrase building and movement to instances of Merge (i.e. E and I-Merge respectively). We can push this reductive/unificational approach more aggressively by reducing other kinds of dependencies to instances of E or I-Merge. More specifically, if we take seriously the MP proposal that Merge is the unique fundamental combinatoric operation that FL/UG affords, then the strongest minimalist hypothesis is that every grammatical dependency must be mediated by some instance of Merge. Call this the “Extended Merge Hypothesis” (EMH). In what follows, I would like to review some of the MP literature to see what properties EMH might enjoy. My aim is to suggest that this radical unification has properties that nicely track some fundamental features of FL/UG. If this is correct, then it suggests that relentlessly expanding the reach of Merge beyond phrase structure and movement to include construal and case/agreement dependencies has interesting potential payoffs for those interested in MP questions. Once again, it will pay to begin with GB as a jumping off point.
GB is a strongly modular theory in the sense that it describes FL/UG as containing many different kinds of operations and principles. Thus, in GB, we distinguish construal rules like Binding and Control, from movement rules like Wh-movement and Raising. We classify case relations as different from movement dependencies and both from theta forming dependencies. The primitives are different and, more importantly, the operations and principles that condition them are different. The internal modularity of GB style FL/UGs complicates them. This is, theoretically speaking, unfortunate, especially in light of the fact that the different dependencies that the modules specify share many properties in common. That they do so is something an MP account (indeed, any account) would like to explain. EMH proposes that all the different dependencies in GB’s various modules structure are actually only apparently distinct. In reality, they are all different instances of chains formed by I-merge. To put this slightly differently, all of the non-local dependencies GB specifies “live on” chains formed by I-Merge. Let’s consider some examples.
(i) Case Theory
(Chomsky 1993) re-analyzes case dependencies as movement mediated. The argument is in two steps.
The first is a critical observation: the GB theory of case is contrived in that it relies on a very convoluted and unnatural notion of government. Furthermore, the contrived nature of the government account reflects a core assumption: accusative case on the internal argument of a transitive verb (sisterhood to a case assigning head) reflects the core configuration for case licensing. Extending sisterhood so that it covers what we see in nominative case and in ECM requires “generalizing” sisterhood to government, with the resulting government configuration itself covering three very distinct looking configurations (see (9)). (9a) is the configuration for accusative case, (9b) for nominative and (9c) for ECM. It is possible to define a technical notion that treats these three configurations as all instances of a common relation (viz. government), but the resulting definition is very baroque. The obtuseness of the resulting definition argues against treating (9a) as the core case precisely because the resulting unified theory rests on a gerrymandered (and hence theoretically unsatisfying) conception of government.
(9) a. [ V nominal]
b. [ Nominal [ T0-finite…
c. [V [ Nominal [ T0-non-finite…
The second step in the argument is positive: case theory can be considerably streamlined and unified if we take the core instance of case assignment to be exemplified by what we find with nominatives. If this is the core case configuration then case is effectively a spec-head relation between a case marked nominal and a head that licenses it. Generalizing nominative case configurations to cover simple accusative objects and ECM subjects requires treating case as a product of movement (as with nominatives). Thus, simplifying the case module rests on analyzing case dependencies as products of I-merge (i.e as non-local relations between a case assigning head and a nominal that has
(A-)moved to the specifier of this head). The canonical case configuration is (10), with h0 being a case assigning head and the nominal being the head of a I-merge generated A-chain .
(10) [Nominal [ h0…
There is some interesting empirical evidence for this view. First, it predicts that we should find a correlation between case and movability. More specifically, if some position resists movement, then this should have an impact on case. And this seems to be correct. Consider the paradigm in (11):
(11) a. John believes him to be tall
b. *John believes him is tall
c. John was believed t to be tall
d. *John was believed t is tall
Just as A-movement/raising from the subject position of a finite clause is prohibited, so too is accusative case. Why? Because accusative case requires A-movement of him in (11b) to the case head that sits above believe and this kind of movement is prohibited, as (11d) illustrates.
There is a second prediction. Case should condition binding. On the current proposal, movement feeds case. As movement broadens an expressions scope, it thereby increases its binding potential. This second prediction is particularly interesting for it ties together two features of the grammar that GB approaches to accusative case keep separate. Here’s what I mean.
With regard to nominative case assignment, it is well-known that movement “for” case (e.g. raising to subject) can expand an expression’s binding potential. John can bind himself after raising to subject in (12b) but not without moving to the matrix Spec T (12a). Thus some instances of movement for case reasons can feed binding.
(12) a. *It seems to himself1 [(that) John1 is happy]
b. John1 seems to himself1 [ t1 to be happy]
However, the standard GB analysis of case for accusatives has the V assign case to the nominal object in its base position. Thus, whereas case to nominative subjects is a Spec-head relation, case to canonical objects is under sisterhood. If, however, we unify nominative and accusative case and assimilate the latter to what we find with nominatives, then movement will mediate accusative case too. If this involves movement to some position above the external argument’s base position (recall, we are assuming PISH) then accusative case is being assigned in a configuration something like (13). Different epochs of MP have treated this VP external Spec position differently, but the technical details don’t really matter. What does matter is that accusative case is not assigned in the nominal’s base position, but rather in a higher position at the edge of the VP complex. So conceived, accusative case, like nominative, is expected to expand a nominal’s binding domain.
(13) [ Nominal1 [V external argument [V V…
There is evidence supporting this correlation between case value and scope domain. Here is some comparative data that illustrates the point. (14) shows that it is possible for an embedded ECM subject to bind an anaphor in a matrix adjunct, whereas a nominative embedded subject cannot.
(14) a. The lawyer proved [the men1 to be guilty] during each other1’s trials
b. The lawyer proved [the men1 were guilty] during each other1’s trials
(14a) has a sensible reading in which the during phrase modifies the matrix predicate proved. This reading is unavailable in (14b), the only reading being the silly one in which the during phrase modifies were guilty. This is expected if licensing accusative case on the ECM subject requires moving it to the edge of the higher matrix VP (as in (13)). In contrast, licensing nominative leaves the men in the embedded spec T and hence leaves the matrix during phrase outside its c-command domain prohibiting binding of the reciprocal. Thus we see that case value and scope domain co-vary as the MP story leads us to expect.
In sum, unifying case under I-merge rather than government leads to a nicer looking theory and makes novel predications concerning the interaction of case and binding.
Consider next (obligatory) complement control, as exemplified in (15):
(15) a. John1 hopes [PRO1 to go to grad school]
b. John persuaded Mary1 [PRO1 to go to grad school]
Here are two salient properties of these constructions: (i) PRO is restricted to non-finite subject positions and (ii) PRO requires a local c-commanding antecedent. There are GB proposals to account for the first property in terms of binding theory (the so-called “PRO theorem”) but by the early 1990s, its theoretical inadequacies became apparent and PRO’s distributional restrictions were hereafter restricted to the subject of finite clauses by stipulation. As regards selecting the appropriate antecedent, this has the remained the province of a bespoke control module with antecedent selection traced to stipulated properties of the embedding predicate (i.e. the controller is a lexical property of hope and persuade). I believe that it is fair to say that both parts of GB control theory contain a fair bit of ad hocery.
Here’s where MP comes to the rescue. A unified more principled account is available by treating construal relations as “living” on chains (in the case of control, A-chains) generated by I-merge. On this view, the actual structure of the sentences in (15) is provided in (16) with the controller being the head of an A-chain with links/copies in multiple theta positions (annotated below).
(16) a. [ John [ T [JohnQ [ hopes [ John [to [ JohnQ [ go to grad school]]]]]]]]
b. [John T [ John [persuade [MaryQ [ Mary to [ MaryQ [go to grad school]]]]]]]
The unification provides a straightforward account for both facts above: where PRO is found and what its antecedent must be. PROs distribute like links in A-chains. Antecedents for PRO are heads of the chains that contain them. Thus, PRO can appear in positions from which A-movement is licit. Antecedents will be the heads of such licit chains. Observe that this implies that PRO has all the properties of a GB A-trace. Thus it will be part of a chain with proximate links, these links will c-command one another and will be local in the way that links in A-chains are local. In other words, a movement theory of control derives the features of control constructions noted above.
We can go further: if we assume that Merge is the only way to establish grammatical dependencies, then control configurations must have such properties. If PRO is a “trace” then of course it requires an antecedent. If it is a trace, then of course the antecedent must c-command it. If it is an A-trace, then of course the antecedent must be local. And if it is an A-trace then we can reduce the fact that it (typically) appears in the subject position of non-finite clauses to the fact that A-movement is also so restricted:
(17) a. John seems t to like Mary
b. *John seems t will like Mary
c. John expects PRO to like Mary
d. *John expects PRO will like Mary
In sum, if we reduce control dependencies to A-chain dependencies and treat control structures as generated via I-merge it is possible to derive some of its core distributional and interpretive properties. Indeed, I would go further, much further.
First, at this moment, only this approach to control offers a possible explanation for properties of control constructions. All other approaches code the relevant data, they do not, and cannot explain them. And there is a principled reason for this. All other theories on the market treat PRO as a primitive lexical element, rather than the residue of grammatical operations, and hand pack the properties of control constructions into the feature specifications of this primitive lexical element. The analysis amounts to showing that checking these features correlates with tracking the relevant properties. The source of the features, however, is grammatically exogenous and arbitrary. The features posited are exactly those that the facts require, thereby allowing for other features were the facts different. And this robs these accounts of any explanatory potential. From a minimalist perspective, one from which the question of interest is why FL/UG has the properties it appears to have and not others, this treatment of control is nugatory.
Second, the movement approach to control has a very interesting empirical consequence in the context of standard MP theories. Recall that the copy theory is a consequence of Merge based accounts of movement (see here section 1). If control is the product of I-merge then control chains, like other A-chains, have copies as links. If so, part of any G will be procedures for phonetically “deleting” all but one of the copies/occurrences. So the reason that PRO is phonetically null is that copies in A-chains are generally phonetically null. Importantly, whatever the process that “deletes” copies/occurrences will apply uniformly to “A-traces” and to PRO as these are the same kinds of things.
There is well-known evidence that this is correct. Consider contraction effects like those in (18). Wanna contraction is licensed in (18b) across an A-trace and in (18a) across a PRO, but not in (18c) across a A’-trace. This supports the claim that PRO is the residue of A-movement.
(18) a. They want PRO to kiss Mary à They wanna kiss Mary
b. They used t to live in the attic à They usta live in the attic
c. Who do they want t to vanish from the partyà *Who do they wanna vanish from the party.
The I-Merge analysis of control also predicts a possibility that PRO based accounts cannot tolerate. Consider, an I-Merge based account of displacement needs a theory of copy/occurrence pronunciation to account for the fact that most copies/occurrences in many languages are phonetically null. So part of any I-Merge theory of displacement we need a theory of copy deletion. A particularly simple one allows higher copies as well as lower ones to delete, at least in principle. This opens up the following possibility: there are control configurations in which “PRO” c-commands its antecedent. Thus, the movement theory of control in conjunction with standard assumptions concerning deletion in the copy theory of movement allow for the possibility of control constructions which apparently violate Principle C. And these appear to exist. It is possible to find languages in which the controllee c-commands its controller. In other words, configurations like (19b) with the standard control interpretation are perfectly fine and have the interpretation of control sentences like (19a). Both kinds of sentences are derivable given assumptions about I-merge and copy deletion. They derive from the common underlying (19c) with either the bottom copy removed (19a) or the top (19b). On this view, the classical control configuration is simply a limiting case of a more general set of possibilities, that but for phonetic expression, have the same underlying properties.
(19) a. DP1 V [PRO1 VP]
b. PRO1 V [DP1 VP]
c. DP1 V [DP2 VP]
This kind of data argues against classical PRO based accounts (decisively so, in my opinion), while being straightforwardly compatible with movement approaches to control based on I-merge.
One last point and I will move on. Given standard MP assumptions, something like the movement theory of control is a virtual inevitability once one dispenses with PRO. MP theories have dispensed with D-structure as a level of representation, and with this the prohibition against a DP moving into multiple theta positions. Thus, there is nothing to prevent DPs from forming control chains via I-merge given the barest MP assumptions. In this sense, control as movement is predicted as an MP inevitability. It is possible to block this implication, but only by invoking additional ad hoc assumptions. Not only is control as movement compatible with MP, it is what we will find unless we try to specifically avoid it. That we find it, argues for the reduction of control to I-merge.
(iii) Principle A effects
The same logic reduces principle A-effects to I-merge. It’s been a staple of grammatical theory since LGB that A-traces have many of the signature properties of reflexives, as illustrated by the following paradigm:
(20)a. *John seems [t is intelligent]
b. *John believes [himself is intelligent]
c. John seems [to be intelligent]
d. John believes [himself to be intelligent]
e. *John seems it was told t that Sue is intelligent
f. *John wants Mary to tell himself that Sue is intelligent
LGB accountd for this common pattern by categorizing A-traces as anaphors subject to principle A. Thus, for example, in LGB-land the reason that A-movement is always upwards, local and to a c-commanding position is that otherwise the traces left by movement are unbound and violate principle A. What’s important for current purposes is to observe that LGB unifies A-anaphoric binding and movement. The current proposal that all grammatical dependencies are mediated by Merge has the LGB unification as a special case if we assume that A-anaphors are simply the surface reflex of an underlying A-chain. In other words, the data in (20) follow directly if reflexives “live on” A-chains. Given standard assumptions concerning I-merge this could be theoretically accommodated if “copies” can convert to reflexive in certain configurations (as in (25)).
(21)[John believes [John (à himself) to be intelligent]]
Like cases of control, reflexives are simply the morphological residue of I-merge generated occurrences/copies. Put another way, reflexives are the inessential morphological detritus of an underlying process of reflexivization, the latter simply being the formation of an A-chain involving multiple theta links under I-Merge.
If correct, this makes a prediction: reflexives per se are inessential for reflexivization. There is evidence in favor of this assumption. There are languages in which copies can stand in place of reflexive morphemes in reflexive constructions. Thus, structures like (22a) have reflexive interpretations, as witnessed by the fact that they license sloppy identity under ellipsis (22b).
(22) a. John saw John (=John saw himself)
b. John saw John and Mary too (=and Mary say Mary)
Note that given standard assumptions regarding GB binding theory examples like (22) violate principle C.
We also have apparent violations of principle B where pronouns locally c-command and antecede another pronoun (structure in (23)):
(23) Pronoun likes pronoun and Mike too (= and Mike likes Mike )
These puzzles disappear when these are seen as the surface manifestations of reflexivization chains under I-merge. The names and pronouns in object positions in (22) and (23) are just pronounced occurrence/copies. There is a strict identity condition on the copies in copy reflexive constructions, again something that an I-merge view of these constructions would lead one to expect. Interestingly, we find similar copies possible in “control” structures:
(24) a. Mike wants Mike to eat
b. The priest persuaded Mike Mike to go to school
This is to be expected if indeed both Reflexive and Control constructions are mediated by I-merge as proposed here.
Let me sum up. Section 1 showed that we can gain explanatory leverage on several interesting features of FL/UG if we assume that Merge is the fundamental operation for combining lexical atoms into larger hierarchical structures. In this section I argued that one can get leverage on other fundamental properties if we assume that all grammatical dependencies are mediated by Merge. This implies that non-local dependencies are products of I-merge. This section presented evidence that case dependencies, control and reflexivization “live on” A-chains formed by I-merge. I have shown that that this proposal much of the conventional data in a straightforward way and that it is compatible with data that goes against the conventional grain (e.g. backwards control, apparent violations of principle B and C binding effects). Moreover, all of this follows from two very simple assumptions: (i) that Merge is the basic combinatoric operation FL/UG makes available and (ii) that all grammatical dependencies are mediate via Merge. We have seen that the second assumption underwrites I-merge analyses of case, control and reflexivization, which in turn explain some of the key features of case, control and reflexivization (e.g. case impacts scope, PRO occupies the subject position of non-finite clauses and requires a local c-commanding antecedent and that languages that appear to violate conditions B and C are not really doing so). Thus, the Merge hypothesis so extended resolves some apparent paradoxes, accounts for some novel data, covers the standard well trodden empirical ground and (and this is the key part) explains why it is that the FL/UG properties GB identified hold in these cases. The Extended Merge Hypothesis (EMH) explains why these constructions have the general properties they do by reducing them to reflexes of the A-chains they live on generated by I-merge. If this is on the right track, then the EMH goes some way towards answering the question that MP has set for itself.
 But first a warning: many MPers would agree with the gist of what I outlined in section 1. What follows is considerably more (indeed, much more) controversial. I don’t think that this is a problem, but it is a fact. I will not have the space (or, to be honest, the interest) in defending the line of argument that follows. I have written about this elsewhere and tried to argue that, for example, large parts of the rules of construal can be usefully reduced to I-merge. Many have disagreed. For my point here, this may not be that important. My aim here is to see how far this line of argument can go, showing that it is also the best way to go is less important than showing that it is a plausible way to proceed.
 This MP project clearly gains inspiration from the unification of islands under Subjacency, still, in my opinion, one of the great leaps forward in syntactic understanding.
 Defining government so that it could do all required of it in GB was a lively activity in the 80s and 90s.
 At least if we adopt the Predicate Internal Subject Hypothesis which assumes that subjects of finite clauses move to Spec T from some lower predicate internal base position in which the nominals theta role is determined. For discussion see Hornstein et al. 2005
 This abstracts away from the issue of assignment versus checking, a distinction I will ignore in what follows.
 If we assume that structures are labeled and that labels are heads then (10) has the structure in (10’) and we can say that the nominal merges with h0 in virtue of merging with a labeled projection of h. I personally believe that this is the right view, however, this is not the place to go into these matters.
(10’) [h Nominal [h h0…
 That case and movement should correlate is implicit in GB accounts as well. Movement in raising and passive constructions is “for” case. If movement is impossible, the case filter will be violated. However, the logic of the GB account based on government is that movement “for” case was the special. The core case licensing configuration did not require it. Chomsky’s 1993 insight is that if one takes the movement fed licensing examples as indicative of the underlying configuration a more unified theory of case licensing was possible. Later MP approaches to case returned to the earlier GB conception, but, in my view, at a significant cost. Later theory added to Merge an additional G operation, AGREE. AGREE is a long distance operation between a probe and a c-commanded goal. It is possible to unify case licensing configurations using AGREE. However, one looses the correlation between movement and scope unless further assumptions are pressed into service.
Why the shift from the earlier account? I am not sure. So far as I can tell, the first reason was Chomsky’s unhappiness with Spec-X0 relations (Chomsky took these to be suspect in a way that head-complement relations are not (I have no idea why)) and became more suspicious in a label free syntax. If labels are not syntactically active, then there isn’t a local relation between a moved nominal and a case licensing head in a Spec-head configuration. So, if you don’t like labels, you won’t like unifying case under Spec-head. Or, to put this more positively (I am after all a pretty positive fellow), if you are ok with labels (I love them) then you will find obvious attractions in the Spec-head theory.
 As is also the case for AGREE based conceptions, see previous note.
 This reprises the analysis in Lasnik and Saito 1991, which is in turn based on data from Postal. For a more elaborate discussion with further binding data see Hornstein et. al 2005 pp. 133ff.
 Analogous data for the internal argument obtain as well:
(i) John criticized the men during each other’s trials
I leave unpacking the derivations as an exercise.
 Usually via a dedicated diacritic feature (e.g. null case) but sometimes even less elegantly.
 I say “typically” for A-movement is not always so restricted and it appears that in these Gs neither is control. See Boeckx et. al. chapter 4 for discussion.
 Again, space prohibits developing the argument in full detail. The interested reader should consult the Boeckx et. al. presentation.
 My own view is that this is probably a reflex of case theory. See Haddad and Potsdam for a proposal along these lines.
 We will soon see that in some languages many copies can be retained, but let’s put this aside for the moment.
 As Haddad and Potsdam note there actually four possibilities: The higher copy is retained, the lower, either the higher or lower or both. Haddad and Potsdam provides evidence that all four possibilities are in fact realized, a fact that provides further support for treating Control as living in I-Merged generated A-chains plus some deletion process for copies.
 For discussion, see Boeckx et. al. and the review in Haddad and Potsdam.
 Observe, for example, that control is still a chain relation linking two theta positions, the embedded one being the subject of a non-finite clause.
 There are many other properties of control constructions that an I-Merge account explains (e.g. the Principle of Minimal Distance). For the curious, this is reviewed in Boeckx et. al.
 This partly resurrects the old Lees-Klima theory of reflexivization, but without many of the problems. For discussion see Lidz and Idsardi 1998 and Hornstein 2001.
 See Boeckx et. al. 2008 and references therein for discussion.
 This proposal also predicts that backwards reflexive constructions should be possible, and indeed, Polinky and Potsdam (2002) argues that these exist in Tsez.