Thursday, September 28, 2017

Physics envy and the dream of an interpretable theory

I have long believed that physics envy is an excellent foundation for linguistic inquiry (see here). Why? Because physics is the paradigmatic science. Hence, if it is ok to do something there it’s ok to do it anywhere else in the sciences (e.g. including in the cog-neuro (CN) sciences, including linguistics) and if a suggested methodological precept fails for physics, then others (including CNers) have every right to treat it with disdain. Here’s a useful prophylactic against methodological sadists: Try your methodological dicta out on physics before you encumber the rest of us with them. Down with methodological dualism!

However, my envy goes further: I have often looked to (popular) discussions about hot topics in physical theory to fuel my own speculations. And recently I ran across a stimulating suggestive piece about how some are trying to rebuild quantum theory from the ground up using simple physical principles (QTFSPP) (here). The discussion is interesting for me in that it leads to a plausible suggestion for how to enrich minimalist practice. Let me elaborate.

The consensus opinion among physicists is that nobody really understands quantum mechanics (QM). Feynman is alleged to have said that anyone who claims to understand it, doesn’t. And though he appears not to have said exactly this (see here section 9), it's a widely shared sentiment. Nonetheless, QM (or the Standard Theory) is, apparently, the most empirically successful theory ever devised. So, we have a theory that works yet we have no real clarity as to why it works. Some (IMO, rightly) find this a challenge. In response they have decided to reconstitute QM on new foundations. Interestingly, what is described are efforts to recapture the main effects of QM within theories with more natural starting points/axioms. The aim, in other words, is reminiscent of the Minimalist Program (MP): construct theories that have the characteristic signature properties of QM but are grounded in more interpretable axioms. What’s this mean? First let’s take a peak at a couple of examples from the article and then return to MP.

A prominent contrast within physics is between QM and Relativity. The latter (the piece mentions special relativity) is based on two fundamental principles that are easy to understand and from which all the weird and wonderful effects of relativity follow. The two principles are: (1) the speed of light is constant and (2) the laws of physics are the same for two observers moving at constant speed relative to one another (or, no frame of reference is privileged when it comes to doing physics). Grant these two principles and the rest follows. As QTFSPP outs it: “Not only are the axioms simple, but we can see at once what they mean in physical terms” (my emphasis, NH) (5).

Standard theories of QM fail to be physically perspicuous and the aim of reconstructionists is to remedy this by finding principles to ground QM as natural and physically transparent as those that Einstein found for special relativity.  The proposals are fascinating. Here are a couple:

One theorist, Lucien Hardy, proposed focusing on “the probabilities that relate the possible states of a system with the chance of observing each state in a measurement” (6). The proposal consists of a set of probabilistic rules about “how systems can carry information and how they can be combined and interconverted” (7). The claim was that “the simplest possible theory to describe such systems is quantum mechanics, with all its characteristic phenomena such as wavelike interference and entanglement…” (8). Can any MPer fail to reverberate to the phrase “the simplest possible theory”? At any rate, on this approach, QMs is fundamentally probabilistic and how probabilities mediate the conversion between states of the system are taken as the basic of the theory.  I cannot say that I understand what this entails, but I think I get the general idea and how if this were to work it would serve to explain why QM has some of the odd properties it does.

Another reconstruction takes three basic principles to generate a theory of QM. Here’s QTFSPP quoting a physicist named Jacques Pienaar: “Loosely speaking, their principles state that information should be localized in space and time, that systems should be able to encode information about each other, and that every process should be in principle reversible, so that information is conserved.” Apparently, given these assumptions, suitably formalized, leads to theories with “all the familiar quantum behaviors, such as superposition and entanglement.” Pienaar identifies what makes these axioms reasonable/interpretable: “They all pertain directly to the elements of human experience, namely what real experimenters ought to be able to do with systems in their laboratories…” So, specifying conditions on what experimenters can do in their labs leads to systems of data that look QMish. Again, the principles, if correct, rationalize the standard QM effects that we see. Good.

QTFSPP goes over other attempts to ground QM in interpretable axioms. Frankly, I can only follow this, if at all, impressionistically as the details are all quite above my capacities. However, I like the idea. I like the idea of looking for basic axioms that are interpretable (i.e. whose (physical) meaning we can immediately grasp) not merely compact. I want my starting points to make sense too. I want axioms that make sense computationally, whose meaning I can immediately grasp in computational terms. Why? Because, I think that our best theories have what Steven Weinberg described as a kind of inevitability and they have this in virtue of having interpretable foundations. Here’s a quote (see here and links provided there):

…there are explanations and explanations.  We should not be satisfied with a theory that explains the Standard Model in terms of something complicated an arbitrary…To qualify as an explanation, a fundamental theory has to be simple- not necessarily a few short equations, but equations that are based on a simple physical principle…And the theory has to be compelling- it has to give us the feeling that it could scarcely be different from what it is. 

Sensible interpretable axioms are the source of this compulsion. We want first principles that meet the Wheeler T-shirt criteria (after John Wheeler): they make sense and are simple enough to be stated “in one simple sentences that the non sophisticate could understand,” (or, more likely, a few simple sentences). So, with this in mind, what about fundamental starting points for MP accounts. What might these look like?

Well, first, they will not look like the principles of GB. IMO, these principles (more or less) “work,” but they are just too complicated and complex to be fundamental. That’s why GB lacks Weinberg’s inevitability. In fact, it takes little imagination to imagine how GB could “be different.” The central problem with GB principles is that they are ad hoc and have the shape they do precisely because the data happens to have the shape it does. Put differently, were the facts different we could rejigger the principles so that they would come to mirror those facts and not be in any other way the worse off for that. In this regard, GB shares the problem QTFSPP identifies with current QM: “It’s a complex framework, but it’s also an ad hoc patchwork, lacking any obvious physical interpretation or justification” (5).

So, GB can’t be fundamental because it is too much of a hodgepodge. But, as I noted, it works pretty well (IMO, very well actually, though no doubt others would disagree). This is precisely what makes the MP project to develop a simple natural theory with a specified kind of output (viz. a theory with the properties that GB describes) worthwhile.

Ok, given this kind of GB reconstruction project, what kinds of starting points would fit?  I am about to go out on a limb here (fortunately, the fall, when it happens, will not be from a great height!) and suggest a few that I find congenial.

First, the fundamental principle of grammar (FPG)[1]: There is no grammatical action at a distance. What this means is that for two expressions A and B to grammatically interact, they must form a unit. You can see where this is going, I bet: for A and B to G interact, they must Merge.[2]

Second, Merge is the simplest possible operation that unitizes expressions. One way of thinking of this is that all Merge does is make A and B, which are heretofore separate, into a unit. Negatively, this implies that it in no way changes A and B in making them a unit, and does nothing more than make them a unit (e.g. negatively, it imposes no order on A and B as this would be doing more than unitizing them). One can represent this formally as saying that Merge takes A,B and forms the set {A,B}, but this is not because Merge is a set forming operation, but because sets are the kinds of objects that do nothing more than unitize the objects that form the set. They don’t order the elements or change them in any way. Treating Merge (A,B) as creating leaves of a Calder Mobile would have the same effect and so we can say that Merge forms C-mobiles just as well as we can say that it forms sets. At any rate, it is plausible that Merge so conceived is indeed as simple a unitizing operation as can be imagined.

Third, Merge is closed in the domain of its application (i.e. its domain and range are the same). Note that this implies that the outputs of Merge must be analogous to lexical atoms in some sense given the ineluctable assumption that all Merges begin with lexical atoms. The problem is that unitized lexical atoms (the “set”-likeoutputs of Merge) are not themselves lexical atoms and so unless we say something more, Merge is not closed. So, how to close it? By mapping the Merged unit back to one of the elements Merged in composing it. So if we map {A,B} back to A or to B we will have closed the operation in the domain of the primitive atoms. Note that by doing this, we will, in effect, have formed an equivalence class of expressions with the modulus being the lexical atoms. Note, that this, in effect, gives us labels (oh nooooo!), or labeled units (aka, constituents) and endorses an endocentric view of labels. Indeed, closing Merge via labeling in effect creates equivalence classes of expressions centered on the lexical atoms (and more abstract classes if the atoms themselves form higher order classes). Interestingly (at least to me) so closing Merge allows for labeled objects of unbounded hierarchical complexity.[3]

These three principles seem computationally natural. The first imposes a kind of strict locality condition on G interactions. E and I merge adhere to it (and do so strictly given labels). Merge is a simple, very simple, combination operation and closure is a nice natural property for formal systems of (arbitrarily complex) “equations” to have. That they combine to yield unbounded hierarchically structured objects of the right kind (I’ve discussed this before, see here and here) is good as this is what we have been aiming for. Are the principles natural and simple? I think so (at least form a kind of natural computation point of view), but I would wouldn’t I?  At any rate, here’s a stab at what interpretable axioms might look like. I doubt that they are unique, but I don’t really care if they aren’t. The goal is to add interpretatbility to the demands we make on theory, not to insist that there is only one way to understand things.

Nor do we have to stop here. Other simple computational principles include things like the following: (i) shorter dependencies are preferred to longer dependencies (minimality?), (ii) bounded computation is preferred to unbounded computation (phases?), (iii) All features are created equal (the way you discharge/check one is the way you discharge/check all). The idea is then to see how much you get starting from these simple and transparent and computationally natural first principles. If one could derive GBish FLs from this then it would, IMO, go some way towards providing a sense that the way FL is constructed and its myriad apparent complexities are not complexities at all but the unfolding of a simple system adhering to natural computational strictures (snowflakes anyone?). That, at least, is the dream.

I will end here. I am still in the middle of pleasant reverie, having mesmerized myself by this picture. I doubt that others will be as enthralled, but that is not the real point. I think that looking for general interpretable principles on which to found grammatical theory makes sense and that it should be part of any theoretical project. I think that trying to derive the “laws” of GB is the right kind of empirical target. Physics envy prompts this kind of search. Another good reason, IMO, to cultivate it.



[1] I could have said, the central dogma of syntax, but refrained. I have used FPG in talks to great (and hilarious) effect.
[2] Note, that this has the pleasant effect of making AGREE (and probe-goal architectures in general) illicit G operations. Good!
[3] This is not the place to go into this, but the analogy to clock arithmetic is useful. Here too via the notion of equivalence classes it is possible to extend operations defined for some finite base of expressions (1-12) to any number. I would love to be able to say that this is the only feasible way of closing a finite domain, but I doubt that this is so. The other suspects however are clearly linguistically untenable (e.g. mapping any unit to a constant, mapping any unit randomly to some other atom). Maybe there is a nice principle (statable on one simple sentence) that would rule these out.

Thursday, September 21, 2017

Chemical computing

One of the potentially most far reaching and revolutionary ideas in cog-neuro is the suggestion that brain computations are largely intra-neuronal rather than inter-neural. Randy Gallistel has been pushing this idea for the last several years and more and more evidence is pilling up for its feasibility, some of which I have reviewed/mentioned in FoL. Here is another brick in that wall. The authors show how to build "synthetic gene circuits"made up entirely of RNA that form biological "ribocomputing devices." This is a step to "develop biological circuit design strategies that will enable cellular function to be programmed with the same ease hat we program electronic computers." It is obvious to me that should this succeed, it will provide a proof of concept that Gallistel has been on exactly the right track. If correct, it will turn out that brains compute roughly like machines do, and if this happens all the hocus pocus surrounding neural nets and their deep unfathomable properties will, poof, disappear. Recalled as just another wrong turn in the history of science. Personally, I cannot wait.

Wednesday, September 20, 2017

The wildly successful minimalist program III

Here is the last part of the paper on minimalism I have written for a volume comparing various approaches. As I mentioned before I am skeptical that there are various approaches and that if there are minimalism is one of these. I am including the third part mainly because I was too lazy to write anything else this week and because those who care about these issues might find this third part useful (maybe).

In two earlier posts (here and here) I outlined an argument that revolves around the following point: MP forces a concentration on properties of FL/UG and forces a LING (vs LANG) conception of the study of language. Prior to MP, LING and LANG interpretations of the study of language could happily co-exist, each readily and happily interpreting the results of the other’s perspective without threatening one’s favored point of view.  I then reviewed how it is that the Merge Hypothesis (MH) advances the goal of explaining some of the basic features of FL/UG by showing them to be by-products of a simple, fundamental recursive procedure. In my view, this result is breathtaking and, given the assumption that earlier GG was empirically roughly correct, it provides an excellent model of how to realize MP ambitions, thereby demonstrating that the big question that MP poses for itself can be fruitfully addressed and answered.

In this post, I want to push this line of argument further, much further. As the old saying goes, you can never know how far you can go until you go a wee bit further. The idea is to entertain an Extended Merge Hypothesis (EMH) which takes non local dependencies to “live on” Merge generated structures (aka, chains). The aim is to unify all non-local dependencies and treat them as chain dependencies. If doable, this will have the effect of unifying several otherwise independent modules of the grammar and showing their properties to effectively reduce to chain properties. As chain properties are just Merge properties, this can be understood as a first step towards reducing all the laws of GB to products of Merge.

The EMH is much less well grounded than the MH, IMO. It includes a hobbyhorse of mine as a special case: the movement theory of construal relations (including both control and binding). I sketch how this might look. Do I believe it? Well, who cares really?  That’s a question about personal psychology. Do I think that this is the right way for an MPer to proceed? Yup. If the goal is to explain the properties of FL/UG in the most efficient manner possible, then some extension of MH is the obvious way to go. And, IMO, one should never do the less obvious when the obvious is staring you in the face. So, it is worth doing. Here goes.

The Extended Merge Hypothesis: Explaining more features of FL/UG

There are proposals in the MP literature that push the MH line of argument harder still. Here’s what I mean. MH unifies structure building and movement and in the process explains many central properties of FL/UG. It accomplishes this by reducing phrase building and movement to instances of Merge (i.e. E and I-Merge respectively). We can push this reductive/unificational approach more aggressively by reducing other kinds of dependencies to instances of E or I-Merge. More specifically, if we take seriously the MP proposal that Merge is the unique fundamental combinatoric operation that FL/UG affords, then the strongest minimalist hypothesis is that every grammatical dependency must be mediated by some instance of Merge. Call this the “Extended Merge Hypothesis” (EMH). In what follows, I would like to review some of the MP literature to see what properties EMH might enjoy. My aim is to suggest that this radical unification has properties that nicely track some fundamental features of FL/UG. If this is correct, then it suggests that relentlessly expanding the reach of Merge beyond phrase structure and movement to include construal and case/agreement dependencies has interesting potential payoffs for those interested in MP questions. Once again, it will pay to begin with GB as a jumping off point.[1]

GB is a strongly modular theory in the sense that it describes FL/UG as containing many different kinds of operations and principles. Thus, in GB, we distinguish construal rules like Binding and Control, from movement rules like Wh-movement and Raising. We classify case relations as different from movement dependencies and both from theta forming dependencies. The primitives are different and, more importantly, the operations and principles that condition them are different. The internal modularity of GB style FL/UGs complicates them. This is, theoretically speaking, unfortunate, especially in light of the fact that the different dependencies that the modules specify share many properties in common. That they do so is something an MP account (indeed, any account) would like to explain. EMH proposes that all the different dependencies in GB’s various modules structure are actually only apparently distinct. In reality, they are all different instances of chains formed by I-merge.[2] To put this slightly differently, all of the non-local dependencies GB specifies “live on” chains formed by I-Merge. Let’s consider some examples.

(i)             Case Theory

(Chomsky 1993) re-analyzes case dependencies as movement mediated. The argument is in two steps.

The first is a critical observation: the GB theory of case is contrived in that it relies on a very convoluted and unnatural notion of government. Furthermore, the contrived nature of the government account reflects a core assumption: accusative case on the internal argument of a transitive verb (sisterhood to a case assigning head) reflects the core configuration for case licensing. Extending sisterhood so that it covers what we see in nominative case and in ECM requires “generalizing” sisterhood to government, with the resulting government configuration itself covering three very distinct looking configurations (see (9)).  (9a) is the configuration for accusative case, (9b) for nominative and (9c) for ECM. It is possible to define a technical notion that treats these three configurations as all instances of a common relation (viz. government), but the resulting definition is very baroque.[3] The obtuseness of the resulting definition argues against treating (9a) as the core case precisely because the resulting unified theory rests on a gerrymandered (and hence theoretically unsatisfying) conception of government.

            (9)       a. [ V nominal]           
                        b. [ Nominal [ T0-finite…
                        c. [V [ Nominal [ T0-non-finite…
                       
The second step in the argument is positive: case theory can be considerably streamlined and unified if we take the core instance of case assignment to be exemplified by what we find with nominatives. If this is the core case configuration then case is effectively a spec-head relation between a case marked nominal and a head that licenses it. Generalizing nominative case configurations to cover simple accusative objects and ECM subjects requires treating case as a product of movement (as with nominatives).[4] Thus, simplifying the case module rests on analyzing case dependencies as products of I-merge (i.e as non-local relations between a case assigning head and a nominal that has
(A-)moved to the specifier of this head).[5] The canonical case configuration is (10), with h0 being a case assigning head and the nominal being the head of a I-merge generated A-chain .[6]

            (10)     [Nominal [ h0

There is some interesting empirical evidence for this view. First, it predicts that we should find a correlation between case and movability. More specifically, if some position resists movement, then this should have an impact on case.  And this seems to be correct. Consider the paradigm in (11):

(11)     a. John believes him to be tall
b. *John believes him is tall
c. John was believed t to be tall
d. *John was believed t is tall

Just as A-movement/raising from the subject position of a finite clause is prohibited, so too is accusative case. Why? Because accusative case requires A-movement of him in (11b) to the case head that sits above believe and this kind of movement is prohibited, as (11d) illustrates.[7]

There is a second prediction. Case should condition binding. On the current proposal, movement feeds case. As movement broadens an expressions scope, it thereby increases its binding potential. This second prediction is particularly interesting for it ties together two features of the grammar that GB approaches to accusative case keep separate. Here’s what I mean.

With regard to nominative case assignment, it is well-known that movement “for” case (e.g. raising to subject) can expand an expression’s binding potential. John can bind himself after raising to subject in (12b) but not without moving to the matrix Spec T (12a). Thus some instances of movement for case reasons can feed binding.

            (12)     a. *It seems to himself1 [(that) John1 is happy]
                        b. John1 seems to himself1 [ t1 to be happy]

However, the standard GB analysis of case for accusatives has the V assign case to the nominal object in its base position.[8] Thus, whereas case to nominative subjects is a Spec-head relation, case to canonical objects is under sisterhood. If, however, we unify nominative and accusative case and assimilate the latter to what we find with nominatives, then movement will mediate accusative case too. If this involves movement to some position above the external argument’s base position (recall, we are assuming PISH) then accusative case is being assigned in a configuration something like (13). Different epochs of MP have treated this VP external Spec position differently, but the technical details don’t really matter. What does matter is that accusative case is not assigned in the nominal’s base position, but rather in a higher position at the edge of the VP complex. So conceived, accusative case, like nominative, is expected to expand a nominal’s binding domain.

            (13)     [ Nominal1 [V external argument [V V…

There is evidence supporting this correlation between case value and scope domain.[9] Here is some comparative data that illustrates the point. (14) shows that it is possible for an embedded ECM subject to bind an anaphor in a matrix adjunct, whereas a nominative embedded subject cannot.

(14) a. The lawyer proved [the men1 to be guilty] during each other1’s trials
  b. The lawyer proved [the men1 were guilty] during each other1’s trials

(14a) has a sensible reading in which the during phrase modifies the matrix predicate proved. This reading is unavailable in (14b), the only reading being the silly one in which the during phrase modifies were guilty. This is expected if licensing accusative case on the ECM subject requires moving it to the edge of the higher matrix VP (as in (13)). In contrast, licensing nominative leaves the men in the embedded spec T and hence leaves the matrix during phrase outside its c-command domain prohibiting binding of the reciprocal. Thus we see that case value and scope domain co-vary as the MP story leads us to expect.[10]

In sum, unifying case under I-merge rather than government leads to a nicer looking theory and makes novel predications concerning the interaction of case and binding.

(ii)           Control

Consider next (obligatory) complement control, as exemplified in (15):

            (15)     a. John1 hopes [PRO1 to go to grad school]
                        b. John persuaded Mary1 [PRO1 to go to grad school]

Here are two salient properties of these constructions: (i) PRO is restricted to non-finite subject positions and (ii) PRO requires a local c-commanding antecedent. There are GB proposals to account for the first property in terms of binding theory (the so-called “PRO theorem”) but by the early 1990s, its theoretical inadequacies became apparent and PRO’s distributional restrictions were hereafter restricted to the subject of finite clauses by stipulation.[11] As regards selecting the appropriate antecedent, this has the remained the province of a bespoke control module with antecedent selection traced to stipulated properties of the embedding predicate (i.e. the controller is a lexical property of hope and persuade). I believe that it is fair to say that both parts of GB control theory contain a fair bit of ad hocery.

Here’s where MP comes to the rescue. A unified more principled account is available by treating construal relations as “living” on chains (in the case of control, A-chains) generated by I-merge. On this view, the actual structure of the sentences in (15) is provided in (16) with the controller being the head of an A-chain with links/copies in multiple theta positions (annotated below).

            (16)     a. [ John [ T [JohnQ [ hopes [ John [to [ JohnQ [ go to grad school]]]]]]]]
b. [John T [ John [persuade [MaryQ [ Mary to [ MaryQ [go to grad school]]]]]]]

The unification provides a straightforward account for both facts above: where PRO is found and what its antecedent must be. PROs distribute like links in A-chains. Antecedents for PRO are heads of the chains that contain them. Thus, PRO can appear in positions from which A-movement is licit. Antecedents will be the heads of such licit chains. Observe that this implies that PRO has all the properties of a GB A-trace. Thus it will be part of a chain with proximate links, these links will c-command one another and will be local in the way that links in A-chains are local. In other words, a movement theory of control derives the features of control constructions noted above.

We can go further: if we assume that Merge is the only way to establish grammatical dependencies, then control configurations must have such properties. If PRO is a “trace” then of course it requires an antecedent. If it is a trace, then of course the antecedent must c-command it. If it is an A-trace, then of course the antecedent must be local. And if it is an A-trace then we can reduce the fact that it (typically)[12] appears in the subject position of non-finite clauses to the fact that A-movement is also so restricted:

(17)     a. John seems t to like Mary
b. *John seems t will like Mary
c. John expects PRO to like Mary
d. *John expects PRO will like Mary


In sum, if we reduce control dependencies to A-chain dependencies and treat control structures as generated via I-merge it is possible to derive some of its core distributional and interpretive properties.[13] Indeed, I would go further, much further.

First, at this moment, only this approach to control offers a possible explanation for properties of control constructions. All other approaches code the relevant data, they do not, and cannot explain them.  And there is a principled reason for this. All other theories on the market treat PRO as a primitive lexical element, rather than the residue of grammatical operations, and hand pack the properties of control constructions into the feature specifications of this primitive lexical element. The analysis amounts to showing that checking these features correlates with tracking the relevant properties. The source of the features, however, is grammatically exogenous and arbitrary. The features posited are exactly those that the facts require, thereby allowing for other features were the facts different. And this robs these accounts of any explanatory potential. From a minimalist perspective, one from which the question of interest is why FL/UG has the properties it appears to have and not others, this treatment of control is nugatory.

Second, the movement approach to control has a very interesting empirical consequence in the context of standard MP theories.  Recall that the copy theory is a consequence of Merge based accounts of movement (see here section 1). If control is the product of I-merge then control chains, like other A-chains, have copies as links. If so, part of any G will be procedures for phonetically “deleting” all but one of the copies/occurrences. So the reason that PRO is phonetically null is that copies in A-chains are generally phonetically null.[14] Importantly, whatever the process that “deletes” copies/occurrences will apply uniformly to “A-traces” and to PRO as these are the same kinds of things.

There is well-known evidence that this is correct. Consider contraction effects like those in (18). Wanna contraction is licensed in (18b) across an A-trace and in (18a) across a PRO, but not in (18c) across a A’-trace. This supports the claim that PRO is the residue of A-movement.

(18) a. They want PRO to kiss Mary à They wanna kiss Mary
b.  They used t to live in the attic à They usta live in the attic
c. Who do they want t to vanish from the partyà *Who do they wanna vanish from the party.

The I-Merge analysis of control also predicts a possibility that PRO based accounts cannot tolerate. Consider, an I-Merge based account of displacement needs a theory of copy/occurrence pronunciation to account for the fact that most copies/occurrences in many languages are phonetically null. So part of any I-Merge theory of displacement we need a theory of copy deletion. A particularly simple one allows higher copies as well as lower ones to delete, at least in principle.[15] This opens up the following possibility: there are control configurations in which “PRO” c-commands its antecedent.[16] Thus, the movement theory of control in conjunction with standard assumptions concerning deletion in the copy theory of movement allow for the possibility of control constructions which apparently violate Principle C. And these appear to exist.  It is possible to find languages in which the controllee c-commands its controller.[17] In other words, configurations like (19b) with the standard control interpretation are perfectly fine and have the interpretation of control sentences like (19a). Both kinds of sentences are derivable given assumptions about I-merge and copy deletion. They derive from the common underlying (19c) with either the bottom copy removed (19a) or the top (19b). On this view, the classical control configuration is simply a limiting case of a more general set of possibilities, that but for phonetic expression, have the same underlying properties.[18]

(19)     a. DP1 V [PRO1 VP]
b. PRO1 V [DP1 VP]
c. DP1 V [DP2 VP]

This kind of data argues against classical PRO based accounts (decisively so, in my opinion), while being straightforwardly compatible with movement approaches to control based on I-merge.

One last point and I will move on. Given standard MP assumptions, something like the movement theory of control is a virtual inevitability once one dispenses with PRO. MP theories have dispensed with D-structure as a level of representation, and with this the prohibition against a DP moving into multiple theta positions. Thus, there is nothing to prevent DPs from forming control chains via I-merge given the barest MP assumptions. In this sense, control as movement is predicted as an MP inevitability. It is possible to block this implication, but only by invoking additional ad hoc assumptions. Not only is control as movement compatible with MP, it is what we will find unless we try to specifically avoid it. That we find it, argues for the reduction of control to I-merge.[19]

(iii)          Principle A effects

The same logic reduces principle A-effects to I-merge. It’s been a staple of grammatical theory since LGB that A-traces have many of the signature properties of reflexives, as illustrated by the following paradigm:

(20)a. *John seems [t is intelligent]
b. *John believes [himself is intelligent]
c. John seems [to be intelligent]
d. John believes [himself to be intelligent]
e. *John seems it was told t that Sue is intelligent
f. *John wants Mary to tell himself that Sue is intelligent

LGB accountd for this common pattern by categorizing A-traces as anaphors subject to principle A. Thus, for example, in LGB-land the reason that A-movement is always upwards, local and to a c-commanding position is that otherwise the traces left by movement are unbound and violate principle A. What’s important for current purposes is to observe that LGB unifies A-anaphoric binding and movement. The current proposal that all grammatical dependencies are mediated by Merge has the LGB unification as a special case if we assume that A-anaphors are simply the surface reflex of an underlying A-chain. In other words, the data in (20) follow directly if reflexives “live on” A-chains. Given standard assumptions concerning I-merge this could be theoretically accommodated if “copies” can convert to reflexive in certain configurations (as in (25)).[20]

(21)[John believes [John (à himself) to be intelligent]]

Like cases of control, reflexives are simply the morphological residue of I-merge generated occurrences/copies. Put another way, reflexives are the inessential morphological detritus of an underlying process of reflexivization, the latter simply being the formation of an A-chain involving multiple theta links under I-Merge.

If correct, this makes a prediction: reflexives per se are inessential for reflexivization. There is evidence in favor of this assumption. There are languages in which copies can stand in place of reflexive morphemes in reflexive constructions. Thus, structures like (22a) have reflexive interpretations, as witnessed by the fact that they license sloppy identity under ellipsis (22b).[21]

(22)     a. John saw John (=John saw himself)
b. John saw John and Mary too (=and Mary say Mary)

Note that given standard assumptions regarding GB binding theory examples like (22) violate principle C. 

We also have apparent violations of principle B where pronouns locally c-command and antecede another pronoun (structure in (23)):

(23)  Pronoun likes pronoun and Mike too (= and Mike likes Mike )

These puzzles disappear when these are seen as the surface manifestations of reflexivization chains under I-merge. The names and pronouns in object positions in (22) and (23) are just pronounced occurrence/copies. There is a strict identity condition on the copies in copy reflexive constructions, again something that an I-merge view of these constructions would lead one to expect. Interestingly, we find similar copies possible in “control” structures:

(24) a. Mike wants Mike to eat
 b. The priest persuaded Mike Mike to go to school

This is to be expected if indeed both Reflexive and Control constructions are mediated by I-merge as proposed here.[22]

(iv)          Conclusion

Let me sum up. Section 1 showed that we can gain explanatory leverage on several interesting features of FL/UG if we assume that Merge is the fundamental operation for combining lexical atoms into larger hierarchical structures. In this section I argued that one can get leverage on other fundamental properties if we assume that all grammatical dependencies are mediated by Merge. This implies that non-local dependencies are products of I-merge. This section presented evidence that case dependencies, control and reflexivization “live on” A-chains formed by I-merge. I have shown that that this proposal much of the conventional data in a straightforward way and that it is compatible with data that goes against the conventional grain (e.g. backwards control, apparent violations of principle B and C binding effects). Moreover, all of this follows from two very simple assumptions: (i) that Merge is the basic combinatoric operation FL/UG makes available and (ii) that all grammatical dependencies are mediate via Merge. We have seen that the second assumption underwrites I-merge analyses of case, control and reflexivization, which in turn explain some of the key features of case, control and reflexivization (e.g. case impacts scope, PRO occupies the subject position of non-finite clauses and requires a local c-commanding antecedent and that languages that appear to violate conditions B and C are not really doing so). Thus, the Merge hypothesis so extended resolves some apparent paradoxes, accounts for some novel data, covers the standard well trodden empirical ground and (and this is the key part) explains why it is that the FL/UG properties GB identified hold in these cases. The Extended Merge Hypothesis (EMH) explains why these constructions have the general properties they do by reducing them to reflexes of the A-chains they live on generated by I-merge. If this is on the right track, then the EMH goes some way towards answering the question that MP has set for itself.




[1] But first a warning: many MPers would agree with the gist of what I outlined in section 1. What follows is considerably more (indeed, much more) controversial. I don’t think that this is a problem, but it is a fact. I will not have the space (or, to be honest, the interest) in defending the line of argument that follows. I have written about this elsewhere and tried to argue that, for example, large parts of the rules of construal can be usefully reduced to I-merge. Many have disagreed. For my point here, this may not be that important. My aim here is to see how far this line of argument can go, showing that it is also the best way to go is less important than showing that it is a plausible way to proceed.
[2] This MP project clearly gains inspiration from the unification of islands under Subjacency, still, in my opinion, one of the great leaps forward in syntactic understanding.
[3] Defining government so that it could do all required of it in GB was a lively activity in the 80s and 90s.
[4] At least if we adopt the Predicate Internal Subject Hypothesis which assumes that subjects of finite clauses move to Spec T from some lower predicate internal base position in which the nominals theta role is determined. For discussion see Hornstein et al. 2005
[5] This abstracts away from the issue of assignment versus checking, a distinction I will ignore in what follows.
[6] If we assume that structures are labeled and that labels are heads then (10) has the structure in (10’) and we can say that the nominal merges with h0 in virtue of merging with a labeled projection of h. I personally believe that this is the right view, however, this is not the place to go into these matters.
            (10’)    [h Nominal [h h0
[7] That case and movement should correlate is implicit in GB accounts as well. Movement in raising and passive constructions is “for” case. If movement is impossible, the case filter will be violated. However, the logic of the GB account based on government is that movement “for” case was the special. The core case licensing configuration did not require it. Chomsky’s 1993 insight is that if one takes the movement fed licensing examples as indicative of the underlying configuration a more unified theory of case licensing was possible. Later MP approaches to case returned to the earlier GB conception, but, in my view, at a significant cost. Later theory added to Merge an additional G operation, AGREE. AGREE is a long distance operation between a probe and a c-commanded goal. It is possible to unify case licensing configurations using AGREE. However, one looses the correlation between movement and scope unless further assumptions are pressed into service.
Why the shift from the earlier account? I am not sure. So far as I can tell, the first reason was Chomsky’s unhappiness with Spec-X0 relations (Chomsky took these to be suspect in a way that head-complement relations are not (I have no idea why)) and became more suspicious in a label free syntax. If labels are not syntactically active, then there isn’t a local relation between a moved nominal and a case licensing head in a Spec-head configuration. So, if you don’t like labels, you won’t like unifying case under Spec-head. Or, to put this more positively (I am after all a pretty positive fellow), if you are ok with labels (I love them) then you will find obvious attractions in the Spec-head theory.
[8] As is also the case for AGREE based conceptions, see previous note.
[9] This reprises the analysis in Lasnik and Saito 1991, which is in turn based on data from Postal. For a more elaborate discussion with further binding data see Hornstein et. al 2005 pp. 133ff.
[10] Analogous data for the internal argument obtain as well:
(i)             John criticized the men during each other’s trials
I leave unpacking the derivations as an exercise.
[11] Usually via a dedicated diacritic feature (e.g. null case) but sometimes even less elegantly.
[12] I say “typically” for A-movement is not always so restricted and it appears that in these Gs neither is control. See Boeckx et. al. chapter 4 for discussion.
[13] Again, space prohibits developing the argument in full detail. The interested reader should consult the Boeckx et. al. presentation.
[14] My own view is that this is probably a reflex of case theory. See Haddad and Potsdam  for a proposal along these lines.
[15] We will soon see that in some languages many copies can be retained, but let’s put this aside for the moment.
[16] As Haddad and Potsdam note there actually four possibilities: The higher copy is retained, the lower, either the higher or lower or both. Haddad and Potsdam provides evidence that all four possibilities are in fact realized, a fact that provides further support for treating Control as living in I-Merged generated A-chains plus some deletion process for copies.
[17] For discussion, see Boeckx et. al. and the review in Haddad and Potsdam.
[18] Observe, for example, that control is still a chain relation linking two theta positions, the embedded one being the subject of a non-finite clause.
[19] There are many other properties of control constructions that an I-Merge account explains (e.g. the Principle of Minimal Distance). For the curious, this is reviewed in Boeckx et. al.
[20] This partly resurrects the old Lees-Klima theory of reflexivization, but without many of the problems. For discussion see Lidz and Idsardi 1998 and Hornstein 2001.
[21] See Boeckx et. al. 2008 and references therein for discussion.
[22] This proposal also predicts that backwards reflexive constructions should be possible, and indeed, Polinky and Potsdam (2002) argues that these exist in Tsez.