Faculty of Language: Constructions

Showing posts with label Constructions. Show all posts

Friday, December 2, 2016

What's a minimalist analysis

The proliferation of handbooks on linguistics identifies a gap in the field. There are so many now that there is an obvious need for a handbook of handbooks consisting of papers that are summaries of the various handbook summaries. And once we take this first tiny recursive step, as you all know, sky’s the limit.

You may be wondering why this thought crossed my mind. Well, it’s because I’ve been reading some handbook papers recently and many of those that take a historical trajectory through the material often have a penultimate section (before a rousing summary conclusion) with the latest minimalist take on the relevant subject matter. So, we go through the Standard Theory version of X, the Extended Standard Theory version, the GB version and finally an early minimalist and late minimalist version of X. This has naturally led me to think about the following question: what makes an analysis minimalist? When is an analysis minimalist and when not? And why should one care?

Before starting let me immediately caveat this. Being true is the greatest virtue an analysis can have. And being minimalist does not imply that an analysis is true. So not being minimalist is not in itself necessarily a criticism of any given proposal. Or at least not a decisive one. However, it is, IMO, a legit question to ask of a given proposal whether and how it is minimalist. Why? Well because I believe that Darwin’s Problem (and the simplicity metrics it favors) is well-posed (albeit fuzzy in places) and therefore that proposals dressed in assumptions that successfully address it gain empirical credibility. So, being minimalist is a virtue and suggestive of truth, even if not its guarantor.[1]

Perhaps I should add that I don’t think that anything guarantees truth in the empirical sciences and that I also tend to think that truth is the kind of virtue that one only gains slantwise. What I mean by this is that it is the kind of goal one attains indirectly rather than head on. True accounts are ones that economically cover reasonable data in interesting ways, shed light on fundamental questions and open up new avenues for further research.[2] If a story does all of that pretty well then we conclude it is true (or well on its way to it). In this way truth is to theory what happiness is to life plans. If you aim for it directly, you are unlikely to get it. Sort of like trying to fall asleep. As insomniacs will tell you, that doesn’t work.

That out of the way, what are the signs of a minimalist analysis (MA)? We can identify various grades of minimalist commitment.

The shallowest is technological minimalism. On this conception an MA is minimalist because it expresses its findings in terms of ‘I-merge’ rather than ‘move,’ ‘phases’ rather than ‘bounding nodes’/‘barriers,’ or ‘Agree’ rather than ‘binding.’ There is nothing wrong with this. But depending on the details there need not be much that is distinctively minimalist here. So, for example, there are versions of phase theory (so far as I can tell, most versions) that are isomorphic to previous GB theories of subjacency, modulo the addition of v as a bounding node (though see Barriers). The second version of the PIC (i.e. where Spell Out is delayed to the next phase) is virtually identical to 1-subjacency and the number of available phase edges is identical to the specification of “escape hatches.”

Similarly for many Agree based theories of anaphora and/or control. In place of local coindexing we express the identical dependency in terms of Agree in probe/goal configurations (antecedents as probes, anaphors as goals)[3] subject to some conception of locality. There are differences, of course, but largely the analyses inter-translate and the novel nomenclature serves to mask the continuity with prior analyses of the proposed account. In other words, what makes such analyses minimalist is less a grounding in basic features of the minimalist program, then a technical isomorphism between current and earlier technology. Or, to put this another way, when successful, such stories tell us that our earlier GB accounts were no less minimalist than our contemporary ones. Or, to put this yet another way, our current understanding is no less adequate than our earlier understanding (i.e. we’ve lost nothing by going minimalist). This is nice to know, but given that we thought that GB left Darwin’s Problem (DP) relatively intact (this being the main original motivation for going Minimalist (i.e. beyond explanatory adequacy) then analyses that are effectively the same as earlier GB analyses likely leave DP in the same opaque state. Does this mean that translating earlier proposals into current idiom is useless? No. But such translations often make a modest contribution to the program as a whole given the suppleness of current technology.

There is a second more interesting kind of MA. It starts from one of the main research projects that minimalism motivates. Let’s call this “reductive” or “unificational minimalism” (UM). Here’s what I mean.

The minimalist program (MP) starts from the observation that FL is a fairly recent cognitive novelty and thus what is linguistically proprietary is likely to be quite meager. This suggests that most of FL is cognitively or computationally general, with only a small linguistically specific residue. This suggests a research program given a GB backdrop (see here for discussion). Take the GB theory of FL/UG to provide a decent effective theory (i.e. descriptively pretty good but not fundamental) and try to find a more fundamental one that has these GB principles as consequences.[4] This conception provides a two pronged research program: (i) eliminate the internal modularity of GB (i.e. show that the various GB modules are all instances of the same principles and operations (see here)) and (ii) show that of the operations and principles that are required to effect the unification in (i), all save one are cognitively and/or computationally generic. If we can successfully realize this research project then we have a potential answer to DP: FL arose with the adventitious addition of the linguistically proprietary operation/principle to the cognitive/computational apparatus the species antecedently had.

That’s the main contours of the research program. UF concentrates on (i) and aims to reduce the different principles and operations within FL to the absolute minimum. It does this by proposing to unify domains that appear disparate on the surface and by reducing G options to an absolute minimum.[5] A reasonable heuristic for this kind of MA is the idea that Gs never do things in more than one way (e.g. there are not two ways (viz. via matching or raising) to form relative clauses). This is not to deny different surface patterns obtain, only that they are not the products of distinctive operations.

Let me put this another way: UM takes the GB disavowal of constructions to the limit. GB eschewed constructions in that it eliminated rules like Relativization and Topicalization, seeing both as instances of movement. However, it did not fully eliminate constructions for it proposed very different basic operations for (apparently) different kinds of dependencies. Thus, GB distinguishes movement from construal and binding from control and case assignment from theta checking. In fact, each of the modules is defined in terms of proprietary primitives, operations and constraints. This is to treat the modules as constructions. One way of understanding UM is that it is radically anti-constructivist and recognizes that all G dependencies are effected in the same way. There is, grammatically speaking, only ever one road to Rome.

Some of the central results of MP are of this ilk. So, for example, Chomsky’s conception of Merge unifies phrase structure theory and movement theory. The theory of case assignment in the Black Book unifies case theory and movement theory (the latter being just a specific reflex of movement) in much the way that move alpha unifies question formation, relativization, topicalization etc. The movement theory of control and binding unifies both modules with movement. The overall picture then is one in which binding, structure building, case licensing, movement, and control “reduce” to a single computational basis. There aren’t movement rules versus phrase structure rules versus binding rules versus control rules versus case assignment rules. Rather these are all different reflexes of a single Merge effected dependency with different features being licensed via the same operation. It is the logic of On wh movement writ large.

There are other examples of the same “less is more” logic: The elimination of D-structure and S-structure in the Black Book, Sportiche’s recent proposals to unify promotion and matching analyses of relativization, unifying reconstruction and movement via the copy theory of movement (in turn based on a set theoretic conception of Merge), Nunes theory of parasitic gaps, and Sportiche’s proposed elimination of late merger to name five. All of these are MAs in the specific sense that they aim to show that rich empirical coverage is compatible with a reduced inventory of basic operations and principles and that the architecture of FL as envisioned in GB can be simplified and unified thereby advancing the idea that a (one!) small change to the cognitive economy of our ancestors could have led to the emergence of an FL like the one that we have good (GB) evidence to think is ours. Thus, MAs of the UM variety clearly provide potential answers to the core minimalist DP question and hence deserve their ‘minimalist’ modifier.

The minimalist ambitions can be greater still. MAs have two related yet distinct goals. The first is to show that svelter Gs do no worse than the more complex ones that they replace (or at least don’t do much worse).[6] The second is to show that they do better. Chomsky contrasted these in chapter three of the Black Book and provided examples illustrating how doing less with more might be possible. I would like to mention a few by way of illustration, after a brief running start.

Chomsky made two methodological observations. First, if a svelter account does (nearly) empirically as well as a grosser one then it “wins” given MP desiderata. We noted why this was so above regarding DP, but really nobody considers Chomsky’s scoring controversial given that it is a lead footed application of Ockham. Fewer assumptions are always better than more for the simple reason that for a given empirical payoff K an explanation based on N assumptions leaves each assumption with greater empirical justification than one based on N+1 assumptions. Of course, things are hardly ever this clean, but often they are clean enough and the principle is not really contestable.[7]

However, Chomsky’s point extends this reasoning beyond simple assumption counting. For MP it’s not only the number of assumptions that matter but their pedigree. Here’s what I mean. Let’s distinguish FL from UG. Let ‘FL’ designate whatever allows the LAD to acquire a particular G_L based on PLD_L. Let ‘UG’ designate those features of FL that are linguistically proprietary (i.e. not reflexes of more generic cognitive or computational operations). A MA aims to reduce the UG part of FL. In the best case, it contains a single linguistically specific novelty.[8] So, it is not just a matter of counting assumptions. Rather what matters is counting UG (i.e. linguistically proprietary) assumptions. We prefer those FLs with minimal UGs and minimal language specific assumptions.

An example of this is Chomsky’s arguments against D-structure and S-structure as internal levels. Chomsky does not deny that Gs interface with interpretive interfaces, rather he objects to treating these as having linguistically special properties.[9] Of course, Gs interface with sound and meaning. That’s obvious (i.e. “conceptually necessary”). But this assumption does not imply that there need be anything linguistically special about the G levels that do the interfacing beyond the fact that they must be readable by these interfaces. So, any assumption that goes beyond this (e.g. the theta criterion) needs defending because it requires encumbering FL with UG strictures that specify the extras required.

All of this is old hat, and, IMO, perfectly straightforward and reasonable. But it points to another kind of MA: one that does not reduce the number of assumptions required for a particular analysis, but that reapportions the assumptions between UGish ones and generic cognitive-computational ones. Again, Chomsky’s discussions in chapter 3 of the Black Book provide nice examples of this kind of reasoning, as does the computational motivation for phases and Spell Out.

Let me add one more (and this will involve some self referentiality). One argument against PRO based conceptions of (obligatory) control is that they require a linguistically “special” account of the properties of PRO. After all, to get the trains to run on time PRO must be packed with features which force it to be subject to the G constraints it is subject to (PRO needs to be locally minimally bound, occurs largely in non-finite subject positions, and has very distinctive interpretive properties). In other words, PRO is a G internal formative with special G sensitive features (often of the possibly unspecified phi-varierty) that force it into G relations. Thus, it is MP problematic.[10] Thus a proposal that eschews PRO is prima facie an MA story of control for it dispenses with the requirement that there exists a G internal formative with linguistically specific requirements.[11] I would like to add, precisely because I have had skin in this game, that this does not imply that PRO-less accounts of control are correct or even superior to PRO based conceptions. No! But it does mean that eschewing PRO has minimalist advantages over accounts that adopt PRO as they minimize the UG aspects of FL when it comes to control.

Ok, enough self-promotion. Back to the main point. The point is not merely to count assumptions but to minimize UGish ones. In this sense, MAs aim to satisfy Darwin more than Ockham. A good MA minimizes UG assumptions and does (about) as well empirically as more UG encumbered alternatives. A good sign that a paper is providing an MA of this sort, is manifest concern to minimize the UG nature of the principles assumed.

Let’s now turn to (and end with) the last most ambitious MA: it is one that not merely does (almost) as well as more UG encumbered accounts, but does better. How can one do better. Recall that we should expect MAs to be more empirically brittle than less minimalist alternatives given that MP assumptions generally restrict an account’s descriptive apparatus.[12] So, how can a svelter account do better? It does so by having more explanatory oomph (see here). Here’s what I mean.

Again, the Black Book provides some examples.[13] Recall Chomsky’s discussion of examples like (1) with structures like (2):

(1) John wonders how many pictures of himself Frank took

(2) John wonders [[how many pictures of himself] Frank took [how many pictures of himself]]

The observation is that (1) has an idiomatic reading just in case Frank is the antecedent of the reflexive.[14] This can be explained if we assume that there is no D-structure level or S-structure level. Without these binding and idiom interpretation must be defined over that G level that is input to the CI interface. In other words, idiom interpretation and binding are computed over the same representation and we thus expect that the requirements of each will affect the possibilities of the other.

More concretely, to get the idiomatic reading of take pictures requires using the lower copy of the wh phrase. To get the John as potential antecedent of the reflexive requires using the higher copy. If we assume that only a single copy can be retained on the mapping to CI, this implies that if take pictures of himself is understood idiomatically, Frank is the only available local antecedent of the reflexive. The prediction relies on the assumption that idiom interpretation and binding exploit the same representation. Thus, by eliminating D-structure, the theory can no longer make D-structure the locus of idiom interpretation and by eliminating S-structure, the theory cannot make it the locus of binding. Thus by eliminating both levels the proposal predicts a correlation between idiomaticity and reflexive antecedence.

It is important to note that a GBish theory where idioms are licensed at D-structure and reflexives are licensed at S-structure (or later) is compatible with Chomsky’s reported data, but does not predict it. The relevant data can be tracked in a theory with the two internal levels. What is missing is the prediction that they must swing together. In other words, the MP story explains what the non-MP story must stipulate. Hence, the explanatory oomph. One gets more explanation with less G internal apparatus.

There are other examples of this kind of reasoning, but not that many. One of the reasons I have always liked Nunes’ theory of parasitic gaps is that it explains why they are licensed only in overt syntax. One of the reasons that I like the Movement Theory of Control is that it explains why one finds (OC) PRO in the subject position of non-finite clauses. No stipulations necessary, no ad hoc assumptions concerning flavors of case, no simple (but honest) stipulations restricting PRO to such positions. These are minimalist in a strong sense.

Let’s end here. I have tried to identify three kinds of MAs. What makes proposals minimalist is that they either answer or serve as steps towards answering the big minimalist question: why do we have the FL we have? How did FL arise in the species? That’s the question of interest. It’s not the only question of interest, but it is an important one. Precisely because the question is interesting it is worth identifying whether and in what respects a given proposal might be minimalist. Wouldn’t it be nice if papers in minimalist syntax regularly identified their minimalist assumptions so that we could not not only appreciate their empirical virtuosity, but could also evaluate their contributions to the programmatic goals.

[1] If pressed (even slightly) I might go further and admit that being minimalist is a necessary condition of being true. This follows if you agree that the minimalist characterization of DP in the domain of language is roughly accurate. If so, then true proposals will be minimalist for only such proposals will be compatible with the facts concerning the emergence of FL. That’s what I would argue, if pressed.

[2] And if this is so, then the way one arrives at truth in linguistics will plausibly go hand in hand with providing answers to fundamental problems like DP. This, proposals that are minimalist may thereby have a leg up on truth. But, again, I wouldn’t say this unless pressed.

[3] The agree dependency here established accompanied by a specific rule of interpretation whereby agreement signals co-valuation of some sort. This, btw, is not a trivial extra.

[4] This parallels the logic of On wh movement wrt islands and bounding theory. See here for discussion.

[5] Sportiche (here) describes this as eliminating extrinsic theoretical “enrichments” (i.e. theoretical additions motivated entirely by empirical demands).

[6] Note a priori one expects simpler proposals to be empirically less agile than more complex ones and to therefore cover less data. Thus, if a cut down account gets roughly the same coverage this is a big win for the more modest proposal.

[7] Indeed, it is often hard to individuate assumptions, especially given different theoretical starting points. However (IMO surprisingly), this is often doable in practice so I won’t dwell on it here.

[8] I personally don’t believe that it can contain less for it would make the fact that nothing does language like humans do a complete mystery. This fact strongly implies (IMO) that there is something UGishly special about FL. MP reasoning implies that this UG part is very small, though not null. I assume this here.

[9] That’s how I understand the proposal to eliminate G internal levels.

[10] It is worth noting that this is why PRO in earlier theories was not a lexical formative at all, but the residue of the operation of the grammar. This is discussed in the last chapter here if you are interested in the details.

[11] One more observation: this holds even if the proposed properties of PRO are universal, i.e. part of UG. The problem is not variability but linguistic specificity.

[12] Observe that empirical brittleness is the flip side of theoretical tightness. We want empirically brittle theories.

[13] The distinction between these two kinds of MAs is not original with me but clearly traces to the discussion in the Black Book.

[14] I report the argument. I confess that I do not personally get the judgments described. However, this does not matter for purposes of illustration of the logic.

Monday, June 20, 2016

Classical case theory

So what’s classical (viz. GB) Case Theory (CCT) a theory of? Hint: it’s not primarily about overt morphological case, though given some ancillary assumptions, it can be (and has been) extended to cover standard instances of morphological case in some languages. Nonetheless, as originally proposed by Jean Roger Vergnaud (JRV), it has nothing whatsoever to do with overt case. Rather, it is a theory of (some of) the filters proposed in “Filters and Control” by Chomsky and Lasnik (F&C).

What do the F&C filters do? They track the distribution of overt nominal expressions. (Overt) D/NPs are licit in some configurations and not in others. For example, they shun the subject positions of non-finite clauses (modulo ECM), they don’t like being complement to Ns or As, nor complements to passivized verbs. JRV’s proposal, outlined in his famous letter to Chomsky and Lasnik, is that it is possible to simplify the F&C theory if we reanalyze the key filters as case effects; specifically if we assume that nominals need case and that certain heads assign case to nominals in their immediate vicinity. Note, that JRV understood the kind of case he was proposing to be quite abstract. It was certainly not something evident from the surface morphology of a language. How do I know? Because F&C filters, and hence JRV’s CCT, was used to explain the distribution of all nominals in English and French and these two languages display very little overt morphology on most nominals. Thus, if CCT was to supplant filters (which was the intent) then the case at issue had to be abstract. The upshot: CCT always trucked in abstract case.

So what about morphologically overt case? Well, CCT can accommodate it if we add the assumption that abstract case, which applies universally to all nominals in all Gs to regulate their distribution, is morphologically expressed in some Gs (a standard GG maneuver). Do this and abstract case can serve as the basis of a theory of overt morphological case. But, and this is critical, the assumption that the mapping from abstract to concrete case can be phonetically pretty transparent is not a central feature of the CCT.[1]

I rehearse this history because it strikes me that lots of discussion of case nowadays thinks that CCT is a theory of the distribution of morphological case marking on nominals. Thus, it is generally assumed that a key component of CCT assigns nominative case to nominals in finite subject positions and accusative to those in object slots etc. From early on, many observed that this simple morphological mapping paradigm is hardly universal. This has led many to conclude that CCT must be wrong. However, this only follows if this is what CCT was a theory of, which, I noted above, it was not.

Moreover, and this is quite interesting actually, so far as I can tell the new case theorists (the ones that reject the CCT) have little to say about the topic CCT or C&F’s filters tried to address. Thus, for example, Marantz’s theory of dependent case (aimed at explaining the morphology) is weak on the distribution of overt nominals. This suggests that CCT and the newer Morphological Case Theory (MCT) are in complimentary distribution: what the former takes as its subject matter and what the latter takes as its subject matter fail to overlap. Thus, at least in principle, there is room for both accounts; both a theory of abstract case (CCT) and a theory of morphological case (MCT). The best theory, of course, would be one in which both types of case are accommodated in a single theory (this is what the extension of the CCT to morphology hoped to achieve). However, were these two different, though partially related systems this would be an acceptable result for many purposes.[2]

Let’s return to the F&C filters and the CCT for a moment. What theoretically motivated them? We know what domain of data they concerned themselves with (the distribution of overt nominal).[3] But why have any filters at all?

F&C was part of the larger theoretical project of simplifying transformations. In fact, it was part of the move from construction based G rules to rules like move alpha (MA). Pre MA, transformations were morpheme sensitive and construction specific. We had rules like relative clause formation and passive and question formation. These rules applied to factored strings which met the rules’ structural conditions (SD). The rules applied to these strings to execute structural changes (SC). The rules applied cyclically, could be optional or obligatory and could be ordered wrt one another (see here for some toy illustrations). The theoretical simplification of the transformational component was the main theoretical research project from the mid 1970s to the early-mid 1980s. The simplification amounted to factoring out the construction specificity of earlier rules, thereby isolating the fundamental displacement (aka, movement) property. MA is the result. It is the classical movement transformations shorn of their specificity. In technical terms, MA is a transformation without specified SDs or SCs. It is a very very simple operation and was a big step towards the merge based conception of structure and movement that many adopt today.

How were filters and CCT part of this theoretical program? Simplifying transformations by eliminating SDs and SCs makes it impossible to treat transformations as obligatory. What would it mean to say that a rule like MA is obligatory? Obliged to do what exactly? So adopting MA means having optional movement transformations. But optional movement of anything anywhere (which is what MA allows) means wildly overgenerating. To regulate this overgeneration without SDs and SCs requires something like filters. Those in F&C regulated the distribution of nominals in the context of a theory in which MA could freely move them around (or not!). Filters make sure that these vacate the wrong places and end up in the right ones. You don’t move for case strictly speaking. Rather the G allows free movement (it’s not for anything as there are no SDs that can enforce movement) but penalizes structures that have nominals in the wrong places. In effect, we move the power of SDs and SCs from the movement rules themselves and put them into the filters. F&C (and CCT which rationalized them) outline one type of filter, Rizzi’s criterial conditions provide another variety. Theoretically, the cost of simplifying the rules is adding the filters.[4]

So, we moved from complex to simple rules at the price of Gs with filters of various sorts. Why was this a step forward? Two reasons.

First, MA lies behind Chomsky’s unification of Ross’s Islands via Subjacency Theory (ST) (and, IMO, is a crucial step in the development of trace theory and the ECP). Let me elaborate. Once we reduce movement to its essentials, as MA does, then it is natural to investigate the properties of movement as such, properties like island sensitivity (a.o.). Thus, ‘On Wh Movement’ (OWM) demonstrates that MA as such respects islands. Or, to put this another way, ST is not construction specific. It applies to all movement dependencies regardless of the specific features being related. Or, MA serves to define what a movement dependency is and ST regulates this operation regardless of the interpretive ends the operation serves, be it focus or topic, or questions, or relativization or clefts or… If MA is involved, islands are respected. Or, ST is a property of MA per se, not the specific constructions MA can be “part” of.[5]

Second, by factoring out MA form movement transformations and replacing SDs/SCs with filters focuses on the question of where these filters come from? Are they universal (part of FL/UG) or language specific? One of the nice features of CCT was that it had the feel of a (potential) FL/UG principle. CCT Case was abstract. The relations were local (government). Gs as diverse as those found in English, French, Icelandic and Chinese looked like they respected these principles (more or less). Moreover, were CCT right, then it did not look like easily learnable given that it was empirically motivated by negative data. So, simplifying the rules of G led to the discovery of plausible universal features of FL/UG. Or, more cautiously, it led to an interesting research program: looking for plausible universal filters on simple rules of derivation.[6]

What should we make of all of this today in a more minimalist setting? Well, so far as I can tell, the data that motivated the F&C filters and the CCT, as well as the theoretical motivation of simplifying G operations, is still with us. If this is so, then some residue of the CCT reflects properties of FL/UG. And this generates a minimalist question: Is CCT linguistically proprietary? Why Case features at all? How, if at all, is abstract case related to (abstract?) agreement? What is anything relates CCT and MTC? How is case discharged in a model without the government relation? How is case related to other G operations? Etc. You know the drill.[7] IMO, we have made some progress on some of these questions (e.g. treating case as a by product of Merge/Agree) and no progress on others (e.g. why there is case at all).[8] However, I believe research has been hindered, in part, by forgetting what CCT was a theory of and why it was such a big step forward.

Before ending, let me mention one more property of abstract case. In minimalist settings abstract case freezes movement. Or, more correctly, in some theories case marking a nominal makes it ineligible for further movement. This “principle” is a reinvention of the old GB observation that well formed chains have one case (marked on the head of the chain) and one theta role (marked on the foot). If this is on the right track (which it might not be) the relevant case here is abstract. So, for example, a quirky subject in a finite subject position in a language like Icelandic can no more raise than can a nominative marked subject. If we take the quirky case marked subject to be abstractly case marked in the same way as the nominative is, then this follows smoothly. Wrt abstract case (i.e. ignoring the morphology) both structures are the same. To repeat, so far as I know, this application of abstract case was not a feature of CCT.

To end: I am regularly told that CCT is dead, and maybe it is. But the arguments generally brought forward in obituary seem to me to be at right angles to what CCT intended to explain. What might be true is that extensions of CCT to include morphological case need re-thinking. But the original motivation seems intact and, frow what I can tell, something like CCT is the only theory around to account for these classical data.[9] And this is important. For if this is right, then minimalists need to do some hard thinking in order to integrate the CCT into a more friendly setting.

[1] Nor, as I recall, did people think that it was likely to be true. It was understood pretty early on that inherent/quirky case (I actually still don’t understand the difference, btw) does not transparently reflect the abstract case assigned. Indeed, the recognized difference between structural case and inherent case signaled early on that whatever abstract case was morphologically, it was not something easily read off the surface.

[2] Indeed, Distributed Morphology might be the form that such a hybrid theory might take.

[3] Actually, there was a debate about whether only overt nominal were relevant. Lasnik had a great argument suggesting that A’-traces also need case marking. Here is the relevant data point: * The man₁ (who/that) it was believed t₁ to be smart. Why is this relative clause unacceptable even if we don’t pronounce the complementizer? Answer: the A’-t needs case. This, to my knowledge, is the only data against the idea that case exclusively regulates the distribution of overt nominal expressions. Let me know if there are others out there.

[4] Well, if you care about overgeneration. If you don’t, then you can do without filters or CCT.

[5] Whether this is an inherent property of movement rather than, say, overt movement, was widely investigated in the 1980s. As you all know, Huang argued that ST is better viewed as an SS filter rather than part of the definition of MA.

[6] I should add, that IMO, this project was tremendously successful and paved the way for the Minimalist Program.

[7] The two most recent posts (here and here) discuss some of these issues.

[8] Curiously, the idea that case and agreement are effectively the same thing was not part of CCT. This proposal is a minimalist one. It’s theoretical motivation is twofold: first to try to reduce case and agreement to a common “mystery,” one being better than two. Second, because if case is a feature of nominals then probes are not the sole locus of uninterpretable features. Case is the quintessential uninterpretable feature. CCT understood it to be a property of nominals. This sits uncomfortably with a probe/goal theory in which all uninterpretable features are located in probes (e.g. phase heads). One way to get around this problem is to treat case as by-products of the “real” agreement operation initiated by the probe.

From what I gather, the idea that case reduces to agreement is currently considered untenable. This does not bother me in the least given my general unhappiness with probe/goal theories. But this is a topic for another discussion.

[9] Reducing nominal distribution to syntactic selection is not a theory as the relevant features are almost always diacritical.

Monday, August 12, 2013

Explaining Camels

There are certain kinds of explanations, which when available, are particularly satisfying. What makes them such is that they not only explain the facts in front of you, but do so in ways that make the facts inevitable. How do they do this? Well, one way is by rendering the non-extant alternatives not merely false, but inconceivable. A joke/riddle that I like to tell my students displays the quality I have in mind.

One physicist/mathematician asks another: why are there 1-humped camels and 2-humped camels, but no N-humped camels, N>2? Answer: Because camels are convex or concave; no other models available.

I love this answer. It’s perfect. How so? By shifting the relevant predicates (from natural numbers to simple curves) the range of possible camels is reduced to two, both of which are attested! Concave/convex, exhaust the space of options. And once you think of things in this way, it is clear why 1 and 2 are the only possible values for N.

Let me put this another way: one gets a truly satisfying explanation if one can embed it in concepts that obviate further why questions. How are they obviated? By exhausting the range of possibilities. Why are there no 3-humped camels? Because 3-humped camels are neither concave nor convex and these are the only shapes camels can come in.

The joke/riddle has a second useful attribute. I displays what I take to be a central aim of theoretical research: to redescribe the conceivable alternatives in such as way as to restrict the range of available alternatives to what one actually sees. The aim of theory is not merely to cover the data, but to explain why the data falls in the restricted range it does, and this requires carefully observing what doesn’t happen (what Paul Pietroski calls ‘negative’ facts (e.g. here)).

So, do we have any of these kinds of explanations within syntax? I think we do, or at least there have been attempts to provide such. Let me illustrate.

One current example is Chomsky’s proposed account for why grammatical operations are structure dependent. This is in Problems of Projection (which I would link to, but it is behind a Lingua paywall with an exorbitant price so I suggest that you just get a copy from someplace else). Here’s what we want to explain: given that rules that move T-to-C (as in Y/N questions in English) target the “highest” Ts and not the linearly closest Ts (i.e. leftmost), why must they target these Ts, (i.e. why can’t they target the linearly most proximate Ts)?

The answer that Chomsky gives is that grammatical operations cannot use notions like linear proximity, because linguistic objects are not linearly specified until Spell Out, i.e. the final rule of the syntax. So why can’t grammatical operations be linearly dependent (i.e. non-structure dependent)? Because the syntactic manipulanda (i.e. phrase markers) contain no linear (i.e. left-right order) information. Thus, if grammatical rules manipulate phrase markers and these don’t contain linear information then there is no way to state linearly-dependent rules over these objects. In other words, why do such rules not appear to exist? Because they can’t be specified for the objects over which the grammatical rules operate, and that’s why linear dependent syntax rules don’t exist.

Or, put positively: why are all syntactic rules structure dependent? Because that’s the only way they can be. In other words, once the impossible options are eliminated all that’s left coincides with what we find. In this way, the actual is explained via the possible and explanatory oomph is attained. Indeed, I suspect (believe!) that the best way to explain anything is by showing how the plausible alternatives are actually conceptually impossible when thought about in the right way.

Here’s another minimalist example. It comes from current conceptions of Spell Out and how they’ve been used to account for phase impenetrability (i.e. the prohibition against forming dependencies across a phase head). Here’s the question: why are phases impenetrable? Answer: because Spell Out “sends” phase head complements to the interfaces thereby removing their contents from the purview of the syntax/computational system. In effect, dependencies across phase heads are not possible because complements of phase heads (and hence their contents) are not syntactically “there” to be related.

Note the similarity to the first account: just as linear information is not there to be exploited and hence only structure dependent operations are stateable, so too phase complement information is not there and so it cannot be exploited. In both cases, the “reason” the condition holds is that it cannot fail to hold. There really is only one option when properly conceptualized.

Here’s another example from an earlier era: one of the most interesting arguments in favor of dispensing with constructions as grammatical primitives came from considering a conundrum relating to examples like (1c).

(1) a. John is likely to have kissed Mary

b. John was seen/believed by Mary

c. John is seen/believed to have kissed Mary

The puzzle is the following in the context of a construction-based conception of grammatical operations (i.e. a view of FL in which the basic operations are construction based rules like Passive, Raising etc).[1] In (1a), John moves from the post verbal position to the subject position via Raising. In (1b) the operation that moves John to the top is Passivization. Question: what is the rule that moves John in (1c)? Is this Passive or Raising? There is no determinate answer.

Eliminating constructions gives a simple answer to the otherwise unanswerable question: it’s neither, as these kinds of rules don’t exist. ‘Move alpha’ is the sole transformation and it applies in producing both Raising and Passive constructions. Of course, if this is the only (movement) transformation, then the question is it Raising or Passive dissolves. It’s move alpha and only move alpha. As the earlier question (Raising or Passive?) had no good answer, a conception of grammar where the question dissolves has its charms.

One last example, this one from an undergrad research thesis by Noah Smith (of CMU fame; yes he was once a joint ling/CS student). He wrote when Jason Merchant’s work on ellipsis was first emerging and he asked the following question: Given that Merchant has shown that ellipsis is deletion and not interpretation, why can’t it be the latter? He rightly (in my view) surmised that this could not be a data driven fact as the relevant data for determining this was very subtle. For Jason it amounted to some case and preposition stranding correlations in sluiced and non-sluiced constructions. On the reasonable assumption that these fall outside the PLD, the fact that ellipsis was deletion could not have been a data driven outcome. Say this is correct. Noah asked why it had to be correct; why ellipsis had to be deletion and could not be interpretation a la Edwin Williams (i.e. ellipsis amounted to filling in the contents of null phrase markers with null terminals at LF).[2] Noah’s answer? Bare Phrase Structure (BPS). BPS replaces the earlier combo of phrase structure + lexical insertion rules. This has the effect of eliminating the distinction between the content and position of a lexical item. As such, he argued, the structures that the interpretive theory of ellipsis presupposed (phrases with no lexical terminals) were conceptually unavailable and this leaves the deletion analysis as the only viable option. So why is ellipsis deletion rather than interpretation? Because the interpretive theories required structures that BPS rendered impossible. I confess to always liking this story.

One caveat before concluding: I am not here proposing that the proposed explanations above are correct. I have some questions regarding the Spell Out explanation of the PIC for example and there are empirical challenges to Merchant’s evidence in favor of deletion analyses of ellipsis. However, the kinds of proposals mentioned above are interesting and important for, if correct, they explain (rather than describe) what we find. And explanation is (or should be) what scientific inquiry aims for.

To end: one of the aims of theoretical work is to find ways of framing questions in such a way that all and only the conceptually possible answers are actualized. This requires finding a vocabulary that not only accommodates/describes the actual, but renders the non-attested impossible, i.e. unstateable. This makes the consideration of systematic absences (viz. negative data) central to the theoretician’s task. Explanation lies with the dogs that don’t bark, the things that though logically possible, don’t occur. Theories explain what happens in terms of what can happen. This means keeping one’s eyes firmly focused on what we don’t find, the actual simply being the residue once the impossible has been pared away.

[1] This is not the place to go into this, but note that rejecting constructions as grammatical primitives does not imply that constructions might not be derived objects of possible psycholinguistic interest. I discuss this a bit (here) in the last chapter.

[2] There was an interesting and animated debate about ellipsis between Edwin Williams and Ivan Sag, the former defending an interpretive conception (trees with null terminals filled in at LF) vs the latters deletion analysis (similar to Merchant’s contemporary approach).

Faculty of Language

Comments