Faculty of Language: Richard Kayne

In the real sciences, theoretical debate often comes to an end (or at least severely changes direction) when a crucial experiment (CE) ends it. How do CEs do this? They uncover decisive data (aka “killer data” (KD)) that if accurate shows that one possible live approach to a problem is empirically deeply flawed.[1] These experiments and their attendant KD become part of the core ideology and serve to eliminate initially plausible explanations from the class of empirically admissible ones.[2]

Here are some illustrative examples of CE: the Michaelson-Morley experiment (which did in the ether and ushered in special relativity (here)), the Rutherford Gold Foil experiment that ushered in the modern theory of the atom (here), the recent LIGO experiment that established the reality of gravity waves (here), the Franklin x-ray refraction pix that established the helical structure of DNA (here), the Aspect and Kwiat experiments that signaled the end of hidden variable theories (here) and (one from Wootton) Galileo’s discovery of the phases of Venus which ended the Ptolemaic geocentric universe. All of these are deservedly famous for ending one era of theoretical speculation and initiating another. In the real sciences, there are many of these and they are one excellent indicator that a domain of inquiry has passed from intelligent speculation (often lavishly empirically titivated) to real science. Why? Because only relatively well-developed domains of inquiry are sufficiently structured to allow an experiment to be crucial. To put this another way: crucial experiments must tightly control for wiggle room, and this demands both a broad well developed empirical basis and a relatively tight theoretical setting. Thus, if a domain has such, it signals its scientific bona fides.

In what follows, I’d like to offer some KDs in syntax, phenomena that, IMO, rightly terminated (or should, if they are accurate) some perfectly plausible lines of investigation. The list is not meant to be exhaustive, nor is it intended to be uncontroversial.[3] I welcome dissent and additions. I offer five examples.

First, and most famously, polar questions and structure dependence. The argument and effect is well known (see here for one elaborate discussion). But to quickly review, we have an observation about how polar questions are formed in English (Gs “move” an auxiliary to the front of the clause). Any auxiliary? Nope, the one “closest” to the front. How is proximity measured? Well, not linearly. How do we know? Because of (i) the unacceptability of sentences like (1) (which should be well formed if distance were measured linearly) and (ii) the acceptability of those like (2) (which should be acceptable if distance is measured hierarchically).

1. *Can eagles that ~~can~~ fly should swim?

2. Should eagles that can fly ~~should~~ swim?

The conclusion is clear: if polar questions are formed by movement, then the relevant movement rule ignores linear proximity in choosing the right auxiliary to move.[4] Note, as explained in the above linked-to post, the result is a negative one. The KD here establishes that G rules forsake linear information. It does not specify the kind of hierarchical information it is sensitive to. Still, the classical argument puts to rest the idea that Gs manipulate phrase markers in terms of their string properties.[5]

The second example concerns reflexivization (R). Is it an operation that targets predicates and reduces their addicities by linking their arguments or is it a syntactic operation that relates nominal expressions? The former treats R as ranging over predicates and their co-arguments. The latter treats R as an operation that syntactically pairs nominal expressions regardless of their argument status. The KD against the predicate centered approach is found in ECM constructions where non co-arguments can be R related.

3. Mary expects herself to win

4. John believes himself to be untrustworthy

5. Mary wants herself to be elected president

In (3)-(5) the reflexive is anteceded by a non-co-argument. So, ‘John’ is an argument of the higher predicate in (4), and ‘himself’ is an argument of the lower predicate ‘be untrustworthy’ but not the higher predicate ‘believe.’ Assuming that reflexives in mono-clauses and those in examples like (3)-(5) are licensed by the same rule, it provides KD that R is not an argument changing (i.e. addictiy lowering)[6] operation but a rule defined over syntactic configurations that relates nominals.[7]

Here's a third more recondite example that actually had the consequence of eliminating one conception of empty categories (EC). In Concepts and Consequences (C&C) Chomsky proposed a functional interpretation of ECs.

A brief advertisement before proceeding: C&C is a really great book whose only vice is that its core idea is empirically untenable. Aside from this, it is a classic and still well worth reading.

At any rate, C&C is a sustained investigation of parasitic gap (PG) phenomena and it proposes that there is no categorical difference among the various flavors of traces (A vs A’ vs PRO). Rather there is only one EC and the different flavors reflect relational properties of the syntactic environment the EC is situated in. This allows for the possibility that an EC can start out its life as a PRO and end its life as an A’-trace without any rule directly applying to it. Rather, if something else moves and binds the PRO, the EC that started out as a PRO will be interpreted as an A or A’-trace depending on what position the element it is related to occupies (the EC is an A-trace if A-bound and an A-trace if A’-bound). This forms the core of C&C analysis of PGs, and it has the nice property of largely deriving the properties of PGs from more general assumptions about binding theory combined with this functional interpretation of ECs. To repeat, it is a very nice story. IMO, conceptually, it is far better than the Barriers account in terms of chain formation and 0-operators which came after C&C. Why? Because the Barriers account is largely a series of stipulations on chain formation posited to “capture” the observed output. C&C provides a principled theory but is wrong and Barriers provides an account that covers the data but is unprincipled.

How was C&C wrong? Kayne provided the relevant KD.[8] He showed that PGs, the ECs inside the adjuncts, are themselves subject to island effects. Thus, though one can relate a PG inside an adjunct (which is an island) to an argument outside the adjunct, the gap inside the island is subject to standard island effects. So the EC inside the adjunct cannot itself be inside another island. Here’s one example:

6. Which book did you review before admitting that Bill said that Sheila had read

7. *Which book did you review before finding someone that read

The functional definition of ECs implies that ECs that are PGs should not be subject to island effects as they are not formed by movement. This proved to be incorrect and the approach died. Killed by Kayne’s KD.

A fourth case: P-stranding and case connectedness effects in ellipsis killed the interpretive theory of ellipsis and argued for the deletion account. Once upon a time, the favored account of ellipsis was interpretive.[9] Gs generated phrase markers without lexical terminals. Ellipsis was effectively what one got with lexical insertion delayed to LF. It was subject to various kinds of parallelism restrictions, with the non-elided antecedent serving to provide the relevant terminals for insertion into the elided PM (i.e. the one without terminals) the insertion subject to recoverability and the requirement that the insertion be to positions parallel to those in the non-elided antecedent. Figuratively, the LF of the antecedent was copied into the PM of the elided dependent.

As is well-known by now, Jason Merchant provided KD against this position, elaborating earlier (ignored?) arguments by Ross. The KD came in two forms. First, that elided structures respect the same case marking conventions apparent in non-elision constructions. Second, that preposition stranding is permitted in ellipsis just in case it is allowed in cases of movement without elision. In other words, it appears that but for the phonology, elided phrases exhibit the same dependencies apparent in non-elided derivations. The natural conclusion is that elision is derived by deleting structure that is first generated in the standard way. So, the parallelism in case and P-stranding profiles of elided and non-elided structures implies that they share a common syntactic derivational core.[10] This is just what the interpretive theory denies and the deletion theory endorses. Hence the deletion theory has a natural account for the observed syntactic parallelism that Merchant/Ross noted. And indeed, from what I can tell, the common wisdom today is that ellipsis is effectively a deletion phenomenon.

It is worth observing, perhaps, that this conclusion also has a kind of minimalist backing. Bare Phrase Structure (BPS) makes the interpretive theory hard to state. Why? Because the interpretive theory relies on a distinction between structure building and lexical insertion, and BPS does not recognize this distinction. Thus, given BPS, it is unclear how to generate structures without terminals. But as the interpretive theory relies on doing just this, it would seem to be a grammatically impossible analysis in a BPS framework. So, not only is the deletion theory of ellipsis the one we want empirically, it also appears to be the one that conforms to minimalist assumptions.

Note, that the virtue of KD is that it does not rely on theoretical validation to be effective. Whether deletion theories are more minimalistically acceptable than interpretive theories is an interesting issue. But whether they are or aren’t does not affect the dispositive nature of KD data wrt the proposals it adjudicates. This is one of the nice features of CEs and KD: they stand relatively independent of particular theory and hence provide a strong empirical check on theory construction. That’s why we like them.

Fifth, and now I am going to be much more controversial; inverse control and the PRO based theory of control. Polinksy and Potsdam (2002) presents cases control in which “PRO” c-commands its antecedent. This, strictly speaking should be impossible for such binding violates principle C. However, the sentences are licit with a control interpretation. Other examples of inverse control have since been argued to exist in various other languages. If inverse control exists, it is a KD for any PRO based conception of control. As all but the movement theory of control (MTC) is a PRO based conception of control, if inverse control obtains then the MTC is the only theory left standing. Moreover, as Polinsky and Potsdam have argued since, that inverse control exists makes perfect sense in the context of a copy theory of movement if one allows top copies to be PF deleted. Indeed, as argued here the MTC is what one expects in the context of a theory that eschews D-structure and adopts the least encumbered theory of merge. But all of this is irrelevant as regards the KD status of inverse control. Whether or not the MTC is right (which it, of course is) inverse control effects present KD against PRO based accounts of control given standard assumptions about principle C.

That’s it. Five examples. I am sure there are more. Send in your favorite. These are very useful to have on hand for they are part of what makes a research program progressive. CEs and KDs mark the intellectual progress of a discipline. They establish boundary condition sin adequate further theorizing. I am no great fan of empirics. The data does not do much for me. But I am an avid consumer of CEs and KDs. They are, in their own interesting ways, tributes to how far we’ve come in our understanding and so should be cherished.

[1] Note the modifier ‘deeply.’ Here’s an interesting question that I have no clean answer for: what makes one flaw deep and another a mere flesh wound? One mark of a deep flaw is that it buts up against a bed rock principle of the theory under investigation. So, for example, Galileo’s discovery was hard to reconcile with the Ptolemaic system unless one assumed that the phases of Venus were unlike any other of the phases seen at the time. There was no set of calculations that could get you the observed effects that were consistent with those most generally in use. Similarly for the Michaelson-Morley data. To reconcile these with the observations required fundamental changes to other basic assumptions. Most data are not like this. They can be reconciled by adding further (possibly ad hoc) assumptions or massaging some principles in new ways. But butting up against a fundamental principle is not that common. That’s why CEs and KD is interesting and worth looking for.

[2] The term “killer data” is found in a great new book on the rise of modern science by David Wootton (here). He argues that the existence of KD is a crucial ingredient in the emergence of modern science. It’s a really terrific book for those of you interested in these kinds of issues. The basic argument is that there really was a distinction in kind between what came after the scientific revolution and its precursors. The chapter on how perspective in painting fueled the realistic interpretation of abstract geometry as applied to the real world is worth the price of the book all by itself.

[3] In this, my list fails to have one property that Wootton highlighted. KDs as a matter of historical fact are widely accepted and pretty quickly too. Not all my candidate KDs have been as successful (tant pis), hence the bracketed qualifying modal.

[4] Please note the conditional: the KD shows that transformations are not linearly sensitive. This presupposes that Y/N questions are transformationally derive. Syntactic Structures argued for a transformational analysis of Aux fronting. A good analysis of the reasons for this is provided in Lasnik’s excellent book (here). What is important to note is that data can become KD only given a set of background assumptions. This is not a weakness.

[5] This raises another question that Chomsky has usefully pressed: why don’t G operations exploit the string properties of phrase markers? His answer is that PMs don’t have string properties as they are sets and sets impose no linear order on their elements.

[6] Note: that R relates nominals does not imply that it cannot have the semantic reflex of lowering the additcity of a predicate. So, R applies to John hugged himself to relate the reflexive and John. This might reduce the addicity of hug from 2-place to 1-place. But this is an effect of the rule, not a condition of the rule. The rule could care less whether the relata are co-arguments.

[7] There are some theories that obscure this conclusion by distinguishing between semantic and syntactic predicates. Such theories acknowledge the point made here in their terminology. R is not an addicity changing operation, though in some cases it might have the effects of changing predicate addicity (see note 6).

This, btw, is one of my favorite KDs. Why? Because it makes sense in a minimalist setting. Say R is a rule of G. Then given Inclusiveness it cannot be an addicity changing operation for this would be a clear violation of Inclusiveness (which, recall, requires preserving the integrity of the atoms in the course of a derivation and nothing violates the integrity of a lexical item more than changing its argument structure). Thus, in a minimalist setting, the first view of R seems ruled out.

We can, as usual, go further. We can provide a deeper explanation for this instance of Inclusiveness and propose that addicity changing rules cannot be stated given the right conception of syntactic atoms (this parallel to how thinking of Merge as outputting sets thereby makes impossible rules that exploit linear dependencies among the atoms (see note 3)). How might we do this? By assuming that predicates have at most one argument (i.e. they are 1-place predicates). This is to effectively endorse a strong neo-Davidsonian conception of predicates in which all predicates are 1-place predicates of events and all “arguments” are syntactic dependents (see e.g. Pietroski here for discussion). If this is correct, then there can be no addicity changing operations grammatically identifying co-arguments of a predicate, as predicates have no co-arguments. Ergo, R is the only kind of rule a G can have.

[8] If memory serves, I think that he showed this in his Connectedness book.

[9] Edwin Williams developed this theory. Ivan Sag argued for a deletion theory. Empirically the two were hard to pull apart. However in the context of GB, Williams argued that the interpretive theory was more natural. I think he had a point.

[10] For what it is worth, I have always found the P-stranding facts to be the more compelling. The reason is that all agree that at LF P-stranding is required. Thus the LF of To whom did you speak? involves abstracting over an individual, not a PP type. In other words, the right LF involves reconstructing the P and abstracting over the DP complement; something like (i), not (ii):

(i) Who₁ [you talk to x₁]

(ii) [To who]₁ [you talk x₁]

An answer to the question given something like (i) is ‘Fred.’ An answer to (ii) could be ‘about Harry.’ It is clear that at LF we want structure like (i) and not (ii). Thus, at LF the right structure in every language necessarily involves P-stranding, even if the language disallows P-stranding syntactically. This is KD for theories that license ellipsis at LF via interpretation rather than via movement plus deletion.

Faculty of Language

Comments

Monday, May 30, 2016

Crucial experiments and killer data

Contributors