In the real sciences, theoretical debate often comes to an
end (or at least severely changes direction) when a crucial experiment (CE) ends
it. How do CEs do this? They uncover decisive data (aka “killer data” (KD))
that if accurate shows that one possible live approach to a problem is empirically
deeply flawed.[1]
These experiments and their attendant KD become part of the core ideology and
serve to eliminate initially plausible explanations from the class of empirically
admissible ones.[2]
Here are some illustrative examples of CE: the
Michaelson-Morley experiment (which did in the ether and ushered in special
relativity (here)),
the Rutherford Gold Foil experiment that ushered in the modern theory of the
atom (here), the
recent LIGO experiment that established the reality of gravity waves (here), the Franklin
x-ray refraction pix that established the helical structure of DNA (here), the Aspect and Kwiat
experiments that signaled the end of hidden variable theories (here) and (one
from Wootton) Galileo’s discovery of the phases of Venus which ended the
Ptolemaic geocentric universe. All of these are deservedly famous for ending
one era of theoretical speculation and initiating another. In the real sciences,
there are many of these and they are one excellent indicator that a domain of
inquiry has passed from intelligent speculation (often lavishly empirically
titivated) to real science. Why? Because only relatively well-developed domains
of inquiry are sufficiently structured to allow an experiment to be crucial. To
put this another way: crucial experiments must tightly control for wiggle room,
and this demands both a broad well developed empirical basis and a relatively
tight theoretical setting. Thus, if a domain has such, it signals its
scientific bona fides.
In what follows, I’d like to offer some KDs in syntax,
phenomena that, IMO, rightly terminated (or should, if they are accurate) some
perfectly plausible lines of investigation. The list is not meant to be
exhaustive, nor is it intended to be uncontroversial.[3] I
welcome dissent and additions. I offer five examples.
First, and most famously, polar questions and structure
dependence. The argument and effect is well known (see here
for one elaborate discussion). But to quickly review, we have an observation
about how polar questions are formed in English (Gs “move” an auxiliary to the
front of the clause). Any auxiliary? Nope, the one “closest” to the front. How
is proximity measured? Well, not
linearly. How do we know? Because of (i) the unacceptability of sentences like
(1) (which should be well formed if distance were measured linearly) and (ii)
the acceptability of those like (2) (which should be acceptable if distance is
measured hierarchically).
1. *Can
eagles that can fly should swim?
2. Should
eagles that can fly should swim?
The conclusion is clear: if
polar questions are formed by movement, then
the relevant movement rule ignores linear proximity in choosing the right
auxiliary to move.[4]
Note, as explained in the above linked-to post, the result is a negative one.
The KD here establishes that G rules forsake linear information. It does not
specify the kind of hierarchical information it is sensitive to. Still, the
classical argument puts to rest the idea that Gs manipulate phrase markers in
terms of their string properties.[5]
The second example concerns reflexivization (R). Is it an
operation that targets predicates and reduces their addicities by linking their
arguments or is it a syntactic operation that relates nominal expressions? The
former treats R as ranging over predicates and their co-arguments. The latter
treats R as an operation that syntactically pairs nominal expressions regardless
of their argument status. The KD against
the predicate centered approach is found in ECM constructions where non
co-arguments can be R related.
3. Mary
expects herself to win
4. John
believes himself to be untrustworthy
5. Mary
wants herself to be elected president
In (3)-(5) the reflexive is anteceded by a non-co-argument.
So, ‘John’ is an argument of the higher predicate in (4), and ‘himself’ is an
argument of the lower predicate ‘be untrustworthy’ but not the higher predicate
‘believe.’ Assuming that reflexives in mono-clauses and those in examples like
(3)-(5) are licensed by the same rule, it provides KD that R is not an argument
changing (i.e. addictiy lowering)[6]
operation but a rule defined over syntactic configurations that relates
nominals.[7]
Here's a third more recondite example that actually had the
consequence of eliminating one conception of empty categories (EC). In Concepts and Consequences (C&C)
Chomsky proposed a functional interpretation of ECs.
A brief advertisement before proceeding: C&C is a really
great book whose only vice is that its core idea is empirically untenable.
Aside from this, it is a classic and still well worth reading.
At any rate, C&C is a sustained investigation of
parasitic gap (PG) phenomena and it proposes that there is no categorical
difference among the various flavors of traces (A vs A’ vs PRO). Rather there
is only one EC and the different flavors reflect relational properties of the
syntactic environment the EC is situated in. This allows for the possibility
that an EC can start out its life as a PRO and end its life as an A’-trace
without any rule directly applying to it. Rather, if something else moves and binds
the PRO, the EC that started out as a PRO will be interpreted as an A or
A’-trace depending on what position the element it is related to occupies (the
EC is an A-trace if A-bound and an A-trace if A’-bound). This forms the core of
C&C analysis of PGs, and it has
the nice property of largely deriving the properties of PGs from more general
assumptions about binding theory combined with this functional interpretation
of ECs. To repeat, it is a very nice story. IMO, conceptually, it is far better than the Barriers account in terms of chain formation and 0-operators which
came after C&C. Why? Because the Barriers account is largely a series of
stipulations on chain formation posited to “capture” the observed output. C&C provides a principled theory
but is wrong and Barriers provides an
account that covers the data but is unprincipled.
How was C&C wrong?
Kayne provided the relevant KD.[8] He
showed that PGs, the ECs inside the adjuncts, are themselves subject to island
effects. Thus, though one can relate a PG inside an adjunct (which is an island)
to an argument outside the adjunct, the gap inside
the island is subject to standard island effects. So the EC inside the
adjunct cannot itself be inside another island. Here’s one example:
6. Which
book did you review before admitting that Bill said that Sheila had read
7. *Which
book did you review before finding someone that read
The functional definition of ECs implies that ECs that are PGs
should not be subject to island effects as they are not formed by movement.
This proved to be incorrect and the approach died. Killed by Kayne’s KD.
A fourth case: P-stranding and case connectedness effects in
ellipsis killed the interpretive theory of ellipsis and argued for the deletion
account. Once upon a time, the favored account of ellipsis was interpretive.[9] Gs
generated phrase markers without lexical terminals. Ellipsis was effectively
what one got with lexical insertion delayed to LF. It was subject to various
kinds of parallelism restrictions, with the non-elided antecedent serving to
provide the relevant terminals for insertion into the elided PM (i.e. the one
without terminals) the insertion subject to recoverability and the requirement
that the insertion be to positions parallel to those in the non-elided antecedent.
Figuratively, the LF of the antecedent was copied into the PM of the elided
dependent.
As is well-known by now, Jason Merchant provided KD against
this position, elaborating earlier (ignored?) arguments by Ross. The KD came in
two forms. First, that elided structures respect the same case marking
conventions apparent in non-elision constructions. Second, that preposition
stranding is permitted in ellipsis just in case it is allowed in cases of
movement without elision. In other words, it appears that but for the
phonology, elided phrases exhibit the same dependencies apparent in non-elided
derivations. The natural conclusion is that elision is derived by deleting
structure that is first generated in the standard way. So, the parallelism in
case and P-stranding profiles of elided and non-elided structures implies that
they share a common syntactic derivational core.[10]
This is just what the interpretive theory denies and the deletion theory endorses.
Hence the deletion theory has a natural account for the observed syntactic
parallelism that Merchant/Ross noted. And indeed, from what I can tell, the
common wisdom today is that ellipsis is effectively a deletion phenomenon.
It is worth observing, perhaps, that this conclusion also
has a kind of minimalist backing. Bare Phrase Structure (BPS) makes the
interpretive theory hard to state. Why? Because the interpretive theory relies
on a distinction between structure building and lexical insertion, and BPS does
not recognize this distinction. Thus, given BPS, it is unclear how to generate
structures without terminals. But as the interpretive theory relies on doing
just this, it would seem to be a grammatically impossible analysis in a BPS
framework. So, not only is the deletion theory of ellipsis the one we want
empirically, it also appears to be the one that conforms to minimalist
assumptions.
Note, that the virtue of KD is that it does not rely on
theoretical validation to be effective. Whether deletion theories are more
minimalistically acceptable than interpretive theories is an interesting issue.
But whether they are or aren’t does not affect the dispositive nature of KD
data wrt the proposals it adjudicates. This is one of the nice features of CEs
and KD: they stand relatively independent of particular theory and hence
provide a strong empirical check on theory construction. That’s why we like
them.
Fifth, and now I am going to be much more controversial;
inverse control and the PRO based theory of control. Polinksy and Potsdam
(2002) presents cases control in which “PRO” c-commands its antecedent. This,
strictly speaking should be impossible for such binding violates principle C.
However, the sentences are licit with a control interpretation. Other examples of
inverse control have since been argued to exist in various other languages. If inverse control exists, it is a KD
for any PRO based conception of control. As all but the movement theory of
control (MTC) is a PRO based
conception of control, if inverse control obtains then the MTC is the only
theory left standing. Moreover, as Polinsky and Potsdam have argued since, that
inverse control exists makes perfect sense in the context of a copy theory of
movement if one allows top copies to be PF deleted. Indeed, as argued here
the MTC is what one expects in the context of a theory that eschews D-structure
and adopts the least encumbered theory of merge. But all of this is irrelevant
as regards the KD status of inverse control. Whether or not the MTC is right
(which it, of course is) inverse control effects present KD against PRO based
accounts of control given standard assumptions about principle C.
That’s it. Five examples. I am sure there are more. Send in
your favorite. These are very useful to have on hand for they are part of what
makes a research program progressive. CEs and KDs mark the intellectual
progress of a discipline. They establish boundary condition sin adequate
further theorizing. I am no great fan of empirics. The data does not do much
for me. But I am an avid consumer of CEs and KDs. They are, in their own
interesting ways, tributes to how far we’ve come in our understanding and so
should be cherished.
[1]
Note the modifier ‘deeply.’ Here’s an interesting question that I have no clean
answer for: what makes one flaw deep and another a mere flesh wound? One mark
of a deep flaw is that it buts up against a bed rock principle of the theory
under investigation. So, for example, Galileo’s discovery was hard to reconcile
with the Ptolemaic system unless one assumed that the phases of Venus were
unlike any other of the phases seen at the time. There was no set of
calculations that could get you the observed effects that were consistent with
those most generally in use. Similarly for the Michaelson-Morley data. To
reconcile these with the observations required fundamental changes to other
basic assumptions. Most data are not
like this. They can be reconciled by adding further (possibly ad hoc)
assumptions or massaging some principles in new ways. But butting up against a
fundamental principle is not that common. That’s why CEs and KD is interesting
and worth looking for.
[2]
The term “killer data” is found in a great new book on the rise of modern
science by David Wootton (here).
He argues that the existence of KD is a crucial ingredient in the emergence of
modern science. It’s a really terrific book for those of you interested in
these kinds of issues. The basic argument is that there really was a
distinction in kind between what came after the scientific revolution and its
precursors. The chapter on how perspective in painting fueled the realistic
interpretation of abstract geometry as applied to the real world is worth the
price of the book all by itself.
[3]
In this, my list fails to have one property that Wootton highlighted. KDs as a
matter of historical fact are widely accepted and pretty quickly too. Not all
my candidate KDs have been as successful (tant pis), hence the bracketed
qualifying modal.
[4]
Please note the conditional: the KD shows that transformations are not linearly sensitive. This presupposes that
Y/N questions are transformationally derive. Syntactic Structures argued for a transformational analysis of Aux
fronting. A good analysis of the reasons for this is provided in Lasnik’s
excellent book (here). What is important to note is that data can
become KD only given a set of background assumptions. This is not a weakness.
[5]
This raises another question that Chomsky has usefully pressed: why don’t G operations exploit the
string properties of phrase markers? His answer is that PMs don’t have string
properties as they are sets and sets impose no linear order on their elements.
[6]
Note: that R relates nominals does not imply that it cannot have the semantic
reflex of lowering the additcity of a predicate. So, R applies to John hugged himself to relate the
reflexive and John. This might reduce
the addicity of hug from 2-place to
1-place. But this is an effect of the rule, not a condition of the rule. The
rule could care less whether the relata are co-arguments.
[7]
There are some theories that obscure this conclusion by distinguishing between
semantic and syntactic predicates. Such theories acknowledge the point made
here in their terminology. R is not an addicity changing operation, though in
some cases it might have the effects of changing predicate addicity (see note 6).
This, btw, is one of my
favorite KDs. Why? Because it makes sense in a minimalist setting. Say R is a
rule of G. Then given Inclusiveness it cannot be an addicity changing operation
for this would be a clear violation of Inclusiveness (which, recall, requires
preserving the integrity of the atoms in the course of a derivation and nothing
violates the integrity of a lexical item more than changing its argument
structure). Thus, in a minimalist setting, the first view of R seems ruled out.
We can, as usual, go further.
We can provide a deeper explanation for this instance of Inclusiveness and
propose that addicity changing rules cannot be stated given the right
conception of syntactic atoms (this parallel to how thinking of Merge as
outputting sets thereby makes impossible rules that exploit linear dependencies
among the atoms (see note 3)). How might we do this? By assuming that
predicates have at most one argument (i.e. they are 1-place predicates). This
is to effectively endorse a strong neo-Davidsonian conception of predicates in
which all predicates are 1-place predicates of events and all “arguments” are
syntactic dependents (see e.g. Pietroski here
for discussion). If this is correct, then there can be no addicity changing
operations grammatically identifying co-arguments of a predicate, as predicates
have no co-arguments. Ergo, R is the only kind of rule a G can have.
[8]
If memory serves, I think that he showed this in his Connectedness book.
[9]
Edwin Williams developed this theory. Ivan Sag argued for a deletion theory.
Empirically the two were hard to pull apart. However in the context of GB, Williams
argued that the interpretive theory was more natural. I think he had a point.
[10]
For what it is worth, I have always found the P-stranding facts to be the more
compelling. The reason is that all agree that at LF P-stranding is required. Thus the LF of To whom did you speak? involves abstracting over an individual, not
a PP type. In other words, the right LF involves reconstructing the P and
abstracting over the DP complement; something like (i), not (ii):
(i)
Who1 [you talk to x1]
(ii)
[To who]1 [you talk x1]
An answer to the question given something like (i) is
‘Fred.’ An answer to (ii) could be ‘about Harry.’ It is clear that at LF we
want structure like (i) and not (ii). Thus, at LF the right structure in every language necessarily involves
P-stranding, even if the language disallows P-stranding syntactically. This is
KD for theories that license ellipsis at LF via interpretation rather than via
movement plus deletion.