Faculty of Language: Crucial experiments and killer data

Monday, May 30, 2016

Crucial experiments and killer data

In the real sciences, theoretical debate often comes to an end (or at least severely changes direction) when a crucial experiment (CE) ends it. How do CEs do this? They uncover decisive data (aka “killer data” (KD)) that if accurate shows that one possible live approach to a problem is empirically deeply flawed.[1] These experiments and their attendant KD become part of the core ideology and serve to eliminate initially plausible explanations from the class of empirically admissible ones.[2]

Here are some illustrative examples of CE: the Michaelson-Morley experiment (which did in the ether and ushered in special relativity (here)), the Rutherford Gold Foil experiment that ushered in the modern theory of the atom (here), the recent LIGO experiment that established the reality of gravity waves (here), the Franklin x-ray refraction pix that established the helical structure of DNA (here), the Aspect and Kwiat experiments that signaled the end of hidden variable theories (here) and (one from Wootton) Galileo’s discovery of the phases of Venus which ended the Ptolemaic geocentric universe. All of these are deservedly famous for ending one era of theoretical speculation and initiating another. In the real sciences, there are many of these and they are one excellent indicator that a domain of inquiry has passed from intelligent speculation (often lavishly empirically titivated) to real science. Why? Because only relatively well-developed domains of inquiry are sufficiently structured to allow an experiment to be crucial. To put this another way: crucial experiments must tightly control for wiggle room, and this demands both a broad well developed empirical basis and a relatively tight theoretical setting. Thus, if a domain has such, it signals its scientific bona fides.

In what follows, I’d like to offer some KDs in syntax, phenomena that, IMO, rightly terminated (or should, if they are accurate) some perfectly plausible lines of investigation. The list is not meant to be exhaustive, nor is it intended to be uncontroversial.[3] I welcome dissent and additions. I offer five examples.

First, and most famously, polar questions and structure dependence. The argument and effect is well known (see here for one elaborate discussion). But to quickly review, we have an observation about how polar questions are formed in English (Gs “move” an auxiliary to the front of the clause). Any auxiliary? Nope, the one “closest” to the front. How is proximity measured? Well, not linearly. How do we know? Because of (i) the unacceptability of sentences like (1) (which should be well formed if distance were measured linearly) and (ii) the acceptability of those like (2) (which should be acceptable if distance is measured hierarchically).

1. *Can eagles that ~~can~~ fly should swim?

2. Should eagles that can fly ~~should~~ swim?

The conclusion is clear: if polar questions are formed by movement, then the relevant movement rule ignores linear proximity in choosing the right auxiliary to move.[4] Note, as explained in the above linked-to post, the result is a negative one. The KD here establishes that G rules forsake linear information. It does not specify the kind of hierarchical information it is sensitive to. Still, the classical argument puts to rest the idea that Gs manipulate phrase markers in terms of their string properties.[5]

The second example concerns reflexivization (R). Is it an operation that targets predicates and reduces their addicities by linking their arguments or is it a syntactic operation that relates nominal expressions? The former treats R as ranging over predicates and their co-arguments. The latter treats R as an operation that syntactically pairs nominal expressions regardless of their argument status. The KD against the predicate centered approach is found in ECM constructions where non co-arguments can be R related.

3. Mary expects herself to win

4. John believes himself to be untrustworthy

5. Mary wants herself to be elected president

In (3)-(5) the reflexive is anteceded by a non-co-argument. So, ‘John’ is an argument of the higher predicate in (4), and ‘himself’ is an argument of the lower predicate ‘be untrustworthy’ but not the higher predicate ‘believe.’ Assuming that reflexives in mono-clauses and those in examples like (3)-(5) are licensed by the same rule, it provides KD that R is not an argument changing (i.e. addictiy lowering)[6] operation but a rule defined over syntactic configurations that relates nominals.[7]

Here's a third more recondite example that actually had the consequence of eliminating one conception of empty categories (EC). In Concepts and Consequences (C&C) Chomsky proposed a functional interpretation of ECs.

A brief advertisement before proceeding: C&C is a really great book whose only vice is that its core idea is empirically untenable. Aside from this, it is a classic and still well worth reading.

At any rate, C&C is a sustained investigation of parasitic gap (PG) phenomena and it proposes that there is no categorical difference among the various flavors of traces (A vs A’ vs PRO). Rather there is only one EC and the different flavors reflect relational properties of the syntactic environment the EC is situated in. This allows for the possibility that an EC can start out its life as a PRO and end its life as an A’-trace without any rule directly applying to it. Rather, if something else moves and binds the PRO, the EC that started out as a PRO will be interpreted as an A or A’-trace depending on what position the element it is related to occupies (the EC is an A-trace if A-bound and an A-trace if A’-bound). This forms the core of C&C analysis of PGs, and it has the nice property of largely deriving the properties of PGs from more general assumptions about binding theory combined with this functional interpretation of ECs. To repeat, it is a very nice story. IMO, conceptually, it is far better than the Barriers account in terms of chain formation and 0-operators which came after C&C. Why? Because the Barriers account is largely a series of stipulations on chain formation posited to “capture” the observed output. C&C provides a principled theory but is wrong and Barriers provides an account that covers the data but is unprincipled.

How was C&C wrong? Kayne provided the relevant KD.[8] He showed that PGs, the ECs inside the adjuncts, are themselves subject to island effects. Thus, though one can relate a PG inside an adjunct (which is an island) to an argument outside the adjunct, the gap inside the island is subject to standard island effects. So the EC inside the adjunct cannot itself be inside another island. Here’s one example:

6. Which book did you review before admitting that Bill said that Sheila had read

7. *Which book did you review before finding someone that read

The functional definition of ECs implies that ECs that are PGs should not be subject to island effects as they are not formed by movement. This proved to be incorrect and the approach died. Killed by Kayne’s KD.

A fourth case: P-stranding and case connectedness effects in ellipsis killed the interpretive theory of ellipsis and argued for the deletion account. Once upon a time, the favored account of ellipsis was interpretive.[9] Gs generated phrase markers without lexical terminals. Ellipsis was effectively what one got with lexical insertion delayed to LF. It was subject to various kinds of parallelism restrictions, with the non-elided antecedent serving to provide the relevant terminals for insertion into the elided PM (i.e. the one without terminals) the insertion subject to recoverability and the requirement that the insertion be to positions parallel to those in the non-elided antecedent. Figuratively, the LF of the antecedent was copied into the PM of the elided dependent.

As is well-known by now, Jason Merchant provided KD against this position, elaborating earlier (ignored?) arguments by Ross. The KD came in two forms. First, that elided structures respect the same case marking conventions apparent in non-elision constructions. Second, that preposition stranding is permitted in ellipsis just in case it is allowed in cases of movement without elision. In other words, it appears that but for the phonology, elided phrases exhibit the same dependencies apparent in non-elided derivations. The natural conclusion is that elision is derived by deleting structure that is first generated in the standard way. So, the parallelism in case and P-stranding profiles of elided and non-elided structures implies that they share a common syntactic derivational core.[10] This is just what the interpretive theory denies and the deletion theory endorses. Hence the deletion theory has a natural account for the observed syntactic parallelism that Merchant/Ross noted. And indeed, from what I can tell, the common wisdom today is that ellipsis is effectively a deletion phenomenon.

It is worth observing, perhaps, that this conclusion also has a kind of minimalist backing. Bare Phrase Structure (BPS) makes the interpretive theory hard to state. Why? Because the interpretive theory relies on a distinction between structure building and lexical insertion, and BPS does not recognize this distinction. Thus, given BPS, it is unclear how to generate structures without terminals. But as the interpretive theory relies on doing just this, it would seem to be a grammatically impossible analysis in a BPS framework. So, not only is the deletion theory of ellipsis the one we want empirically, it also appears to be the one that conforms to minimalist assumptions.

Note, that the virtue of KD is that it does not rely on theoretical validation to be effective. Whether deletion theories are more minimalistically acceptable than interpretive theories is an interesting issue. But whether they are or aren’t does not affect the dispositive nature of KD data wrt the proposals it adjudicates. This is one of the nice features of CEs and KD: they stand relatively independent of particular theory and hence provide a strong empirical check on theory construction. That’s why we like them.

Fifth, and now I am going to be much more controversial; inverse control and the PRO based theory of control. Polinksy and Potsdam (2002) presents cases control in which “PRO” c-commands its antecedent. This, strictly speaking should be impossible for such binding violates principle C. However, the sentences are licit with a control interpretation. Other examples of inverse control have since been argued to exist in various other languages. If inverse control exists, it is a KD for any PRO based conception of control. As all but the movement theory of control (MTC) is a PRO based conception of control, if inverse control obtains then the MTC is the only theory left standing. Moreover, as Polinsky and Potsdam have argued since, that inverse control exists makes perfect sense in the context of a copy theory of movement if one allows top copies to be PF deleted. Indeed, as argued here the MTC is what one expects in the context of a theory that eschews D-structure and adopts the least encumbered theory of merge. But all of this is irrelevant as regards the KD status of inverse control. Whether or not the MTC is right (which it, of course is) inverse control effects present KD against PRO based accounts of control given standard assumptions about principle C.

That’s it. Five examples. I am sure there are more. Send in your favorite. These are very useful to have on hand for they are part of what makes a research program progressive. CEs and KDs mark the intellectual progress of a discipline. They establish boundary condition sin adequate further theorizing. I am no great fan of empirics. The data does not do much for me. But I am an avid consumer of CEs and KDs. They are, in their own interesting ways, tributes to how far we’ve come in our understanding and so should be cherished.

[1] Note the modifier ‘deeply.’ Here’s an interesting question that I have no clean answer for: what makes one flaw deep and another a mere flesh wound? One mark of a deep flaw is that it buts up against a bed rock principle of the theory under investigation. So, for example, Galileo’s discovery was hard to reconcile with the Ptolemaic system unless one assumed that the phases of Venus were unlike any other of the phases seen at the time. There was no set of calculations that could get you the observed effects that were consistent with those most generally in use. Similarly for the Michaelson-Morley data. To reconcile these with the observations required fundamental changes to other basic assumptions. Most data are not like this. They can be reconciled by adding further (possibly ad hoc) assumptions or massaging some principles in new ways. But butting up against a fundamental principle is not that common. That’s why CEs and KD is interesting and worth looking for.

[2] The term “killer data” is found in a great new book on the rise of modern science by David Wootton (here). He argues that the existence of KD is a crucial ingredient in the emergence of modern science. It’s a really terrific book for those of you interested in these kinds of issues. The basic argument is that there really was a distinction in kind between what came after the scientific revolution and its precursors. The chapter on how perspective in painting fueled the realistic interpretation of abstract geometry as applied to the real world is worth the price of the book all by itself.

[3] In this, my list fails to have one property that Wootton highlighted. KDs as a matter of historical fact are widely accepted and pretty quickly too. Not all my candidate KDs have been as successful (tant pis), hence the bracketed qualifying modal.

[4] Please note the conditional: the KD shows that transformations are not linearly sensitive. This presupposes that Y/N questions are transformationally derive. Syntactic Structures argued for a transformational analysis of Aux fronting. A good analysis of the reasons for this is provided in Lasnik’s excellent book (here). What is important to note is that data can become KD only given a set of background assumptions. This is not a weakness.

[5] This raises another question that Chomsky has usefully pressed: why don’t G operations exploit the string properties of phrase markers? His answer is that PMs don’t have string properties as they are sets and sets impose no linear order on their elements.

[6] Note: that R relates nominals does not imply that it cannot have the semantic reflex of lowering the additcity of a predicate. So, R applies to John hugged himself to relate the reflexive and John. This might reduce the addicity of hug from 2-place to 1-place. But this is an effect of the rule, not a condition of the rule. The rule could care less whether the relata are co-arguments.

[7] There are some theories that obscure this conclusion by distinguishing between semantic and syntactic predicates. Such theories acknowledge the point made here in their terminology. R is not an addicity changing operation, though in some cases it might have the effects of changing predicate addicity (see note 6).

This, btw, is one of my favorite KDs. Why? Because it makes sense in a minimalist setting. Say R is a rule of G. Then given Inclusiveness it cannot be an addicity changing operation for this would be a clear violation of Inclusiveness (which, recall, requires preserving the integrity of the atoms in the course of a derivation and nothing violates the integrity of a lexical item more than changing its argument structure). Thus, in a minimalist setting, the first view of R seems ruled out.

We can, as usual, go further. We can provide a deeper explanation for this instance of Inclusiveness and propose that addicity changing rules cannot be stated given the right conception of syntactic atoms (this parallel to how thinking of Merge as outputting sets thereby makes impossible rules that exploit linear dependencies among the atoms (see note 3)). How might we do this? By assuming that predicates have at most one argument (i.e. they are 1-place predicates). This is to effectively endorse a strong neo-Davidsonian conception of predicates in which all predicates are 1-place predicates of events and all “arguments” are syntactic dependents (see e.g. Pietroski here for discussion). If this is correct, then there can be no addicity changing operations grammatically identifying co-arguments of a predicate, as predicates have no co-arguments. Ergo, R is the only kind of rule a G can have.

[8] If memory serves, I think that he showed this in his Connectedness book.

[9] Edwin Williams developed this theory. Ivan Sag argued for a deletion theory. Empirically the two were hard to pull apart. However in the context of GB, Williams argued that the interpretive theory was more natural. I think he had a point.

[10] For what it is worth, I have always found the P-stranding facts to be the more compelling. The reason is that all agree that at LF P-stranding is required. Thus the LF of To whom did you speak? involves abstracting over an individual, not a PP type. In other words, the right LF involves reconstructing the P and abstracting over the DP complement; something like (i), not (ii):

(i) Who₁ [you talk to x₁]

(ii) [To who]₁ [you talk x₁]

An answer to the question given something like (i) is ‘Fred.’ An answer to (ii) could be ‘about Harry.’ It is clear that at LF we want structure like (i) and not (ii). Thus, at LF the right structure in every language necessarily involves P-stranding, even if the language disallows P-stranding syntactically. This is KD for theories that license ellipsis at LF via interpretation rather than via movement plus deletion.

21 comments:

OmerMay 30, 2016 at 7:58 AM
This comment has been removed by the author.
ReplyDelete
Replies
OmerMay 30, 2016 at 8:19 AM
Ooh, I like this game. Here are some of my favorites:

• raising-to-ERG in Basque (Artiagoitia 2001, in Herschensohn et al. (eds.); Rezac, Albizu & Etxepare 2014, NLLT): KD for the inherent case theory of ergative case

• multiple heads, in different clauses, all agreeing with the same argument in all available phi features (Polinsky & Potsdam 2001, NLLT): KD for the Activity Condition

• ABS case in Basque in structures where agreement has been visibly disrupted (Etxepare 2006, ASJU (the International Journal of Basque Linguistics and Philology)): KD for the idea that structural case is assigned by agreement

• A-movement of inherent-case marked noun phrases in Icelandic (Zaenen, Maling, & Thráinsson 1985, NLLT; among many others): KD for the idea that raising in Germanic is case-driven [[[NB: Like the Ross/Merchant stuff cited in the body of Norbert's post, this is a good example of how certain ideas -- in this instance, raising being case-driven -- continued to pervade the field long after their empirical expiration date had passed.]]]

• sensitivity of the Person Case Constraint (PCC) to the relative hierarchical organization of the ABS and DAT arguments in 2-place unaccusatives in Basque (Albizu 1997, ASJU; Rezac 2008, NLLT): KD for the idea that the PCC is a morphological filter

• the distribution of ACC in Sakha (Baker & Vinokurova 2010, NLLT): KD for the idea that accusative case is generally assigned by a head (e.g. v)
ReplyDelete
Replies
Alex ClarkMay 30, 2016 at 9:05 AM
Huybrechts and Shieber's work on Dutch and Swiss German that was the KD for GPSG, and the idea that languages were even weakly context-free.

(And Piraha surely is KD for something but I am not sure what ...)
ReplyDelete
Replies
AnonymousMay 30, 2016 at 11:41 PM
Tangkic languages and the idea that agreement depends on local spec-head relationships (alive in Baker 2008, apparently finally abandoned now).
ReplyDelete
Replies
UnknownMay 31, 2016 at 3:55 AM
I'm not sure about number 2, even after reading footnotes 6 and 7. What about versions of categorial grammar in which "expects __ to win" is a possible constituent, which gets its arity reduced by "herself"?
ReplyDelete
Replies
davidadgerMay 31, 2016 at 5:16 AM
I've also always been fond of the C&C view as opposed to the horror that is barriers. I'm not sure Kayne's data is KD for the idea of a functional definition of ECs though (as opposed to the particular execution of that idea that Chomsky offers in C&C). You can imagine, for example, that movement and islands are not so tightly linked (if, say, movement has to be preceded by Agree, and Agree is constrained by locality, as Gillian and I said in our 2005 LI paper), and that might allow you to say that PGC's involve a Comp-->EC relation with no movement, where the nature of the tail of the constructed Agree-chain is determined by the value that Comp gives it plus its locally determined case properties. I quite like that idea as it allows you to reduce the non-overtness of PGCs to a general capacity for licensed pronouns to be non-overt.
ReplyDelete
Replies
UnknownJune 2, 2016 at 1:13 PM
I have some KD for the MTC:

https://www.linguistics.ruhr-uni-bochum.de/~kiss/publications/stc.pdf

https://www.uni-salzburg.at/fileadmin/multimedia/Linguistik/Control_does_not_involve_movement-revised_01.pdf

I find this whole discussion a bit hypocritial because advocats of the MTC like Hornstein have been ignoring the crucial data for years.
ReplyDelete
Replies
UnknownJune 2, 2016 at 3:25 PM
Dear Norbert,

sorry, I wasn’t aware that you knew this Hornstein guy. ;-)

I apologize for the harsh tone of my comment. I mean it. It is way too late for me to write commentaries. It is only that (in addition to me being physically tired) I’m a bit tired of this whole discussion about MTC and backward control, which has been going on for a while. As a syntactician working on German, I still find the arguments put forward by the MTC camp not very convincing, because they obviously make the wrong predictions for German (see e.g. the paper by Haider I linked). I'm fully aware that you weren’t writing about the MTC in general in your post, but let me rephrase what was going on in my mind:

Do you consider the empirical evidence from German (but also from other languages) put forward by Haider and Kiss against the MTC as KD? After all, the paper by Tibor Kiss has been around for a while, and I think you - I mean Hornstein - even quotes it in his 2010 book.

What would be the CE for the MTC?

These are not rhetorical questions, i.e. I'd be really interested in your answer.

Best,

Oliver
ReplyDelete
Replies
Alex DrummondJune 3, 2016 at 4:27 AM
@Oliver. I'm wary of starting an MTC debate here, but the Haider paper is a little frustrating in that it doesn't reference a lot of the MTC literature that's attempted to deal with some of the problems it raises. For example, with regard to section 3, non-subject-oriented OC into adjucnts is discussed on p. 98 of Move!. And surely at least some of the discussion of passives in section 5.2 of the Boeck, Hornstein and Nunes monograph has got to be relevant to section 2, but the discussion there is never mentioned.
ReplyDelete
Replies
GaryJune 4, 2016 at 2:41 AM
Doesn't this discussion indicate that syntax is for the most part too young a science for KD/CE? I mean I can imagine the same kind of to and fro emerging on case 4 (theories of ellipsis) if, say, Mark Steedman started chipping in (eg deletion advocates like myself have argued that case matching needs to be stipulated as a condition on remnants and not silent structure, and that stipulation can be bought by DI theories; p-stranding could prob be treated similarly, given that the theory of p-stranding is still up for grabs). Very local claims like the ones Omer cited above are probably good cases of KD, although even with those I feel like there might be someone lurking with a bunch of null operators/projections which they can argue for the existence of if cornered.
ReplyDelete
Replies
AlanJune 7, 2016 at 8:41 PM
KD for the claim that the CSC is not a constraint on movement: in languages with productive resumptive pronouns, a RP can save any island except a CSC island. I think the initial observation goes back to Grant Goodall using data from Carol Georgopoulos' 1985 paper "Variables in Palauan Syntax".
ReplyDelete
Replies
NorbertJune 15, 2016 at 7:36 AM
Good points. I agree with your first point completely. As you know, since Larson and Reinhart there has been an effort to do away with linear effect in G. I believe that the field has largely concluded that this effort has been successful. I have no opinion on this myself, but it does seem true that the effects of linearity are mainly at the edges (but this is a value judgment). That said, your point is right on.

Your second point is also well put. IMO, the English data are pretty conclusive wrt "real" reflexivity. The ECM cases don't look like logophors or emphatic reflexives. If these are indeed real reflexives then the co-argument theory seems to me dead. That does not imply that things are entirely clear even given this. I agree that there are various kinds of anaphoric expressions (and maybe, many kinds of anaphoric relations), but thinking that local reflexives are local in virtue of being co-arguments seems to me a non-starter.
ReplyDelete
Replies

Add comment