Friday, September 28, 2018

Linguistic experiments

How often do we test our theories and basic concepts in linguistics? I don’t know for sure, but my hunch is that it is not that often. Let me explain.

One of the big ideas in the empirical sciences is the notion of the crucial experiment (or “experimentum crucis” (EC) for those of you who prefer “ceteris paribus” to “all things being equal” (psst, I am one of those so it is ‘EC’ from now on) (see here). What is an EC?  Wikepedia says the following:

In the sciences, an experimentum crucis (English: crucial experiment or critical experiment) is an experiment capable of decisively determining whether or not a particular hypothesis or theory is superior to all other hypotheses or theories whose acceptance is currently widespread in the scientific community. In particular, such an experiment must typically be able to produce a result that rules out all other hypotheses or theories if true, thereby demonstrating that under the conditions of the experiment (i.e., under the same external circumstancesand for the same "input variables" within the experiment), those hypotheses and theories are proven false but the experimenter's hypothesis is not ruled out.

The most famous experiments in the sciences (e.g. Michelson-Morley on Special Relativity, Eddington’s on General Relativity, Aspect on Bell’s inequality) are ECs, including those that were likely never conducted (e.g. Galileo’s dropping things from the tower). What makes them critical is that they are able to isolate a central feature of a theory or a basic concept for test in a local environment where it is possible to control for the possible factors. We all know (or we all shouldknow) that it is very hard to test an interesting theoretical claim directly.[1]As the quote above notes, the test critically relies on carefully specifying the “conditions of the experiment” so as to be able to isolate the principle of interest enough for an up or down experimental test.

What happens in such an experiment? Well, we set up ancillary assumptions that are well grounded enough to allow the experiment to focus on the relevant feature up for test. In particular, if the ancillary assumptions are sufficiently well grounded in the experimental situation then the proposition up for test will be the link in the deductive structure of the set up that is most exposed by the test. 

Ancillary assumptions are themselves empirical and hence contestable. That is why ECs are so tough to dream up: to be effective these ancillary assumptions must in the context of the experimental set upbe stronger than the theoretical item they are being used to test. If they are weaker than the proposition to be tested then the EC cannot decisively test that proposition. Why? Well, the ancillary assumption(s) will be weaker links in the chain of experimental reasoning and an experimental result can always be correctlycausally attributed to the weaker ancillary assumptions. This will spare exposure of the theoretically principle or concept of interest directly to the test. However, and this is the important thing, it is possible in a given contextto marshal enough useful ancillary assumptions that are better grounded in that contextthan the proposition to be tested. And when this is possible the conditions for an EC are born.

As I noted, I am not sure that we linguists do much ECing. Yes, we argue for and against hypotheses and marshal data to those ends, but it is rare that we set things up to manufacture a stable EC. Here is what I mean.

A large part of linguistic work aims less to test a hypothesis than to apply it (and thereby to possibly(not this is a possibility, not a necessity) refine it). For example, say I decide to work on a certain construction C in a certain language L. Say C has some focus properties, namely when the expression appears in a designated position distinct from its “base” position it bears a focus interpretation. I then analyze the mechanisms underlying this positioning. I usemovement theory to triangulate on the kind of operation might be involved. I test this assumption by seeing if it meets the strictures of Subjacency Theory (allows unbounded dependencies yet obeys islands) and if it does, I conclude it is movement. I then proceed to describe some of the finer points of the construction given that it is an A’-movement operation. This might force a refinement of the notion of movement, or island or, phase to capture all the data, but the empirical procedure presupposes that the theory we entered the investigation with is on the right track though possibly in need of refinement within the grammar of L. The empirical investigation’s primary interest is in describing C in L and in service of this it will refine/revise/repurpose (some) principles of FL/UG. 

This sort of work, no matter how creative and interesting is unlikely to lead to a EC of the principles of FL/UG precisely because of its exploratory nature. The principles are more robust than the ancillary assumptions we will make to fit the facts. And if this is so, we cannot use the description to evaluate the basic principles. Quite the contrary. So, this kind of work, which I believe describes a fair chunk of what gets done, will not generally serve EC ends.

There is a second impediment to ECs in linguistics. More often than not the principles are too gauzy to be pinned down for direct test. Take for example the notion of “identity” or “recoverability.” Both are key concepts in the study of ellipsis, but, so far as I can tell, we are not quite sure how to specify them. Or maybe a more accurate claim would be is that we have many many specifications. Is it exact syntactic identity? Or identity as non-distinctness? Or propositional (semantic) identity? Identity of what object at what level?  We all know that something likeidentity is critical, but it has proven to be very hard to specify exactly what notion is relevant. And of course, because of this, it is hard to generate ECs to test these notions. Let me repeat: the hallmark of a good EC is its deductive tightness. In the experimental situation the experimental premises are tight enough and grounded enough to focus attention on the principle/concept of interest. Good ECs are very tight deductive packages. So constructing effective ones is hard and this is why, I believe, there are not many ECs in linguistics.

But this is not always so, IMO. Here are some example ECs that have convinced me.

First: It is pretty clear that we cannot treat case as a byproduct of agreement. What’s the EC?[2]Well one that I like involves the Anaphor Agreement Effect (AAE). Woolford (refining Rizzi) observed that reflexives cannot sit in positions where they would have to value agreement features on a head. The absence of nominative reflexives in languages like English illustrates this. The problem with them is not that they are nominatively case marked, but that they must value the un-valued phi features of T0and they cannot do this. So, AAE becomes an excellent phi-feature detector and it can be put to use in an EC: if case is a byproduct of phi-feature valuation then we should never find reflexives in (structurally) case marked positions. This is a direct consequence of the AAE. But we do regularly find reflexives in non-nominative positions, hence it must be possible to assign case without first valuing phi-features. Conclusion: case assignment need not piggy back on phi-feature valuation. 

Note the role that the AAE plays in this argument. It is a relatively simple and robust principle. Moreover, it is one that we would like to preserve as it explains a real puzzling fact about nominative reflexives: they don’t robustly exist! And where we do find them, they don’t come from T0s with apparent phi-features and where we find other case assigning heads that do have unvalued phi-features we don’t find reflexives. So, all in all, the AAE looks like a fairly decent generalization and is one that we would like to keep. This makes it an excellent part of a deductive package aimed at testing the idea that case is parasitic on agreement as we can lever its retention into an probe of some idea we want to explore. If AAE is correct (main assumption), then if case is parasitic on agreement we shouldn’t see reflexives in case positions that require valuing phi features on a nearby head. If case is not parasitic on phi valuation then we will. The experimental verdict is that we do find reflexives in the relevant domains and the hypothesis that case and phi-feature valuation are two sides of the same coin sinks. A nice tight deductive package. An EC with a very useful result.

Second: Here’s a more controversial EC, but I still think is pretty dispositive. Inverse control provides a critical test for PRO based theories of control. Here’s the deductive package: PRO is an anaphoric dependent of its controller. Anaphoric dependents can never c-command their antecedents as this would violate principle C. Principle C is a very robust characteristic of binding configurations. So, a direct consequence of PRO based accounts of control is the absence of inverse control configurations, configurations in which “PRO” c-commands its antecedent. 

This consequence has been repeatedly tested since Polinksy and Potsdam first mooted the possibility in Tsez and it appears that inverse control does indeed exist. But regardless of whether you are moved by the data, the logic is completely ECish and unless there is something wrong with the design (which I strongly doubt) it settles the issue of whether Control is a DP-PRO dependency. It cannot be. Inverse control settles the matter. This has the nice consequence that PRO does not exist. Most linguists resist this conclusion but, IMO, that is because they have not fully taken on board the logic of ECs.

Here’s a third and last example: are island effects complexity effects or structural effects? In other words, are island effects the reflections of some generic problem that islands present cognition with or something specific to the structural properties of islands? The former would agree that island effects exist but that they are due to, for example, short term memory overload that the parsing of islands induces. 

The two positions are both coherent and, truth be told, for theoretical reasons, I would rather that the complexity story were the right one. It would just make my life so much easier to be able to say that island effects were not part of my theoretical minimalist remit. I could then ignore them because they are not really reflections of the structure of FL/UG and so I would not have to try and explain them! Boy would that be wonderful! But much as I would love this conclusion, I cannot in good scientific conscience adopt it for Sprouse and colleagues have done ECs showing that it is very very likely wrongwrong. I refer you to the Experimental Syntax volume Sprouse and I edited for discussion (see here) and details. 

The gist of the argument is that were islands reflexes of things like memory limitations then we should be able to move island acceptability judgments around by manipulating the short term memory variable. And we can do this. Humans come in strong vs weak short term memory capacities. We even have measures of these. Were island effects reflections of such memory capacity, then island effects would differentially affect these two groups. They don’t so it’s not. Again the EC comes in a tight little deductive box and the experiment (IMO) decisively settles the matter. Island effects, despite my fondest wishes really do reflect something about the structurallinguisticproperties of islands. Damn!

So, we have ECs in linguistics and I would like to see many more. Let me end by saying why.  I have three reasons.

First, it would generate empirical work directly aimed at theoretically interesting issues. The current empirical investigative instrument is the analysis, usually of some construction or paradigm. It starts with an empirical paradigm or construction in some L and it aims at a description and explanation for that paradigm’s properties. This is a fine way to proceed and it has served us well. This way of proceeding is particularly apposite when we are theory poor for it relies on the integrity of the paradigm to get itself going and reaches for the theory in service of a better description and possible explanation. And, as I said, there is nothing wrong with this. However, though it confronts theory, it does so obliquely rather than directly. Or so it looks to me.

To see this, contrast this with the kind of empirical work we see more often in the rest of the sciences. Here empirical work is experimental. Experiments are designed to test the core features of the theory. This requires, first, identifying and refining the key features of the leading ideas, massaging them, explicating them and investigating their empirical consequences. Once done, experiments aim to find ways of making these consequences empirically visible. Experiments, in other words, require a lot of logical scaffolding. They are not exploratory but directed towards specific questions, questions generated by the theories they are intended to test. Maybe a slogan would help here: linguistics has lots of exploratory work, some theoretical work but only a smidgen of experimental work. We could do with some more.

Second, experiments would tighten up the level of argument. I mentioned that ECs come as tight deductive packages. The assumptions, both what is being tested and the ancillary hypotheses must be specified for an EC to succeed. This is less the case for exploratory work. Here we need to string together principles and facts in a serviceable way to cover the empirical domain. This is different from building an airtight box to contain it and prod it and test it. So, I think that a little more experimental thinking would serve to tighten things up.

Third, the main value of ECs is that it eliminates theoretical possibilities and so allows us to more narrowly focus theory construction. For example, if case is not parasitic on agreement then this suggests different theories of case than ones where they must swing together. Similarly, if PRO does not exist, then theories that rely on PRO are off on the wrong track, no matter how descriptively useful they might be. The role of experiments, in the best of all possible worlds, is to discard attractive but incorrect theory. This is what empirical work is for, to dispose. Now, we do not (and never will) live in the best of all possible scientific worlds. But this does not mean that getting a good bead on the empirical standing of our basic concepts experimentally is not useful. 

Let me finish by adding one more thing. Our friends in psycho ling do experiments all the time. Their culture is organized around this procedure. That’s why I have found going to their lab meetings so interesting. I think that theories in Ling are far better grounded and articulated than theories in psycho-ling (that is my personal opinion) but their approach often seems more direct and reasonable. If you have not been in the habit of sitting in on their lab meetings, I would recommend doing so. There is a lot to recommend the logic of experimentation that is part of their regular empirical practice.


[1]Part of the problem with languists’ talking about Chomsky’s linguistic conception of universals is that they do not appreciate that simply looking at surface forms is unlikely to bear much on the claim being made. Grammars are not directly observable. Languists take this to imply that Chomskyan universals are not testable. But this is not so. They are not triviallytestable, which is a whole different matter. Nothing interesting is trivially testable. It requires all sorts of ancillary hypotheses to set the stage for isolating the relevant principle of interest. And this takes lots of work. 
[2]This is based on discussions with Omer. Thx.

No comments:

Post a Comment