Comments

Friday, March 30, 2018

Three varieties of theoretical research

As FoLers know, I do not believe that linguists (or at least syntacticians) highly prize theoretical work. Just the opposite, in fact. This is why, IMO, the field has tolerated (rather than embraced) the minimalist project (MP) and why so many professionals believe MP to have largely been a failure despite, what (again IMO) is its evident overall successes. As I’ve argued this at length before, I will not do so again here. Rather I would like to report on an interesting paper that I have just re-read that tries to elucidate three distinct kinds of theoretical work. The paper is an old one (published in 2000). It’s called “Thinking about Mechanisms” and the authors, three philosophers, are Peter Machamer, Lindley Darden and Carl Craven (MDC). Here is a link. The paper concentrates on elucidating the notion of a mechanism and argues that it is the key explanatory notion within neurobiology and molecular genetics. The discussion is interesting and I recommend it. In what follows, I would like to pick out some points at random that MDC makes and relate it to linguistic theorizing. This, I hope, will encourage others to look more kindly on theoretical work.

MDC defines the notion of a as follows:

Mechanisms are entities and activities organized such that they are productive of regular changes from start or set-up to finish or termination conditions… To give a description of a mechanism for a phenomenon is to explain that phenomenon, i.e. to explain how it was produced. (3)

So, mechanisms are theoretical constructs whose features (the “entities,” their “properties,” and the “activities” they partake in) explain how phenomena of interest arise. So mechanisms produce phenomena (in biology, in real time) in virtue of the properties of their parts and the activities they engender. [1]

MDC divides a mechanistic description into three parts: (i) Set-up Conditions, (ii) Termination Conditions and (iii) Intermediate activities.

The first, set-up conditions, are “idealized descriptions” of the beginning of the mechanism. Termination conditions are “idealized states or parameters describing a privileged endpoint.” The intermediate steps provide an account of how one gets from the initial set-up to the termination conditions, which describe the phenomenon of interest. (11-12)

This should all sound vaguely familiar. To me it sounds very much like what linguists do in providing a grammatical derivation of a sentence of interest. We start with an initial structure (e.g. a D(eep) S(tructure) representation and explain some feature of a sentence (e.g why the syntactic subject is interpreted as a thematic object) by showing how various operations (i.e. transformations) lead from the initial to the termination state. Doing this explains why the sentence of interest has the properties to be explained. Indeed, it has them in virtue of being the endpoint of the licit derivation provided.

Note too that both mechanisms and GPs focus on idealized situations. GPs describes the linguistic competence of an ideal speaker-hearer or the FL of an idealized LAD. So too with biological mechanisms. They describe idealized hearts or kidneys or electrical conduction at a synapse. Actual instances are not identical to these, though they function in the same ways (it is hoped). No two hearts are the same, yet every idealized heart is identical to any other.

This said, derivations are not actually mechanisms in MDC’s sense for they do not operate in real time (unlike the one’s biologists are typically describing (e.g. synaptic transmission or protein synthesis)). However, generative procedures (GP) are the “mechanisms” of interest within GG for it is (at least in large part) in virtue of the properties of GPs that we explain why native speakers judge the linguistic objects in their native languages as they do and why Gs have the properties they have. Furthermore, as in biology, the aim of linguistics is to elucidate the basic properties of GPs and try to explain why they have the properties they have and not others. So, GPs in linguistics are analogous to mechanisms in other parts of biology. Phenomena are interesting exactly to the degree that they serve to shine light on the fine structures of mechanisms in biology. Ditto with GPs in linguistics.

MDC notes that a decent way to write a history of biology is to trace out the history of its mechanisms. I cannot say whether this is so for the rest of biology, but as regards linguistics, there are many worse ways of tracing the history of modern GG than by outlining how the notion of GP has evolved over the last 60 years.  There is a reasonable argument to be made (and I have tried to make it (see here and following four posts) that the core understanding of GP has become simpler and more general in this period, and that the Minimalist Program is conservative extension of prior work describing the core properties of a human linguistic GP. Not surprisingly, this has analogues in the other kinds of biological theorizing MDC discusses.

So, the core explanatory construct in biology according to MDS is the mechanism. As MDC puts it: “…a mechanistic explanation…renders a phenomenon intelligible…Intelligibility arises not from an explanation’s correctness, but rather from an elucidative relation between the explanans (the set-up conditions and the intermediate entities and activities) and the explanadum (the termination condition or the phenomenon to be explained)” (21).

MDC is at pains to point out that this “elucidative relation” holds regardless of the accuracy of the description. So explanatory potential is independent of truth, and what theory aims at are theories with such potential. Explanatory potential relies on elucidating how something could work, not how it does. The gap between possibility and actuality is critical for the theoretical enterprise. It’s what allows it a certain degree of autonomy.  

For such autonomy to be possible it is critical to appreciate that explanatory potential (what I have elsewhere called “oomph”) is not reducible to regularity of behavior. Again MDC (21-22):

We should not be tempted to follow Hume and later logical empiricists into thinking that the intelligibility of activities (or mechanisms) is reducible to their regularity. Description or mechanisms render the end stage intelligible by showing how it is produced by bottom out entities and activities. To explain is not merely to redescribe one regularity as a series of several. Rather, explanation involves revealing the productive relation. It is the unwinding, bonding, and breaking that explain protein synthesis; it is the binding, bending, and opening that explain the activity of Na+ channels. It is not the regularities that explain the activities but the activities that sustain the regularities.

In other words, mechanisms are not (statistical) summaries of what something regularly does. Regularities/summaries do not (and cannot) explain, and as mechanisms aim to explain they must be more than such summaries no matter how regular. Mechanisms outline how a phenomenon has (or could have) arisen, and this requires outlining the structures and principles that mechanisms deploy to “generate” the phenomenon of interest.[2]

Importantly, it is the relative independence of explanatory potential from truth that allows theory to have an independent existence. MDC suggest three different grades of theoretical involvement summed up in three related but different questions: How possibly? How plausibly? How actually? Let me elaborate.

Explanations are hard. They are hard precisely because they must go beyond recapitulating the phenomenon of interest. Finding the right concepts and putting them together in the right way can be demanding. Here is an example of what I mean (see here for an earlier discussion).

The Minimalist Program (MP) has largely ignored ECP effects of the argument/adjunct asymmetry variety.  Why so? I would contend it is because it is quite unclear how to understand these effects in MP terms. In this respect ECP effects contrast with island effects. There are MP compatible versions of the latter, largely recapitulating earlier versions of Subjacency Theory. IMO, such accounts are not particularly elegant, nor particularly insightful. However it is possible to pretty directly trade bounding nodes for phases, escape hatches for phase edges and Subjacency Principles for Phase Impenetrability Conditions in largely a one for one swap and thereby end up with a theory no worse than the older GB stories but cast in an acceptable MP idiom. This does not constitute a great theoretical leap forward (and so, if this is correct, for these phenomena thinking a la MP does not deepen our understanding), but at least it is clear how island effects could hold within an MP style conception of G. They reduce to Subjacency Effects albeit with all the parts suitably renamed. In other words, the theoretical and conceptual resources of MP are adequate to recapitulate (if not much illuminate) the theoretical and conceptual resources of earlier GB.

This is not so for ECP effects. Why not? Well for several reasons, but the two big ones are that the ECP is a trace licensing condition and the technology behind it appears to run afoul of inclusiveness. Let’s discuss each point in turn.

The big idea behind the ECP is that traces are grammatically toxic unless tamed. They can be tamed by being marked (gamma-marked) by a local antecedent throughout the course of the derivation. The distinction between arguments and adjuncts arises from the assumption that argument A’-chains can be reduced, thereby eliminating their -gamma marked carriers and thereby not cancelling the derivation at LF (recall, -gamma marked expressions kill a derivation). So, traces are toxic, +gamma marking tames them, and deletion acts differently for adjuncts and arguments which is why the former are more restricted than the latter. This, plus a kind of uniformity principle on chains (not a great or intuitive principle IMO, but maybe this is just me) which invidiously distinguishes adjunct from argument chains,[3] yields the desired empirical payoff.

Given the complexity of the ECP data, this is an achievement. Whether it constitutes much of an explanation is something people can disagree about. However, whatever it’s value, it runs afoul of what appear to be basic MP assumptions. For example, MP eschews traces, hence there is little conceptual place for a module of the grammar whose job it is to license them. Second, MP derivations reject adding little diacritics to expressions in the course of a derivation. If indices are technicalia non grata, what to make of +/-gamma marks. Last, MP derivations are taken to be monotonic (No Tampering), hence frowning upon operations that delete information on the “LF” side of a derivation. But deleting –gamma marked traces is what “explains” the argument/adjunct difference. So, the standard GB story doesn’t really fit with basic MP assumptions and this makes it fruitful to ask how ECP effects could possibly be modeled in MP style accounts.  And this is a job for theorists: to come up with a story that could fit, to find the right combination of MP compatible concepts that would yield roughly the right empirical outcomes.[4] The theoretical challenge, then, in the first instance, is to explain how to possibly fit ECP effects into an MP setting, given the elimination of traces and a commitment to derivational monotonicity.

There are additional why questions out there begging for how possible scenarios: e.g. Why case? Why are phrase markers organized so that theta domains are within case/agreement domains that are within A’ (information structure) domains? Why are reflexivization and pronominalization in complementary distribution? Why is selection and subcategorization so local? I could go on and on. These why questions are hard not because we have tons of possible explanatory options but cannot figure out which one to run with, but because we have few candidate theories to run with at all. And that is a theoretical challenge, not just an empirical one. It’s in situations like these that how-possibly becomes a pressing and interesting issue. Sadly, it is also something that many working syntacticians barely attend to.

MDC notes a second level of theoretical involvement: how plausible is a certain possible story. Clearly to ask this requires having a how possible scenario or two sketched out. It is tempting to think that plausibility is largely a matter of empirical coverage. But I would like to suggest otherwise. IMO, plausibility is evaluated along two dimensions: how well the novel theory covers the older (gross) empirical terrain and how many novel lines of inquiry it prompts. A theory is plausible to the degree that it largely conserves the results and empirical coverage of prior theory (what one might call the “stylized facts”) and the degree to which it successfully explains things that earlier theory left stipulative. Again, let me illustrate.

Clearly, plausibility is more demanding than possibility. Plausible theories not only explain, but have verisimilitude (we think that they have a decent chance of being correct). What are the marks of a plausible account? Well, they cover roughly the same empirical territory of the theory they are replacing and they explain what earlier theory stipulated. Here are a couple of examples.

I believe that movement theories of binding and control are plausible precisely because they are able to explain why Obligatory Control (OC) and reflexivization have many of the properties they do. For example, we typically find neither in the subject position of finite clauses (e.g. John expects PRO to/*will win, John expects him(he)self to/*will win). Why not? Well if the movement theory is right, then they are parts of A-chains and so should pattern like what we find in analogous raising constructions (e.g. John was expected t to/*would win), and they do. So the movement theory derives what is largely stipulated in earlier accounts and exposes as systematic relations that earlier theory treated as coincidental (that finite subject positions don’t allow PRO, reflexives or A-traces).  Does his make such accounts true? Nope. But it does enhance their plausibility. Thus, being able to unify these disparate phenomena and provide principled explanations for the distribution of OC PRO and for the relative paucity of nominative reflexives enhances their claims on truth.

Note that here plausibility hinges on (1) accepting that prior accounts are roughly descriptively accurate (i.e. doing what decent science always does; building on past work and insight) and (2) explaining their stipulated features in a principled way. When a story has these two features it moves from possible to plausible. Of course, demonstrating plausibility is not trivial, and what some consider plausibility enhancing others will find wanting. But that is as it should be. The point is not that theorizing is dispositive (nothing is) but that it strives for goals different from empirical coverage (and this is not intended to disparage the latter).

Let me out this another way. When one has a possible explanation in hand it is time to start looking for evidence in its favor. In other words, rather than looking for ways to reject the account one looks for reasons to accept it as a serious one. Trying to falsify (i.e. rigorously test) a proposal has its place, but so does looking for support. However, trying to falsify a possible theory is premature. What one should test are the plausible ones, and that means finding ways to elevate the possible to a higher epistemological plain; the territory of the plausible. That’s what how-plausibly theory aims to do; find the fit between something that is possible and what has come before and showing that the new possible story is a fecund extension of the old. It is an extension in that it covers much of the same territory. It is fecund in that it improves on what came before. This kind of theorizing is also hard to pull off, but like how-possible theory, it relies heavily on theoretical imagination.

Which brings us to how-actually investigations. This is where the theory and the data really meet and where something that family resembles falsification comes into play. Say we have a plausible theory, the next step is to tease out ways of testing its central assumptions. This, no doubt, sounds obvious. But I would beg to differ. Much of what goes on in my little area of linguistics fails to test central postulates and largely concentrates on seeing how to fit current theoretical conceptions to available data (e.g. how to apply a Probe-Goal account to some configuration of agreement/case data). There is nothing wrong with this, of course. But it is not quite “testing” the theory in the sense of isolating its central premises/concepts and seeing how they fly. Let me give you an example.

I personally know of very few critical tests driven by thoughtful theorizing. But I do know of one: the Aoun/Choueiri account of reconstruction effects (RE). The reasoning is as follows: If REs are reflections of the copy theory of movement (as every good Minimalist believes) then where there is no movement, there should be no reconstruction (notice movement is a necessary, not sufficient condition for RE). There is no movement from islands, therefore there should be no RE within islands. Aoun/Choueiri then goes onto argue that resumption in Lebanese Arabic is a movement dependency (Demirdache argued this first I believe) and argues that whereas REs are available when an antecedent binds a resumptive outside an island, they fail systematically to arise with resumptives inside islands. This argues for two central conclusions: (i) that REs are indeed parasitic on movement and (ii) that resumption is a movement dependency. This vindicates the copy theory, and with it a central precept of MP.

For now forget about whether Aoun/Choueiri is right about the facts.[5] The important point here is the logic. The test is interesting because it very clearly implicates key features of current theory: the copy theory of movement, islands as restrictions on movements and REs as piggybacking on copies.  These are three central features and the argument if correct tests them. And this is interesting precisely because they are central ideas in any MP style account. Moreover, it is very clear how the premises bear on the testable conclusion.[6] They can be laid out (that’s where theory comes in BTW, in laying out the premises and showing how together they have certain testable consequences) and a prediction squeezed from them. Moreover, the premises, as noted, are theoretically robust. The Copy Theory of Movement is a core feature of MP architectures, locality as islands are central parts of any reasonable GG theory of syntax. Hence if these pulled apart it would indicate something seriously amiss with how we conceptualize the fundamentals of FL/UG. And that is what makes the Aoun/Choueiri argument impressive.

Like I said, I personally know of only a couple of cases like this. What makes it useful here is that it illustrates how to successfully do how-actually theory (i.e. it is a paradigm case of how-actually theoretical practice). Find consequences of core conceptions and use them to test the core ideas.  We all know that most of what we believe today is likely wrong at least in detail. Knowing this however, does not mean that we cannot test the core features of our accounts. But this requires determining what is central (which requires theoretical evaluation and judicious imagination) and figuring out how to tease consequences from them (which requires analytical acumen). In testing a proposal to see how-actual we need to lead from theory to data, and this means thinking theoretically by respecting the deductive structure that makes a theory the theory it is.

How possible, how plausible, how actual; three grades of theoretical involvement. All are useful. All require attention to the deductive structure of the core ideas that constitute theory. All start with these ideas and move outwards towards the phenomena that, correctly used, can help us refine and improve them. Right now, theoretical work is largely absent from the discipline, at least the how possibly and how plausibly variety. Even the how actually kind is far less common than commonly supposed.


[1] MDC distinguishes “substantivilists” and “process” ontologists” wrt their different understanding of mechanism. The difference appears to reside in whether mechanisms comprise both “entities” and “activities” or whether activities alone suffice. MDC takes it as obvious that reducing entities to activities is hopeless (“As far as we know, there are no activities in …biology…that are not activities of entitites” (5)). I mention this for the discussion is redolent of the current discussion on FoL (Idsardi and Raimy discussing Hale and Reiss) concerning substance free phonology. It is curious that the same kind of discussion takes place in a very different venue and so it is worth taking a look at it in this domain to gain leverage on the one in ours.
[2] As an old friend (Louise Antony) once remarked: in answer to a question like “why did this book drop to the floor when I let go?” it is not helpful to answer that “it always drops whenever anyone lets go.”
[3] I am sure it is not news that the distinction is not accurately described in terms of arguments and adjuncts. But for the record, the absence of pair-list readings of WHs extracted from weak islands seems to show the same acceptability profile as adjuncts even though the WH moves from the complement position. The difference seems to be less argument/adjunct and more individual variable vs higher level variable interpretation.
[4] FWIW, I think that this is where maybe a minimality style explanation of the Rizzi-Cinque variety might be a better fit than the Lasnik-Saito/Barriers approach. But even this story need some detailed reworking.
[5] There is some evidence from Jordanian Arabic contradicting it, though I am not sure whether I believe it yet. Of course, you can take what I believe and still need the full fare of about $2 to get a metro ride in DC.
[6] I often find it surprising how few papers of a purported theoretical nature actually set out their premises clearly and deduce the conclusion of interest. More often, we sue theory like putty and smear it on our favorite empirical findings to see if copiously applied it can be used to hold the data together. Though this method can yield interesting results it is not theory driven and generally fails to address an identifiable theoretical question.
 

4 comments:

  1. I agree with the diagnosis but not with the prescribed solution. You say, "[These] why questions are hard not because we have tons of possible explanatory options but cannot figure out which one to run with, but because we have few candidate theories to run with at all."

    Let's look at this scenario, where the number of candidate explanations currently on the table is zero. (I actually think both scenarios (too many candidate explanations, or none at all) do occur, but let's indeed concentrate on the 'none' scenario.) Unless we have some prior knowledge about the space of imaginable explanations – and I think it's safe to say that we don't; such is the nature of imagination – what exactly guarantees us that by engaging in the activity of imagining, we will reach the right one in some bounded amount of attempts? Yes yes, I know that there are never any guarantees in science, but that's not quite what I mean here. I mean that even if we successfully move the number of candidate explanations from zero to one, or two, or ..., we are as likely to miss the mark as we are to hit it. (Again, pending someone coming up with a theory of probability distributions over things that we might imagine.) And, crucially, as I will sketch below, there is an alternative – which is why I don't think my position is quite as vulnerable to the "there are no guarantees in science" retort.

    The alternative is data-guided theorizing. Don't have a clue how to model ECP effects given current theoretical assumptions? Look for new kinds of relevant data. And, obviously, the 'relevant' bit should be in 96pt font or something. Just throwing data at a problem is useless if one seeks to explain, we agree on that. But specific empirical domains have a way of shining a light on particular mechanisms. One of the reasons I think traditional Case Theory is such an embarrassment is because it was initially developed on the basis of English and French. That's like studying the visual system using only subjects who have forks stuck in both eyes. You might learn interesting things! But it's probably not a sound strategy to look only there. On the other hand, you look at Icelandic and Basque and, voila, the truth just unfolds before you. Okay, that's a grave disservice to the intellectual depth it took to do the work that was done by Zaenen, Maling & Thrainsson; Yip, Maling & Jackendoff; Marantz; Bittner & Hale; and others. But my point is that certain empirical domains have a way of guiding you to the answer.

    So, obviously, just immersing yourself in data for data's sake is pointless if one's point is to find explanations. But fishing around for the kind of data that happens to wear the explanatory contours of the underlying phenomena on its sleeve is better, in my book, than just relying on our imaginations. Why? Because I have no clue about the structure of the space of imaginable possibilities, whereas I have some ideas (enlarge the font for 'some' in your mind's eye, as well) about where to look for interesting data...

    ReplyDelete
    Replies
    1. I kinda of agree in part. I take what you are saying to be that when you know nothing that there is no obvious strategy of how to push ahead and that going from data to theory is no worse than the reverse. I agree that this is possible. The cases that I considered I thought looked a little different from this: these are cases where the problem is pretty clear (e.g. how to handle ECP style data in an MP setting). It is not that we don't have the data, we do. The argument/adjucnt asymmetries I took to be real and important. The problem is how to transfer whatever explanatory oomph existed in the GB still analysis to the MP one. Here I thought that looking for how possibly stories made sense independently of looking for more data. At any rate, I would be happy with a draw here. I would be happy if you agreed that there is a virtue to a how possibly account independently of how much FURTHER data it invoked. Sometimes the problem is figuring out how to conceptually cover the train, not figuring out what terrain we want covered.

      Delete
    2. @Omer I'm a little confused by your comment.

      First, your prescription for problem cases is to "[l]ook for new kinds of relevant data" rather than look at explanatory options, but aren't those explanatory options precisely what determines what kinds of data are relevant?

      Second, are you suggesting that data from Icelandic and Basque has provided an explanation for c/Case? If so, what is the explanation?

      Delete
    3. @Dan: Since the ECP-related question (posed in the body of the post) was "how to model ECP effects," I was taking the case-related question to be "how to model case." And on that front, as far as I can tell, data from Icelandic and Basque have blown the traditional GB/MP model out of the water.

      Of course, when you say "an explanation for [case]," you may be referring to something else – something along the lines of, "why is there even such a thing as case in the grammar?" First, note that this is not the level of question Norbert was asking re:ECP effects. He states (repeatedly) that the question under discussion is how to model ECP effects. And for good reason. It's very difficult to ask "why is x so" questions without knowing with some precision what x actually is. As I have pointed out elsewhere on FoL, the last few decades have seen a lot of effort expended on "why is x so" questions regarding x's that turned out to be bogus in the first place. (My favorite example, of course, is the interpretable/uninterpretable feature distinction.)

      As for my prescription, you are right that the search for data fundamentally, inexorably theory-guided (as Darwin astutely observed quite a while ago: https://michaelshermer.com/2001/04/darwins-dictum/. My point is that the search for theory is also (or, should be at least) fundamentally, inexorably data-guided. Thus, instead of a strict ordering between the two (theories first, data second, or vice versa), the two should inform one another continuously.

      Delete