Monday, February 24, 2014

DTC redux

Syntacticians have effectively used just one kind of probe to investigate the structure of FL, viz. acceptability judgments. These come in two varieties: (i) simple “sounds good/sounds bad” ratings, with possible gradations of each (effectively a 6ish point scale ok, ?, ??, ?*, *, **), and (ii) “sounds good/sounds bad under this interpretation” ratings (again with possible gradations). This rather crude empirical instrument has proven to be very effective as the non-trivial nature of our theoretical accounts indicates.[1] Nowadays, this method has been partially systematized under the name “experimental syntax.” But, IMO, with a few important conspicuous exceptions, these more refined rating methods have effectively endorsed what we knew before. In short, the precision has been useful, but not revolutionary.[2]

In the early heady days of Generative Grammar (GG), there was an attempt to find other ways of probing grammatical structure. Psychologists (following the lead that Chomsky and Miller (1963) (C&M) suggested) took grammatical models and tried to correlate them with measures involving things like parsing complexity or rate of acquisition. The idea was a simple and appealing one: more complex grammatical structures should be more difficult to use than less complex ones and so measures involving language use (e.g. how long it takes to parse/learn something) might tell us something about grammatical structure. C&M contains the simplest version of this suggestion, the now infamous Derivational Theory of Complexity (DTC). The idea was that there was a transparent (i.e. at least a homomorphic) relation between the rules required to generate a sentence and the rules used to parse it and so parsing complexity could be used to probe grammatical structure.

Though appealing, this simple picture can (and many believed did) go wrong in very many ways (see Berwick and Weinberg 1983 (BW) here for a discussion of several).[3] Most simply, even if it is correct that there is a tight relation between the competence grammar and the one used for parsing (which there need not be, though in practice there often is, e.g. the Marcus Parser) the effects of this algorithmic complexity need not show up in the usual temporal measures of complexity, e.g. how long it takes to parse a sentence. One important reason for this is that parsers need not apply their operations serially and so the supposition that every algorithmic step takes one time step is just one reasonable assumption among many. So, even if there is a strong transparency between competence Gs and the Gs parsers actually deploy, no straightforward measureable time prediction follows.

This said, there remains something very appealing about DTC reasoning (after all, it’s always nice to have different kinds of data converging on the same conclusion, i.e. Whewell’s consilience) and though it’s true that the DTC need not be true, it might be worth looking for places where the reasoning succeeds. In other words, though the failure of DTC style reasoning need not in and of itself imply defects in the competence theory used, a successful DTC style argument can tell us a lot about FL. And because there are many ways for a DTC style explanation to fail and only a few ways that it can succeed, successful stories if they exist can shed interesting light on the basic structure of FL.

I mention this for two reasons. First, I have been reading some reviews of the early DTC literature and have come to believe that its demonstrated empirical “failures” were likely oversold. And second, it seems that the simplicity of MP grammars has made it attractive to go back and look for more cases of DTC phenomena. Let me elaborate on each point a bit.

First, the apparent demise of the DTC. Chapter 5 of Colin Phillips’ thesis (here) reviews the classical arguments against the DTC.  Fodor, Bever and Garrett (in their 1974 text) served as the three horsemen of the DTC apocalypse. They interned the DTC by arguing that the evidence for it was inconclusive. There was also some experimental evidence against it (BW note the particular importance of Slobin (1966)). Colin’s review goes a very long way in challenging this pessimistic conclusion. He sums up his in depth review as follows (p.266):

…the received view that the initially corroborating experimental evidence for the DTC was subsequently discredited is far from an accurate summary of what happened. It is true that some of the experiments required reinterpretation, but this never amounted to a serious challenge to the DTC, and sometimes even lent stronger support to the DTC than the original authors claimed.

In sum, Colin’s review strongly implies that linguists should not have abandoned the DTC so quickly.[4] Why, after all, give up on an interesting hypothesis, just because of a few counter-examples, especially ones that when considered carefully seem on the weak side? In retrospect, it looks like the abandonment of the strong hypothesis was less a matter of reasonable retreat in the face of overwhelming evidence than a decision that disciplines occasionally make to leave one another alone for self-interested reasons. With the demise of the DTC, linguists could assure themselves that they could stick to their investigative methods and didn’t have to learn much psychology and psychologists could concentrate on their experimental methods and stay happily ignorant of any linguistics. The DTC directly threatened this comfortable “live and let live” world and perhaps this is why its demise was so quickly embraced
by all sides.

This state of comfortable isolation is now under threat, happily.  This is so for several reasons. First, some kind of DTC reasoning is really the only game in town in cog-neuro. Here’s Alec Marantz’s take:

...the “derivational theory of complexity” … is just the name for a standard methodology (perhaps the dominant methodology) in cognitive neuroscience (431).

Alec rightly concludes that given the standard view within GG that what linguists describe are real mental structures, there is no choice but to accept some version of the DTC as the null hypothesis. Why? Because, ceteris paribus:

…the more complex a representation- the longer and more complex the linguistic computations necessary to generate the representation- the longer it should take for a subject to perform any task involving the representation and the more activity should be observed in the subject’s brain in areas associated with creating or accessing the representation or performing the task (439).

This conclusion strikes me as both obviously true and salutary, with one caveat. As BW has shown us, the ceteris paribus clause can in practice be quite important.  Thus, the common indicators of complexity (e.g. time measures) may be only indirectly related to algorithmic complexity. This said, GG is (or should be) committed to the view that algorithmic complexity reflects generative complexity and that we should be able to find behavioral or neural correlates of this (e.g. Dehaene’s work (discussed here) in which BOLD responses were seen to track phrasal complexity in pretty much a linear fashion or Forster’s work finding temporal correlates mentioned in note 4).

Alec (439) makes an additional, IMO correct and important, observation. Minimalism in particular, “in denying multiple routes to linguistic representations,” is committed to some kind of DTC thinking.[5] Furthermore, by emphasizing the centrality of interface conditions to the investigation of FL, Minimalism has embraced the idea that how linguistic knowledge is used should reveal a great deal about what it is. In fact, as I’ve argued elsewhere, this is how I would like to understand the “strong minimalist thesis,” (SMT) at least in part. I have suggested that we interpret the SMT as committed to a strong “transparency hypothesis” (TH) (in the sense of Berwick & Weinberg), a proposal that can only be systematically elaborated by how linguistic knowledge is used.

Happily, IMO, paradigm examples of how to exploit “use” and TH to probe the representational format of FL are now emerging. I’ve already discussed how Pietroski, Hunter, Lidz and Halberda’s work relates to the SMT (e.g. here and here). But there is other stuff too of obvious relevance: e.g. BW’s early work on parsing and Subjacency (aka Phase Theory) and Colin’s work on how islands are evident in incremental sentence processing. This work is the tip of an increasingly impressive iceberg. For example, there is analogous work showing that that parsing exploits binding restrictions incrementally during processing (e.g. by Dillon, Sturt, Kush).

This latter work is interesting for two reasons. It validates results that syntacticians have independently arrived at using other methods (which, to re-emphasize, is always worth doing on methodological grounds). And, perhaps even more importantly, it has started raising serious questions for syntactic and semantic theory proper. This is not the place to discuss this in detail (I’m planning another post dedicated to this point), but it is worth noting that given certain reasonable assumptions about what memory is like in humans and how it functions in, among other areas, incremental parsing, the results on the online processing of binding noted above suggest that binding is not stated in terms of c-command but some other notion that mimics its effects.

Let me say a touch more about the argument form, as it is both subtle and interesting. It has the following structure: (i) we have evidence of c-command effects in the domain of incremental binding, (ii) we have evidence that the kind of memory we use in parsing cannot easily code a c-command restriction, thus (iii) what the parsing Grammar (G) employs is not c-command per se but another notion compatible with this sort of memory architecture (e.g. clausemate or phasemate). But, (iv) if we adopt a strong SMT/TH (as we should), (iii) implies that c-command is absent from the competence G as well as the parsing G. In short, the TH interpretation of SMT in this context argues in favor of a revamped version of Binding Theory in which FL eschews c-command as a basic relation. The interest of this kind of argument should be evident, biut let me spell it out. We S-types are starting to face the very interesting prospect that figuring out how grammatical information is used at the interfaces will help us choose among alternative competence theories by placing interface constraints on the admissible primitives. In other words, here we see a non-trivial consequence of Bare Output Conditions on the shape of the grammar. Yessss!!!

We live in exciting times. The SMT (in the guise of TH) conceptually moves DTC-like considerations to the center of theory evaluation. Additionally, we now have some useful parade cases in which this kind of reasoning has been insightfully deployed (and which, thereby, provide templates for further mimicking). If so, we should expect that these kinds of considerations and methods will soon become part of every good syntactician’s armamentarium.




[1] The fact that such crude data can be used so effectively is itself quite remarkable. This speaks to the robustness of the system being studied for such weak signals should not be expected to be so useful otherwise.
[2] Which is not to say that such more careful methods don’t have their place. There are some cases where being more careful has proven useful. I think that Jon Sprouse has given the most careful thought to these questions. Here is an example of some work where I think that the extra care has proven to be useful.
[3] I have not been able to find a public version of the paper.
[4] BW note that Forster provided evidence in favor of the DTC even as Fodor et. al. were in the process of burying it. Forster effectively found temporal measures of psychological complexity that tracked the grammatical complexity the DTC identified by switching the experimental task a little (viz. he used an RSVP presentation of the relevant data).
[5] I believe that what Alec intends here is that in a theory where the only real operation is merge then complexity is easy to measure and there are pretty clear predictions of how this should impact algorithms that use this information. It is worth noting that the heyday of the DTC was in a world where complexity was largely a matter of how many transformations applied to derive a surface form. We have returned to that world again, though with a vastly simpler transformational component.

No comments:

Post a Comment