Syntacticians have effectively used just one kind of probe
to investigate the structure of FL, viz. acceptability judgments. These come in
two varieties: (i) simple “sounds good/sounds bad” ratings, with possible
gradations of each (effectively a 6ish point scale ok, ?, ??, ?*, *, **), and
(ii) “sounds good/sounds bad under this interpretation” ratings (again with
possible gradations). This rather crude empirical instrument has proven to be
very effective as the non-trivial nature of our theoretical accounts indicates.[1]
Nowadays, this method has been partially systematized under the name
“experimental syntax.” But, IMO, with a few important conspicuous exceptions,
these more refined rating methods have effectively endorsed what we knew
before. In short, the precision has been useful, but not revolutionary.[2]
In the early heady days of Generative Grammar (GG), there
was an attempt to find other ways of probing grammatical structure.
Psychologists (following the lead that Chomsky and Miller (1963) (C&M)
suggested) took grammatical models and tried to correlate them with measures
involving things like parsing complexity or rate of acquisition. The idea was a
simple and appealing one: more complex grammatical structures should be more
difficult to use than less complex
ones and so measures involving language use (e.g. how long it takes to
parse/learn something) might tell us something about grammatical structure.
C&M contains the simplest version of this suggestion, the now infamous
Derivational Theory of Complexity (DTC). The idea was that there was a
transparent (i.e. at least a homomorphic) relation between the rules required
to generate a sentence and the rules used to parse it and so parsing complexity
could be used to probe grammatical structure.
Though appealing, this simple picture can (and many believed
did) go wrong in very many ways (see Berwick and Weinberg 1983 (BW) here
for a discussion of several).[3]
Most simply, even if it is correct that there is a tight relation between the
competence grammar and the one used for parsing (which there need not be, though
in practice there often is, e.g. the Marcus Parser) the effects of this algorithmic complexity need not show up
in the usual temporal measures of complexity, e.g. how long it takes to parse a
sentence. One important reason for this is that parsers need not apply their
operations serially and so the supposition that every algorithmic step takes
one time step is just one reasonable assumption among many. So, even if there is a strong transparency
between competence Gs and the Gs parsers actually deploy, no straightforward measureable
time prediction follows.
This said, there remains something very appealing about DTC
reasoning (after all, it’s always nice
to have different kinds of data converging on the same conclusion, i.e. Whewell’s
consilience) and though it’s true that the DTC need not be true, it might be worth looking for places where the
reasoning succeeds. In other words, though the failure of DTC style reasoning
need not in and of itself imply
defects in the competence theory used, a successful DTC style argument can tell
us a lot about FL. And because there are many ways for a DTC style explanation
to fail and only a few ways that it can succeed, successful stories if they exist can shed interesting light
on the basic structure of FL.
I mention this for two reasons. First, I have been reading
some reviews of the early DTC literature and have come to believe that its
demonstrated empirical “failures” were likely oversold. And second, it seems
that the simplicity of MP grammars has made it attractive to go back and look
for more cases of DTC phenomena. Let me elaborate on each point a bit.
First, the apparent demise of the DTC. Chapter 5 of Colin
Phillips’ thesis (here)
reviews the classical arguments against the DTC. Fodor, Bever and Garrett (in their 1974 text)
served as the three horsemen of the DTC apocalypse. They interned the DTC by
arguing that the evidence for it was inconclusive. There was also some
experimental evidence against it (BW note the particular importance of Slobin
(1966)). Colin’s review goes a very long way in challenging this pessimistic
conclusion. He sums up his in depth review as follows (p.266):
…the received view that the
initially corroborating experimental evidence for the DTC was subsequently discredited
is far from an accurate summary of what happened. It is true that some of the
experiments required reinterpretation, but this never amounted to a serious
challenge to the DTC, and sometimes even lent stronger support to the DTC than
the original authors claimed.
In sum, Colin’s review strongly implies that linguists
should not have abandoned the DTC so quickly.[4]
Why, after all, give up on an interesting hypothesis, just because of a few
counter-examples, especially ones that when considered carefully seem on the
weak side? In retrospect, it looks like the abandonment of the strong
hypothesis was less a matter of reasonable retreat in the face of overwhelming
evidence than a decision that disciplines occasionally make to leave one
another alone for self-interested reasons. With the demise of the DTC,
linguists could assure themselves that they could stick to their investigative
methods and didn’t have to learn much psychology and psychologists could
concentrate on their experimental methods and stay happily ignorant of any
linguistics. The DTC directly threatened this comfortable “live and let live”
world and perhaps this is why its demise was so quickly embraced
by all sides.
This state of comfortable isolation is now under threat,
happily. This is so for several reasons.
First, some kind of DTC reasoning is really the only game in town in cog-neuro.
Here’s
Alec Marantz’s take:
...the “derivational theory of
complexity” … is just the name for a standard methodology (perhaps the dominant
methodology) in cognitive neuroscience (431).
Alec rightly concludes that given the standard view within GG
that what linguists describe are real mental structures, there is no choice but
to accept some version of the DTC as the null hypothesis. Why? Because, ceteris paribus:
…the more complex a representation-
the longer and more complex the linguistic computations necessary to generate
the representation- the longer it should take for a subject to perform any task
involving the representation and the more activity should be observed in the
subject’s brain in areas associated with creating or accessing the
representation or performing the task (439).
This conclusion strikes me as both obviously true and
salutary, with one caveat. As BW has shown us, the ceteris paribus clause can in practice be quite important. Thus, the common indicators of complexity
(e.g. time measures) may be only indirectly related to algorithmic complexity.
This said, GG is (or should be) committed to the view that algorithmic
complexity reflects generative complexity and that we should be able to find
behavioral or neural correlates of this (e.g. Dehaene’s work (discussed here)
in which BOLD responses were seen to track phrasal complexity in pretty much a
linear fashion or Forster’s work finding temporal correlates mentioned in note
4).
Alec (439) makes an additional, IMO correct and important,
observation. Minimalism in particular, “in denying multiple routes to
linguistic representations,” is committed to some kind of DTC thinking.[5]
Furthermore, by emphasizing the centrality of interface conditions to the
investigation of FL, Minimalism has embraced the idea that how linguistic
knowledge is used should reveal a great deal about what it is. In fact, as I’ve
argued elsewhere, this is how I would like to understand the “strong minimalist
thesis,” (SMT) at least in part. I have suggested that we interpret the SMT as
committed to a strong “transparency hypothesis” (TH) (in the sense of Berwick
& Weinberg), a proposal that can only be systematically elaborated by how
linguistic knowledge is used.
Happily, IMO, paradigm examples of how to exploit “use” and
TH to probe the representational format of FL are now emerging. I’ve already
discussed how Pietroski, Hunter, Lidz and Halberda’s work relates to the SMT
(e.g. here
and here).
But there is other stuff too of obvious relevance: e.g. BW’s early work on
parsing and Subjacency (aka Phase Theory) and Colin’s work on how islands are
evident in incremental sentence processing. This work is the tip of an
increasingly impressive iceberg. For example, there is analogous work showing
that that parsing exploits binding restrictions incrementally during processing
(e.g. by Dillon, Sturt, Kush).
This latter work is interesting for two reasons. It validates
results that syntacticians have independently arrived at using other methods
(which, to re-emphasize, is always worth doing on methodological grounds). And,
perhaps even more importantly, it has started raising serious questions for
syntactic and semantic theory proper. This is not the place to discuss this in
detail (I’m planning another post dedicated to this point), but it is worth
noting that given certain reasonable assumptions about what memory is like in
humans and how it functions in, among other areas, incremental parsing, the
results on the online processing of binding noted above suggest that binding is
not stated in terms of c-command but some other notion that mimics its effects.
Let me say a touch more about the argument form, as it is
both subtle and interesting. It has the following structure: (i) we have
evidence of c-command effects in the domain of incremental binding, (ii) we
have evidence that the kind of memory we use in parsing cannot easily code a
c-command restriction, thus (iii) what the parsing Grammar (G) employs is not
c-command per se but another notion
compatible with this sort of memory architecture (e.g. clausemate or
phasemate). But, (iv) if we adopt a strong SMT/TH (as we should), (iii) implies
that c-command is absent from the competence G as well as the parsing G. In
short, the TH interpretation of SMT in this context argues in favor of a
revamped version of Binding Theory in which FL eschews c-command as a basic
relation. The interest of this kind of argument should be evident, biut let me
spell it out. We S-types are starting to face the very interesting prospect
that figuring out how grammatical information is used at the interfaces will help us choose among alternative competence theories by placing interface
constraints on the admissible primitives. In other words, here we see a
non-trivial consequence of Bare Output Conditions on the shape of the grammar.
Yessss!!!
We live in exciting times. The SMT (in the guise of TH)
conceptually moves DTC-like considerations to the center of theory evaluation.
Additionally, we now have some useful parade cases in which this kind of reasoning
has been insightfully deployed (and which, thereby, provide templates for
further mimicking). If so, we should expect that these kinds of considerations
and methods will soon become part of every good syntactician’s armamentarium.
[1]
The fact that such crude data can be used so effectively is itself quite remarkable.
This speaks to the robustness of the system being studied for such weak signals
should not be expected to be so useful otherwise.
[2]
Which is not to say that such more careful methods don’t have their place.
There are some cases where being more careful has proven useful. I think that
Jon Sprouse has given the most careful thought to these questions. Here
is an example of some work where I think that the extra care has proven to be
useful.
[3]
I have not been able to find a public version of the paper.
[4]
BW note that Forster provided evidence in favor of the DTC even as Fodor et.
al. were in the process of burying it. Forster effectively found temporal
measures of psychological complexity that tracked the grammatical complexity
the DTC identified by switching the experimental task a little (viz. he used an
RSVP presentation of the relevant data).
[5]
I believe that what Alec intends here is that in a theory where the only real
operation is merge then complexity is easy to measure and there are pretty
clear predictions of how this should impact algorithms that use this
information. It is worth noting that the heyday of the DTC was in a world where
complexity was largely a matter of how many transformations applied to derive a
surface form. We have returned to that world again, though with a vastly
simpler transformational component.