Much as I enjoy the cut and thrust of debate about the discoveries of Generative Grammar and their significance for understanding FL, I am ready to change the topic, at least for a while. Before, doing so, let me urge those of you who have not been following the comment threads of my last two posts to dip into them. IMO, they are both (mostly) entertaining and actually insightful. I may be over-concluding here, but it looks to me that we have reached a kind of consensus in which almost everyone (there is one conspicuous exception, and I am sure regular readers can guess who this is) concurs that GG has made many serious empirical discoveries (what I dubbed "effects") that call for theoretical explanation. With this consensus in hand, let’s
return to the SMT.
In two previous posts (here
and here),
I outlined a version of the SMT that had several nice properties (or at least I
thought them nice). First, empirically it linked work on syntax directly to
work in psycho and vice versa, with results in each carrying clear(ish)
consequences for work in the other. The SMT mediated this cross fertilization
by endorsing a strong version of the transparency thesis wherein the
performance systems used the principles, operations and representations of the
competence systems to do what they do (and, this is important, do so well). I
offered some illustrations of this two-way commerce and touted its virtues.
The second nice property of this version of the SMT is that
it promises to deliver on the idea that grammars are evaluable wrt
computational efficiency. Minimalists love to say that some property enhances
or detracts from computational efficiency or adds or reduces computational
complexity and our computational colleagues never tire from calling them/us out
on this.[1]
The SMT provides a credible sense in which grammars might be computationally
efficient (CE). Grammars, operations, representations etc. are CE just in case
transparently embedding them within performance systems allows these performance
systems to be efficient. What’s ‘efficient’ mean? Parsers that parse fast are
efficient. If these parsers are fast (i.e. efficient) (in part) because they embed grammars with certain
specifiable properties then we can say that these grammars are CE. Ditto with acquisition.
The SMT conjectures that we are efficient at acquiring our native Gs (in part) because UG has the properties it does.
Thus, Gs and UGs are efficient to the degree that they explain why we are so
good at doing (performing) what we do linguistically. Thus, given the SMT, Gs
and UGs can be vicariously CE (VCE).
I have been arguing that minimalists should endorse VCE as what they intend when
claiming computational virtues for their minimalist proposals.
Say you buy this much. It would be useful to have a couple
of paradigm examples of how to make the argument linking the properties of Gs
and UG to CE in performance systems. Indeed, wouldn’t it be nice to have stories
that take us from properties like, say, Extension, Cyclicity, and C-command to fast
parsing and easy learnability. Fortunately, such illustrative examples exist.
Let me bring a nice compact, short, and easily readable one to your attention.
Berwick and Wexler (B&W) (here)
provide a paradigm case of what I think we need. This paper was written in 1987
(the 80s really were a golden age for this sort of stuff, as Charles has noted),
and sad to say, the wisdom it contains seems to have been almost entirely lost.
In what follows I give a short précis of the B&W argument, highlighting
what I take to be those features of the approach that it would behoove us to
“rediscover” and use. It’s a perfect example of SMT reasoning.
B&W focuses on showing that c-command (CC) is “a
linguistically-motivated restriction [that] can also be justified on computational
grounds” (48)). How do B&W show this?
There are two prongs to the argument. First, B&W show how and under what conditions
CC would enhance antecedence retrieval. The main result is that for trees that
are as deep as they are wide the computational savings are a reduction in
search time from N to log N (N = number of terminals) (48) when CC is
transparently embedded in a Marcus Parser (M-Par).[2]
B&W then show that CC follows from a particular property
of M-Pars, which B&W dub “constituency completeness” (55). What is this? It
is the assumption that “the “interior” of any phrase attached to a node on the
active node stack…[is] opaque to further access” (55). More specifically, “once
a phrase has been completely built” it is “attached as a single, opaque object
to its proper dominating phrase…Crucially, this means that the … [node] now acts as a single, opaque unit…[that
will not be]…accessible to syntactic
analysis.” As B&W note, “this restriction is simply the core notion of
c-command once again” (50).
Constituent Completeness has an analogue within current
syntax. It is effectively the Extension Condition (EC). EC states that once a
constituent is built it cannot be further reconfigured (i.e tampered with).
Furthermore, as several have noted, there is a tight connection between EC and
CC, at least for some class of dependencies.[3]
It is interesting to see these connections foreshadowed in B&W. Note that
the B&W discussion in the current theoretical context lends credence to the
idea that EC promotes CE via its relation to CC and M-Pars rapid parsing.
B&W observe that Constituent Completeness (aka, EC) has
another pleasant consequence. It’s pivotal in making M-Pars fast. How so? First
M-Pars is a species of Bounded Context Parsers (BCP). BCPs (and hence M-Pars)
are fast because they move forward by “examining strictly literal contexts around the current locus of parsing” (55). Thus, parsing decisions can only consult “the
local environment of the parse.” To implement this, such local environments
must be represented in the very same code that linguists use in describing
syntactic objects:
…[a] decision will be made by consulting
he local environment of the parse – the S and VP nodes, with the attached verb,
and the three input buffer items. Further these items are recorded exactly as
written by the linguist – as the nodes S, NP, VP, V and so forth. No
“additional” coding is carried out. It this literal use of the parse tree
context that distinguishes bounded context parsing…
Thus, Constituent Completeness (viz. EC) “effectively limits
the left-hand parsing context that is available…[and] is a necessary
requirement for such a parser to work” (55).
In other words, something very like EC contributes to making
BCPs/M-Pars fast. Additionally, Constituent Completeness and the transparency
assumption together motivate the Berwick and Weinberg proposal that something
very like bounded cyclic derivations are necessary for efficient parsing given
the relation between bounded left contexts and fast parsing. Every grammatical
theory since the mid 80s has included some way of representing bounded cycles
(i.e. either phases + PIC or Barriers+Subjacency or Bounding nodes + subjacency). Indeed, as you all
know, Berwick and Weinberg argued that Subjacency was sufficient to provide
bounded left contexts of the kind their parser required to operate quickly. In sum, the B&W paper shows how EC (in
the guise of Constituent Completeness) and something like bounded domains of
computation (phases/subjacent domain) in the context of a M-Parser together can
conspire to yield fast parsing. If so, this supports the view that something
like EC and phases are computationally efficient. Wow!!
B&W doesn’t stop here. It goes on to speculate about the
relation between fast parsing and easy learnability. Wexler and Culicover
showed that grammars that have the BDE property (bounded degree of error) can
be learned on the basis of degree 2 data.[4]
It is possible that BDE and BCP are closely related. Berwick (here)
showed that one can derive BDE from BCP and that both properties rely on
something like EC and bounded domains of computation (which, to repeat, something
like phase/subjacency theory would provide). B&W suggest that BDE might in
turn imply BCE, which, if correct, would further support the idea that notions
like EC and phases are CE. Indeed, should it prove possible to prove that Gs
are easily learned iff they are quickly parsed and that both quick learning and
speedy parsing leverage specific properties of G and UG like EC, phases/subjacency,
CC etc. then we will have taken a big step in vindicating the SMT.
Let me end with two last comments on B&W.
First, one of the key features of the paper is the proposal
to take M-Pars/BCPs as proxy models for efficient parsing and to then study
what enables them to be so good. What
B&W finds is that part of what makes them good are the data structures they
use with the particular properties they encode. Thus, the choice to study M-Pars/BCPs
is the choice to study parsers (and maybe BDE learners) “grounded in particular
linguistic theories” (58). As B&W
note, this approach is quite unlike what one finds in formal learning theory or
“the general results obtained from the parsability of formal languages” (58).
B&W starts from a consideration of a narrower class of languages that are
“already known to be linguistically relevant” (59). The aim, as B&W sees it,
is to evaluate the impact on computations that the data structures we know to
be operative in natural language Gs and UG have. As B&W puts it, what the
paper develops is a model for studying “the interactions between data
structures and algorithms…[as a way] to develop more computationally based
linguistic theory” (51). In other words, it offers an interesting and concrete
way of understanding the current minimalist interest in CE.
Second, as B&W stresses again and again, this is not the
“last word” on the topic (53). But, to my eyes it is a very good first word. In
contrast to many computational approaches to parsing and learning, it takes
linguistic theory seriously and considers its implications for performance. Let
me quote B&W:
…what we want to illustrate here is
not the final result but the method of study. We can assess the relative
strengths of parsability and learnability in this case, but only because we
have advanced specific models for each. These characterizations are still quite
specific, being grounded in particular linguistic theories. The results are
therefore quite unlike the formal learning theories of Gold (1967), or more
recently, of Osherson, Stob and Weinstein (1982) nor are they like the general
results obtained from the analysis of the parsability of formal languages.
Rather, they hold of a narrower class of languages that are already known to be
linguistically relevant. In this respect, what the theories lose in terms of
invariance over changes in linguistic theories, they gain in terms of
specificity.[5]
(58-9)
Thus, it offers a concrete way of motivating actually
proposed principles of FL/UG on computational grounds. In other words, it
offers a concrete way of exploring the SMT. Not bad for a 1987 paper that has
long been ignored. It’s time to go back to the future.
[1]
I still fondly recall a day in Potsdam several years ago when Greg Kobele
suggested in the question period after a talk I gave that any minimalist claims
to computational efficiency are unfounded (actually, he implied worse, BS being
what sprang to my mind). At any rate, Greg was right to push. The SMT posts are
an attempt to respond.
[2]
For trees that are not “perfectly balanced” (i.e. not as deep as wide) the
computational savings decline until they go to zero in simple left branching
sentences.
[3]
Epstein has noted this connection. Hornstein 2009 discusses it ad nauseum and it forms the basis of his
argument that all relations mediated by CC should be reduced to movement. This
includes pronominal binding. This unification of binding with movement is still
very controversial (i.e. only Kayne, Sandiway Fong, me and a couple of other
crazies think it possible) and cannot be considered as even nearly settled.
This said, the connections to B&W are intriguing.
[4]
Like real time parsing, I suspect degree 2 is too lax a standard. We likely
want something stricter, something along the lines of Lightfoot’s degree 0+
(main clauses plue a little bit).
[5]
This is important. Most formal work on language quite deliberately abstract
away from what linguists would consider the core phenomenon of interest: the
structure of Gs and UG. They results are general because they are not G/UG
dependent. But this is precisely what makes them the wrong way for
investigating G/UG properties and for exploring the SMT.
There is a counter argument, that by going specific
one is committing hostages to the caprice of theory. There are days where I
would sympathize with this. However, as I’ve not yet gone tired of repeating,
the rate of real theoretical change
within GG is far slower than generally believed. For example, as I noted in the
main body of the post, modern theory is often closely related to old proposals
(e.g. phases/subjacency or CC/EC). This means that theory change is not quite
as radical as often advertised and so the effects of going specific not nearly
as baleful as often feared. This said, I
need not be so categorical. General results are not to be pooh-poohed just
because they are general. However, as regards the SMT, to the degree that
formal results are not based in grammatically specific concepts, to that degree
they will not be useful for SMT purposes. So, if you are interested in the SMT,
B&W is the right way to go.
An aside: the branch of CS that that B&W take to
be of relevance to their discussion is compiler theory, a branch of CS which
gets its hand dirty with the nitty gritty details.