Much as I enjoy the cut and thrust of debate about the discoveries of Generative Grammar and their significance for understanding FL, I am ready to change the topic, at least for a while. Before, doing so, let me urge those of you who have not been following the comment threads of my last two posts to dip into them. IMO, they are both (mostly) entertaining and actually insightful. I may be over-concluding here, but it looks to me that we have reached a kind of consensus in which almost everyone (there is one conspicuous exception, and I am sure regular readers can guess who this is) concurs that GG has made many serious empirical discoveries (what I dubbed "effects") that call for theoretical explanation. With this consensus in hand, let’s return to the SMT.
In two previous posts (here and here), I outlined a version of the SMT that had several nice properties (or at least I thought them nice). First, empirically it linked work on syntax directly to work in psycho and vice versa, with results in each carrying clear(ish) consequences for work in the other. The SMT mediated this cross fertilization by endorsing a strong version of the transparency thesis wherein the performance systems used the principles, operations and representations of the competence systems to do what they do (and, this is important, do so well). I offered some illustrations of this two-way commerce and touted its virtues.
The second nice property of this version of the SMT is that it promises to deliver on the idea that grammars are evaluable wrt computational efficiency. Minimalists love to say that some property enhances or detracts from computational efficiency or adds or reduces computational complexity and our computational colleagues never tire from calling them/us out on this. The SMT provides a credible sense in which grammars might be computationally efficient (CE). Grammars, operations, representations etc. are CE just in case transparently embedding them within performance systems allows these performance systems to be efficient. What’s ‘efficient’ mean? Parsers that parse fast are efficient. If these parsers are fast (i.e. efficient) (in part) because they embed grammars with certain specifiable properties then we can say that these grammars are CE. Ditto with acquisition. The SMT conjectures that we are efficient at acquiring our native Gs (in part) because UG has the properties it does. Thus, Gs and UGs are efficient to the degree that they explain why we are so good at doing (performing) what we do linguistically. Thus, given the SMT, Gs and UGs can be vicariously CE (VCE). I have been arguing that minimalists should endorse VCE as what they intend when claiming computational virtues for their minimalist proposals.
Say you buy this much. It would be useful to have a couple of paradigm examples of how to make the argument linking the properties of Gs and UG to CE in performance systems. Indeed, wouldn’t it be nice to have stories that take us from properties like, say, Extension, Cyclicity, and C-command to fast parsing and easy learnability. Fortunately, such illustrative examples exist. Let me bring a nice compact, short, and easily readable one to your attention. Berwick and Wexler (B&W) (here) provide a paradigm case of what I think we need. This paper was written in 1987 (the 80s really were a golden age for this sort of stuff, as Charles has noted), and sad to say, the wisdom it contains seems to have been almost entirely lost. In what follows I give a short précis of the B&W argument, highlighting what I take to be those features of the approach that it would behoove us to “rediscover” and use. It’s a perfect example of SMT reasoning.
B&W focuses on showing that c-command (CC) is “a linguistically-motivated restriction [that] can also be justified on computational grounds” (48)). How do B&W show this? There are two prongs to the argument. First, B&W show how and under what conditions CC would enhance antecedence retrieval. The main result is that for trees that are as deep as they are wide the computational savings are a reduction in search time from N to log N (N = number of terminals) (48) when CC is transparently embedded in a Marcus Parser (M-Par).
B&W then show that CC follows from a particular property of M-Pars, which B&W dub “constituency completeness” (55). What is this? It is the assumption that “the “interior” of any phrase attached to a node on the active node stack…[is] opaque to further access” (55). More specifically, “once a phrase has been completely built” it is “attached as a single, opaque object to its proper dominating phrase…Crucially, this means that the … [node] now acts as a single, opaque unit…[that will not be]…accessible to syntactic analysis.” As B&W note, “this restriction is simply the core notion of c-command once again” (50).
Constituent Completeness has an analogue within current syntax. It is effectively the Extension Condition (EC). EC states that once a constituent is built it cannot be further reconfigured (i.e tampered with). Furthermore, as several have noted, there is a tight connection between EC and CC, at least for some class of dependencies. It is interesting to see these connections foreshadowed in B&W. Note that the B&W discussion in the current theoretical context lends credence to the idea that EC promotes CE via its relation to CC and M-Pars rapid parsing.
B&W observe that Constituent Completeness (aka, EC) has another pleasant consequence. It’s pivotal in making M-Pars fast. How so? First M-Pars is a species of Bounded Context Parsers (BCP). BCPs (and hence M-Pars) are fast because they move forward by “examining strictly literal contexts around the current locus of parsing” (55). Thus, parsing decisions can only consult “the local environment of the parse.” To implement this, such local environments must be represented in the very same code that linguists use in describing syntactic objects:
…[a] decision will be made by consulting he local environment of the parse – the S and VP nodes, with the attached verb, and the three input buffer items. Further these items are recorded exactly as written by the linguist – as the nodes S, NP, VP, V and so forth. No “additional” coding is carried out. It this literal use of the parse tree context that distinguishes bounded context parsing…
Thus, Constituent Completeness (viz. EC) “effectively limits the left-hand parsing context that is available…[and] is a necessary requirement for such a parser to work” (55).
In other words, something very like EC contributes to making BCPs/M-Pars fast. Additionally, Constituent Completeness and the transparency assumption together motivate the Berwick and Weinberg proposal that something very like bounded cyclic derivations are necessary for efficient parsing given the relation between bounded left contexts and fast parsing. Every grammatical theory since the mid 80s has included some way of representing bounded cycles (i.e. either phases + PIC or Barriers+Subjacency or Bounding nodes + subjacency). Indeed, as you all know, Berwick and Weinberg argued that Subjacency was sufficient to provide bounded left contexts of the kind their parser required to operate quickly. In sum, the B&W paper shows how EC (in the guise of Constituent Completeness) and something like bounded domains of computation (phases/subjacent domain) in the context of a M-Parser together can conspire to yield fast parsing. If so, this supports the view that something like EC and phases are computationally efficient. Wow!!
B&W doesn’t stop here. It goes on to speculate about the relation between fast parsing and easy learnability. Wexler and Culicover showed that grammars that have the BDE property (bounded degree of error) can be learned on the basis of degree 2 data. It is possible that BDE and BCP are closely related. Berwick (here) showed that one can derive BDE from BCP and that both properties rely on something like EC and bounded domains of computation (which, to repeat, something like phase/subjacency theory would provide). B&W suggest that BDE might in turn imply BCE, which, if correct, would further support the idea that notions like EC and phases are CE. Indeed, should it prove possible to prove that Gs are easily learned iff they are quickly parsed and that both quick learning and speedy parsing leverage specific properties of G and UG like EC, phases/subjacency, CC etc. then we will have taken a big step in vindicating the SMT.
Let me end with two last comments on B&W.
First, one of the key features of the paper is the proposal to take M-Pars/BCPs as proxy models for efficient parsing and to then study what enables them to be so good. What B&W finds is that part of what makes them good are the data structures they use with the particular properties they encode. Thus, the choice to study M-Pars/BCPs is the choice to study parsers (and maybe BDE learners) “grounded in particular linguistic theories” (58). As B&W note, this approach is quite unlike what one finds in formal learning theory or “the general results obtained from the parsability of formal languages” (58). B&W starts from a consideration of a narrower class of languages that are “already known to be linguistically relevant” (59). The aim, as B&W sees it, is to evaluate the impact on computations that the data structures we know to be operative in natural language Gs and UG have. As B&W puts it, what the paper develops is a model for studying “the interactions between data structures and algorithms…[as a way] to develop more computationally based linguistic theory” (51). In other words, it offers an interesting and concrete way of understanding the current minimalist interest in CE.
Second, as B&W stresses again and again, this is not the “last word” on the topic (53). But, to my eyes it is a very good first word. In contrast to many computational approaches to parsing and learning, it takes linguistic theory seriously and considers its implications for performance. Let me quote B&W:
…what we want to illustrate here is not the final result but the method of study. We can assess the relative strengths of parsability and learnability in this case, but only because we have advanced specific models for each. These characterizations are still quite specific, being grounded in particular linguistic theories. The results are therefore quite unlike the formal learning theories of Gold (1967), or more recently, of Osherson, Stob and Weinstein (1982) nor are they like the general results obtained from the analysis of the parsability of formal languages. Rather, they hold of a narrower class of languages that are already known to be linguistically relevant. In this respect, what the theories lose in terms of invariance over changes in linguistic theories, they gain in terms of specificity. (58-9)
Thus, it offers a concrete way of motivating actually proposed principles of FL/UG on computational grounds. In other words, it offers a concrete way of exploring the SMT. Not bad for a 1987 paper that has long been ignored. It’s time to go back to the future.
 I still fondly recall a day in Potsdam several years ago when Greg Kobele suggested in the question period after a talk I gave that any minimalist claims to computational efficiency are unfounded (actually, he implied worse, BS being what sprang to my mind). At any rate, Greg was right to push. The SMT posts are an attempt to respond.
 For trees that are not “perfectly balanced” (i.e. not as deep as wide) the computational savings decline until they go to zero in simple left branching sentences.
 Epstein has noted this connection. Hornstein 2009 discusses it ad nauseum and it forms the basis of his argument that all relations mediated by CC should be reduced to movement. This includes pronominal binding. This unification of binding with movement is still very controversial (i.e. only Kayne, Sandiway Fong, me and a couple of other crazies think it possible) and cannot be considered as even nearly settled. This said, the connections to B&W are intriguing.
 Like real time parsing, I suspect degree 2 is too lax a standard. We likely want something stricter, something along the lines of Lightfoot’s degree 0+ (main clauses plue a little bit).
 This is important. Most formal work on language quite deliberately abstract away from what linguists would consider the core phenomenon of interest: the structure of Gs and UG. They results are general because they are not G/UG dependent. But this is precisely what makes them the wrong way for investigating G/UG properties and for exploring the SMT.
There is a counter argument, that by going specific one is committing hostages to the caprice of theory. There are days where I would sympathize with this. However, as I’ve not yet gone tired of repeating, the rate of real theoretical change within GG is far slower than generally believed. For example, as I noted in the main body of the post, modern theory is often closely related to old proposals (e.g. phases/subjacency or CC/EC). This means that theory change is not quite as radical as often advertised and so the effects of going specific not nearly as baleful as often feared. This said, I need not be so categorical. General results are not to be pooh-poohed just because they are general. However, as regards the SMT, to the degree that formal results are not based in grammatically specific concepts, to that degree they will not be useful for SMT purposes. So, if you are interested in the SMT, B&W is the right way to go.
An aside: the branch of CS that that B&W take to be of relevance to their discussion is compiler theory, a branch of CS which gets its hand dirty with the nitty gritty details.