Sunday, May 18, 2014


Much as I enjoy the cut and thrust of debate about the discoveries of Generative Grammar and their significance for understanding FL, I am ready to change the topic, at least for a while. Before, doing so, let me urge those of you who have not been following the comment threads of my last two posts to dip into them. IMO, they are both (mostly) entertaining and actually insightful. I may be over-concluding here, but it looks to me that we have reached a kind of consensus in which almost everyone (there is one conspicuous exception, and I am sure regular readers can guess who this is) concurs that GG has made many serious empirical discoveries (what I dubbed "effects") that call for theoretical explanation.  With this consensus in hand, let’s return to the SMT.

In two previous posts (here and here), I outlined a version of the SMT that had several nice properties (or at least I thought them nice). First, empirically it linked work on syntax directly to work in psycho and vice versa, with results in each carrying clear(ish) consequences for work in the other. The SMT mediated this cross fertilization by endorsing a strong version of the transparency thesis wherein the performance systems used the principles, operations and representations of the competence systems to do what they do (and, this is important, do so well). I offered some illustrations of this two-way commerce and touted its virtues.

The second nice property of this version of the SMT is that it promises to deliver on the idea that grammars are evaluable wrt computational efficiency. Minimalists love to say that some property enhances or detracts from computational efficiency or adds or reduces computational complexity and our computational colleagues never tire from calling them/us out on this.[1] The SMT provides a credible sense in which grammars might be computationally efficient (CE). Grammars, operations, representations etc. are CE just in case transparently embedding them within performance systems allows these performance systems to be efficient. What’s ‘efficient’ mean? Parsers that parse fast are efficient. If these parsers are fast (i.e. efficient) (in part) because they embed grammars with certain specifiable properties then we can say that these grammars are CE. Ditto with acquisition. The SMT conjectures that we are efficient at acquiring our native Gs (in part) because UG has the properties it does. Thus, Gs and UGs are efficient to the degree that they explain why we are so good at doing (performing) what we do linguistically. Thus, given the SMT, Gs and UGs can be vicariously CE (VCE). I have been arguing that minimalists should endorse VCE as what they intend when claiming computational virtues for their minimalist proposals.

Say you buy this much. It would be useful to have a couple of paradigm examples of how to make the argument linking the properties of Gs and UG to CE in performance systems. Indeed, wouldn’t it be nice to have stories that take us from properties like, say, Extension, Cyclicity, and C-command to fast parsing and easy learnability. Fortunately, such illustrative examples exist. Let me bring a nice compact, short, and easily readable one to your attention. Berwick and Wexler (B&W) (here) provide a paradigm case of what I think we need. This paper was written in 1987 (the 80s really were a golden age for this sort of stuff, as Charles has noted), and sad to say, the wisdom it contains seems to have been almost entirely lost. In what follows I give a short précis of the B&W argument, highlighting what I take to be those features of the approach that it would behoove us to “rediscover” and use. It’s a perfect example of SMT reasoning.

B&W focuses on showing that c-command (CC) is “a linguistically-motivated restriction [that] can also be justified on computational grounds” (48)).  How do B&W show this? There are two prongs to the argument. First, B&W show how and under what conditions CC would enhance antecedence retrieval. The main result is that for trees that are as deep as they are wide the computational savings are a reduction in search time from N to log N (N = number of terminals) (48) when CC is transparently embedded in a Marcus Parser (M-Par).[2]

B&W then show that CC follows from a particular property of M-Pars, which B&W dub “constituency completeness” (55). What is this? It is the assumption that “the “interior” of any phrase attached to a node on the active node stack…[is] opaque to further access” (55). More specifically, “once a phrase has been completely built” it is “attached as a single, opaque object to its proper dominating phrase…Crucially, this means that the … [node] now acts as a single, opaque unit…[that will not be]…accessible to syntactic analysis.” As B&W note, “this restriction is simply the core notion of c-command once again” (50).

Constituent Completeness has an analogue within current syntax. It is effectively the Extension Condition (EC). EC states that once a constituent is built it cannot be further reconfigured (i.e tampered with). Furthermore, as several have noted, there is a tight connection between EC and CC, at least for some class of dependencies.[3] It is interesting to see these connections foreshadowed in B&W. Note that the B&W discussion in the current theoretical context lends credence to the idea that EC promotes CE via its relation to CC and M-Pars rapid parsing.

B&W observe that Constituent Completeness (aka, EC) has another pleasant consequence. It’s pivotal in making M-Pars fast. How so? First M-Pars is a species of Bounded Context Parsers (BCP). BCPs (and hence M-Pars) are fast because they move forward by “examining strictly literal contexts around the current locus of parsing” (55).  Thus, parsing decisions can only consult “the local environment of the parse.” To implement this, such local environments must be represented in the very same code that linguists use in describing syntactic objects:

…[a] decision will be made by consulting he local environment of the parse – the S and VP nodes, with the attached verb, and the three input buffer items. Further these items are recorded exactly as written by the linguist – as the nodes S, NP, VP, V and so forth. No “additional” coding is carried out. It this literal use of the parse tree context that distinguishes bounded context parsing…

Thus, Constituent Completeness (viz. EC) “effectively limits the left-hand parsing context that is available…[and] is a necessary requirement for such a parser to work” (55).

In other words, something very like EC contributes to making BCPs/M-Pars fast. Additionally, Constituent Completeness and the transparency assumption together motivate the Berwick and Weinberg proposal that something very like bounded cyclic derivations are necessary for efficient parsing given the relation between bounded left contexts and fast parsing. Every grammatical theory since the mid 80s has included some way of representing bounded cycles (i.e. either phases + PIC or Barriers+Subjacency or Bounding nodes + subjacency). Indeed, as you all know, Berwick and Weinberg argued that Subjacency was sufficient to provide bounded left contexts of the kind their parser required to operate quickly.  In sum, the B&W paper shows how EC (in the guise of Constituent Completeness) and something like bounded domains of computation (phases/subjacent domain) in the context of a M-Parser together can conspire to yield fast parsing. If so, this supports the view that something like EC and phases are computationally efficient. Wow!!

B&W doesn’t stop here. It goes on to speculate about the relation between fast parsing and easy learnability. Wexler and Culicover showed that grammars that have the BDE property (bounded degree of error) can be learned on the basis of degree 2 data.[4] It is possible that BDE and BCP are closely related. Berwick (here) showed that one can derive BDE from BCP and that both properties rely on something like EC and bounded domains of computation (which, to repeat, something like phase/subjacency theory would provide). B&W suggest that BDE might in turn imply BCE, which, if correct, would further support the idea that notions like EC and phases are CE. Indeed, should it prove possible to prove that Gs are easily learned iff they are quickly parsed and that both quick learning and speedy parsing leverage specific properties of G and UG like EC, phases/subjacency, CC etc. then we will have taken a big step in vindicating the SMT.

Let me end with two last comments on B&W.

First, one of the key features of the paper is the proposal to take M-Pars/BCPs as proxy models for efficient parsing and to then study what enables them to be so good. What B&W finds is that part of what makes them good are the data structures they use with the particular properties they encode. Thus, the choice to study M-Pars/BCPs is the choice to study parsers (and maybe BDE learners) “grounded in particular linguistic theories” (58).  As B&W note, this approach is quite unlike what one finds in formal learning theory or “the general results obtained from the parsability of formal languages” (58). B&W starts from a consideration of a narrower class of languages that are “already known to be linguistically relevant” (59). The aim, as B&W sees it, is to evaluate the impact on computations that the data structures we know to be operative in natural language Gs and UG have. As B&W puts it, what the paper develops is a model for studying “the interactions between data structures and algorithms…[as a way] to develop more computationally based linguistic theory” (51). In other words, it offers an interesting and concrete way of understanding the current minimalist interest in CE.

Second, as B&W stresses again and again, this is not the “last word” on the topic (53). But, to my eyes it is a very good first word. In contrast to many computational approaches to parsing and learning, it takes linguistic theory seriously and considers its implications for performance. Let me quote B&W:

…what we want to illustrate here is not the final result but the method of study. We can assess the relative strengths of parsability and learnability in this case, but only because we have advanced specific models for each. These characterizations are still quite specific, being grounded in particular linguistic theories. The results are therefore quite unlike the formal learning theories of Gold (1967), or more recently, of Osherson, Stob and Weinstein (1982) nor are they like the general results obtained from the analysis of the parsability of formal languages. Rather, they hold of a narrower class of languages that are already known to be linguistically relevant. In this respect, what the theories lose in terms of invariance over changes in linguistic theories, they gain in terms of specificity.[5] (58-9)

Thus, it offers a concrete way of motivating actually proposed principles of FL/UG on computational grounds. In other words, it offers a concrete way of exploring the SMT. Not bad for a 1987 paper that has long been ignored. It’s time to go back to the future.

[1] I still fondly recall a day in Potsdam several years ago when Greg Kobele suggested in the question period after a talk I gave that any minimalist claims to computational efficiency are unfounded (actually, he implied worse, BS being what sprang to my mind). At any rate, Greg was right to push. The SMT posts are an attempt to respond.
[2] For trees that are not “perfectly balanced” (i.e. not as deep as wide) the computational savings decline until they go to zero in simple left branching sentences.
[3] Epstein has noted this connection. Hornstein 2009 discusses it ad nauseum and it forms the basis of his argument that all relations mediated by CC should be reduced to movement. This includes pronominal binding. This unification of binding with movement is still very controversial (i.e. only Kayne, Sandiway Fong, me and a couple of other crazies think it possible) and cannot be considered as even nearly settled. This said, the connections to B&W are intriguing.
[4] Like real time parsing, I suspect degree 2 is too lax a standard. We likely want something stricter, something along the lines of Lightfoot’s degree 0+ (main clauses plue a little bit).
[5] This is important. Most formal work on language quite deliberately abstract away from what linguists would consider the core phenomenon of interest: the structure of Gs and UG. They results are general because they are not G/UG dependent. But this is precisely what makes them the wrong way for investigating G/UG properties and for exploring the SMT.

There is a counter argument, that by going specific one is committing hostages to the caprice of theory. There are days where I would sympathize with this. However, as I’ve not yet gone tired of repeating, the rate of real theoretical change within GG is far slower than generally believed. For example, as I noted in the main body of the post, modern theory is often closely related to old proposals (e.g. phases/subjacency or CC/EC). This means that theory change is not quite as radical as often advertised and so the effects of going specific not nearly as baleful as often feared.  This said, I need not be so categorical. General results are not to be pooh-poohed just because they are general. However, as regards the SMT, to the degree that formal results are not based in grammatically specific concepts, to that degree they will not be useful for SMT purposes. So, if you are interested in the SMT, B&W is the right way to go.

An aside: the branch of CS that that B&W take to be of relevance to their discussion is compiler theory, a branch of CS which gets its hand dirty with the nitty gritty details.


  1. OK - so I'm a 100% novice to this discussion, so feel free to swat me down at a moment's notice. I would prefer a gentle clarification, of course.

    It seems to me that Chomsky continually discusses how language seems computationally optimized for the conceptual-intentional system, prioritized over the externalization system (copies are deleted, etc.). As such, are we really to expect the SMT to hold when assessing computational efficiency in parsing?

  2. To amplify my point, here are some quotes from the Minimalist Program, Ch. 2 "Some notes on Economy of Derivation and Representation":

    "In these respects, language design appears to be problematic from a parsing-theoretic perspective, though elegant regarded in isolation from considerations of use."

    "Note that one cannot easily motivate the conditions on economy of representation in terms of processing considerations, since they hold at LF, and only derivatively at S-Structure."

    In other words, perhaps parsing is good because of UG, but it would be even better if UG were different (e.g., if it didn't delete copies, etc.), and it's these computationally efficient properties of UG that tend to make life difficult for communication: parsing and production.

    1. Your comments are right on target, so I will refrain from even trying to swat. So let me explain what I am doing. I have never truly understood what Chomsky had in mind. I don't understand how efficiency and complexity concerns can be addressed independently of resource issues: time, space whatever. It is also true that Chomsky has been know to say things like SMT addresses what the grammar must be like in order to be usable at all. So, I let my imagination fly and decided that there was an interesting interpretation of the SMT that I could make sense of and that fit with some of Chomsky's remarks. Moreover, the program I tried to outline also seemed to have interesting empirical potential and I tried to limn some relevant avenues for pursuing it.

      Now what does that leave us with? Well, the following: some features of the computational system look like they can be understood in fairly traditional ways if one thinks something along the lines of vicarious computational efficiency, or so I tried to suggest. Other features, such as economy of representation, less so (though the latter has not been that central in the actual minimalist program.

      There are also some remarks that Chomsky has made that I don't understand at all. For example, it is not clear that language design is problematic for use. After all, we do parse very quickly. That we garden path every now and then seems to say nothing about how well designed the system is. Even optimally designed systems may not do everything well. It is also not clear that parsing is the only use we have. After all, there is also production and acquisition, two other important uses that the system of knowledge is put to. An optimally designed FL would worry about these as well. Note that making things harder for the comprehender may may things easier for the producer, so…

      At any rate, I am really not sure how much of this Chomsky would buy. I am pretty sure that it is a defensible program and and interesting one. So, if this is not what Chomsky meant, then it would be nice to have someone explain what he does mean and what practical implications it has for research. IMO, the more positive interpretations that lead to interesting work there is out there, the better. Thx for your remarks.

    2. So, your reply here sounds quite a bit like the evolutionary arguments that Fodor has argued against, along with Chomsky to a lesser degree. Fodor has mentioned that people fawn over the supposed 'exquisite fit' between an organism and its environment, e.g., that a bird's wings are just perfectly suited to flying. He mentions that such conclusions aren't justified, because the organism chooses its niche based on the tools at hand. In other words, this is an illusion - organisms are what they are, and they figure out the best solutions given the materials they have - but this doesn't mean the materials were designed well for these solutions.

      For an analogy, a person with crutches may appear to be very optimal at locomotion when considered apart from people without crutches, e.g., swinging forward the crutch just in time to catch weight, moving the body in synchrony to move forward fluidly, whatever, but this is an illusion - the person has simply made the best with what's available. Likewise for language - in communicative settings, we've figured out how best to communicate with the tools we have at hand, avoiding syntactic ambiguity, not center-embedding beyond 1 or 2, repeating phrases as necessary, supplementing with all the available resources (e.g., gestures).

      When Chomsky says FL is not well designed for use, I assume he means communicative use. For internal thought, however, it may be optimal, and internal thought is also reliant on resources, time, etc., perhaps not the same resources as overt production/parsing/acquisition. I may be misunderstanding him here, perhaps.

    3. I am not fawning over the beautiful fit, nor is the version of the SMT I am suggesting. Why? Because I don't know if there is such a fit. It's a thesis, not a claim. However, I do agree that were it correct, i.e. were Berwick and Wexler's conjecture correct that part of what allows for languages to be parsed quickly and learned efficiently is that they have the kinds of representations and conditions on operations we think they do, then this would be very interesting. A question could then be posed: why the fit? We are not there yet, however. First we need to see if the SMT has legs.

      I am not sure that we understand anything right now about "internal use" if it does not amount to "speaking to oneself." If so, it's no different from the regular production/parsing stuff. If it means something else, it would be nice to know what, we could then see if the products of the grammar fit these well or not.

      So: I have nothing against other versions of the SMT. It's just that right now I am not sure what the others are. I think that the one I outlined is both interesting and currently generating interesting research that is worth investigating. It even seems to have some hints of truth. That, right now, is good enough for me.

    4. If it's productive then it speaks for itself. A professor of mine while discussing signal detection theory intimated much the same ideas - that the SDT models make some simplifying assumptions that may be empirically false but have generated a lot of useful research in psychophysics. For instance, SDT makes the assumption that perceptual evidence is acquired on a continuous function - that two equally-unhearable tones may have different values for perceptual evidence, but the intuition is that you get nothing for two very quiet tones, which is accounted for in threshold models of detection. However, the SDT model allows one to readily measure other properties of interest such as sources of noise, thresholds of detection, channel bandwidth, etc.

      But we don't necessarily need just optimal, efficiency, etc. to indicate the transparency between the grammar and the processing system, right? Don't the situations where processing fails also critically motivate grammar proposals? For instance, doesn't syntactic ambiguity/garden-pathing indicate the veracity of a particular proposal as well as the efficiency of parsing due to C-command? In other words, doesn't any empirical evidence that shows the usage of grammar in production/parsing/acquisition, whether efficient, deficient, or just weird, motivate grammar proposals?

    5. As far as the SMT in language is concerned, I think it's just a very difficult notion to capture and investigate. But the analogies that Chomsky makes are empirically investigable - optimal foraging in bees, optimal neural wiring, etc. There is some work in the motor control literature suggesting that people use optimal trajectory selection to minimize cost. But as far as parsing and production are concerned, my impression is that investigating optimality in these domains would indicate optimality of the externalization systems, and not optimality of the grammar itself, which is your goal, if I interpret you correctly.

    6. @William:
      doesn't any empirical evidence that shows the usage of grammar in production/parsing/acquisition, whether efficient, deficient, or just weird, motivate grammar proposals?
      Unfortunately, no. At least, not without auxiliary assumptions (`linking hypotheses') which link the grammar to behaviour. (The standard such assumption is implicit, and is that, ceteris paribus, grammaticality is acceptability.) Among linguists, there are currently no linking assumptions which are viewed as established enough to be able to successfully argue against an otherwise reasonable linguistic proposal. Thomas Graf's comment (on the previous post) on the methods of syntax sheds light on how this situation could arise -- syntax has been based on off-line (distributional) data, and not on (on-line) behavioural data.

      I think of Norbert's SMT as a reformulation of the strong competence hypothesis popularized by kaplan & bresnan (I'd be interested in hearing Norbert's take on this). This hypothesis says that the form of the grammar is related in a particular way to the parser, which more narrowly circumscribes the kind of linking theories that an analysis is compatible with. This is a particularly seductive hypothesis for linguists (although Chomsky has time and again cautioned against assuming it unreflectingly). It is an extremely strong version of the more moderate levels interpretation of cognitive systems.

      As for expecting it to hold in parsing, here Norbert and I seem to have different fundamental intuitions regarding methodology. I look at the infinite space of parsing procedures which more or less do what the grammar says, and recoil in existential terror. For me, I think that starting out with the assumption that we only need to consider parsers which actually do what the grammar says they should do makes the existential leap less scary. (It also makes clear what explanatory role the grammar has.) Norbert seems to have mastered his fear.

      Less glibly, I actually think that Norbert's position is ultimately right [my version of Norbert's position: the grammar is an approximate description of the human sentence processing mechanism; i.e. Norbert is a Smolensky-style connectionist at heart, and is treating the grammar as a rough description of the underlying dynamics of the neural net], but I think that my scaredy-cat methodology is better at this stage.

  3. So, at risk of exposing my lack of expertise in these matters, what I mean to say is that I understand that grammars are based on off-line data. Norbert seems to me to be advocating lessening that restriction to see if something like the SMT holds, that if you peer into real-time processing you see that it's good because of the grammar, but then I want to point out that there are plenty of cases where processing is bad, presumably because of the grammar. So if one is peering into processing only to see if processing is optimal or whatever WRT the grammar, then I feel that the answer is clearly "no", as Chomsky has routinely pointed out. If one is peering into processing to merely see when it reflects the grammar as a potentially fecund research tactic, then the good and the bad are both in play.

    This has always been my sense about informal observations related to linguistic proposals - if somebody misinterprets what I say because of syntactic ambiguity, or struggles to assign possessive case in a conjunction like "her and my's book", or uses resumptive pronouns, whatever, then I recall the legitimacy of grammatical theory.

    1. @William:
      there ARE many places here things go poorly, as is to be expected by any system that imposes constraints on some activity. And yes, we can learn a lot about the system by seeing where it breaks down. So far as I can tell this is no different than what we see on the grammar side of things. Chomsky does not conclude that because there are some things we cannot say thought the interpretation would be fine were we able to say it that the G system is not well designed (So, we cannot scope 'every' over 'someone' in: Someone told me that everyone was ready, thought the thought it expresses is perfectly coherent. Nor can we easily ask: which book did you meet several people who gave *(it) to Sam. Does this imply that G in CHomsky's sense is not optimally designed? Note the problem is not just expressing this. the thought can be entertained but not via the Gs we have in either case. The same holds for the use systems.

      I suggest that we replace search for 'optimality' with search for 'pretty damn good.' The systems of use are very good at what they do. We can ask why. Just as they do with, say, dead reckoning in ants. Is a polar coordinate system better for the ant than a cartesian one? It might seem so given certain other assumptions. So too we can ask, what advantages the particular Gs we have give to use systems like the parser/producer or learner. In the best case, it helps with all 3, with some possible tradeoffs given the multiple uses at hand (the optimal producer may make parsing harder etc.).

      @Greg: I am not sure that I believe that the G is just an abstract description of the parser. I have no dog in that fight. I do think that the properties are tightly tied together if the SMT is right. But then it is also tied tightly to the producer, the learner and the visual system. Is the grammar in all of these? Maybe. right now I don't really care. I find the distinction between what we know and how we use what we know a useful one. I also find the question of whether what we know makes certain kinds of uses very easy also a good question to investigate. I leave it for my descendants 1000 years from now to decide the ontological question of whether Gs only live in use systems or have an independent existence. Right now, this question is too far away from what I can do with it.

    2. I think it might be useful to back up a bit. Here's my interpretation of the SMT in terms of a well-defined notion of optimality:

      A system is optimal if it minimizes a cost function in attaining a particular goal.

      I derive this notion from work in optimal motor control, which I think is quite worth checking out in this context:

      In this sense, the grammar is 'optimal' in that it attains the goal of interpretability at the interfaces by minimizing a cost function, the cost function being defined in some vague but I believe interpretable way - such as LF operations being less costly than overt operations, shortest move, etc. By this definition of optimality, then I think we cannot expect parsing and production to behave optimally because these are multi-component systems; e.g., parsing is comprised of auditory processing, lexical access, working memory, executive control, lots of stuff. Let's assume each of these systems is optimal in its own right - if so, then the combined effects of optimizing each system is unlikely to produce global optimality, because it is presumably a modular organization, although this expectation may be false (and quite interesting if false!).

      An appropriate analogy might be to more 'wet' biological systems - take the immune or cardiovascular systems. These are composite systems, glommed together over evolutionary time to fulfill a function, but you'd hardly expect it to be optimal globally. However, if you zoom in on one component, then you might expect optimality, for instance, the dispersal of veins and arteries, or the action of a particular type of cell, etc.

      At any rate, I think the key interpretation of 'optimal' is this cost function that is minimized. We could do this for the performance systems of language, but we need to define the cost function. The informal notion of being good at what it does gets away from what I perceive to be the critical notion of optimal, in the sense of nematode wiring diagrams, optimal foraging, and optimal motor control.

    3. And, by the way, I think the reason that Chomsky raised the SMT is to motivate computational efficiency as a law of nature. I think this is quite important, because computational optimality is really a problem if you don't get it for free as a law of nature - e.g., how does a derivation "know" that an operation will be more costly than if it waits until LF? Same problem for almost every domain of cognition, as far as I can tell. In motor control, when you reach for a cup, how do you know that that trajectory is the optimal one? People try to get around this by such things as analysis-by-synthesis - you figure out the right solution by just trying out all of the possible solutions. But I have a feeling that this isn't right. If computational optimality is a law of nature, then everything falls into place. But this is quite speculative on my part.

    4. @ William:
      I think you are right here. This may have been C's motivation. My problem is that I don't understand what it means. Optimal wrt what? Effebility? Unlikely. What then? If one has unbounded resources is there anything wrong with maximal search? What's the virtue of bounded cyclicity if we have infinite memory that works at lightening speed?

      To repeat, I suspect that 'optimal' is not helpful here. Optimal might be a little like QED, something that we say at the end of our good stories. C occasionally talks as if it is an empirical problem to define 'optimal.' We are looking for those problems for which this is the optimal solution. This is fine, but it is a postscript to research rather than a boundary condition for it. For me, that makes targeting 'optimality' rather uninteresting, as there is no target to aim at until you've hit it.

      That said, let 100 flowers bloom here. What I would like is some discussion of what any particular person is saying when a claim to efficiency or optimality is proffered. Optimal wrt what? The nice thing about the Berwick-Wexler view is that I have some idea if what is intended and how one might go about investigating matters. Other uses of the term leave me puzzled. I concede that this might be my problem rather than yours or C's, however.

    5. Some kind of optimization (most likely some cheap trick rather than a sophisticated process) seems to be inevitable in performance. What comes to my mind (not surprisingly in the context of SMT) is the eager parser in long distance dependencies

  4. @ Norbert:

    I think you still have issues with the notion "good". So, let's say the B&W parser is fast because of certain syntactic conditions that are missing from some other theory. My point is that the notion of "good" in this context is only useful in how it matches how people actually parse. If it doesn't match, then a grammar that results in slower or worse parsing would actually be a better grammar.

    I suppose I just don't see the point of the SMT if it isn't to pursue the notion of optimality. Optimality is an extremely interesting phenomenon that is present in mathematics, physics, chemistry, biology, and many domains of cognition. We just had a colloquium with Zygmunt Pizlo who discussed this notion in visual perception, it's worth checking out his work:

    My point is that if the conservation laws in physics, etc. have transparent analogies in mental computation, that is extremely interesting, and could in fact have ramifications for how computations are physically instantiated in the brain. It turns out that electrical circuits minimize a cost function for dispersal of heat, for instance (Pizlo informed me of this this afternoon). Sure, the SMT is difficult to evaluate, but its goals are important. By changing the notion of the SMT, you lose this goal. Which is fine, but IMHO it's not the SMT.

  5. @William
    Yes, it could be a better grammar even if slow and if it makes learning hard and production difficult. It could. But what's interesting is that many of the principles we are in fact postulating for Gs via FL also lend themselves to fast parsing and easy learning. Now, this may just be a fluke. But maybe not. One of the properties of our linguistic capacity is that we parse fast and pretty accurately. We do make mistakes, but we are pretty good overall. Similarly, Gs allow us to think/say many things. But there are many things that it forbids us from thinking/saying despite their being perfectly coherent thoughts (e.g. He thinks that everyone left=everyone thinks that he left). So there are constraints.

    As for optimality, optimal wrt what? My impression is that after the fact we can always find a well functioning system optimizing something. But that, being ex post, is not methodologically useful. So, my question to the optimizers is to suggest things that might be optimized that we can use and test.

    You may be right that this is not the SMT and so you might not be interested in this. I disagree: were it the case that principles we have postulated for G internal reasons also function to allow for fast parsing/easy learning etc then this would be a pretty good sense in which these are computationally efficient notions. My aim is to link computational efficiency (a staple of MP analyses) with some empirically evaluable conception. This is what B&W do. But for this work, I know of no way to anchor these claims. Moreover, the work is interesting independently of MP concerns. Linking them together, then, seems like a win all around. If you don't want to call this SMT, then call it something else. However, I do think there are resonances with what Chomsky has intended, though maybe not as deep as I have been imagining.