Bill Idsardi sent me this link to 4 recent lectures by Chomsky. I have not looked at them myself, but I suspect that some of you may find them interesting.
They are great. I hope there will be a discussion about them; there are several technical aspects of them which I do not understand completely.
To give an example, Chomsky exists that it is misleading to draw trees, and he sticks to bracketed structures instead. The reason seems to be that some nodes in the trees can remain unlabeled, but it is completely unclear to me why that cannot be represented in a tree structure. So I think I might be missing something. As a matter of fact, there are various other things which I do not get about the labeling function (such as: why is it there at all?) The unfortunate thing about these videos is that there are very few questions asked by the students, and there is very little discussion. But the main message is brilliant, in my view.
I am delighted to see that Norbert now dedicated a separate post to the papers i had linked to on June 3rd [http://facultyoflanguage.blogspot.de/2014/05/the-gg-game-plato-darwin-and-pos.html]. And hopefully there will also soon be a link to the sold out talk at Olomouc [Czech Republic] on June 5th.
Like Marc van Oostendorp I did not understand completely some of Chomsky's remarks and am looking forward to the educational discussion about to ensue.
So, I am 32 minutes into lecture 2, and there is discussion of the SMT, so here come my comments on this again. Chomsky discussed the SMT as guiding the choice of the simplest combinatorial operation, i.e., Merge, over some alternative (say, a set of phrase-structure rules). The set is more complicated because there are more rules - more rules being more complex than fewer rules. So this is how I interpret the SMT - more rules is less optimal, longer distance is less optimal than shorter distance, positive number of uninterpretable features is less optimal than zero uninterpretable features, etc.
However, a grammar with phrase structure rules could very well be more efficient for parsing and production than merge, right? I suppose I don't know off the top of my head what would be better for communication, but it's far from clear. But the SMT provides a clear guide in the sense of the some other notion of "optimal" or "minimal" - fewer rules are better than more rules.
So this is just my discomfort in couching the SMT in externalization terms - it would belie the goals of the Minimalist Program, IMO.
It strikes me that on this view the SMT is less a thesis than a principle of theory evaluation akin to Occam/Okam/Okham's (how is this spelled?) razor. Why? Well, were it a thesis it would not be in terms of "better/worse" but "best." And indeed Chomsky talks like this at other times: "OPTIMAL realization of interface conditions." Now, the interpretation you note cannot be so framed, or it could be but then we know it is false. Why? Because on these criteria the optimal theory would have NO uninterpretatble features and hence NO movement. Or if there were movement that was not feature driven the unit size of the locality domain would be very much smaller than a phase, as phases are, for example, larger than minimal phrases. There are yet more issues: is it fair to conclude from this version that longer derivations are preferred to shorter ones? After all, if moves MUST be short then derivations MUST be long (as Chomsky pointed out long ago). Indeed, longest moves have shortest derivations. So why pick on at the smallest domain as a metric rather than the length of the derivations (here's a suggestion from externalization: memory limitations, but we don't want to consider these right?)? Another question: what do we mean by "more rules"? Is a logical system that uses the Sheffer stroke more optimal than one that uses - and v? Don't we want to leave this kind of issue to our neuro-anatomy friends (you?) rather than decide what circuit units there are? Or, consider symmetric merge operations. We think that the interface, if stated in terms of arguments/predicates or events and their participants is asymmetric: If a is argument of P then P is not argument of a etc. So wouldn't a combinatoric system that fed this system be optimal if it too exploited an asymmetric basic predicate? But then doesn't that make ordered pairs superior to unstructured sets? I could go on giving cases where the apparent dicta of the SMT lead nowhere, but you probably see the point.
So, I don't want to give up on the SMT. But I want a version that is a real "thesis" not an ad hoc list of methodological suggestions that generally amount to "do the best you can given the data." If that be the SMT then there is nothing new or interesting about it. The fight is always then what's simplest in the circumstances and here we have lots of candidates always. That means that the SMT has virtually no bite and is then, IMO, of dubious interest. Or, more accurately, it is either clearly false when made mildly precise (e.g. a theory without long distance dependencies would be better than one with them as the shortest dependencies would be the only ones we get (note btw this would be true even if Merge gave us movement "for free")) or it is close to vacuous given that we can get a useful principle to match whatever we find. This does not strike me as a good way to go.
RE: Occam. Wikipedia suggests a way out of the dilemma through use of Latin: lex parsimoniae. And my mother always told me that taking Latin in high school would be a waste of time.
As always, my novicehood threatens to betray me here. I think there is a clear interpretation of the SMT given an appropriate definition of optimal within the correctly characterized constraints.
I don't think the optimal theory requires no uninterpretable features, because we are assuming that the computational system is an optimal link between pre-existing systems (certainly the interfaces, and I assume the lexicon). We assume the uninterpretable features are already there, right? So the system just has to deal with it in the most optimal way it can. These assumptions have to be in place, as far as I can tell.
I'm not quite sure about the notion of long and short movement and phases - my expertise may be failing me here. But my understanding is that the phase is just the unit at which structures are evaluated - this is the relevant domain, and the entire derivation really doesn't exist for the syntax, so you can't evaluate at this level. But I'm guessing here, because I really know nothing about phases.
There are similar notions for symmetric Merge - there is a different desideratum here. The operation that is added has to be as simple as possible because of Darwin's problem, right? So we assume the addition of an unordered Merge operation. Assuming this operation, then the question is does it behave optimally in hooking up with the demands of the interfaces.
So I feel like there are just certain assumptions that need to be made regarding the lexicon and the interfaces, and why kind of operation we get. Then, the question is: (now) does it behave optimally? It's like, assuming a prisoner has to escape from a prison, why do they use a spoon instead of a jackhammer? Because they don't get the jackhammer - the only question is, do they dig the shortest tunnel they could dig?
Thx for the spelling help. 'Occam' it is from now on. I hesitate to disagree with what you say because I think that some of version of this is true. I guess where I'm feeling lost is how to understand the SMT programmatically. What does it mean to realize the thesis, show that it is true? Now my problem is that there is no problem telling an optimizing story after the fact whatever we discover. Ex ante, however, things are more difficult. So, as I noted, there is tradeoff between the number of operations one applies and the unit size of the computational domain. The smaller the domain the more applications of the operation one needs to, e.g., get form one's initial position to one's ultimate landing site. The longer the step permitted, i.e. the larger the domain, the fewer the required applications of a particular operation. This tradeoff does not seem to have a conceptual resolution. Similarly, how simple Merge is doesn't have one. If the question is interface fit, then a merge that is asymmetric fits better with the predicates at the CI (and likely the AP, though as Chomsky does, let's put it aside) interface. Why? Because the predicates required for CIing are not symmetric so having a symmetric predicate makes things more complex. You reply well we are stuck with Darwin's Problem. But then part of what is complex or not depends on what was there to begin with: do we have any combinatoric operations that tweaked the right way would serve or do we need Merge full blown and that's the best option? Don't know, but Chomsky talks as if this is not the right way to think of these issues. Why not?
So, yes, things are complicated. It's optimality relative to x,y,z. Fine, what are x,y,z? Until we know what we are optimizing with respect to we don't really have a working notion and so no "thesis." And that's why I've shied away from this interpretation of the SMT.
Fair enough, but I think it will be difficult to have these parameters set in advance. I think part of the SMT is also reconsidering these parameters as you go, e.g., whether certain features are weak or strong.
A side note on Merge. I think Merge has to meet two conditions: (1) the empirical condition of combinatorics and (2) the evolutionary condition of recent, sudden, and unchanged emergence. So the simplicity of Merge should not be assessed with how it plays with the interfaces aside from the fact that it meets empirical adequacy of combinatorics. I think what Chomsky goes for is some kind of mathematical notion of simplicity here, which would eventually have to be cashed out in genetic and neuronal notions (as you mention above) rather than the other notions of simplicity e.g., economy conditions.
"Chomsky goes for is some kind of mathematical notion of simplicity here"
Yes. If there is a hunch here, it's that this conceptual simplicity will match some relevant evo notion. Right now, I'm skeptical of this move. But Chomsky's hunches have worked out before, so...
Yes, I am skeptical too. But from what I can tell it seems to work surprisingly well if it were on the wrong track. I don't know when we'll know enough about genes and neurons to give a good answer.
I hesitate to make my first chime-in on the blog be tinged with self-promotion, but: I have argued (here, here) that narrow syntax may indeed *not* have anything resembling uninterpretable features (at least, not in the sense of "features that will lead to a crash if not checked"). Certainly, such features don't seem to be involved in what we pre-theoretically refer to as agreement.
What is the relevance of this? Well, Norbert says above that a view of syntax as an "optimal realization of interface conditions" would entail a syntax without uninterpretable features. I think there is good reason to think that this is exactly the syntax we observe. Now: both agreement and movement (or, if Norbert is right, the single operation that underlies them both) are certainly feature *driven* – i.e., they occur only when certain features are present in the derivation – and what I have said here sheds no particular light on why those features would exist in narrow syntax. But I do think it is interesting to note that, if I am right, there are no features in narrow syntax that the interfaces cannot deal with (i.e., the kind that would cause a *crash* were they to arrive at the interfaces untouched). For those who wonder how syntax could be "optimal" while also containing crash-inducing features, this would perhaps be a modest step forward.
I agree that this is interesting. However, as you note all current stories rely on features to drive operations. Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone? Not obviously. Is this better than theories where there are no such features at all and so no movement to speak of? Not clear, at least to me. How much do we know about the interface at all such that movement is required? Chomsky suggests that we know that there is duality of interpretation. But is this distinction something that CI objects/structures had pre-linguistically? Were they there with this duality BEFORE FL arose? I really cannot say. IMO, we know so little about the CI interface that worrying about the fit between it and products of FL hardly restrains speculation at all. And that's too bad.
You ask: Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone?
It may or not be *conceptually* better, but it is *empirically* better (in particular, Bare Output Conditions can't adequately handle agreement; that is the crux of the work I cited above). So, this is the (a? my?) hope: that some of these issues that are murky when addressed on the conceptual playing field can be resolved on the empirical one.
Yes, but the SMT if a thesis at all is conceptually driven. One way of reading your results is that the SMT is just wrong. Question: is there a version that may be right and has some teeth?
I think there is, and it has to do with the "transparent use" version of the SMT that you, Norbert, have talked about on the blog before. (Though, in fairness, one could reasonably accuse me here of the very same post-hoc fitting of the notion optimal that was alluded to above...)
The Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions. If the grammar is supposed to be put to use by realtime systems of production and comprehension, then I think it would be preferable to have what Frampton & Gutmann have called a crash-proof grammar: one that determines, for every derivational state, a finite (possibly singleton) set of possible next steps, each of which is guaranteed to lead to a well-formed outcome.
So if production and comprehension work as efficiently as they do because of properties of the grammar, crash-proofness strikes me as an excellent candidate property to consider. And the Bare Output Conditions grammar lacks it.
(NB: I have phrased this in computationally naive terms. This is partially due to a lack of expertise on my part, but also because I am not convinced that classic computational complexity theory is the only game in town when it comes to evaluating the efficiency of a grammar. Suppose we built a parsing algorithm underpinned by a crash-proof grammar, and it performed at an average-case complexity of O(f(n)); but that someone clever figured out that you can swap in a Bare Output Condition grammar underneath, and rule out whole classes of non-convergent derivations very efficiently so that you still maintained an average-case complexity of O(f(n)). Would this mean that the apparent computational advantage of crash-proofness has dissolved? Not necessarily. When it comes to implementing a realtime algorithm using finite neural machinery, constants may very well matter. And crash-proof grammars keep the number of possible derivations you need to entertain to a minimum.)
Norbert wrote: However, as you note all current stories rely on features to drive operations. Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone? Not obviously. Is this better than theories where there are no such features at all and so no movement to speak of? Not clear, at least to me.
It's not true that "all current stories" rely on formal features as triggers of syntactic computation; see e.g. work by Reinhart, Fox, Moro and others. It's quite amazing, in my view, that the featural-triggers hypothesis is still so popular, given that (again, in my view) it has had virtually no explanatory success in any domain of syntax. As Chomsky, Fanselow and others have pointed out repeatedly, in almost all cases featural triggers are totally ad hoc, without independent motivation: saying that XP undergoes some operation O because XP has an O-feature is nothing but a restatement of the fact. And even in the one case where there is some initial plausibility for a featural trigger for displacement (wh-movement), there is interesting work suggesting that features aren't needed.
It's particularly surprising to me that so many "Chomskyans," like Norbert, are still adhering to the feature-driven world view, given that Chomsky himself has clearly been moving away from this at least since the 2004 paper, advocating a free-Merge model instead (although I admit he's been a bit ambivalent about this). The motivation is clear: what's done at the interfaces doesn't need to be redundantly replicated in syntax, and so interface-based explanations, at least in principle, shift the explanatory burden away from UG. And while most people have been happy to continue working with featural descriptions rather than more principled notions, there is some very intersting work, as I indicated above (and also Omer's work), that shows that this alternative can be fruitfully pursued.
Note also that the featural-trigger hypothesis has serious conceptual problems: the principle of Full Interpretation has, to my knowledge, never really been justified conceptually; to mark valued uFs as "deleted" you need to introduce a diacritic (masked by its "strike-through" notation, but still a diacritic), hence violate Inclusiveness; you need to somehow assign non-intrinsic features to XPs in the course of the derivation (or in the numeration, but that too is without independent motivation); etc.
I find lots to agree with here. Feature driven theories have, IMO, all the problems that you note (BTW, I am not a fan of these, nor have I ever been, but that's secondary). However, I think we judge the current state of the art differently. From what I can see, feature driven I-merge is the name of the game, at least in the bulk of work that travels under the Minimalist banner. There is lots of probing and goaling and none of this makes sense, at least to me, in the absence of features. From what I can gather, Chomsky still likes this way of putting things, which maybe is what you mean by "ambivalent." So, let's agree, the fewer features the better and better still BOCs that determine which generable objects survive and get used.
Here's where I think we might differ: to date I see almost nothing that tells us anything about BOCs. You note the principle of full interpretation is unprincipled. Well, can you think of any useful BOC at all if even that one is gone? (You mention Fox and Reinhart: but if look-ahead bothers you (and it does me) then comparing derivations wrt interpretations is anything but efficient). We know so little about features of the CI interface that invoking it as an explanatory construct is just as bad (maybe worse) than a general reach to features. At least with features we might have some morphological guidance.
So: couldn't agree more re feature abuse and its general non-explanatory nature. But, though the underlying conception outlined at the start of MP which distributes explanation between BOCs and Efficient computation is a nice picture, the basic results, from where I sit, have come from considering the computational properties of grammars NOT how they map to the interfaces, CI especially. This is WHY people reach for features: there is nothing else productive on offer.
Norbert: sorry for the delay, and also sorry for mistakenly placing you in the featural-triggers camp; at least in the 2005 textbook you seemed to me to be subscribing to this general world view, but perhaps this was in part for pedagogical reasons and/or doesn't reflect your current views.
You say, There is lots of probing and goaling and none of this makes sense, at least to me, in the absence of features. From what I can gather, Chomsky still likes this way of putting things, which maybe is what you mean by "ambivalent."
That is indeed what I meant, but I think in Chomsky's case the main reason why there's still lots of probing and goaling is that he sticks to classical Case Theory. Drop that and the role of features is diminished significantly. And note his use of features in the lectures you posted: they determine to some extent where some element can or cannot end up, but they don't really trigger anything. His discussion of the "halting problem" (*Which dog do you wonder John likes t?) was most explicit on this: instead of adopting Rizzi's solution he argues that you can permit "illicit" movement in such a case, since it won't yield a coherent interpretation. The features are implicated in that they're relevant to labeling, but they don't trigger or block anything. I'm not saying that's the correct or ultimate explanation, but the general spirit seems to me to point in the right direction. And note that if something like this is correct you *want* the syntax to be able to carry out the (eventually) illicit operation, for otherwise you'd be replicating in the syntax what's independently done at the interface.
You say, Here's where I think we might differ: to date I see almost nothing that tells us anything about BOCs. You note the principle of full interpretation is unprincipled. Well, can you think of any useful BOC at all if even that one is gone? (You mention Fox and Reinhart: but if look-ahead bothers you (and it does me) then comparing derivations wrt interpretations is anything but efficient). We know so little about features of the CI interface that invoking it as an explanatory construct is just as bad (maybe worse) than a general reach to features.
I don't think this is a fair criticism: there is indeed very little work trying to pin down the precise nature of interface conditions, but I think the reason for this is precisely that most people in the field have been and continue to be too content with rather shallow "explanations" in terms of unmotivated features (sometimes supplemented with a caveat that the features are really "shorthand" for something else, but this just begs the question). Speaking from personal experience, I have been urged more than once now by reviewers to patch up some open issue in an analysis with an invented, arbitrary feature rather than simply leaving it open. The descriptive urge is strong, and I think also drives the whole cartographic movement, which continues to be much more popular than the alternative approaches I adumbrated.
I agree with you that the Fox-Reinhart take on QR has issues concerning look-ahead (if I remember correctly, Fox explicitly admits and acknowledges this). But this is not necessarily true of interface-based explanations per se. Take for instance the Moro-Chomsky approach to {XP,YP} structures requiring movement to be linearizable/labelable; this requires no look-ahead, but it does mean that there will be failing derivations (those in which no symmetry-breaking movement applies, at whatever level this is detected). I don't know how plausible this is overall, but even Chomsky's sketch of an explanation of successive-cyclic movement in these terms strikes me as more principled than any account I've seen that relies on featural triggers for intermediate movement steps. There are also theories of scrambling that don't assume "scrambling features" and the like, but instead free movements whose information-structural effects are determined by mapping rules (Neeleman & van de Koot). I understand that in Tromso-style nanosyntax, certain movements serve to derive lexicalizable subtrees; but those movements are necessarily blind to such needs arising "later on," so they end up being obligatory but crucially without any look-ahead (Chomsky has a really good discussion of why optional operations should not be misunderstood as applying teleologically in "Derivation by phase"). All you need to accept is that there are failing derivations, which -- it seems to me -- is entirely unproblematic, once misunderstandings concerning competence/performance and acceptability/grammaticality that we've addressed here are cleared up.
So I agree that we know little about interface conditions, but I fail to see how that discredits the approach. I also don't see what "basic results" have emerged from the feature-driven framework, unless by that you mean (undoubtedly important) empirical generalizations. I have yet to see a case where that framework provides a genuine explanation for a real problem, rather than just a puzzle disguised as an answer.
I am not saying it discredits it. I am saying that there is no real approach there yet. There will be one when we know something about CI and what demands it makes on derivations.
Btw, I use 'filter' in a loose sense. It explains some data we are interested in without doing so by restricting generation.
Last point: just to touch base on Chomsky's current view: my problem with it is that it makes an ad hoc assumption about labels/features and CI requirements. I do not see why agreement is required for interface reasons. Do we really need to know that a sentence is a question in addition to knowing that a certain operator is WH? What of agreement? Is this required for interpretation? Moreover, this approach to successive cyclicity, at least to me, has all the virtues and vices of Boskovic's story. So until I hear of some independent evidence for Chomsky's interface condition that forces agreement on pain of CI uninterpretability (if that's the real proble, I'm never sure here as Chomsky doesn't say what goes awry if the agreement fails) then I am not going to be more impressed with this sort of story than one that just keeps adopting new features. Indeed, I don't see how unmotivated interface conditions is any better or worse )or different) than unmotivated features. That's my problem. It's not conceptual at all.
Indeed, I don't see how unmotivated interface conditions is any better or worse )or different) than unmotivated features. That's my problem.
No disagreement here. My feeling though is that stories based on featural triggers for Merge are rarely insightful, and I find it hard to see how they could be. (I'm not talking about theories of agreement or other operations where features are uncontroversially involved, I'm referring only to "triggered Merge" models that take operations like structure-building, deletion etc. to be contingent on formal features.) By contrast, the little bit of work there is that explores alternative notions of "trigger" (in terms of interface effects) looks much more promising to me, although naturally lots of problems remain.
But I think the choice between "triggered Merge" theories and interface-based theories is not just a matter of taste. At least implicitly, "triggered Merge" approaches often rest on the assumption that the goal of the theory is to model a "crash-proof" system, an idea that I believe rests on the mistaken equation of grammaticality and "acceptability." In interface-based theories considerations of acceptability take a back seat, shifting the focus of investigation to the question of what consequences syntactic operations have when it comes to interpretation and externalization of the resulting structure. So while I agree with you that either approach must be evaluated a posteriori based on its merits, I think that the two approaches differ, a priori, in terms of what they take the theory to be a theory of. (As always, the truth may well lie somewhere inbetween. There's an interesting paper by Biberauer & Richards that proposes a model in which obligatory operations are triggered by features whereas others apply freely, the latter licensed indirectly by their effect at the interfaces.)
As for successive-cyclic movement, I was merely refering to the general idea that you're always free to move to the edge, but if you don't, you're stuck; what Chomsky adds is that you can't stay in a non-final edge, since that will mean that the higher predicate's complement is an unidentifiable {XP,YP} structure. Requiring labels for purposes of ensuring locality of selection is the one place where they make some sense to me, intuitively at least. I'm curious, what "virtues and vices of Boskovic's story" do you have in mind?
@Dennis B's theory requires movement until features of wh are discharged. Then it can move no more. Chomsky's theory is that Wh moves until criterial agreement occurs and then there is not more movement. This seems to be the very same idea, one feature based, one BOC based. The feature story is unmotivated. But so far as I can tell so is the BOC account as it relies on a very strong (and unmotivated) assumption about clause typing being required for CI interpretation. Moreover, the freezing of the hw after criteria have been checked is based on considerations having to do with interpret ability that I frankly do not understand. So, where the two theories make claims, they seem to be more or less the Sam claim. Both theories, of course, have problem coping with all the variation we find in wh movement In cases of multiple interrogation. The variation is very hard to model given either assumption.
So, is one better than another. Right now, they are pretty interchangeable. This said, I agree that WERE one able to find defensible, non trivial BOCs that would be very nice. But then were one able to find non trivial defensible features, that would be too. At the level of abstraction we are discussing things, I think the biggest problem with both views is how little the assumptions made have any independent plausibility.
Norbert: Thanks for clarifying, your comparison of B's and C's stories is helpful. Seems to me that C's story relies on the premise that you want the interrogative clause to be locally selectable, hence you need to identify it by labeling. I agree this is a strong assumption, but one that strikes me as prima facie way more plausible than distributing vacuous intermediate movement triggers. But I guess this is where it comes down to theoretical intuitions, and the account would need to be worked out much more.
However, for the sake of the argument let's assume C's and B's theories have the same "empirical coverage." Then don't you think that C's story is still preferable, since it implies no enrichment of UG? The assumption is that clause-typing/selection are conditions imposed by C-I, so the syntax need not know anything about them. But B's model needs a syntax that is sensitive to and constrained by trigger features, deviating from simplest (= free) Merge. This is where I see the general conceptual advantage of interface-based explanations, although of course at the end of the day you want them in turn to be grounded in theories of the interfacing systems. And I think it would be premature to dismiss such approaches just based on the fact that we do not yet have those theories in place.
Something that also strikes me as relevant to this whole issue was mentioned by Omer: The Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions.
If you follow Chomsky and drop the idea that there is a significant notion of "well-formed formula" for natural language, then the term "overgeneration" has no real meaning. There's nothing incoherent about the idea that the grammar itself licenses all kinds of expressions along all dimensions of "acceptability," deviance, usefulness, etc. In fact, we know that acceptability and grammaticality cannot (and should not) be directly correlated, although many people seem to be assuming essentially this -- a misunderstanding, I think, rooted in analogies to formal-language theory in early GG (note that in FLT, there are no interface systems/conditions).
So once you drop that, "overgeneration" and conversely "crash-proof (grammar)" all become pretty much meaningless notions, or at least I don't see what they would mean, unless you either mistakenly equate acceptability (or something like that) with grammaticality or else give up on GG's fundamental tenet that competence can/must be meaningfully studied in total abstraction from real-time processes. Otherwise there's no problem whatsoever with having a grammar that yields all kinds of expressons, only a subset of which is usable by interfacing systems -- in fact, I think, it's the most desirable outcome, since it would allow you to get away with a maximally simple core-computation system, in the extreme.
If "sticking close to the SMT" means anything for actual linguistic research, this seems to me to be the guideline: try to show that Merge applies freely, and ascribe as much as the complexity we find beyond this (i.e., basically all the complexity) to the independently given interface systems. And again, I think the only reason this perspective is often frowned upon is the mistaken idea that acceptability = grammaticality, and therefore an efficient grammar should be "crash-proof." As I said before, Chomsky has made this point repeatedly, but it seems that it hasn't really had an impact.
The idea of severing grammaticality from acceptability has indeed not had an impact – at least, not on me – and I think for good reason. It's not because I think the idea of severing the two is, in principle, incoherent (on the contrary; see below). It's because I think that at the moment, it is a methodological dead-end. I assume as a methodological heuristic that acceptability = grammaticality, and frankly I have no idea how to do linguistics (or at least, syntax) without this assumption.
Things would be different if we had good independent evidence (by which I mean, independent of language) for what these "interface systems" are like and what their effects were on acceptability. But absent that, severing acceptability from grammaticality is just like saying "oh, sentence X is acceptable/unacceptable because of processing" without a theory of processing; it's a methodological trash-bin for things that one's theory of grammar cannot explain.
So I think, Dennis, that the two of us agree in principle that grammaticality and acceptability are not the same thing. Where I differ is that I think there is nothing incoherent about practicing linguistics by heuristically equating the two; and, moreover, I think it's methodologically unsound to sever the two without an explicit theory of acceptability – or, at least, an explicit procedure to identify when something is an "acceptability" effect vs. a "grammaticality" effect.
Until Chomsky or anyone else comes up with such a theory or procedure, I will continue my methodologically-motivated idealization of "acceptability = grammaticality" (just one of many idealizations that we, just like any other scientists, routinely make). And since Chomsky's current proposal does not come with such a theory or procedure (from what I can tell), I will continue to assert that his model of grammar is unsuited for realtime use – or at least, not as well-suited as a crash-proof grammar would be.
I have lots of sympathy with Omer's methodological point. One thing I believe that we should guard against is throwing out 60 years of work on FL in order to advance minimalist notions. For me, if MP doesn't account for the kinds of generalizations we have found over 60 years of work (it need not be ALL, but a good chunk of these), then so much the worse for MP. These are the empirical benchmarks of success, at least for me.
Last point: I agree that your version of the SMT is a prevalent one. And that's my problem with it. It's not a thesis at all, not even an inchoate one, until one specifies what the CI interface is (what's in it, e.g. how many cognitive modules does FL interact with?), what properties it has (what properties do these modules have?), and how grammatical objects get mapped to these. Only when this is done do we have a thesis rather than a feel good slogan. And, that's why I like other versions of the SMT more: they provide broad programs for investigation, one's that I can imagine being fruitfully pursued. They may be wrong, in the end, but they point in clearish research directions, unlike many of the versions of the SMT that I am familiar with. So a question: why are parsers, visual system, learners NOT considered part of the interfaces that FL interacts with? Why should we not consdier how FL fits with these? Or, put another way: what interface modules "count" as relevant to SMT and what not?.
Last point: can you point to examples where the version of the SMT you outline is displayed? I don't mean mentioned in passing, but cases where GIVEN what is a plausible property of CI (one with some independent support (e.g. not Full Interpretation given your earlier remarks) serves to interestingly filter Gish products? It would be good to have a couple close to hand to massage intuitions. If you want to elaborate on this in a bigger setting, e.g. a post or two, feel free to send me some stuff and I will put it up. It would be a very useful service.
Omer, I agree with you to an extent that assuming a correlation between acceptability and grammaticality has been and continues to be an attractive heuristic premise. But note that we're placing a bet here, really, as it is in no way a priori given that there is *any* correlation between the two. We just hope that there is, for otherwise it's less clear what our empirical evidence would be (but see below). But I think we agree that the two notions are logically distinct.
However, I don't think that severing the two methodologically is contingent on having a theory of acceptability. That's a bit like asking for a theory of E-language, I think, in the sense that neither term is meant to denote a real theoretical category. Note that our notion of "acceptability" is entirely informal (and this is true whether or not we let people rate sentences on scales); all it means is that people find certain kinds of sentences funny, and evidently bazillions of cognitive factors enter into those judgments. By contrast, grammaticality (in the FLT sense) is a strictly theoretical/technical notion and well-defined as such.
I think it's in part for this reason that we shouldn't be misled into thinking that our theory is about acceptability. Again, pretending that it is has been a somewhat useful heuristic, but it is at points like the one we're discussing here that it can become seriously misleading, in my view. So I don't think "modeling (un)acceptability" is what we want our theory to do; rather, we want it to be a true theory of possible sound-meaning pairings, including those that are "deviant" or "unacceptable" -- the question of why some of those pairings strike us as deviant, register-dependent, word salad, etc. is a separate, secondary one. (This, I believe, is what Chomsky means when he says that we *want* the system to be such that it generates "deviant" expressions.) As I said, I think the field's obsession with acceptability has been useful to some extent, but it can become seriously misleading when this is taken to be anything but a heuristic simplifcation based on a non-trivial bet. (I've commented on this blog before that it is also misleading when acceptability is the basis of putative "empirical generalizations," for instance concerning islands. All we know in this and other cases is that sentences of certain kinds strike people as odd -- which presumably a meaningful fact, but one that in and of itself tells us nothing about whether or not it has any relation to the theory of I-language.)
As for your last paragraph: I frankly don't understand what it means for a competence model to be "unsuited for realtime use," given that the system is by definition not one that operates in real time. And this logical point aside, we know that grammar cares very little about real-time use, just think the standard examples of multiple center embedding, etc. So I wouldn't want to build any argument on this kind of reasoning.
Indeed, grammar may not care about realtime use; but realtime use seems to care about the grammar. (See the work previously cited by Norbert on this blog, e.g. work showing that parsing respects c-command for antecedent resolution; that it respects islands for filled gap effects; etc.)
So here is a choice point: either there is a second system, G', that is usable in realtime and mimics (to a non-trivial degree) the effects of the grammar G; or the two are one and the same. One option is to maintain that G and G' are distinct, in which case the burden is to explain why their effects are so darn similar. The other option is to accept that G can be used in realtime, in which case the price is that we can no longer use the refrain you invoked to free ourselves entirely of considerations of realtime computation. I choose option two.
@Dennis "So I don't think "modeling (un)acceptability" is what we want our theory to do; rather, we want it to be a true theory of possible sound-meaning pairings, including those that are "deviant" or "unacceptable" -- the question of why some of those pairings strike us as deviant, register-dependent, word salad, etc. is a separate, secondary one."
I think that this is right and important: our theories are not ABOUT acceptability or even popsicle sound-meaning pairs. Our theories are ABOUT the structure of FL. That said, looking at relative acceptability of pairs has proven to be an excellent probe into FL. Why? Well one reason is that we believe that FL generates an infinite number of such pairs and so looking at their properties is a reasonable thing to do. Do we have a guarantee that this will work well into the future? No we do not. But that it has proven very effective till now suggests that it is a very fruitful method of investigating FL, somewhat surprisingly so given what a crude probe it really is.
Why do I say that this has been successful? Well, as in all things methodological, the proof of the pudding is in the eating. I judge our discoveries to data to be very non-trivial and this suggests that the tools used to make these discoveries have a lot going for them. So I think that these tools are the last word? No, I suspect that these methods need supplementation at times and that there are some questions for which these methods will prove nugatory. But that's the way of all things scientific.
I suspect that you agree with this. As you say: "I agree with you to an extent that assuming a correlation between acceptability and grammaticality has been and continues to be an attractive heuristic premise. But note that we're placing a bet here, really, as it is in no way a priori given that there is *any* correlation between the two."
I agree entirely, with the caveat that this is a bet with a track record. And what else do we really have to go on? When Chomsky, for example, notes that reducing redundancy is a good strategy in theorizing he notes that this has worked before, so too looking for simple theories etc. I agree with him about this. But there is no guarantee that this should be so. Similarly with acceptability judgements (under interpretations). No guarantee, but one hell of a track record.
Omer: On a sociological note, I am not sure generative syntacticians use the heuristic of acceptability = grammaticality. There is clearly a whole lot of experience and intuition (reasonably so, IMO) used in selecting what is relevant data to try and explain through competence mechanisms.
For example, acceptability judgements are fine-grained and are surely modulated by sentence complexity. But, people aren't trying to explain this through competence mechanisms/principles in generative syntax. Instead it is more usual to try and pass it off to a less than explicit memory-based explanation. Another set of popular examples is that of "escher" sentences and agreement attraction errors. In these cases, the intuition is the opposite, they are deemed acceptable by many, and again a lot of people (reasonably so, again) try to outsource these problems to less-than specific performance factors. I don't see anything wrong with this methodologically. But, of course, I would say we should be keeping track of all these things in case later we find additional generalizations worth making. We don't want to exclude too much and make our competence theories almost trivial.
This of course raises the obvious question that worries you. So, how do you go about separating the two?
"I think it's methodologically unsound to sever the two without an explicit theory of acceptability – or, at least, an explicit procedure to identify when something is an "acceptability" effect vs. a "grammaticality" effect."
And the only honest answer that I can think of is that there is no clear-cut separation. Researchers necessarily have to use their noodle, their experience and intuitions in trying to slice the pie, and the only possible evidence that the pie has been cut the right way is the the utility of that particular way of slicing the pie for future research. I think asking for an explicit procedure to separate acceptability from grammaticality is a bit naive, and will, IMO, result in stifling scientific innovation.
In fact, my bet is everyone, including you, applies similar intuition-based criteria in separating grammaticality from acceptability. At best, your pie slices might be different from others.
@Confused: I'm not sure I see the distinction you're making.
Let's take agreement attraction as an example. I think most people working on this phenomenon have at least some proposal in mind for when (i.e., under which structural and/or linear conditions) it takes place. That kind of proposal, even if it is just a set of working assumptions, qualifies as far as I'm concerned as a "procedure to identify when something is an 'acceptability' effect vs. a 'grammaticality' effect" (in the domain of agreement). While I don't know who is behind the Confused Academic moniker, I have a sneaking suspicion that you would agree that this is categorically different from discounting data points as "unacceptable but grammatical" or "ungrammatical but acceptable" on an ad hoc basis.
(Similarly, I could imagine someone saying that an "Escher sentence" is one where the acceptability rating shows higher-than-usual sensitivity to the amount of time the subject has to judge the sentence. I don't know if this particular criterion would work, but it's the kind of thing I have in mind.)
As for the biographical component, regarding my own work: yes and no. I've worked on some less-studied languages lately, at least in part because there's more to be discovered there even while looking at only monoclausal or biclausal sentences – which I feel are easier for speakers to judge. Now you could say that this is just another way of slicing the acceptability-grammaticality pie: "short > long" / "monoclausal > biclausal > ..." / something like that. This wouldn't be wrong. But if a quadruple-embedded example in Kaqchikel falsified my favorite theory of Kaqchikel agreement, I'm not sure I would feel more comfortable saying "that's an acceptability effect!" than I would just saying "hmm, that is something that my current theory does not explain."
@Omer: I think this points out exactly what I was trying to say.
"But if a quadruple-embedded example in Kaqchikel falsified my favorite theory of Kaqchikel agreement, I'm not sure I would feel more comfortable saying "that's an acceptability effect!" than I would just saying "hmm, that is something that my current theory does not explain"
But, right in this, one shows that there is no reason to obviously think that the quadruple embedding is a grammaticality issue either. But, if acceptability = grammaticality were a heuristic, and if one needs to treat everything as an issue of grammaticality when there is no explicit justification of the partition, then this is a rather weird thing to do. As you rightly note (IMO), the very idea of cutting the pie into short and long betrays a subtle hope perhaps that the short effects might extend to longer effects, which in turn suggests a certain, vague but useful, understanding of how grammaticality is divorced from performance mechanisms.
The same is true of agreement attraction errors. The very fact that these have been termed as such suggests a certain partitioning of the pie. Note, the correct analyses of these errors might indeed involve grammatical constructs, but the fact that a lot of productive work on agreement has happened while systematically ignoring these as "errors" suggests a certain rightness to that way of thinking, namely, the somewhat arbitrary partitioning of the relevant data to grammatical vs. acceptable in the name of progress.
Omer, you wrote: So here is a choice point: either there is a second system, G', that is usable in realtime and mimics (to a non-trivial degree) the effects of the grammar G; or the two are one and the same. One option is to maintain that G and G' are distinct, in which case the burden is to explain why their effects are so darn similar. The other option is to accept that G can be used in realtime, in which case the price is that we can no longer use the refrain you invoked to free ourselves entirely of considerations of realtime computation. I choose option two.
If option two just means acknowledging that when engaging in (linguistic) behavior we're putting to use our (linguistic) knowledge/competence, then I agree; the two are trivially related in this sense. But the systems must be fundamentally different, as a matter of logic, if we want to maintain that the grammar is a mental representation of the speaker's linguistic knowledge, whereas other processes operate in real time to algorithmically assign meanings to sounds based on what the grammar identifies as licit sound-meaning pairs. I'm not sure what you mean when you say that production/comprehension "mimics (to a non-trivial degree) the effects of the grammar" or that "their effects are so darn similar;" the crucial point to me is that no such comparison can be more than informal, given that "operations" in the grammar have no procedural dimension (just like steps in a proof; I think Chomsky has used this analogy), whereas production/comprehension systems are necessarily procedural. So if option two means likening real-time processes to purely logical operations (steps in a derivation or whatever), then I don't see how this could possibly be stated coherently, without conflating logically distinct dimensions. And consequently, there's no burden attached to distinguishing the systems, since it's a matter of necessity.
The Atlantic had an interesting interview with Chomsky a while ago (it's online), where at some point he says that I-language "has no algorithm," it's just abstract computation, so it's incoherent to impute to it a procedural/algorithmic interpretation. Interestingly, a recent paper by Sprouse & Lau ("Syntax and the Brain") explicitly denies this right at the outset, stating that the processor is simply I-language at the algorithmic level. So Sprouse & Lau in effect view I-language as an input-output system whereas Chomsky takes it to be a system of competence (accessed by systems of use but distinct from them), and this seems to be just our disagreement here.
Sorry, Norbert, I missed your earlier comment. You wrote, Last point: I agree that your version of the SMT is a prevalent one. And that's my problem with it. It's not a thesis at all, not even an inchoate one, until one specifies what the CI interface is (what's in it, e.g. how many cognitive modules does FL interact with?), what properties it has (what properties do these modules have?), and how grammatical objects get mapped to these. Only when this is done do we have a thesis rather than a feel good slogan. And, that's why I like other versions of the SMT more: they provide broad programs for investigation, one's that I can imagine being fruitfully pursued. They may be wrong, in the end, but they point in clearish research directions, unlike many of the versions of the SMT that I am familiar with. So a question: why are parsers, visual system, learners NOT considered part of the interfaces that FL interacts with? Why should we not consdier how FL fits with these? Or, put another way: what interface modules "count" as relevant to SMT and what not?
You may interpret this as dodging the question, but here I'm with Chomsky: we have to find out what the interfacing systems are and what constraints they impose as we proceed. But it's not like we have no idea what their effects are: we have things like Binding Theory, Theta Theory, etc. after all. And I'm with you that we should take the basic generalizations on which these theories rest seriously and to be too good to be entirely false (although our optimism is orthogonal to the issue). The task, as I see it, is to refine and restate, in a more principled fashion, these putative "modules of grammar" in terms of interface requirements, kind of like what Chomsky & Lasnik tried to do for Binding Theory in their 1993 paper. How is this a less clear or coherent research guideline than the more traditional one that seeks capture the complexity in terms of syntax-internal constraints, as you imply?
Incidentally, I don't think this is the prevalent interpretation of SMT at all, at least not in practice. There's very little work, as far as I'm aware at least, that actually tries to do this.
@Dennis: Yes, you've zeroed in on it. I indeed reject Chomsky's views on this matter (e.g. what you quoted/paraphrased from the Atlantic). This used to be motivated, for me, for the usual scientific-method reasons (i.e., if realtime systems and the "competence grammar" actually do share some (or all) of their subsystems, you'd never discover it if you started out with the assumption that they're separate, e.g., "it's incoherent to impute to [the grammar] a procedural/algorithmic interpretation").
But now, as I said before, there is positive evidence accruing that the realtime procedures respect islands, c-command, etc. (ask some of the other UMD folks commenting on this blog; they know far more about it than I do). And if Chomsky's ontological choices make this comparison necessarily informal, then in my view, that's simply another strike against those ontological choices.
Lastly, I think the notion of "a system of competence (accessed by systems of use but distinct from them)" is incoherent if you think the systems of use can operate in realtime but the system of competence cannot (if the operation of SoU involves a step where SoC is accessed, and SoU operates in realtime, then at least that part of SoC that is accessed by SoU must also be able to operate in realtime).
@dennis Hmm. If we are talking about bets here, we migh be putting our money in different places. I personally think that we will get a lot more from considering how the computational system works than thinking about how the interfaces interpret. I am also not particularly impressed with Chomsky's treatment of binding or theta theory. The latter, to the degree it says anything about CI amounts to a restatement of the principle of full interpretation, which, as I recall, you're not impressed with. As for binding theory, virtually none if it's properties follow from anything Chomsky has said. Whey the locality and hierarchy conditions? He says that this is natural, but come on, really? Nowadays he ties locality and hierarchy to probe goal dependencies between antecedent relations to heads they agree with and which in turn probe anaphors. But this doesn't explain anything. It just restates the facts. I could go on, but the real point is that we need concrete examples that do real work before deciding on how reasonable this approach is. I'm waiting for affix hopping! Till I get this, I'll stick to my view that the idea is coherent but so under specified as to be currently little more than a poetic hint of something that MIGHT be worthwhile if anyone figures out how to make it concrete.
Dennis, To Omer's statement that "The Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions", you responded that "If you follow Chomsky and drop the idea that there is a significant notion of "well-formed formula" for natural language, then the term "overgeneration" has no real meaning" and that ""overgeneration" and conversely "crash-proof (grammar)" all become pretty much meaningless notions". I wonder if you could elaborate a bit on this. I'm not sure I understand how "overgeneration" becomes meaningless. Brandon
Brandon (sorry for the delay): well, in formal languages where you stipulate the syntax, you have well-formed formulae (anything that abides by the syntactic rules you stipulated) and everything else (those formulae that don't). So that's a straightforward [+/-grammatical] distinction: there's a set of expressions the grammar generates, those are [+grammatical], and anything else is ungrammatical, i.e. not generated. Chomsky denies that there is such a distinction for natural language, or at least a meaningful one, since expressions can be more or less acceptable along various dimensions, acceptable in some contexts/registers but unacceptable in others, etc. So for natural language there's no notion of well-formed formula, or at least this is not what we're probing when we ask people for acceptability judgments (where a host of other factors come into play besides grammar). But the notion of "overgeneration" presupposes precisely that there is a notion of well-formed formula -- if you generate formulae that aren't in the language (= set of sentences), you "overgenerate." But in linguistics people typically, and mistakenly, use apply the term to analyses that predict/imply the generation of "deviant" forms. This is at best a very informal notion of "overgeneration," and not one that is defined technically, *unless* (and that's the fallacy) you equate grammaticality and acceptability. Same for crash-proof grammars: as far as I can see, the goal of these is to generate "only what's acceptable," as though this were equivalent to the notion of well-formed formula in formal language theory. Even if this were a coherent goal (which I don't think it is, since acceptability is not a defined technical notion), it would be empirically dubious, since, as Chomsky has emphasized, we want the grammar to generate all kind of deviant expressions which have perfectly coherent interpretations, and may even be used deliberately in certain contexts.
@Dennis Aren't you conflating acceptability with grammaticality here? Your words, relevant part between *s: "So that's a straightforward [+/-grammatical] distinction: there's a set of expressions the grammar generates, those are [+grammatical], and anything else is ungrammatical, i.e. not generated. Chomsky denies that there is such a distinction for natural language, or at least a meaningful one, *since expressions can be more or less acceptable along various dimensions*, acceptable in some contexts/registers but unacceptable in others, etc. So for natural language there's no notion of well-formed formula…"
This looks like it assumes that because utterances vary in acceptability that a grammar must eschew a notion of grammaticality. But does this follow? We know that one can get continuous effects from discrete systems (think genes and heights). So the mere fact that acceptability is gradient does not imply that grammaticality is too.
There are several factors you mention. Registers: but of course we know that people have multiple grammars, so we can say it is grammatical in one but not another. You also mention other factors: but this does not mean that a sentence may not be +/-grammatical, just that grammaticality is one factor in gauging acceptability. And we have tended to think that it is a non-negligible factor so that acceptability was a pretty good probe into grammaticality. And given the success of this method, this seems like a pretty good assumption, though there are interesting cases to argue over.
What I think you highlight, which is interesting, is that Chomsky has made a stand against thinking that all forms of unacceptability implicate the grammar. He did this before, of course (think "colorless green…"). But he seems to want to expand this yet more now. I am not really sure where he wants to draw the line and I can imagine that there is no principled line to draw. It's an empirical matter as they say. However, his current work clearly requires that some structures are ungrammatical and some are not. He believes that some unacceptability is not due to ungrammaticality but due to something else, e.g. a gibberish interpretation at the interface. So far this is the colorless-green strategy. Do you think that there is more?
BTW, I never understood why Chomsky thought we needed gradient GRAMMARS. Can you explain why? I can see that the combo of Gs and other things yield gradient judgments. But why gradient grammars?
@Omer: thanks, it seems like we've pinned down our disagreement (or, I guess I should say, the point where our intuitions diverge).
You say, But now, as I said before, there is positive evidence accruing that the realtime procedures respect islands, c-command, etc. (ask some of the other UMD folks commenting on this blog; they know far more about it than I do). And if Chomsky's ontological choices make this comparison necessarily informal, then in my view, that's simply another strike against those ontological choices.
That's one way of looking at it. I should certainly read (more of) the stuff you mention, but when it comes to matters of logic (like the knowledge/competence vs. use/performance distinction) I fail to see how any empirical evidence could bear on it in principle. I mean, if you accept that there's one system, the grammar, that is a purely logical-derivational system and another bunch of systems, those involved in production, that operate in real time, then what does it even mean to say that those systems share certain properties?
Also, Lastly, I think the notion of "a system of competence (accessed by systems of use but distinct from them)" is incoherent if you think the systems of use can operate in realtime but the system of competence cannot (if the operation of SoU involves a step where SoC is accessed, and SoU operates in realtime, then at least that part of SoC that is accessed by SoU must also be able to operate in realtime).
I don't see what's incoherent about the idea that production/perception systems access systems of knowledge. If I play a game and I have its logical structure/rules internalized, I will access that knowledge when I play the game. But that doesn't mean that the logical structure and my actions in playing the game are somehow equivalent -- they're quite distinct, but my behavior isn't random because I can make use of the knowledge I have. And this certainly doesn't imply that the logical structure is somehow instantiated "in real time" (except in the sense of "being there in my head in that moment").
@Dennis: You write, I should certainly read (more of) the stuff you mention, but when it comes to matters of logic (like the knowledge/competence vs. use/performance distinction) I fail to see how any empirical evidence could bear on it in principle. I mean, if you accept that there's one system, the grammar, that is a purely logical-derivational system and another bunch of systems, those involved in production, that operate in real time, then what does it even mean to say that those systems share certain properties?
Again, (a portion of) this logic is exactly what I'm finding fault with. The logic is not a given – it is part of the linguist's hypothesis structure. If this logic puts us in a position where there are certain robust facts about the world (those things you call "informal" similarities) that we will never be in a position to explain, then the logic is the wrong one to pursue (as scientists; it might still be an interesting thought experiment from a philosophical point of view).
If I play a game and I have its logical structure/rules internalized, I will access that knowledge when I play the game. But that doesn't mean that the logical structure and my actions in playing the game are somehow equivalent -- they're quite distinct, but my behavior isn't random because I can make use of the knowledge I have.
That last sentence is what I'm after: if you can make use of the knowledge you have while playing the game, then there must be a model of this knowledge that is implementable in realtime. That doesn't mean that the only way to represent this knowledge is using a realtime implementation; but if someone makes a proposal for the content of that knowledge which is fundamentally at odds with realtime implementation, then we know that that proposal is wrong – since, out in the world, people are "making use of that knowledge" (your words) in realtime.
Thanks for your attempt to clarify. Unfortunately, I'm more confused now than ever. Actually, I don't know if I don't understand or if I just disagree. When Chomsky proposed the notion of bare output conditions being the conditions that language must meet to be useable at all, didn't he mean to claim that the narrow syntax is free to generate whatever it will (this is the sense in which I understand there to be no well-formed formula: in the narrow syntax), and that those generated objets which meet BOC's form grammatical sentences, while those which do not meet BOC's form ungrammatical strings? In other words, I understand the "overgenerated" strings to be those which are generated in the narrow syntax but do not meet BOC's. If this is not the case, then I suppose I'm not sure what the importance of BOC's is anymore. (Just to be clear: when I talk about BOC's, I'm not talking about semantic interpretation; whether an object yields gibberish or some semantically coherent expression can only be evaluated once the object has undergone interpretation in the semantic component proper; but an object must obey BOC's to gain access to the semantic component in the first place.)
Also, I repeat a question of Norbert's: If acceptability and grammaticality are distinct notions, why should the observation of gradience in acceptability judgements lead us to postulate gradient grammars?
As always, I appreciate any insight you can provide.
Dennis said that in [Formal Language Theory], there are no interface systems/conditions. Well, there are.
There's an entire subfield of formal language theory that's concerned with the generative capacity of logically defined constraints. For instance, a string language is definable via constraints stated in monadic second-order logic iff it is recognized by a finite-state string automaton iff it is generated by a regular string grammar iff its Myhill-Nerode partition has finite index.
What FLT shows is that we can switch between all these perspectives as we see fit: features or constraints, grammars or recognizers, unbounded dependencies or local subcategorization. Some are more succinct, some are easier to implement, some highlight connections that are easy to miss otherwise (the logic perspective is very useful for comparing phonology and syntax). But they all have their own unique advantages.
That's why I'm having a hard-time making sense of discussions like this (this = features VS constraints, not grammaticality VS acceptability). It doesn't seem to be a discussion of how one's research interests should inform what kind of technical devices one uses. Instead the sentiment is apparently that feature-based perspectives are fundamentally broken and a switch to constraints would easily solve that.
@Thomas: The issue as I see it is not features versus constraints but which features and which constraints, implement as you will. The original MP conceit was that movement was forced, driven by feature checking (as in MP formalizations). However, there is another idea: movement is "free" and outputs are "acceptable" if they gain interpretation at the CI interface. Doing this means "fitting" with the system of meanings that live at CI. There is a problem with these views: re features, they are too cheap and hence lack explanatory power. Re Bare Output Conditions (CI conditions) we know virtually nothing about them. This also reduces their explanatory efficacy.
So, the problem is not whether to code what we know in terms of features or in terms of conditions, but an admission that we don't know enough to make use of these notions particularly enlightening. And for that there is no formal fix, so far as I can tell.
Thomas: Dennis said that in [Formal Language Theory], there are no interface systems/conditions. Well, there are. There's an entire subfield of formal language theory that's concerned with the generative capacity of logically defined constraints.
Yes, but that's different from interface conditions imposed by biological systems embedding the I-language. There is no reason to assume, as far as I can see, that the latter correspond to logical constraints on formal systems. I remember Chomsky talking about vacuous quantification, which seems to me to be a good illustration of this. It's really hard to assign an interpretation to something like "Who did John kiss Mary?," which in logical/formal terms is unproblematic. But C-I (or whatever you want to call it) doesn't permit it. Or take thematic roles, imposed on interpretation by C-I in a way that has little to do with logical constraints, but presumably with the format of "events" defined by C-I. And so on.
Norbert: movement is "free" and outputs are "acceptable" if they gain interpretation at the CI interface.
I think this is a misleading way of putting it. Rather, expressions have whatever interpretation they have at the interfaces, including deviant and nonsensical interpretations, or perhaps no coherent interpretation in the extreme. This is conceptually quite different from a "filtering" system implementing some notion of "acceptability" in terms of "reject" or "admit."
I replied to this above. Filter here is used very non theoretically; it's whatever it is that explains our data without doing so on the generation side of the grammar. Btw, I've never really understood what the interpretive options are given that Chomsky has been loath to specify what the interpretive system does. But this may not be a fair criticism, as nobody has a good idea about this. However, if I want to say the something converges with a gibberish interpretation rather than not being interpretable at all, it would be nice to have some canonical examples of how this distinction is meant to be taken.
Last point, how do you understand the idea that the grammar is the optimal realization of interface conditions if the later does not in some sense restrict the range of the former?
@Dennis: Yes, but that's different from interface conditions imposed by biological systems embedding the I-language My point is a technical one: Let's assume, as you do, that there's a set C of constraints that hold at the interfaces. As long as these constraints fall within a certain complexity class, they can be automatically translated into syntactic constraints over derivations, which in turn can be encoded in terms of feature dependencies.
This class of feature-reducible constraints includes all the examples you give above. The only constraints in the literature that fall outside are those that invoke identity of meaning, but even there it's not clear-cut (Scope Economy, for instance, is fine if we do not care about actual change of meaning but just about whether the meaning of the sentence could in principle be altered by QR, which is what Fox actually argues for). So overall features and constraints can do the same work, but they do it in different ways and thus one of the two might be more suitable for certain tasks.
At any rate Norbert already provided the money quote: the problem is not whether to code what we know in terms of features or in terms of conditions, but an admission that we don't know enough to make use of these notions particularly enlightening. I find that admission very refreshing, but I have the impression that a fair share of syntactic work nonetheless wrestles with such matters of notation with the agenda to prove one superior.
@Thomas: I think I see what you mean, but in the case of I-language we're dealing, by assumption, with different systems interacting. So it does make a difference *where* you locate the complexity (in the grammar or in the interfacing systems), esp. if you take evolutionary considerations into account. That's not just a matter of notation, although it may be from a purely formal point of view.
So I'm not denying that you could restate everything in terms of features without increasing complexity, but it would still mean putting this stuff into UG rather than into other places that are hopefully in some meaningful sense "independently given." So while I agree with Norbert's assessment of how little we know, I think it's clear which route you want to go *if* you subscribe to the general idea of cutting down UG.
Norbert: Last point, how do you understand the idea that the grammar is the optimal realization of interface conditions if the later does not in some sense restrict the range of the former?
I take "optimal realization" to mean something like the most minimal system that can satisfy interface conditions while being totally blind to them, i.e. generate expressions that end up useable. It may generate all kinds of unusable expressions, but it plainly has to generate those that are usable as well. And free Merge operating over the lexicon will give you an infinity of propositional expressions, but it will need to be supplemented with theories of interface mappings (at least PF) and, eventually, the outside systems accessing the resulting representations.
I think the issue is a methodological one: without a restrictive theory of the interfaces, nearly *anything* (well, except for Merge itself) can be relegated to the interfaces. One can then claim victory (i.e., that a very minimal UG has been achieved), but this move will have taught us very little – I dare say, nothing – about the human capacity for language. We have in effect stuck a "PF" or "LF" sticker on phenomena that still have no explanation.
The price is steep (again, methodologically speaking): what "PF" looks like to modern-day syntacticians makes no sense to any morpho-phonologists to whom I have posed the question; similarly for what many syntacticians take to be "LF" requirements. (This is why I was careful, earlier, to say that agreement cannot be enforced using Bare Output Conditions – if you allow LF to impose the condition "if there is a [plural]-bearing DP within the c-command domain of, and in the same phase as, T, then T must have agreed with that DP", then it certainly can be enforced "at the interfaces.")
I would say that in practice, these 'relegations' to PF/LF often do more to impede research than to foster it.
No disagreement here, and of course I didn't mean to suggest that putting "PF" and "LF" stickers on phenomena is per se an explanation. But I think there are some plausible analyses that go this way, which is encouraging. Just a whole lot of work left to do.
They are great. I hope there will be a discussion about them; there are several technical aspects of them which I do not understand completely.
ReplyDeleteTo give an example, Chomsky exists that it is misleading to draw trees, and he sticks to bracketed structures instead. The reason seems to be that some nodes in the trees can remain unlabeled, but it is completely unclear to me why that cannot be represented in a tree structure. So I think I might be missing something. As a matter of fact, there are various other things which I do not get about the labeling function (such as: why is it there at all?) The unfortunate thing about these videos is that there are very few questions asked by the students, and there is very little discussion. But the main message is brilliant, in my view.
I am delighted to see that Norbert now dedicated a separate post to the papers i had linked to on June 3rd [http://facultyoflanguage.blogspot.de/2014/05/the-gg-game-plato-darwin-and-pos.html]. And hopefully there will also soon be a link to the sold out talk at Olomouc [Czech Republic] on June 5th.
DeleteLike Marc van Oostendorp I did not understand completely some of Chomsky's remarks and am looking forward to the educational discussion about to ensue.
So, I am 32 minutes into lecture 2, and there is discussion of the SMT, so here come my comments on this again. Chomsky discussed the SMT as guiding the choice of the simplest combinatorial operation, i.e., Merge, over some alternative (say, a set of phrase-structure rules). The set is more complicated because there are more rules - more rules being more complex than fewer rules. So this is how I interpret the SMT - more rules is less optimal, longer distance is less optimal than shorter distance, positive number of uninterpretable features is less optimal than zero uninterpretable features, etc.
ReplyDeleteHowever, a grammar with phrase structure rules could very well be more efficient for parsing and production than merge, right? I suppose I don't know off the top of my head what would be better for communication, but it's far from clear. But the SMT provides a clear guide in the sense of the some other notion of "optimal" or "minimal" - fewer rules are better than more rules.
So this is just my discomfort in couching the SMT in externalization terms - it would belie the goals of the Minimalist Program, IMO.
It strikes me that on this view the SMT is less a thesis than a principle of theory evaluation akin to Occam/Okam/Okham's (how is this spelled?) razor. Why? Well, were it a thesis it would not be in terms of "better/worse" but "best." And indeed Chomsky talks like this at other times: "OPTIMAL realization of interface conditions." Now, the interpretation you note cannot be so framed, or it could be but then we know it is false. Why? Because on these criteria the optimal theory would have NO uninterpretatble features and hence NO movement. Or if there were movement that was not feature driven the unit size of the locality domain would be very much smaller than a phase, as phases are, for example, larger than minimal phrases. There are yet more issues: is it fair to conclude from this version that longer derivations are preferred to shorter ones? After all, if moves MUST be short then derivations MUST be long (as Chomsky pointed out long ago). Indeed, longest moves have shortest derivations. So why pick on at the smallest domain as a metric rather than the length of the derivations (here's a suggestion from externalization: memory limitations, but we don't want to consider these right?)? Another question: what do we mean by "more rules"? Is a logical system that uses the Sheffer stroke more optimal than one that uses - and v? Don't we want to leave this kind of issue to our neuro-anatomy friends (you?) rather than decide what circuit units there are? Or, consider symmetric merge operations. We think that the interface, if stated in terms of arguments/predicates or events and their participants is asymmetric: If a is argument of P then P is not argument of a etc. So wouldn't a combinatoric system that fed this system be optimal if it too exploited an asymmetric basic predicate? But then doesn't that make ordered pairs superior to unstructured sets? I could go on giving cases where the apparent dicta of the SMT lead nowhere, but you probably see the point.
DeleteSo, I don't want to give up on the SMT. But I want a version that is a real "thesis" not an ad hoc list of methodological suggestions that generally amount to "do the best you can given the data." If that be the SMT then there is nothing new or interesting about it. The fight is always then what's simplest in the circumstances and here we have lots of candidates always. That means that the SMT has virtually no bite and is then, IMO, of dubious interest. Or, more accurately, it is either clearly false when made mildly precise (e.g. a theory without long distance dependencies would be better than one with them as the shortest dependencies would be the only ones we get (note btw this would be true even if Merge gave us movement "for free")) or it is close to vacuous given that we can get a useful principle to match whatever we find. This does not strike me as a good way to go.
@ Norbert
DeleteRE: Occam. Wikipedia suggests a way out of the dilemma through use of Latin: lex parsimoniae. And my mother always told me that taking Latin in high school would be a waste of time.
As always, my novicehood threatens to betray me here. I think there is a clear interpretation of the SMT given an appropriate definition of optimal within the correctly characterized constraints.
I don't think the optimal theory requires no uninterpretable features, because we are assuming that the computational system is an optimal link between pre-existing systems (certainly the interfaces, and I assume the lexicon). We assume the uninterpretable features are already there, right? So the system just has to deal with it in the most optimal way it can. These assumptions have to be in place, as far as I can tell.
I'm not quite sure about the notion of long and short movement and phases - my expertise may be failing me here. But my understanding is that the phase is just the unit at which structures are evaluated - this is the relevant domain, and the entire derivation really doesn't exist for the syntax, so you can't evaluate at this level. But I'm guessing here, because I really know nothing about phases.
There are similar notions for symmetric Merge - there is a different desideratum here. The operation that is added has to be as simple as possible because of Darwin's problem, right? So we assume the addition of an unordered Merge operation. Assuming this operation, then the question is does it behave optimally in hooking up with the demands of the interfaces.
So I feel like there are just certain assumptions that need to be made regarding the lexicon and the interfaces, and why kind of operation we get. Then, the question is: (now) does it behave optimally? It's like, assuming a prisoner has to escape from a prison, why do they use a spoon instead of a jackhammer? Because they don't get the jackhammer - the only question is, do they dig the shortest tunnel they could dig?
Thx for the spelling help. 'Occam' it is from now on.
DeleteI hesitate to disagree with what you say because I think that some of version of this is true. I guess where I'm feeling lost is how to understand the SMT programmatically. What does it mean to realize the thesis, show that it is true? Now my problem is that there is no problem telling an optimizing story after the fact whatever we discover. Ex ante, however, things are more difficult. So, as I noted, there is tradeoff between the number of operations one applies and the unit size of the computational domain. The smaller the domain the more applications of the operation one needs to, e.g., get form one's initial position to one's ultimate landing site. The longer the step permitted, i.e. the larger the domain, the fewer the required applications of a particular operation. This tradeoff does not seem to have a conceptual resolution. Similarly, how simple Merge is doesn't have one. If the question is interface fit, then a merge that is asymmetric fits better with the predicates at the CI (and likely the AP, though as Chomsky does, let's put it aside) interface. Why? Because the predicates required for CIing are not symmetric so having a symmetric predicate makes things more complex. You reply well we are stuck with Darwin's Problem. But then part of what is complex or not depends on what was there to begin with: do we have any combinatoric operations that tweaked the right way would serve or do we need Merge full blown and that's the best option? Don't know, but Chomsky talks as if this is not the right way to think of these issues. Why not?
So, yes, things are complicated. It's optimality relative to x,y,z. Fine, what are x,y,z? Until we know what we are optimizing with respect to we don't really have a working notion and so no "thesis." And that's why I've shied away from this interpretation of the SMT.
Fair enough, but I think it will be difficult to have these parameters set in advance. I think part of the SMT is also reconsidering these parameters as you go, e.g., whether certain features are weak or strong.
DeleteA side note on Merge. I think Merge has to meet two conditions: (1) the empirical condition of combinatorics and (2) the evolutionary condition of recent, sudden, and unchanged emergence. So the simplicity of Merge should not be assessed with how it plays with the interfaces aside from the fact that it meets empirical adequacy of combinatorics. I think what Chomsky goes for is some kind of mathematical notion of simplicity here, which would eventually have to be cashed out in genetic and neuronal notions (as you mention above) rather than the other notions of simplicity e.g., economy conditions.
"Chomsky goes for is some kind of mathematical notion of simplicity here"
DeleteYes. If there is a hunch here, it's that this conceptual simplicity will match some relevant evo notion. Right now, I'm skeptical of this move. But Chomsky's hunches have worked out before, so...
Yes, I am skeptical too. But from what I can tell it seems to work surprisingly well if it were on the wrong track. I don't know when we'll know enough about genes and neurons to give a good answer.
DeleteI hesitate to make my first chime-in on the blog be tinged with self-promotion, but: I have argued (here, here) that narrow syntax may indeed *not* have anything resembling uninterpretable features (at least, not in the sense of "features that will lead to a crash if not checked"). Certainly, such features don't seem to be involved in what we pre-theoretically refer to as agreement.
ReplyDeleteWhat is the relevance of this? Well, Norbert says above that a view of syntax as an "optimal realization of interface conditions" would entail a syntax without uninterpretable features. I think there is good reason to think that this is exactly the syntax we observe. Now: both agreement and movement (or, if Norbert is right, the single operation that underlies them both) are certainly feature *driven* – i.e., they occur only when certain features are present in the derivation – and what I have said here sheds no particular light on why those features would exist in narrow syntax. But I do think it is interesting to note that, if I am right, there are no features in narrow syntax that the interfaces cannot deal with (i.e., the kind that would cause a *crash* were they to arrive at the interfaces untouched). For those who wonder how syntax could be "optimal" while also containing crash-inducing features, this would perhaps be a modest step forward.
I agree that this is interesting. However, as you note all current stories rely on features to drive operations. Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone? Not obviously. Is this better than theories where there are no such features at all and so no movement to speak of? Not clear, at least to me. How much do we know about the interface at all such that movement is required? Chomsky suggests that we know that there is duality of interpretation. But is this distinction something that CI objects/structures had pre-linguistically? Were they there with this duality BEFORE FL arose? I really cannot say. IMO, we know so little about the CI interface that worrying about the fit between it and products of FL hardly restrains speculation at all. And that's too bad.
DeleteYou ask: Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone?
DeleteIt may or not be *conceptually* better, but it is *empirically* better (in particular, Bare Output Conditions can't adequately handle agreement; that is the crux of the work I cited above). So, this is the (a? my?) hope: that some of these issues that are murky when addressed on the conceptual playing field can be resolved on the empirical one.
Yes, but the SMT if a thesis at all is conceptually driven. One way of reading your results is that the SMT is just wrong. Question: is there a version that may be right and has some teeth?
DeleteI think there is, and it has to do with the "transparent use" version of the SMT that you, Norbert, have talked about on the blog before. (Though, in fairness, one could reasonably accuse me here of the very same post-hoc fitting of the notion optimal that was alluded to above...)
DeleteThe Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions. If the grammar is supposed to be put to use by realtime systems of production and comprehension, then I think it would be preferable to have what Frampton & Gutmann have called a crash-proof grammar: one that determines, for every derivational state, a finite (possibly singleton) set of possible next steps, each of which is guaranteed to lead to a well-formed outcome.
So if production and comprehension work as efficiently as they do because of properties of the grammar, crash-proofness strikes me as an excellent candidate property to consider. And the Bare Output Conditions grammar lacks it.
(NB: I have phrased this in computationally naive terms. This is partially due to a lack of expertise on my part, but also because I am not convinced that classic computational complexity theory is the only game in town when it comes to evaluating the efficiency of a grammar. Suppose we built a parsing algorithm underpinned by a crash-proof grammar, and it performed at an average-case complexity of O(f(n)); but that someone clever figured out that you can swap in a Bare Output Condition grammar underneath, and rule out whole classes of non-convergent derivations very efficiently so that you still maintained an average-case complexity of O(f(n)). Would this mean that the apparent computational advantage of crash-proofness has dissolved? Not necessarily. When it comes to implementing a realtime algorithm using finite neural machinery, constants may very well matter. And crash-proof grammars keep the number of possible derivations you need to entertain to a minimum.)
Norbert wrote: However, as you note all current stories rely on features to drive operations. Is this conceptually better than operations free applying and being filtered out by Bare Output Conditions alone? Not obviously. Is this better than theories where there are no such features at all and so no movement to speak of? Not clear, at least to me.
ReplyDeleteIt's not true that "all current stories" rely on formal features as triggers of syntactic computation; see e.g. work by Reinhart, Fox, Moro and others. It's quite amazing, in my view, that the featural-triggers hypothesis is still so popular, given that (again, in my view) it has had virtually no explanatory success in any domain of syntax. As Chomsky, Fanselow and others have pointed out repeatedly, in almost all cases featural triggers are totally ad hoc, without independent motivation: saying that XP undergoes some operation O because XP has an O-feature is nothing but a restatement of the fact. And even in the one case where there is some initial plausibility for a featural trigger for displacement (wh-movement), there is interesting work suggesting that features aren't needed.
It's particularly surprising to me that so many "Chomskyans," like Norbert, are still adhering to the feature-driven world view, given that Chomsky himself has clearly been moving away from this at least since the 2004 paper, advocating a free-Merge model instead (although I admit he's been a bit ambivalent about this). The motivation is clear: what's done at the interfaces doesn't need to be redundantly replicated in syntax, and so interface-based explanations, at least in principle, shift the explanatory burden away from UG. And while most people have been happy to continue working with featural descriptions rather than more principled notions, there is some very intersting work, as I indicated above (and also Omer's work), that shows that this alternative can be fruitfully pursued.
Note also that the featural-trigger hypothesis has serious conceptual problems: the principle of Full Interpretation has, to my knowledge, never really been justified conceptually; to mark valued uFs as "deleted" you need to introduce a diacritic (masked by its "strike-through" notation, but still a diacritic), hence violate Inclusiveness; you need to somehow assign non-intrinsic features to XPs in the course of the derivation (or in the numeration, but that too is without independent motivation); etc.
I find lots to agree with here. Feature driven theories have, IMO, all the problems that you note (BTW, I am not a fan of these, nor have I ever been, but that's secondary). However, I think we judge the current state of the art differently. From what I can see, feature driven I-merge is the name of the game, at least in the bulk of work that travels under the Minimalist banner. There is lots of probing and goaling and none of this makes sense, at least to me, in the absence of features. From what I can gather, Chomsky still likes this way of putting things, which maybe is what you mean by "ambivalent." So, let's agree, the fewer features the better and better still BOCs that determine which generable objects survive and get used.
DeleteHere's where I think we might differ: to date I see almost nothing that tells us anything about BOCs. You note the principle of full interpretation is unprincipled. Well, can you think of any useful BOC at all if even that one is gone? (You mention Fox and Reinhart: but if look-ahead bothers you (and it does me) then comparing derivations wrt interpretations is anything but efficient). We know so little about features of the CI interface that invoking it as an explanatory construct is just as bad (maybe worse) than a general reach to features. At least with features we might have some morphological guidance.
So: couldn't agree more re feature abuse and its general non-explanatory nature. But, though the underlying conception outlined at the start of MP which distributes explanation between BOCs and Efficient computation is a nice picture, the basic results, from where I sit, have come from considering the computational properties of grammars NOT how they map to the interfaces, CI especially. This is WHY people reach for features: there is nothing else productive on offer.
Norbert, I'll respond later. Haven't missed your reply.
DeleteNorbert: sorry for the delay, and also sorry for mistakenly placing you in the featural-triggers camp; at least in the 2005 textbook you seemed to me to be subscribing to this general world view, but perhaps this was in part for pedagogical reasons and/or doesn't reflect your current views.
DeleteYou say, There is lots of probing and goaling and none of this makes sense, at least to me, in the absence of features. From what I can gather, Chomsky still likes this way of putting things, which maybe is what you mean by "ambivalent."
That is indeed what I meant, but I think in Chomsky's case the main reason why there's still lots of probing and goaling is that he sticks to classical Case Theory. Drop that and the role of features is diminished significantly. And note his use of features in the lectures you posted: they determine to some extent where some element can or cannot end up, but they don't really trigger anything. His discussion of the "halting problem" (*Which dog do you wonder John likes t?) was most explicit on this: instead of adopting Rizzi's solution he argues that you can permit "illicit" movement in such a case, since it won't yield a coherent interpretation. The features are implicated in that they're relevant to labeling, but they don't trigger or block anything. I'm not saying that's the correct or ultimate explanation, but the general spirit seems to me to point in the right direction. And note that if something like this is correct you *want* the syntax to be able to carry out the (eventually) illicit operation, for otherwise you'd be replicating in the syntax what's independently done at the interface.
You say, Here's where I think we might differ: to date I see almost nothing that tells us anything about BOCs. You note the principle of full interpretation is unprincipled. Well, can you think of any useful BOC at all if even that one is gone? (You mention Fox and Reinhart: but if look-ahead bothers you (and it does me) then comparing derivations wrt interpretations is anything but efficient). We know so little about features of the CI interface that invoking it as an explanatory construct is just as bad (maybe worse) than a general reach to features.
DeleteI don't think this is a fair criticism: there is indeed very little work trying to pin down the precise nature of interface conditions, but I think the reason for this is precisely that most people in the field have been and continue to be too content with rather shallow "explanations" in terms of unmotivated features (sometimes supplemented with a caveat that the features are really "shorthand" for something else, but this just begs the question). Speaking from personal experience, I have been urged more than once now by reviewers to patch up some open issue in an analysis with an invented, arbitrary feature rather than simply leaving it open. The descriptive urge is strong, and I think also drives the whole cartographic movement, which continues to be much more popular than the alternative approaches I adumbrated.
I agree with you that the Fox-Reinhart take on QR has issues concerning look-ahead (if I remember correctly, Fox explicitly admits and acknowledges this). But this is not necessarily true of interface-based explanations per se. Take for instance the Moro-Chomsky approach to {XP,YP} structures requiring movement to be linearizable/labelable; this requires no look-ahead, but it does mean that there will be failing derivations (those in which no symmetry-breaking movement applies, at whatever level this is detected). I don't know how plausible this is overall, but even Chomsky's sketch of an explanation of successive-cyclic movement in these terms strikes me as more principled than any account I've seen that relies on featural triggers for intermediate movement steps. There are also theories of scrambling that don't assume "scrambling features" and the like, but instead free movements whose information-structural effects are determined by mapping rules (Neeleman & van de Koot). I understand that in Tromso-style nanosyntax, certain movements serve to derive lexicalizable subtrees; but those movements are necessarily blind to such needs arising "later on," so they end up being obligatory but crucially without any look-ahead (Chomsky has a really good discussion of why optional operations should not be misunderstood as applying teleologically in "Derivation by phase"). All you need to accept is that there are failing derivations, which -- it seems to me -- is entirely unproblematic, once misunderstandings concerning competence/performance and acceptability/grammaticality that we've addressed here are cleared up.
So I agree that we know little about interface conditions, but I fail to see how that discredits the approach. I also don't see what "basic results" have emerged from the feature-driven framework, unless by that you mean (undoubtedly important) empirical generalizations. I have yet to see a case where that framework provides a genuine explanation for a real problem, rather than just a puzzle disguised as an answer.
I am not saying it discredits it. I am saying that there is no real approach there yet. There will be one when we know something about CI and what demands it makes on derivations.
DeleteBtw, I use 'filter' in a loose sense. It explains some data we are interested in without doing so by restricting generation.
Last point: just to touch base on Chomsky's current view: my problem with it is that it makes an ad hoc assumption about labels/features and CI requirements. I do not see why agreement is required for interface reasons. Do we really need to know that a sentence is a question in addition to knowing that a certain operator is WH? What of agreement? Is this required for interpretation? Moreover, this approach to successive cyclicity, at least to me, has all the virtues and vices of Boskovic's story. So until I hear of some independent evidence for Chomsky's interface condition that forces agreement on pain of CI uninterpretability (if that's the real proble, I'm never sure here as Chomsky doesn't say what goes awry if the agreement fails) then I am not going to be more impressed with this sort of story than one that just keeps adopting new features. Indeed, I don't see how unmotivated interface conditions is any better or worse )or different) than unmotivated features. That's my problem. It's not conceptual at all.
This comment has been removed by the author.
DeleteIndeed, I don't see how unmotivated interface conditions is any better or worse )or different) than unmotivated features. That's my problem.
DeleteNo disagreement here. My feeling though is that stories based on featural triggers for Merge are rarely insightful, and I find it hard to see how they could be. (I'm not talking about theories of agreement or other operations where features are uncontroversially involved, I'm referring only to "triggered Merge" models that take operations like structure-building, deletion etc. to be contingent on formal features.) By contrast, the little bit of work there is that explores alternative notions of "trigger" (in terms of interface effects) looks much more promising to me, although naturally lots of problems remain.
But I think the choice between "triggered Merge" theories and interface-based theories is not just a matter of taste. At least implicitly, "triggered Merge" approaches often rest on the assumption that the goal of the theory is to model a "crash-proof" system, an idea that I believe rests on the mistaken equation of grammaticality and "acceptability." In interface-based theories considerations of acceptability take a back seat, shifting the focus of investigation to the question of what consequences syntactic operations have when it comes to interpretation and externalization of the resulting structure. So while I agree with you that either approach must be evaluated a posteriori based on its merits, I think that the two approaches differ, a priori, in terms of what they take the theory to be a theory of. (As always, the truth may well lie somewhere inbetween. There's an interesting paper by Biberauer & Richards that proposes a model in which obligatory operations are triggered by features whereas others apply freely, the latter licensed indirectly by their effect at the interfaces.)
As for successive-cyclic movement, I was merely refering to the general idea that you're always free to move to the edge, but if you don't, you're stuck; what Chomsky adds is that you can't stay in a non-final edge, since that will mean that the higher predicate's complement is an unidentifiable {XP,YP} structure. Requiring labels for purposes of ensuring locality of selection is the one place where they make some sense to me, intuitively at least. I'm curious, what "virtues and vices of Boskovic's story" do you have in mind?
@Dennis
DeleteB's theory requires movement until features of wh are discharged. Then it can move no more. Chomsky's theory is that Wh moves until criterial agreement occurs and then there is not more movement. This seems to be the very same idea, one feature based, one BOC based. The feature story is unmotivated. But so far as I can tell so is the BOC account as it relies on a very strong (and unmotivated) assumption about clause typing being required for CI interpretation. Moreover, the freezing of the hw after criteria have been checked is based on considerations having to do with interpret ability that I frankly do not understand. So, where the two theories make claims, they seem to be more or less the Sam claim. Both theories, of course, have problem coping with all the variation we find in wh movement
In cases of multiple interrogation. The variation is very hard to model given either assumption.
So, is one better than another. Right now, they are pretty interchangeable. This said, I agree that WERE one able to find defensible, non trivial BOCs that would be very nice. But then were one able to find non trivial defensible features, that would be too. At the level of abstraction we are discussing things, I think the biggest problem with both views is how little the assumptions made have any independent plausibility.
Norbert: Thanks for clarifying, your comparison of B's and C's stories is helpful. Seems to me that C's story relies on the premise that you want the interrogative clause to be locally selectable, hence you need to identify it by labeling. I agree this is a strong assumption, but one that strikes me as prima facie way more plausible than distributing vacuous intermediate movement triggers. But I guess this is where it comes down to theoretical intuitions, and the account would need to be worked out much more.
DeleteHowever, for the sake of the argument let's assume C's and B's theories have the same "empirical coverage." Then don't you think that C's story is still preferable, since it implies no enrichment of UG? The assumption is that clause-typing/selection are conditions imposed by C-I, so the syntax need not know anything about them. But B's model needs a syntax that is sensitive to and constrained by trigger features, deviating from simplest (= free) Merge. This is where I see the general conceptual advantage of interface-based explanations, although of course at the end of the day you want them in turn to be grounded in theories of the interfacing systems. And I think it would be premature to dismiss such approaches just based on the fact that we do not yet have those theories in place.
Something that also strikes me as relevant to this whole issue was mentioned by Omer: The Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions.
ReplyDeleteIf you follow Chomsky and drop the idea that there is a significant notion of "well-formed formula" for natural language, then the term "overgeneration" has no real meaning. There's nothing incoherent about the idea that the grammar itself licenses all kinds of expressions along all dimensions of "acceptability," deviance, usefulness, etc. In fact, we know that acceptability and grammaticality cannot (and should not) be directly correlated, although many people seem to be assuming essentially this -- a misunderstanding, I think, rooted in analogies to formal-language theory in early GG (note that in FLT, there are no interface systems/conditions).
So once you drop that, "overgeneration" and conversely "crash-proof (grammar)" all become pretty much meaningless notions, or at least I don't see what they would mean, unless you either mistakenly equate acceptability (or something like that) with grammaticality or else give up on GG's fundamental tenet that competence can/must be meaningfully studied in total abstraction from real-time processes. Otherwise there's no problem whatsoever with having a grammar that yields all kinds of expressons, only a subset of which is usable by interfacing systems -- in fact, I think, it's the most desirable outcome, since it would allow you to get away with a maximally simple core-computation system, in the extreme.
If "sticking close to the SMT" means anything for actual linguistic research, this seems to me to be the guideline: try to show that Merge applies freely, and ascribe as much as the complexity we find beyond this (i.e., basically all the complexity) to the independently given interface systems. And again, I think the only reason this perspective is often frowned upon is the mistaken idea that acceptability = grammaticality, and therefore an efficient grammar should be "crash-proof." As I said before, Chomsky has made this point repeatedly, but it seems that it hasn't really had an impact.
The idea of severing grammaticality from acceptability has indeed not had an impact – at least, not on me – and I think for good reason. It's not because I think the idea of severing the two is, in principle, incoherent (on the contrary; see below). It's because I think that at the moment, it is a methodological dead-end. I assume as a methodological heuristic that acceptability = grammaticality, and frankly I have no idea how to do linguistics (or at least, syntax) without this assumption.
DeleteThings would be different if we had good independent evidence (by which I mean, independent of language) for what these "interface systems" are like and what their effects were on acceptability. But absent that, severing acceptability from grammaticality is just like saying "oh, sentence X is acceptable/unacceptable because of processing" without a theory of processing; it's a methodological trash-bin for things that one's theory of grammar cannot explain.
So I think, Dennis, that the two of us agree in principle that grammaticality and acceptability are not the same thing. Where I differ is that I think there is nothing incoherent about practicing linguistics by heuristically equating the two; and, moreover, I think it's methodologically unsound to sever the two without an explicit theory of acceptability – or, at least, an explicit procedure to identify when something is an "acceptability" effect vs. a "grammaticality" effect.
Until Chomsky or anyone else comes up with such a theory or procedure, I will continue my methodologically-motivated idealization of "acceptability = grammaticality" (just one of many idealizations that we, just like any other scientists, routinely make). And since Chomsky's current proposal does not come with such a theory or procedure (from what I can tell), I will continue to assert that his model of grammar is unsuited for realtime use – or at least, not as well-suited as a crash-proof grammar would be.
I have lots of sympathy with Omer's methodological point. One thing I believe that we should guard against is throwing out 60 years of work on FL in order to advance minimalist notions. For me, if MP doesn't account for the kinds of generalizations we have found over 60 years of work (it need not be ALL, but a good chunk of these), then so much the worse for MP. These are the empirical benchmarks of success, at least for me.
DeleteLast point: I agree that your version of the SMT is a prevalent one. And that's my problem with it. It's not a thesis at all, not even an inchoate one, until one specifies what the CI interface is (what's in it, e.g. how many cognitive modules does FL interact with?), what properties it has (what properties do these modules have?), and how grammatical objects get mapped to these. Only when this is done do we have a thesis rather than a feel good slogan. And, that's why I like other versions of the SMT more: they provide broad programs for investigation, one's that I can imagine being fruitfully pursued. They may be wrong, in the end, but they point in clearish research directions, unlike many of the versions of the SMT that I am familiar with. So a question: why are parsers, visual system, learners NOT considered part of the interfaces that FL interacts with? Why should we not consdier how FL fits with these? Or, put another way: what interface modules "count" as relevant to SMT and what not?.
Last point: can you point to examples where the version of the SMT you outline is displayed? I don't mean mentioned in passing, but cases where GIVEN what is a plausible property of CI (one with some independent support (e.g. not Full Interpretation given your earlier remarks) serves to interestingly filter Gish products? It would be good to have a couple close to hand to massage intuitions. If you want to elaborate on this in a bigger setting, e.g. a post or two, feel free to send me some stuff and I will put it up. It would be a very useful service.
Omer, I agree with you to an extent that assuming a correlation between acceptability and grammaticality has been and continues to be an attractive heuristic premise. But note that we're placing a bet here, really, as it is in no way a priori given that there is *any* correlation between the two. We just hope that there is, for otherwise it's less clear what our empirical evidence would be (but see below). But I think we agree that the two notions are logically distinct.
DeleteHowever, I don't think that severing the two methodologically is contingent on having a theory of acceptability. That's a bit like asking for a theory of E-language, I think, in the sense that neither term is meant to denote a real theoretical category. Note that our notion of "acceptability" is entirely informal (and this is true whether or not we let people rate sentences on scales); all it means is that people find certain kinds of sentences funny, and evidently bazillions of cognitive factors enter into those judgments. By contrast, grammaticality (in the FLT sense) is a strictly theoretical/technical notion and well-defined as such.
I think it's in part for this reason that we shouldn't be misled into thinking that our theory is about acceptability. Again, pretending that it is has been a somewhat useful heuristic, but it is at points like the one we're discussing here that it can become seriously misleading, in my view. So I don't think "modeling (un)acceptability" is what we want our theory to do; rather, we want it to be a true theory of possible sound-meaning pairings, including those that are "deviant" or "unacceptable" -- the question of why some of those pairings strike us as deviant, register-dependent, word salad, etc. is a separate, secondary one. (This, I believe, is what Chomsky means when he says that we *want* the system to be such that it generates "deviant" expressions.) As I said, I think the field's obsession with acceptability has been useful to some extent, but it can become seriously misleading when this is taken to be anything but a heuristic simplifcation based on a non-trivial bet. (I've commented on this blog before that it is also misleading when acceptability is the basis of putative "empirical generalizations," for instance concerning islands. All we know in this and other cases is that sentences of certain kinds strike people as odd -- which presumably a meaningful fact, but one that in and of itself tells us nothing about whether or not it has any relation to the theory of I-language.)
As for your last paragraph: I frankly don't understand what it means for a competence model to be "unsuited for realtime use," given that the system is by definition not one that operates in real time. And this logical point aside, we know that grammar cares very little about real-time use, just think the standard examples of multiple center embedding, etc. So I wouldn't want to build any argument on this kind of reasoning.
Indeed, grammar may not care about realtime use; but realtime use seems to care about the grammar. (See the work previously cited by Norbert on this blog, e.g. work showing that parsing respects c-command for antecedent resolution; that it respects islands for filled gap effects; etc.)
DeleteSo here is a choice point: either there is a second system, G', that is usable in realtime and mimics (to a non-trivial degree) the effects of the grammar G; or the two are one and the same. One option is to maintain that G and G' are distinct, in which case the burden is to explain why their effects are so darn similar. The other option is to accept that G can be used in realtime, in which case the price is that we can no longer use the refrain you invoked to free ourselves entirely of considerations of realtime computation. I choose option two.
@Dennis
Delete"So I don't think "modeling (un)acceptability" is what we want our theory to do; rather, we want it to be a true theory of possible sound-meaning pairings, including those that are "deviant" or "unacceptable" -- the question of why some of those pairings strike us as deviant, register-dependent, word salad, etc. is a separate, secondary one."
I think that this is right and important: our theories are not ABOUT acceptability or even popsicle sound-meaning pairs. Our theories are ABOUT the structure of FL. That said, looking at relative acceptability of pairs has proven to be an excellent probe into FL. Why? Well one reason is that we believe that FL generates an infinite number of such pairs and so looking at their properties is a reasonable thing to do. Do we have a guarantee that this will work well into the future? No we do not. But that it has proven very effective till now suggests that it is a very fruitful method of investigating FL, somewhat surprisingly so given what a crude probe it really is.
Why do I say that this has been successful? Well, as in all things methodological, the proof of the pudding is in the eating. I judge our discoveries to data to be very non-trivial and this suggests that the tools used to make these discoveries have a lot going for them. So I think that these tools are the last word? No, I suspect that these methods need supplementation at times and that there are some questions for which these methods will prove nugatory. But that's the way of all things scientific.
I suspect that you agree with this. As you say: "I agree with you to an extent that assuming a correlation between acceptability and grammaticality has been and continues to be an attractive heuristic premise. But note that we're placing a bet here, really, as it is in no way a priori given that there is *any* correlation between the two."
I agree entirely, with the caveat that this is a bet with a track record. And what else do we really have to go on? When Chomsky, for example, notes that reducing redundancy is a good strategy in theorizing he notes that this has worked before, so too looking for simple theories etc. I agree with him about this. But there is no guarantee that this should be so. Similarly with acceptability judgements (under interpretations). No guarantee, but one hell of a track record.
Omer: On a sociological note, I am not sure generative syntacticians use the heuristic of acceptability = grammaticality. There is clearly a whole lot of experience and intuition (reasonably so, IMO) used in selecting what is relevant data to try and explain through competence mechanisms.
DeleteFor example, acceptability judgements are fine-grained and are surely modulated by sentence complexity. But, people aren't trying to explain this through competence mechanisms/principles in generative syntax. Instead it is more usual to try and pass it off to a less than explicit memory-based explanation. Another set of popular examples is that of "escher" sentences and agreement attraction errors. In these cases, the intuition is the opposite, they are deemed acceptable by many, and again a lot of people (reasonably so, again) try to outsource these problems to less-than specific performance factors. I don't see anything wrong with this methodologically. But, of course, I would say we should be keeping track of all these things in case later we find additional generalizations worth making. We don't want to exclude too much and make our competence theories almost trivial.
This of course raises the obvious question that worries you. So, how do you go about separating the two?
"I think it's methodologically unsound to sever the two without an explicit theory of acceptability – or, at least, an explicit procedure to identify when something is an "acceptability" effect vs. a "grammaticality" effect."
And the only honest answer that I can think of is that there is no clear-cut separation. Researchers necessarily have to use their noodle, their experience and intuitions in trying to slice the pie, and the only possible evidence that the pie has been cut the right way is the the utility of that particular way of slicing the pie for future research. I think asking for an explicit procedure to separate acceptability from grammaticality is a bit naive, and will, IMO, result in stifling scientific innovation.
In fact, my bet is everyone, including you, applies similar intuition-based criteria in separating grammaticality from acceptability. At best, your pie slices might be different from others.
@Confused: I'm not sure I see the distinction you're making.
DeleteLet's take agreement attraction as an example. I think most people working on this phenomenon have at least some proposal in mind for when (i.e., under which structural and/or linear conditions) it takes place. That kind of proposal, even if it is just a set of working assumptions, qualifies as far as I'm concerned as a "procedure to identify when something is an 'acceptability' effect vs. a 'grammaticality' effect" (in the domain of agreement). While I don't know who is behind the Confused Academic moniker, I have a sneaking suspicion that you would agree that this is categorically different from discounting data points as "unacceptable but grammatical" or "ungrammatical but acceptable" on an ad hoc basis.
(Similarly, I could imagine someone saying that an "Escher sentence" is one where the acceptability rating shows higher-than-usual sensitivity to the amount of time the subject has to judge the sentence. I don't know if this particular criterion would work, but it's the kind of thing I have in mind.)
As for the biographical component, regarding my own work: yes and no. I've worked on some less-studied languages lately, at least in part because there's more to be discovered there even while looking at only monoclausal or biclausal sentences – which I feel are easier for speakers to judge. Now you could say that this is just another way of slicing the acceptability-grammaticality pie: "short > long" / "monoclausal > biclausal > ..." / something like that. This wouldn't be wrong. But if a quadruple-embedded example in Kaqchikel falsified my favorite theory of Kaqchikel agreement, I'm not sure I would feel more comfortable saying "that's an acceptability effect!" than I would just saying "hmm, that is something that my current theory does not explain."
@Omer: I think this points out exactly what I was trying to say.
Delete"But if a quadruple-embedded example in Kaqchikel falsified my favorite theory of Kaqchikel agreement, I'm not sure I would feel more comfortable saying "that's an acceptability effect!" than I would just saying "hmm, that is something that my current theory does not explain"
But, right in this, one shows that there is no reason to obviously think that the quadruple embedding is a grammaticality issue either. But, if acceptability = grammaticality were a heuristic, and if one needs to treat everything as an issue of grammaticality when there is no explicit justification of the partition, then this is a rather weird thing to do. As you rightly note (IMO), the very idea of cutting the pie into short and long betrays a subtle hope perhaps that the short effects might extend to longer effects, which in turn suggests a certain, vague but useful, understanding of how grammaticality is divorced from performance mechanisms.
The same is true of agreement attraction errors. The very fact that these have been termed as such suggests a certain partitioning of the pie. Note, the correct analyses of these errors might indeed involve grammatical constructs, but the fact that a lot of productive work on agreement has happened while systematically ignoring these as "errors" suggests a certain rightness to that way of thinking, namely, the somewhat arbitrary partitioning of the relevant data to grammatical vs. acceptable in the name of progress.
Omer, you wrote: So here is a choice point: either there is a second system, G', that is usable in realtime and mimics (to a non-trivial degree) the effects of the grammar G; or the two are one and the same. One option is to maintain that G and G' are distinct, in which case the burden is to explain why their effects are so darn similar. The other option is to accept that G can be used in realtime, in which case the price is that we can no longer use the refrain you invoked to free ourselves entirely of considerations of realtime computation. I choose option two.
DeleteIf option two just means acknowledging that when engaging in (linguistic) behavior we're putting to use our (linguistic) knowledge/competence, then I agree; the two are trivially related in this sense. But the systems must be fundamentally different, as a matter of logic, if we want to maintain that the grammar is a mental representation of the speaker's linguistic knowledge, whereas other processes operate in real time to algorithmically assign meanings to sounds based on what the grammar identifies as licit sound-meaning pairs. I'm not sure what you mean when you say that production/comprehension "mimics (to a non-trivial degree) the effects of the grammar" or that "their effects are so darn similar;" the crucial point to me is that no such comparison can be more than informal, given that "operations" in the grammar have no procedural dimension (just like steps in a proof; I think Chomsky has used this analogy), whereas production/comprehension systems are necessarily procedural. So if option two means likening real-time processes to purely logical operations (steps in a derivation or whatever), then I don't see how this could possibly be stated coherently, without conflating logically distinct dimensions. And consequently, there's no burden attached to distinguishing the systems, since it's a matter of necessity.
The Atlantic had an interesting interview with Chomsky a while ago (it's online), where at some point he says that I-language "has no algorithm," it's just abstract computation, so it's incoherent to impute to it a procedural/algorithmic interpretation. Interestingly, a recent paper by Sprouse & Lau ("Syntax and the Brain") explicitly denies this right at the outset, stating that the processor is simply I-language at the algorithmic level. So Sprouse & Lau in effect view I-language as an input-output system whereas Chomsky takes it to be a system of competence (accessed by systems of use but distinct from them), and this seems to be just our disagreement here.
Sorry, Norbert, I missed your earlier comment. You wrote, Last point: I agree that your version of the SMT is a prevalent one. And that's my problem with it. It's not a thesis at all, not even an inchoate one, until one specifies what the CI interface is (what's in it, e.g. how many cognitive modules does FL interact with?), what properties it has (what properties do these modules have?), and how grammatical objects get mapped to these. Only when this is done do we have a thesis rather than a feel good slogan. And, that's why I like other versions of the SMT more: they provide broad programs for investigation, one's that I can imagine being fruitfully pursued. They may be wrong, in the end, but they point in clearish research directions, unlike many of the versions of the SMT that I am familiar with. So a question: why are parsers, visual system, learners NOT considered part of the interfaces that FL interacts with? Why should we not consdier how FL fits with these? Or, put another way: what interface modules "count" as relevant to SMT and what not?
DeleteYou may interpret this as dodging the question, but here I'm with Chomsky: we have to find out what the interfacing systems are and what constraints they impose as we proceed. But it's not like we have no idea what their effects are: we have things like Binding Theory, Theta Theory, etc. after all. And I'm with you that we should take the basic generalizations on which these theories rest seriously and to be too good to be entirely false (although our optimism is orthogonal to the issue). The task, as I see it, is to refine and restate, in a more principled fashion, these putative "modules of grammar" in terms of interface requirements, kind of like what Chomsky & Lasnik tried to do for Binding Theory in their 1993 paper. How is this a less clear or coherent research guideline than the more traditional one that seeks capture the complexity in terms of syntax-internal constraints, as you imply?
Incidentally, I don't think this is the prevalent interpretation of SMT at all, at least not in practice. There's very little work, as far as I'm aware at least, that actually tries to do this.
@Dennis: Yes, you've zeroed in on it. I indeed reject Chomsky's views on this matter (e.g. what you quoted/paraphrased from the Atlantic). This used to be motivated, for me, for the usual scientific-method reasons (i.e., if realtime systems and the "competence grammar" actually do share some (or all) of their subsystems, you'd never discover it if you started out with the assumption that they're separate, e.g., "it's incoherent to impute to [the grammar] a procedural/algorithmic interpretation").
DeleteBut now, as I said before, there is positive evidence accruing that the realtime procedures respect islands, c-command, etc. (ask some of the other UMD folks commenting on this blog; they know far more about it than I do). And if Chomsky's ontological choices make this comparison necessarily informal, then in my view, that's simply another strike against those ontological choices.
Lastly, I think the notion of "a system of competence (accessed by systems of use but distinct from them)" is incoherent if you think the systems of use can operate in realtime but the system of competence cannot (if the operation of SoU involves a step where SoC is accessed, and SoU operates in realtime, then at least that part of SoC that is accessed by SoU must also be able to operate in realtime).
@dennis
DeleteHmm. If we are talking about bets here, we migh be putting our money in different places. I personally think that we will get a lot more from considering how the computational system works than thinking about how the interfaces interpret. I am also not particularly impressed with Chomsky's treatment of binding or theta theory. The latter, to the degree it says anything about CI amounts to a restatement of the principle of full interpretation, which, as I recall, you're not impressed with. As for binding theory, virtually none if it's properties follow from anything Chomsky has said. Whey the locality and hierarchy conditions? He says that this is natural, but come on, really? Nowadays he ties locality and hierarchy to probe goal dependencies between antecedent relations to heads they agree with and which in turn probe anaphors. But this doesn't explain anything. It just restates the facts. I could go on, but the real point is that we need concrete examples that do real work before deciding on how reasonable this approach is. I'm waiting for affix hopping! Till I get this, I'll stick to my view that the idea is coherent but so under specified as to be currently little more than a poetic hint of something that MIGHT be worthwhile if anyone figures out how to make it concrete.
Dennis,
DeleteTo Omer's statement that "The Bare Output Conditions model requires (potentially massive) overgeneration of possible derivations, followed by filtration of those derivations whose outcomes do not meet the relevant conditions", you responded that "If you follow Chomsky and drop the idea that there is a significant notion of "well-formed formula" for natural language, then the term "overgeneration" has no real meaning" and that ""overgeneration" and conversely "crash-proof (grammar)" all become pretty much meaningless notions". I wonder if you could elaborate a bit on this. I'm not sure I understand how "overgeneration" becomes meaningless.
Brandon
Brandon (sorry for the delay): well, in formal languages where you stipulate the syntax, you have well-formed formulae (anything that abides by the syntactic rules you stipulated) and everything else (those formulae that don't). So that's a straightforward [+/-grammatical] distinction: there's a set of expressions the grammar generates, those are [+grammatical], and anything else is ungrammatical, i.e. not generated. Chomsky denies that there is such a distinction for natural language, or at least a meaningful one, since expressions can be more or less acceptable along various dimensions, acceptable in some contexts/registers but unacceptable in others, etc. So for natural language there's no notion of well-formed formula, or at least this is not what we're probing when we ask people for acceptability judgments (where a host of other factors come into play besides grammar). But the notion of "overgeneration" presupposes precisely that there is a notion of well-formed formula -- if you generate formulae that aren't in the language (= set of sentences), you "overgenerate." But in linguistics people typically, and mistakenly, use apply the term to analyses that predict/imply the generation of "deviant" forms. This is at best a very informal notion of "overgeneration," and not one that is defined technically, *unless* (and that's the fallacy) you equate grammaticality and acceptability. Same for crash-proof grammars: as far as I can see, the goal of these is to generate "only what's acceptable," as though this were equivalent to the notion of well-formed formula in formal language theory. Even if this were a coherent goal (which I don't think it is, since acceptability is not a defined technical notion), it would be empirically dubious, since, as Chomsky has emphasized, we want the grammar to generate all kind of deviant expressions which have perfectly coherent interpretations, and may even be used deliberately in certain contexts.
DeleteDoes this make sense?
@Dennis
DeleteAren't you conflating acceptability with grammaticality here? Your words, relevant part between *s:
"So that's a straightforward [+/-grammatical] distinction: there's a set of expressions the grammar generates, those are [+grammatical], and anything else is ungrammatical, i.e. not generated. Chomsky denies that there is such a distinction for natural language, or at least a meaningful one, *since expressions can be more or less acceptable along various dimensions*, acceptable in some contexts/registers but unacceptable in others, etc. So for natural language there's no notion of well-formed formula…"
This looks like it assumes that because utterances vary in acceptability that a grammar must eschew a notion of grammaticality. But does this follow? We know that one can get continuous effects from discrete systems (think genes and heights). So the mere fact that acceptability is gradient does not imply that grammaticality is too.
There are several factors you mention. Registers: but of course we know that people have multiple grammars, so we can say it is grammatical in one but not another. You also mention other factors: but this does not mean that a sentence may not be +/-grammatical, just that grammaticality is one factor in gauging acceptability. And we have tended to think that it is a non-negligible factor so that acceptability was a pretty good probe into grammaticality. And given the success of this method, this seems like a pretty good assumption, though there are interesting cases to argue over.
What I think you highlight, which is interesting, is that Chomsky has made a stand against thinking that all forms of unacceptability implicate the grammar. He did this before, of course (think "colorless green…"). But he seems to want to expand this yet more now. I am not really sure where he wants to draw the line and I can imagine that there is no principled line to draw. It's an empirical matter as they say. However, his current work clearly requires that some structures are ungrammatical and some are not. He believes that some unacceptability is not due to ungrammaticality but due to something else, e.g. a gibberish interpretation at the interface. So far this is the colorless-green strategy. Do you think that there is more?
BTW, I never understood why Chomsky thought we needed gradient GRAMMARS. Can you explain why? I can see that the combo of Gs and other things yield gradient judgments. But why gradient grammars?
@Omer: thanks, it seems like we've pinned down our disagreement (or, I guess I should say, the point where our intuitions diverge).
DeleteYou say, But now, as I said before, there is positive evidence accruing that the realtime procedures respect islands, c-command, etc. (ask some of the other UMD folks commenting on this blog; they know far more about it than I do). And if Chomsky's ontological choices make this comparison necessarily informal, then in my view, that's simply another strike against those ontological choices.
That's one way of looking at it. I should certainly read (more of) the stuff you mention, but when it comes to matters of logic (like the knowledge/competence vs. use/performance distinction) I fail to see how any empirical evidence could bear on it in principle. I mean, if you accept that there's one system, the grammar, that is a purely logical-derivational system and another bunch of systems, those involved in production, that operate in real time, then what does it even mean to say that those systems share certain properties?
Also, Lastly, I think the notion of "a system of competence (accessed by systems of use but distinct from them)" is incoherent if you think the systems of use can operate in realtime but the system of competence cannot (if the operation of SoU involves a step where SoC is accessed, and SoU operates in realtime, then at least that part of SoC that is accessed by SoU must also be able to operate in realtime).
I don't see what's incoherent about the idea that production/perception systems access systems of knowledge. If I play a game and I have its logical structure/rules internalized, I will access that knowledge when I play the game. But that doesn't mean that the logical structure and my actions in playing the game are somehow equivalent -- they're quite distinct, but my behavior isn't random because I can make use of the knowledge I have. And this certainly doesn't imply that the logical structure is somehow instantiated "in real time" (except in the sense of "being there in my head in that moment").
@Dennis: You write, I should certainly read (more of) the stuff you mention, but when it comes to matters of logic (like the knowledge/competence vs. use/performance distinction) I fail to see how any empirical evidence could bear on it in principle. I mean, if you accept that there's one system, the grammar, that is a purely logical-derivational system and another bunch of systems, those involved in production, that operate in real time, then what does it even mean to say that those systems share certain properties?
DeleteAgain, (a portion of) this logic is exactly what I'm finding fault with. The logic is not a given – it is part of the linguist's hypothesis structure. If this logic puts us in a position where there are certain robust facts about the world (those things you call "informal" similarities) that we will never be in a position to explain, then the logic is the wrong one to pursue (as scientists; it might still be an interesting thought experiment from a philosophical point of view).
If I play a game and I have its logical structure/rules internalized, I will access that knowledge when I play the game. But that doesn't mean that the logical structure and my actions in playing the game are somehow equivalent -- they're quite distinct, but my behavior isn't random because I can make use of the knowledge I have.
That last sentence is what I'm after: if you can make use of the knowledge you have while playing the game, then there must be a model of this knowledge that is implementable in realtime. That doesn't mean that the only way to represent this knowledge is using a realtime implementation; but if someone makes a proposal for the content of that knowledge which is fundamentally at odds with realtime implementation, then we know that that proposal is wrong – since, out in the world, people are "making use of that knowledge" (your words) in realtime.
Dennis:
DeleteThanks for your attempt to clarify. Unfortunately, I'm more confused now than ever. Actually, I don't know if I don't understand or if I just disagree. When Chomsky proposed the notion of bare output conditions being the conditions that language must meet to be useable at all, didn't he mean to claim that the narrow syntax is free to generate whatever it will (this is the sense in which I understand there to be no well-formed formula: in the narrow syntax), and that those generated objets which meet BOC's form grammatical sentences, while those which do not meet BOC's form ungrammatical strings? In other words, I understand the "overgenerated" strings to be those which are generated in the narrow syntax but do not meet BOC's. If this is not the case, then I suppose I'm not sure what the importance of BOC's is anymore. (Just to be clear: when I talk about BOC's, I'm not talking about semantic interpretation; whether an object yields gibberish or some semantically coherent expression can only be evaluated once the object has undergone interpretation in the semantic component proper; but an object must obey BOC's to gain access to the semantic component in the first place.)
Also, I repeat a question of Norbert's: If acceptability and grammaticality are distinct notions, why should the observation of gradience in acceptability judgements lead us to postulate gradient grammars?
As always, I appreciate any insight you can provide.
This comment has been removed by the author.
ReplyDeleteDennis said that in [Formal Language Theory], there are no interface systems/conditions. Well, there are.
ReplyDeleteThere's an entire subfield of formal language theory that's concerned with the generative capacity of logically defined constraints. For instance, a string language is definable via constraints stated in monadic second-order logic iff it is recognized by a finite-state string automaton iff it is generated by a regular string grammar iff its Myhill-Nerode partition has finite index.
What FLT shows is that we can switch between all these perspectives as we see fit: features or constraints, grammars or recognizers, unbounded dependencies or local subcategorization. Some are more succinct, some are easier to implement, some highlight connections that are easy to miss otherwise (the logic perspective is very useful for comparing phonology and syntax). But they all have their own unique advantages.
That's why I'm having a hard-time making sense of discussions like this (this = features VS constraints, not grammaticality VS acceptability). It doesn't seem to be a discussion of how one's research interests should inform what kind of technical devices one uses. Instead the sentiment is apparently that feature-based perspectives are fundamentally broken and a switch to constraints would easily solve that.
@Thomas:
DeleteThe issue as I see it is not features versus constraints but which features and which constraints, implement as you will. The original MP conceit was that movement was forced, driven by feature checking (as in MP formalizations). However, there is another idea: movement is "free" and outputs are "acceptable" if they gain interpretation at the CI interface. Doing this means "fitting" with the system of meanings that live at CI. There is a problem with these views: re features, they are too cheap and hence lack explanatory power. Re Bare Output Conditions (CI conditions) we know virtually nothing about them. This also reduces their explanatory efficacy.
So, the problem is not whether to code what we know in terms of features or in terms of conditions, but an admission that we don't know enough to make use of these notions particularly enlightening. And for that there is no formal fix, so far as I can tell.
Thomas: Dennis said that in [Formal Language Theory], there are no interface systems/conditions. Well, there are. There's an entire subfield of formal language theory that's concerned with the generative capacity of logically defined constraints.
DeleteYes, but that's different from interface conditions imposed by biological systems embedding the I-language. There is no reason to assume, as far as I can see, that the latter correspond to logical constraints on formal systems. I remember Chomsky talking about vacuous quantification, which seems to me to be a good illustration of this. It's really hard to assign an interpretation to something like "Who did John kiss Mary?," which in logical/formal terms is unproblematic. But C-I (or whatever you want to call it) doesn't permit it. Or take thematic roles, imposed on interpretation by C-I in a way that has little to do with logical constraints, but presumably with the format of "events" defined by C-I. And so on.
Norbert: movement is "free" and outputs are "acceptable" if they gain interpretation at the CI interface.
I think this is a misleading way of putting it. Rather, expressions have whatever interpretation they have at the interfaces, including deviant and nonsensical interpretations, or perhaps no coherent interpretation in the extreme. This is conceptually quite different from a "filtering" system implementing some notion of "acceptability" in terms of "reject" or "admit."
I replied to this above. Filter here is used very non theoretically; it's whatever it is that explains our data without doing so on the generation side of the grammar. Btw, I've never really understood what the interpretive options are given that Chomsky has been loath to specify what the interpretive system does. But this may not be a fair criticism, as nobody has a good idea about this. However, if I want to say the something converges with a gibberish interpretation rather than not being interpretable at all, it would be nice to have some canonical examples of how this distinction is meant to be taken.
DeleteLast point, how do you understand the idea that the grammar is the optimal realization of interface conditions if the later does not in some sense restrict the range of the former?
@Dennis: Yes, but that's different from interface conditions imposed by biological systems embedding the I-language
DeleteMy point is a technical one: Let's assume, as you do, that there's a set C of constraints that hold at the interfaces. As long as these constraints fall within a certain complexity class, they can be automatically translated into syntactic constraints over derivations, which in turn can be encoded in terms of feature dependencies.
This class of feature-reducible constraints includes all the examples you give above. The only constraints in the literature that fall outside are those that invoke identity of meaning, but even there it's not clear-cut (Scope Economy, for instance, is fine if we do not care about actual change of meaning but just about whether the meaning of the sentence could in principle be altered by QR, which is what Fox actually argues for). So overall features and constraints can do the same work, but they do it in different ways and thus one of the two might be more suitable for certain tasks.
At any rate Norbert already provided the money quote: the problem is not whether to code what we know in terms of features or in terms of conditions, but an admission that we don't know enough to make use of these notions particularly enlightening. I find that admission very refreshing, but I have the impression that a fair share of syntactic work nonetheless wrestles with such matters of notation with the agenda to prove one superior.
@Thomas: I think I see what you mean, but in the case of I-language we're dealing, by assumption, with different systems interacting. So it does make a difference *where* you locate the complexity (in the grammar or in the interfacing systems), esp. if you take evolutionary considerations into account. That's not just a matter of notation, although it may be from a purely formal point of view.
DeleteSo I'm not denying that you could restate everything in terms of features without increasing complexity, but it would still mean putting this stuff into UG rather than into other places that are hopefully in some meaningful sense "independently given." So while I agree with Norbert's assessment of how little we know, I think it's clear which route you want to go *if* you subscribe to the general idea of cutting down UG.
Norbert: Last point, how do you understand the idea that the grammar is the optimal realization of interface conditions if the later does not in some sense restrict the range of the former?
DeleteI take "optimal realization" to mean something like the most minimal system that can satisfy interface conditions while being totally blind to them, i.e. generate expressions that end up useable. It may generate all kinds of unusable expressions, but it plainly has to generate those that are usable as well. And free Merge operating over the lexicon will give you an infinity of propositional expressions, but it will need to be supplemented with theories of interface mappings (at least PF) and, eventually, the outside systems accessing the resulting representations.
I think the issue is a methodological one: without a restrictive theory of the interfaces, nearly *anything* (well, except for Merge itself) can be relegated to the interfaces. One can then claim victory (i.e., that a very minimal UG has been achieved), but this move will have taught us very little – I dare say, nothing – about the human capacity for language. We have in effect stuck a "PF" or "LF" sticker on phenomena that still have no explanation.
DeleteThe price is steep (again, methodologically speaking): what "PF" looks like to modern-day syntacticians makes no sense to any morpho-phonologists to whom I have posed the question; similarly for what many syntacticians take to be "LF" requirements. (This is why I was careful, earlier, to say that agreement cannot be enforced using Bare Output Conditions – if you allow LF to impose the condition "if there is a [plural]-bearing DP within the c-command domain of, and in the same phase as, T, then T must have agreed with that DP", then it certainly can be enforced "at the interfaces.")
I would say that in practice, these 'relegations' to PF/LF often do more to impede research than to foster it.
No disagreement here, and of course I didn't mean to suggest that putting "PF" and "LF" stickers on phenomena is per se an explanation. But I think there are some plausible analyses that go this way, which is encouraging. Just a whole lot of work left to do.
Delete