Sunday, September 8, 2013

A cute example

One of Chomsky’s more charming qualities is his way of making important conceptual points using simple linguistic examples. Who will ever forget that rowdy pair colorless green ideas sleep furiously and furiously sleep ideas green colorless and their pivotal roles in divorcing the notions ‘grammatical’ from ‘significant’ or ‘meaningful’ and in questioning the utility of bigram frequency in understanding the notion ‘grammaticality.’[1] Similarly, kudos goes to Chomsky’s argument for structure dependence as a defining property of UG using Yes/No question formation. Simple examples, deep point.  However, this talent has led to serious widespread misunderstandings. Indeed, very soon “proofs” appeared “showing” that Chomsky’s argument did not establish that structure dependence was a built in feature of FL for it could have been learned from the statistical linear patterns available to the child (see here for a well-known recent effort).  The idea is that one can compare the probabilities of bi/tri-gram sequences in a corpus of simple sentences and see if these suffice to distinguish the fine (1a) from the from the not so-fine (1b).

(1)  a. Is the man who is in the corner smoking
b. *Is the man who in the corner is sleeping

It appears possible to do this, as the Reali and Christiensan (R&C) paper shows. However, the results, it seems, (see here p 26) are entirely driven by the greater frequency of sequences like who is over who crying in their corpus, which in turn derives from the very high frequency of simple who is questions (e.g. who is in the room?) in the chosen corpus.[2]

There has been lots of discussion of these attempts to evade the consequences of Chomsky’s original examples (the best one is here). However, true to form, Chomsky has found a way to illustrate the pointlessness of these efforts in a simple and elegant way. He has found a way of making the same point with examples don’t affect the linear order of any of the relevant expressions (i.e. there are no bi/tri-gram differences in the relevant data).  Here’s the example:

(2)  Instinctively, eagles that fly swim

The relevant observation is that instinctively in (1) can only modify swim. It cannot be understood as modifying fly. This despite the fact that whereas it is true that eagles instinctively fly, they don’t swim.  The point can be made yet more robustly if we substitute eat egg rolls with their sushi for swim. Regardless of how silly the delivered meaning, instinctively is limited to modifying matrix predicate. 

This example has several pleasant features. First, there is no string linear difference to piggyback on as there was in Chomsky’s Y/N question example in (1). There is only one string under discussion, albeit with only one of two “possible” interpretations. Moreover, the fact that instinctively can only modify the matrix predicate has nothing to do with delivering a true or even sensible interpretation. In fact, what’s clear is that the is fronting facts above and the adverb modification facts are exactly the same.  Just as there is no possible Aux movement from the relative clause there is no possible modification of the predicate within the relative clause by a sentence initial adverb. Thus, whatever is going on in the classical examples has nothing to do with differences in their string properties, as the simple contrast in (2) demonstrates.

Berwick et. al. emphasize that the Poverty of Stimulus Problem has always aimed to  explain “constrained homophony,” a fact about absent meanings for given word strings.  Structure has always been in service of explaining not only what sound-meaning pairs are available, but just as important which aren’t.  The nice feature of Chomsky’s recent example is that it neutralizes (neuters?) a red herring, one that the technically sophisticated seem to be endlessly hooking on their statistical lines. It is hoped that clarifying the logic of the POS in terms of “absent possible interpretations,” as Berwick et. al. have done, will stop the diminishing school of red herrings from replenishing itself.

[1] See Syntactic Structures p. 15-16. Chomsky’s main point here, that we will need more than linear order properties of strings to understand how to differentiate the first sentence from the second has often been misunderstood.  He is clearly pointing out here that we need higher order notion parts-of-speech categories to begin to unravel the difference. This “discovery” is remade every so often with the implication that it eluded Chomsky. See here for discussion.
[2] See Berwick et. al. here for a long and thorough discussion of the R&C paper here. Note, that the homophony between the relative pronoun and the question word appears to be entirely adventitious and so any theory that derives its results by generalizing from questions to relative clauses on the basis of lexical similarities is bound to be questionable. 


  1. One could also maintain the argument for Yes/No questions with more interesting sentences.

    It is a matter of finding cases where the extraction of the embedded Aux results in what are better bigram/trigram sequences than the extraction of the "correct" matrix Aux.

    Let the simple sentence be as in (1) below. If bigram/trigram frequencies are what drive the Yes/No question response, then (2) should be better than (3) since "Sean happy" is surely worse that "Sean is happy" as far as the child's input is concerned. However, it is (3) that is grammatical.

    (1) The boy who is called Sean is happy.
    (2) *Is the boy who called Sean is happy.
    (3) Is the boy who is called Sean happy.

  2. This post certainly was educational. I learned that instinctively swimming eagles are cute. My guess would have been they'd be wet. But the fascinating ornithological fact was the only novel insight for me. Predictably, non Chomskyan modelling, this time in the incarnation of a dated Reali & Christiansen paper, has to be wrong. Predictably virtually nothing from that paper was cited, let alone anything of the work the Christiansen lab has done after publishing that paper.

    Always the optimist i had a look at the Berwick et al. paper Norbert kindly linked. Like Norbert i use shortcuts when deciding whether a paper is worth reading. Mine is looking at the reference section to make sure the authors have covered RECENT literature. I gotta say i was very disappointed: nothing more recent than Reali & Christiansen, 2005 from that group. The name MacWhinney did not even appear, nor did ANY paper published in the important 2010 special issue of Journal of Child Language that was dedicated entirely to modelling and had several papers on multiple cue integration. Last not least there was a reference to the work of Jeff Elman:
    Elman, J. (2003). Learning and development in neural networks: The importance of starting small. Cognition,48, 71–99. The title sounded familiar, the year did not - so i looked it up: indeed, Elman published this paper but it was in 1993 not 2003.

    Yes, there were also a few references to recent work by Alex Clark [and co-authors] and the 2011 Perfors et al. paper. But it remains a mystery how Berwick et al. can draw the far reaching conclusions about the value of such work when 90% of the work in that field is simply ignored.

    And then there are the bees. Assuming for a moment that ALL non-Chomsky-style models need to be rejected because they cannot account for the non-ambiguity of flying eagles that also swim or eat or ... just what makes anyone think that bee communication is relevant to figuring out "Which expression-generating procedures do children acquire; and how do children acquire these procedures?" Is it hoped that the bee model outlined in Gallistel [2007] will succeed where Reali & Christiansen [2005] failed and give an account for the examples Norbert calls cute?

    1. I perfectly agree, if there is more recent work that ought to be taken into account.
      To make it easier for "outsiders" like myself, could you point to the more recent work by Christiansen and / or Reali (or others) that tackles anything like Norbert's (well, Chomsky's) (2) ?

    2. @Benjamin: you question suggests that I did not make my point clear [maybe because I made it several times before]: people like Christiansen moved away from 'one size fits all' approaches and work on different types of models that can address different specific tasks/phenomena. Informed by how kids begin to learn these researchers employ multiple cue integration models [here are a couple references but you rally should e-mail Morten about this work if it interests you. As long as he is not called irrational dogmatist or made fun of on public blogs he is massively helpful]

      Monaghan, P., Chater, N., & Christiansen, H. (2005). The differential contribution of phonological and distributional cues in grammatical categorization. Cognition, 96, 143-182.
      Monaghan, P. & Christiansen, M. (2008). Integration of multiple probabilistic cues in syntax acquisition. In H. Behrens (Ed.), Trends in corpus research: Finding structure in data. (pp. 139-163). Amsterdam: John Benjamins.

      In a previous discussion I cited Chater et al. [2011] saying that there are no fundamentalist Bayesians. I have no idea if Elman thought back in 1993 that just one kind of model could account for everything language. But his recent work suggests he doesn't believe this now. Other recent work has shown that models that for for one task [word segmentation] for one language [English] fail miserably for another [Sesotho], cf:Blanchard, D., Heinz, J., & Golinkoff, R. (2010). Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 37, 487 - 511.

      From the minimalist perspective such a finding might spell disaster. But if you do not assume that all humans share an innate mechanism that has to churn out every possible human language and, instead, focus on input, you can assume that different cues contribute to a different degree in the acquisition of different languages. This is the direction in which modelling has moved recently.

      Now your question [whether Reali & Christiansen or others have a working model for Norbert's (2)] indicates that you wish to apply the minimalist criteria [one [kind of] mechanism has to account for everything] to work of researchers who have explicitly stated they reject this framework. Berwick et al. are right that models relying on just one mechanism to 'extract everything' from the input are likely to fail. But no one is claiming this. So before they can draw the grand conclusion they do, they need to address the work that is done besides the few models they discuss and find wanting.

      If you want to know whether anyone works on models to address Norbert's [2] you have to consult the literature. Assume someone is and has a good model. I predict Chomsky will come up with another 'cute' example for which no account exists. He has always been great at that. But before you get too excited about this possibility you may want to remember that there were 2 things that got Chomsky established as game changer in linguistics: his evisceration of Skinner but maybe even more his own innovative, neah trail-blazing account. Sadly, his work regarding the latter no longer lives up to what he is undoubtedly capable of and other linguists find easily counter examples to his proposals [for one tiny list look at: ].

      i think many people who were initially excited by Chomsky's contribution have become tired of the ongoing hostility he [and some of his comrades] shows towards anyone who has a different view. And, they are tired of waiting for his POSITIVE account of the things he blasts others for not accounting for. So maybe instead of joining those who have excellent evisceration skills [like Norbert or David P.] you may want to consider working on that other part of the Chomskyan tradition: show how the mechanism for Norbert's (2) is implemented in a human brain...

    3. @Christina. The question was simply whether anyone has an account of the interpretative properties of (2) which does not involve some kind of innate preference for structure-dependent principles in the domain of language. If you can't point to such an account, then what is there for Berwick et al. to address?

    4. @Christina. I am not sure that I understand why you are so ironic about Chomsky and his examples. You seem to be saying that there is a group of researchers [Reali & Christiansen and others] who claim that every individual problem potentially has a different kind of solution. They cannot start tackling the kinds of examples Chomsky gives, because every time they do so, that horrible guy comes up with a new example for which they have to find a new solution.

      (And furthermore, you find Chomsky an unpleasant person and you do not like his behaviour; I am not sure what role that plays in your argumentation, but you put it forward with some force.)

      There is an obvious methodological problem with the kind of account you are describing, it seems to me. It is unfalsifiability in a very crude form. No claim is made, because everything imaginable can potentially be accounted for if we have enough time. It has thereby made itself immune to criticism, while at the same time being able to criticize the other party, for instance for not giving an account for (2). [I am not a syntactician, but to me it seems obvious that (2) can simply be accounted for by assuming that the products of Merge have no linear order; I am not sure why an account is only succesful if we show how Merge is implemented in the brain. That seems a little bit too much to ask of the current state of brain science, to be honest.]

      I think that Norbert has described the other strategy (Chomsky's) in the discussion quite nicely: you show some crucial and fairly simple examples which you can't account for, and the other ones cannot. Then, if the other party has added yet another instrument to their toolbox so that now they can deal with it, and remain unconvinced of anything like the Occam's Razor argument, he comes up with a new one. Maybe he is a very unpleasant person, but it seems to me the right way to go about questions like this.

    5. @Marc: in somewhat reverse order. You are right, that in general it should not matter whether someone is an unpleasant person. But in Chomsky's case it is different because part of what makes him so unpleasant is that he distorts what his opponents say to make them look like idiots and that he does so in spite of knowing better: in 2009 I contacted him about a distortion of Elman's work in Cartesian Linguistics, sent him the relevant papers that clearly show that what he said was wrong - he said at one point in the correspondence that 'the misunderstandings had been cleared up'. So one would expect ideally an apology to Elman but at least that the distortions stop. Yet, in Science of language exactly the same distortion of Elman's work is published [as is in Of Minds and Language]. This distortion is used to support claims like the entire field of computational modelling makes no contribution to science, proponents are irrational dogmatists, etc. This kind of intentional distortion is not merely unpleasant but violates very basic principles of scientific debating. It puzzles me that it is tolerated in linguistics.

      I am not ironic about Chomsky's examples at all. I have said they are GREAT. The problem is not their quality but that their main purpose seems to be to refute others. Why is Chomsky not spending the same amount of mental energy on improving his own proposals? Why is he not responding to the LINGUISTIC criticism of Postal that shows that his own proposals cannot account for the phenomena he claims to have accounted for?

      Further, I do not think Christiansen and his lab work on one different solution for every example Chomsky comes up with. They are trying to figure out what kids actually do when they learn language. And before kids get to examples like Norbert's (2), they have already learned a lot about language. So they never have to come up in isolation from any other knowledge with a strategy for understanding Norbert's (2) or any other example Chomsky may come up with. When Chomsky is asked for results he reminds us that linguistics is still a very young field. If this is true why are the other guys expected to have already all the answers? Maybe you ought to give them some time too? And not forget that they do NOT claim there is NO internal structure in their models [this might also answers Alex's comment].

      People are not unconvinced by Occam's razor arguments but by the failure of Chomsky to make some progress on the main thing he promised to account for decades ago [according SoL back in the 1950s]: the BIOLOGY of language. Never mind accounting for examples like Norbert's (2). But if language is part of our biology and at least part of language is not shared by any other cognitive domain, and does not occur anywhere else in the animal kingdom why do we still have not the foggiest idea about the uniquely linguistic biological 'structure'? If the Everett rebuttal is right [that Merge can be missing in the tool-kits of some languages], then Merge cannot be the part of our biology that allows for language - or we all would have to have that one tool.

      Given that you seem concerned about the falsifiability of the opposing account: just what would falsify Chomsky's account? Re-read the debate Norbert initiated a while ago: if non-falsifiability is not a flaw for the Chomskyan account, why would it be one for the accounts of others?

      To repeat what i said earlier: I do not think that anyone has all the answers yet. The main reason i continue reading the blog [in spite of the VERY unpleasant treatment i have gotten here] is that I hope for POSITIVE accounts of what it is that makes MP superior to what the other guys are doing. The Berwick et al. paper is mostly silent on that...

    6. @Christina. It's obviously possible that in the future someone might come up with an alternative explanation for why ‘instinctively’ can associate with ‘swim’ but not ‘fly’ in (2). As far as I know (and I'm happy to be corrected with references), the only plausible explanation currently on offer is that kids have some kind of innate bias towards structure-dependent principles. This is not a matter of one or two tricky examples. Structure dependence is a pervasive property of natural language grammars that is exhibited in a wide range of constructions. An acquisition model that can't explain why children invariably acquire grammars that have this property is just a non-starter.

      The point that often seems to get missed by non-linguists is that the argument based on (1)-(2) is supposed to be a completely trivial example serving to illustrate the logic of poverty of the stimulus arguments. It is a nice accessible example of the kind of argumentation that syntacticians use to figure out what's in UG. Aside from that it is of little interest, since its conclusion is (i) obviously true and (ii) entirely compatible with many empiricist approaches to language acquisition (most of which do not, of course, deny the existence of any innate biases whatsoever).

      In short, there are only two reasons to be bothered by the argument based on (1)-(2). First, if you take it to illustrate some error in the logic of POS arguments in general. Second, if you seriously think that kids could acquire non-structure-dependent rules just as easily as structure-dependent ones given the right environment.

    7. "They are trying to figure out what kids actually do when they learn language. And before kids get to examples like Norbert's (2), they have already learned a lot about language."
      As is everyone (around here) trying to do, at some level or other. The trouble I have with lots of the modeling work on "syntax" is that ultimately, examples such as (2) need to be handled properly. And if anyone proposes a model on "early" acquisition of syntax, it's only fair to ask them how their model will, in principle, account for what we know has to fall out of it at the end.

      "[I]f you [...] focus on input, you can assume that different cues contribute to a different degree in the acquisition of different languages. This is the direction in which modelling has moved recently."

      And "they" aren't the only ones doing that. Look at Constantine's continuation of the Gambell/Yang work, at recent analyses of why some of the current segmentation models might fail on languages other than English like Fourtassi et al. (2013), the Johnson/Frank et al. work done on social cues in identifying word-object mappings / synergies in learning word-object mappings jointly with performing word segmentation, to name a few.

      You hardly have to completely disown the idea that there are _strong_ biases involved in language acquisition in order to "appreciate the input".

      But ultimately, as Alex pointed out, all I was asking for is whether Berwick et al. had ignored any more recent work that was relevant to what they were looking at. So, short answer, no, right?

    8. @Alex: Thank you for the clarification of what is at issue. I asked 2 linguists and got 2 suggestions for alternatives to children having innate bias towards structure dependent principles. If I can find such proposals as quickly I am sure, as linguist you can as well - if you ask the right people [you know where I am getting my linguistic advice].

    9. This comment has been removed by the author.

    10. @Christina. It's been my misfortune that I never seem to know the right people at the right time, so if you could pass on some references to these proposals I'd very much appreciate it.

  3. This comment has been removed by the author.

  4. There is clearly a difference between an argument for the structure dependence of syntax (SDS) and an argument for the innateness of the structure dependence of syntax. Now I completely accept that natural language syntax has hierarchical structure in some sense and example 2 is a perfectly good argument for that -- but I don't see why this is an argument for the *innateness* of SDS.

    Norbert has been adamant that the learner learns from sound/meaning pairs, and if you are learning from sound meaning pairs then you will know that adverbs only modify verbs in the matrix clause because semantically you only ever see adverbs modifying verbs in the matrix clause.
    E.g. if you are learning from sound/meaning pairs then when you hear "tomorrow the boys that live next door are coming home" you know that tomorrow modifies "coming home" not "live next door".

    Now I don't think that Berwick et al share Norbert's strongly held view (though Paul P may?) -- or at least there is no indication in that paper that they do, and given that all of the papers learn only from surface strings, then surely they would have mentioned that they should learn from meanings as well.
    But even if they don't there is surely some more reasoning that you need to make it a learnability problem.

    Alex D says "Structure dependence is a pervasive property of natural language grammars that is exhibited in a wide range of constructions. An acquisition model that can't explain why children invariably acquire grammars that have this property is just a non-starter.". If there is some universal property of language, I don't think that the acquisition model necessarily needs to explain it. This is confusing Plato's problem with Greenberg's problem, or getting UG confused with universal grammar. I mean one explanation for why SD is pervasive is that it is built into the biases of the learning algorithm, and that is not a bad explanation, (probably the right one), but there are other explanations. And one reason why children acquire grammars of this type might just be that the languages they are exposed to are uniformly of this type and they learn that. That is to say the biases they have are towards a class of grammars that include structure dependent and structure independent syntax and as in Amy Perfors work, the learner can figure out which is applicable.

    So for it to be a POS argument, you need at the very least to show that examples like (2) are rare?
    And sentences like "the boys that live next door play noisily/with their dog all the time" seem like they might be frequent.
    That was quite a big part of the arguments about (1). And you should be clear about the inputs.. semantics or not? Because if the argument is about interpretation then that becomes a really big deal.

    1. That all makes lots of sense to me. Yet, I always have a hard time with this idea:

      "And one reason why children acquire grammars of this type might just be that the languages they are exposed to are uniformly of this type and they learn that."

      Fair enough --- but why is it that the input uniformly looks like this? Why is it that "semantically you only ever see adverbs modifying verbs in the matrix clause."

    2. Actually, the PLD is not replete with subject relative clauses and I will bet that sentences in Childers with fronted adverbs and subject relatives are not a dime a dozen. There are certainly very few Y/N questions with subject relatives in CHILDES, at least. My recollection is on McWhinney's count there were zero. Now the WSJ may be another matter and most kids now get gift subscriptions at 2 months complimentary from the wizards of finance.

    3. @Alex. I accept of course that we cannot immediately jump to the conclusion that structure dependence is explained by properties of the acquisition mechanism itself. However, the question remains of what exactly an alternative explanation would be. To propose that children invariably acquire structure-dependent grammars because all languages have structure-dependent grammars doesn't get us very far. As Benjamin points out, the question immediately arises of why all languages are like that. The most obvious explanation would be that kids have an innate bias for structure dependent grammars!

      There might conceivably be functional considerations favoring structure dependence, but unless kids are busy calculating what kinds of grammar would be the most communicatively optimal, that still wouldn't explain why they don't mistakenly acquire non-structure-dependent grammars from time to time (assuming that the stimulus is indeed "poor"). I still find it puzzling that (1)-(2) give rise to such controversy. It seems that even within the empiricist camp very few people seriously dispute that kids have some kind of innate bias in favor of structure dependence.

    4. This comment has been removed by the author.

    5. I don't really find saying "X is innate" to be an explanation of X at all; and I think it is uncontroversial that on its own that sort of claim is not explanatory. One would like to have a bit of detail about what precisely is innate and how it causes whatever it is meant to explain. So in this case, it would be nice to have an idea of what the class of "structure dependent grammars" are and how the learning bias would explain the absence of structure independent grammars. This is non-trivial given that the current hypotheses (like say MGs/MCFGs) are also capable of representing the structure independent grammars, or representing languages that can e.g. front adverbs from a relative clause. So as far as I know, the current theories of generative syntax don't actually provide an explanation of this phenomenon. But I see from David Adger's comment below that he thinks that generative syntax is a theory of this sort of innate bias and it would be nice to see what the biases are and how they work.

      Moreover, and I think Norbert would maybe agree here (?), saying "X is innate, and that is why we see X rather than Y" just raises the question of why X is innate rather than Y being innate. So at least the rather thin explanations of structure dependence of syntax that one gets from e.g. Herbert Simon style architectural arguments or Edinburgh style evolutionary arguments do at least bottom out in some reasonable reduction rather than kicking them in to the long grass.

      Just to be clear, I am not entirely convinced by these "communicative efficiency" type arguments, or other functional arguments; my interest is rather the slightly meta question of what would count as good or at least adequate explanation of this phenomenon.

    6. @Alex. To begin with, it's worth noting that POS arguments are deductive arguments which establish their conclusions regardless of explanatory value. So, if a POS argument concludes that a bias for structure-dependence is innate, it does nothing to undermine the argument to show that the conclusion has no explanatory value. We just have to accept the conclusion (that is, if we don't reject the premises or the logic of the argument) and get on with trying to find a deeper explanation.

      This being said, it seems to me that if kids have an innate bias for structure-dependent grammars this would explain why all languages have structure-dependent grammars. After all, you can easily imagine a world in which there's no such innate bias and languages are consequently replete with non-structure-dependent syntactic rules. To point out that these two things are linked seems to have at least some explanatory value. It's far from providing a complete explanation, of course, and we would indeed like to have “an idea of what the class of ‘structure dependent grammars’ is.” That's what generative syntax is all about. So e.g., we try to find deeper explanations for island constraints, which restrict the class of acquirable grammars to ones in which adverbs can't front out of relative clauses.

      The explanations for structure-dependence that you refer to seem broadly compatible with the conclusion that there is an innate bias for structure-dependence. I assume you agree on this point, since you think that “X” probably is innate in this instance, and at the same time (like me) would like a deeper explanation for why this is so.

    7. You are right about the APS -- but you weren't using an APS. You were arguing that the absence of alternative functional explanations for X was evidence for innateness; which is a different argument. The APS is you are right logically valid, but one of the premises is very questionable -- namely the one that says that you can't learn that a language has X: structure dependent syntax in this case. So I don't accept the conclusions.

      I think it is true that some theories of generative syntax do specify a UG which implies e.g. that adverbs can't move out of relative clauses, and that that sort of theory, would explain why this doesn't happen, albeit by stipulation in my view. But it verges towards a 'virtus dormitiva' explanation.
      But those P & P theories have been largely abandoned and I don't think the current MP proposals for UG, such as the one in the Berwick et al paper actually have this consequence, since the classes of grammars they consider seem to include structure independent grammars.

      Could you give me an example of the sort of current theory of UG you think does explain example 2 (or 1)?

    8. Could you elaborate a little bit on the structure-independent grammar bit? What exactly do you mean?
      Yes, CFGs properly include FSG, say, and so do MGs. Is that all you mean?

      But even so, the CFG-encoding of a Bigram-model induces structure (if trivial uniformly right-branching structure) --- don't you still get more than the yield-string? (That's even true for the FSG encoding of a Bigram model, no?)

      If your point is that saying "humans can only acquire MGs, and hence languages exhibit (non-trivial) structure dependence" isn't enough (and not even valid, as trivial structure can also be generated by MGs), I agree.

      Where I don't agree is that positing the bias towards structure-dependent grammars is non-explanatory. Yes, why there is this bias rather than another bias is a further question but unlike Norbert (and you?), I doubt we are in a good state to tackle questions like this. Irrespective of that, though, doesn't the same question arise with respect to the learning-mechanism strategy? That is, in addition to the pressing question why the input to learners uniformly exhibits non-trivial structure dependence (to which the innate-strategy _does_ give an answer), you'd also be faced with the question why it is learning mechanism Y rather than X, no?

      What I am just as interested in as you are (I think) is what this bias could look like in principle (and actually looks like, ultimately) --- can we neatly characterize a set family of grammars which actually excludes the trivial-structure ones while accounting for all the non-trivial structure we find, together with (supposedly universal) constraints on these structures.

      Again, I agree that we are not exactly close to having answered this question to satisfaction (I think here I disagree with Norbert?) but it strikes me as the question the answer to which will give real substance to the currently rather vague (but nevertheless explanatory) innate-structure-bias-positing.

    9. Actually this raises an interesting angle on Norbert's comments on the Bayesian learning post(s); can we model the LAD just by defining a hypothesis class H without specifying a learning algorithm or inference mechanism?
      It seems if you don;t have an inference mechanism, then in order to explain examples 1 and 2 you need to have a hypothesis class which simply doesn't contain *any* grammars which will give the wrong answers. Otherwise your theory of UG won't explain these facts. Ergo you are forced into having quite a restrictive theory of H. Indeed depending on how you set things up it might be impossible to enforce the restriction (for example, you can't define the class of CFGs that only define non regular languages, because of undecidability, or the class of MCFGs/MGs that only define non context-free languages).

      So you might want to have at least an evaluation measure or something that will favour one portion of the space rather than the other. But if you have to have that, then you probably don't need a separate theory of the hypothesis class H at all -- which I think is what Norbert uses the term "UG" for.

    10. Quite correct. There are in fact two kinds of current explanations from which the facts of structure dependence can be derived. The more conservative account depends on the fact that subjects are islands. If grammatical commerce across an island is verboten then both the polar question facts and the adverb modification facts follow as the subject relative is cut off from the rest of the clause wrt grammatical dependencies. This, then ultimately lays these data at the door of Ross's island conditions, which are themselves part of FL/UG.

      A more interesting explanation, recently mooted by Chomsky in his 'Problems of Projection' is that phrase markers have no linear order at all in the grammar. Linearization is a product of transfer to the S&M interfaces and so grammatical operations cannot exploit left/right info as the objects of grammatical manipulation, I.e. phrase markers, code no such information. This makes the non-structure dependent predicates unavailable. Thus these grammars are literally not in the hypothesis space. If I understood you correctly this is what you speculated in the previous post re Learning methods. Thus, the outcome is not a function of the possible learning paths but due to the nature of the hypothesis space. This work builds on that of Kayne and his work on the LCA. This is now, in some form, a central feature of most current grammars.

      Hope this helps to answer your question. Yes, structure dependence is not deep, though if true it does explain why grammars use sauce rules. But, yes there are more interesting and deeper attempts out there to derive structure dependence in a more natural way.

    11. "Indeed depending on how you set things up it might be impossible to enforce the restriction (for example, you can't define the class of CFGs that only define non regular languages, because of undecidability, or the class of MCFGs/MGs that only define non context-free languages)."

      Thanks, that was a helpful reminder of something that should have been clear to me :-) [though, doesn't that only hold with respect to string languages? not sure where that would go, but it seems easy enough to exclude "trivial" structures; say, no FSG would allow you to even have lexical-insertion using unary rules (which I know isn't part of current theorizing). so couldn't one require certain (motivated, of course; the way I put it all of this really is just stipulating) non-trivialities in the structures that can be generated by the admissible grammars, and try to define the class of admissible grammars through this?]

      "It seems if you don't have an inference mechanism, then in order to explain examples 1 and 2 you need to have a hypothesis class which simply doesn't contain *any* grammars which will give the wrong answers. Otherwise your theory of UG won't explain these facts. Ergo you are forced into having quite a restrictive theory of H."

      That sounds about right. Of course there still is non-trivial typological variation as well --- say, differences in domains of extractions between Italian and English (iirc, but the point really just is that there still is a fair range of possible variation, leaving a non-trivial acquisition problem even for the extremely restrictive theories).
      And with respect to possible patterns that (to the best of our knowledge) are absent from actual human languages, assuming they are excluded from the hypothesis space learners have seems to be at least reasonable. So assuming that adverb-fronting out of relative clauses is universally banned, on such a view there shouldn't be a grammar in the hypothesis space which allows it.

      Are you worried about a slippery slope here that makes the acquisition problem "trivial"?

    12. Benjamin, I posted that before I read your last post, so it probably seemed like a non-sequitur.

      You raise a very good point: "you'd also be faced with the question why it is learning mechanism Y rather than X, no?" This is something I have thought about a lot over the years, and I think that question is a lot easier to answer, because anyone can write down 20 different classes of grammars before breakfast, but computationally efficient learning algorithms for large subclasses of the class of PTIME languages are harder to find. So if you accept that a domain-general efficient learning mechanism is adaptive (I hear a sharp intake of breath from some people already: Naive adaptationism! Spandrels! etc etc. ), then since there aren't very many of those, the set of possible explanations is very constrained. Indeed at the moment the only candidates are distributional learning algorithms. So I accept this is early days yet but it seems like there is a less of a gap here than there is with a very rich stipulated UG. But still, as you point out, a gap.

    13. I'm sure you can guess that I entirely disagree. Indeed, there are very few variants on the basic view of UG out there. The various frameworks are all pretty much notational variants in general. So, there are not dozens of adequate grammars out there. Most code the same basic insights and in pretty much the same ways. That this is so is what has made asking why UG looks like it does so fruitful. There is actually relatively little debate about what the core grammatical dependencies are overall. In fact some version of GB pretty well sums up the kinds of dependencies grammarians have found. I say consensus because Ivan Sag and I were able to agree on a pretty good list despite our filigree disagreements about the exact versions.

    14. So is UG the class of all Minimalist Grammars or some subset?
      And if so what subset?

    15. Well, as Minimalism is a work in progress, we can say that it defines the class of possible grammars. For the question you asked- whence structure dependence,?- IF either of the two proposals are right and these are part of UG then all grammar rules must be structure dependent in the relevant sense. As I said the Chomsky story is the potentially more interesting one as it makes non structure dependent grammars in principle unavailable; they can't exist because they exploit predicates that the grammar doesn't countenance. If this is right, then rules of syntax dependent on linear properties of the string are impossible. So, if minimalist grammars encode some version of spell out/transfer (which most do at present) then structure dependence is the only possible kind of grammar on minimalist grounds.

      One more point: the idea that phrases do not code linear order and that grammars don't goes back at least to McCawley. Kayne has repackaged the proposal in a very interesting way, but the conceit is a very old one within Generative Grammar. An interesting sidelight: Chomsky argued against this conception in Apsects, very strongly in fact.

    16. If you aren't going to put forward an alternative, then I will assume that MGs are the only option on the table.
      There are MGs can generate languages which allow things like:
      Messily children who eat noodles get in trouble.
      where Messily modifies eat, and
      Instinctively eagles that fly swim
      where instinctively modifies fly.
      As well as languages which form questions by reversing the words in the sentence etc.
      (See Bouchard 2012 Biolinguistics paper for details).

      So Minimalism provides no explanation here for examples 1 and 2.

    17. There are none that obey subject islands that can generate:
      [Messily1 [[children who [eat noodles] t1] get in trouble]
      This dependency is illicit given the subject condition.
      Nor are there grammars that allow you to target the linearly closest verb for modification, as such rules cannot be stated. I don't know how Bouchard generates the required examples but such MGs are not the ones that practitioners generally allow.

    18. The grammars that are counterexamples will appear unnatural. If they were natural then the theory would already have been fixed. But you seem to admit that the counterexamples are admitted by the theory and only excluded by informal practice. In which case the theory does not exclude the incorrect interpretations. So if you are claiming that the theory does, I would like to know which theory. And it would have to be quite precise, or you won't be able to show that for all grammars G allowed by your theory, it is not the case that G will allow that sound/meaning pair.

      Now I write this, I realise I don't even know what such a demonstration would look like. Don't you need a grammar class with some very strong "regularity conditions" so it doesn't have any finite exceptions? What forces [children who eat noodles] to have the particular syntactic type that would block the extraction? What blocks "messily" from having some exceptional feature marking that allows it to escape that restriction? etc etc How can you possibly claim that no grammar in the class will allow that interpretation? It seems like the argument is missing some crucial elements.

      (In case you were making the factual claim that MGs can't do that, I would remind you that MGs are strongly and weakly equivalent to MCFGs (Michaelis, Stabler etc.) which can certainly do this. The example is just about the observed string/meaning pairings. Converted into MG terms, there would probably have to be some additional unmotivated ad hoc features, and the resulting grammar would be larger and would look weird, for sure. But looking strange is not a theoretical well-defined concept.)

    19. I am making a simple point: virtually all MG grammars encode a "shortest move/attract" condition and/or island effects. If some formalizations DON'T then they are not adequate formalizations. If one does encode such then the sentences you cite are not derivable. Period. Now, in a grammar that does not code linear relations, "shortest" cannot mean linearly closest. Why, because there are no linear relations available. In such a grammar "shortest" will necessarily involve some hierarchical notion, hence will be structure dependent. In many minimalist grammars islands are understood in terms of Spell Out, successive spelling out making spelled out structure unavailable. In such grammars islands like the CNPC are understood in such terms. Uriagereka, Lasnik and I have advanced such a proposal, among others. In such a theory there can be no grammatical commerce between elements within islands and those without due to Spell Out.

      Now, my understanding of Stabler's formalism is that he codes a "shortest" constraint, though in a somewhat unconventional way. At any rate, if he does, the his formalism should result in the same consequences. I don't think he says much about Islands and Spell Out, Tant pis! The bottom line: either the formalizations encode these notions, in which case the derivations you cite are unavailable. Or they do not, in which case they are inadequate and the formalists should get back to work if they want to address these issues.

    20. I think my point is very simple too, and let me make it really trivial so that it is easy to grasp, as I think I did a bad job of making it.
      Suppose w = Instinctively eagles that fly swim
      and s1 is the meaning where instinctively modifies swim
      and s2 is the meaning where it modifies fly.
      So English has (w,s1) but not (w,s2).

      So suppose UG consists of the hypothesis class H and it happens that there are NO grammars in H that generate (w,s2). Then sure, UG would explain why we don't see (w,s2) (or at least imply it, as I don't think it explains much, as I was discussing with Benjamin).

      But it is I think very obvious that there will be a grammar in H that generates (w,s2), and that is because the lexicon is not fixed.
      So let w' be the sentence "Eagles that instinctively fly swim."
      So since there is a grammar that generates (w',s2), we can just permute the lexicon to get a grammar English'
      where "Instinctively" is a plural count noun that means "eagles", "Eagles" is a relative pronoun, "that" is an adverb that means "instinctively" etc.
      So English' will generate (w,s2).
      And there is no reason to think that English' is not in H.
      (Since English is, and English' just has a shuffled lexicon).
      So there are grammars in H that generate (w,s2).

      So an extreme example, but I hope completely clear. It explains the more general argument I am making.
      e.g. English'' which is just like English, except it has those additional lexical items from English' for those five words etc etc. all the way up to cases where subject relative clauses have a slightly different feature based analysis so that there are extractions allowed and so on. But without some technical detail about which subclass of MGs you are thinking about I can't generate those examples. But they *certainly* exist, and just asserting that they don't does not make progress.

    21. Very annoyingly I just lost my post - but here's the jist:

      Thanks for the clear example, Alex. I think it highlights a big difference of assumptions, (or maybe I just still haven't understood it!).

      The point of the Chomsky example is to exemplify a pattern that holds generally, and so permuting the sound-meaning pairing of the lexical items is irrelevant to the generalization. The crucial thing is the structure-meaning pairing. Even your English'' doesn't have the initial element of the sentence (pronounced "instinctively" but meaning eagles) modify the verb in the constituent that has the relative clause meaning. That's what isn't in the hypothesis space. The Chomsky examples are about kinds of structural relations correlating with kinds of meaning, so particularities of assignments of phonological structure to meanings in lexical items aren't relevant. As a side point, of course your English'' probably wouldn't be learned anyway, as it would generate a slew of examples with wrong meanings "three instinctively fly" = "three eagles fly" etc, ramifying if we apply the strategy across countless examples ("Quietly the jaguar that crawled through the jungle stalked its prey" etc), so it would be descriptively inadequate.

      On the more general point, I think it's an amazing finding that Joshi's conjecture about the mild context sensitivity of human language looks so right. Presumably this is because of the nature of the system (not allowing certain kinds of cross-serial dependencies etc) rather than because of some extensional fact about generable stringsets. However, of course FSL aren't even a good model for human languages, as there are many possible FSLs that are nothing like human languages. So as well as saying that the generative power of our computational systems has to be restricted, we also need to say that the systems have yet further restrictions on them. Whether those restrictions come from general constraints on how we learn things, from parsing or production limitations, memory, functional pressures, etc, or from further constraints on the computational system itself is an open question. Generative syntax explores the idea that some of those further constraints are part of the system. For the case in point, the idea would be that subjects are impenetrable to syntactic dependencies. We'd want this to come out of the theoretical system, and the formalization of that system should capture that. Maybe that hunch is wrong, and the constraint on dependencies into subjects is to do with parsing or whatever, but then what we need to do is to delineate the postions and figure out what is right. So the recent debate between Hofmeister Sag etc and between Philips Sprouse etc on islands is exactly of this sort and was quite enlightening as to the possible strengths and weaknesses of the positions, I thought.

    22. Thx to David for making the point I was about to try and make, no doubt less well. The Generative Enterprise, as I understand it, has been to find invariant grammatical patterns, e.g. classes of licit and illicit dependencies. the data we use are sound meaning pairs as humans cannot, it seems, perceive these patterns simply qua patterns; they need lexical incarnation. THe example you give, as David observes, does not challenge the relevant pattern for permuting the meanings does not end up with an illicit dependency.

      I would like to add one more point. I also found your example useful for it seems to indicate that we are interested in very different things. I am interested in the class of licit dependencies and how FL can be inferred by studying these. I am not sure what you are interested in. On reading the example it made me think of the following "physics" discussion: Newton's laws do not explain Kepler's data because it relies on specific masses for the planets and the sun. But were the sun the mass of the earth then the planets would not revolve around it elliptically and so we Newton's laws cannot explain the observed pattern. This argument seems very unconvincing to me both in the physics and the linguistics domains. But it seems to be the argument you are making.

    23. I hesitated before posting that example because a trivial example illustrating a point often makes people think the point is trivial. I picked it because it rests on a technical claim that I could make convincingly in 5 minutes. So here is a more controversial technical claim:
      let English+ be English with additionally the single incorrect pairing (s,w2). English+ can be generated by an MCFG; ergo it can be generated by an MG. English++ is English but additionally with the fronted adverbs out of subject relatives.; again generable by an MG. (MG means Stabler's Minimalist grammars with shortest move constraint). So I think these claims are correct, and if not could someone technical chime in and correct me.

      So Norbert is right that the grammars will look strange. Very strange indeed if you actually convert them from an MCFG. But they are allowed by this class of grammars, which in a sense defines the notion of licit grammatical dependencies in the theory. So Norbert wants to say, oh well if my theory makes the wrong predictions then it has been formalized incorrectly, and when it is formalized correctly it will make the right predictions, Period. But while this is certainly a ballsy argument, it's not really playing the game.

      David makes the crucial step, which is the right way to deal with it, which is to raise a learnability argument. Namely, these grammars are so strange that they can't be learned. I agree that that is where the locus of explanation should be -- in the learning process, and that is what I am interested in, as I don't think restrictions on UG can actually be made precise enough to explain this on their own. Without, that is, going the full P & P, finite class of grammars route; and we all know how that panned out.
      So my assumption at the moment is that UG is something like MGs (something a bit beyond Joshi's class as I think one needs a copying operation for e.g. Kayardild case-stacking), but that that doesn't really explain examples 1 or 2, and so we need a learning algorithm to do a lot of the explanatory work. Which is what I work on. So to answer Norbert's question, I am interested in FL, sure; but I don't think that specifying some arbitrary UG is the right way to look at it. I think targeting the LAD itself directly is the better way. But that is a methodological assumption really rather than anything stronger.

    24. Thanks for all the elaborations, Alex, that's really very helpful and interesting.

      So on your view, the learning algorithm you take to be a crucial part of the explanation as to why certain "logically possible" configurations are absent from natural languages (to the best of our knowledge) would fail to acquire these configurations if they were to actually occur in the input, or it would be happy with them but that's no problem because, well, they never occur in the input.

      The first case _seems_ to make identifying the learning algorithm just as hard as you point out it is to identify the correct and "minimal" hypothesis space, i.e. you also seem to need to ensure that it'll fail on "unnatural" input. (I'm emphasizing "seems", I really don't understand enough about the kinds of learning algorithms you're talking about, something I'll hope to change once I have time...)
      The second case leaves unaddressed the (in my opinion) important question why the _input_ looks the way it does.

      You're basic point, of course, stands undisputed (as far as I'm concerned). I actually share you're feeling that more emphasis should be given to the kind of work Ed and others are doing on properly formalizing proposals (and I think you're illustration is helpful in seeing why that kind of work _is_ important) but given the previous "merits of formalization"-discussion on this blog, I suspect there won't be a consensus in this respect any time soon...

    25. A quick point: this argument on formalization is interestingly the other way round from the previous one. Before I was arguing for formalization, because without formalization the empirical claims for UG are vacuous (argument here: and this argument does seem to indicate that it is *possible* that some of the supposed universal limitations on structure might be artifacts of the mode of analysis. (i.e. because you are looking for grammars that have P, you construct grammars that have P, even though you could well have picked a grammar that is not P, so the fact that you have grammars for 100 languages that are all P, doesn't provide any evidence that P is a universal property of the I-language).

      But now Norbert needs a formalized theory, because he needs to quantify over all unobserved grammars (and structures) to show that no grammar in his class generates English+ or English++. So I am not expecting any radical changes of heart, but here formalisation would be in the service of proving Norbert right, not proving him wrong, so that might make it more palatable....

  5. I think interesting case might be attraction errors in agreement: that is where the verb agrees with the immediately adjacent noun rather than the structural subject. We know from the work of Bock and others that these are common, so we get things like

    1. The key for the cupboards are lost

    These errors appear in corpora, in production, and people often don't notice them in online perception. Now we also know that agreement is quite fragile in language change (e.g sociolinguistic work by Ferguson and others). If the acquisition mechanism has no bias towards structure rather than linearity, one might expect that some SVO languages would have moved to a purely adjacency based agreement system, where the verb agrees with the immediately preceding noun rather than the subject. A language acquirer will have a ton of evidence consistent with this (since linearly preceding and structural subject will both be analyses of NP V input). But this has never happened. In fact, I know of no language where agreement depends in the linearly adjacent noun and children, to my knowledge, have never been reported to generalise their agreement rules in this was, although they generalize inflectional systems in other ways. It's hard to see how to explain this general fact about language in the absence of a bias towards structural rather than linear analyses. There's also the interesting work of Culbertson with Smolensky on how learners of artificial languages generalise statistically variable patterns in the input towards typologically common patterns, which again suggests an innate bias towards making particular kinds of generalisations where the evidence in the input is carefully designed to be equivocal, or biased in the other direction. So we would like some theory of what those innate biases are, and generative syntax provides one such theory (of course there are also competing accounts in functionalism that appeal to innate non-syntactic biases, although personally I think that these can never tell more than art of the story).

    1. Thre is an interesting paper on agreement attraction by Wagers, Lau and Phillips ( They show that though very robust, it is unlikely to be a grammatical effect. The cutest part of their argument is their explanation of an intriguing asymmetry in which the possibility of attraction improves unacceptability but does not degrade acceptability. It's worth a look if you are interested in these topics.

  6. This is in part a reply to Alex D.'s question and in part response to several of the comments here:

    re alternative accounts; as Alex C. points out: there is "a difference between an argument for the [1] structure dependence of syntax (SDS) and an argument for the [2] innateness of the structure dependence of syntax". You can have a view about language like say Paul Postal [one of the linguists i asked] on which accepting [1] does not commit you to accept [2]. Now given that Postal considers himself a linguist he works on language not on language acquisition [which he considers to be a job of psychologists]. So he has no account for how children acquire language. But i think his attitude [let those who are actually experts on developmental psychology work on acquisition] is commendable. "Only" working on syntax seems quite a monumental task for one person to handle. This does not mean linguists should not INFORM psychologists about the results of their work. But there is no reason for linguists to do work for which they are not trained [like speculation about biology, neurophysiology etc. etc.]

    If Norbert could overcome his allergy to names like Geoffrey Sampson and would read his work he'd knew that there are corpera besides WSJ that contain Y/N questions with subject relatives. And if he'd read the massive literature on acquisition he'd also know about proposals about how kids acquire the skill that do not require innate LAD.

    Until we have actually independent evidence from brain research [e.g. some structure in the brain that generates the I-language] all the arguments for innate language specific constraints remain circular: You assume that there ARE such constraints and hence if you find structure dependence in the language you assume it is because of the innate constraints. But, Alex C. is quite right, to break the circle you need additional evidence for innateness.

    David A. attempts to use the persuasive, but misleading argument from [perceived] uniformity of languages to draw the desired conclusion: there is uniformity in languages, there is [genetic] uniformity among humans - hence the uniformity of languages is explained by uniformity of innate biases shared by all humans. And he gives some evidence from a few languages [I have no idea how many he surveyed] and reveals: "I know of no language where agreement depends in the linearly adjacent noun". It continues to amaze me how Chomskyans can draw these grand conclusions based on the knowledge of a few languages [which if one trusts Postal is massively incomplete even for English at the moment]. May I remind you that seemingly in 2002 Hauser et al. knew of no language that lacked recursion and declared recursion as a language universal, possibly the only component of FLN. Just three years later the same authors claimed that there might be languages that lack recursion [a claim repeated several times by Chomsky in print]. So how do you know that not 10 years from now someone discovers the kind of language you do not know of? [Ironically your argument would be stronger WITHOUT the 'no language' claim - English seems structure dependence so psychologists need to account for how kids learn it regardless of what other languages may or may not have]

    Let me close by reminding you of something Chomsky has said recently. Far from what has been suggested about me, i do not think that everything in Science of Language [or even worse what Chomsky says] is 'crap'. Some of his remarks are certainly worth listening to:

    “... the fact that [something is] the first thing that comes to mind doesn’t make it true.... It is not necessarily wrong, but most first guesses are. Take a look at the history of the advanced sciences. No matter how well established they are, they almost always turned out to be wrong” (Chomsky, 2012, 38).

    1. Wow, you really hate these 'Chomskyans', don't you! Each one of your comments contains more information about the horrible things they do.

      To me it seems that drawing 'grand conclusions' is not a bad thing to do. It makes it relatively easy to identify where something is wrong with the theory, because somebody can come up with an example why the grand conclusion is not justified.

      Maybe I should also become a 'Chomskyan'! It would of course be a pity that you would start putting your scorn on me as well, but otherwise it sounds like it might be fun. Do you happen to know what I should do to join the club?

    2. @Marc: She is not a practising linguist; not a psycholinguist, and nor is she an acquisitionist. The best way to respect her statement (1) is to disregard what she says otherwise completely. If I understand the argument properly, then she is indirectly claiming that she is in the wrong to discuss the stuff that she has been on this blog and in other reviews. No one need argue against this suggestion.

      (1) "But there is no reason for linguists to do work for which they are not trained [like speculation about biology, neurophysiology etc. etc.] (Behme 2013)

    3. @Marc: since when is criticizing the view of a person the same as hating that person? Is it not part of scientific discourse to criticize each other's view and hope that in the process one arrives at a better understanding? Do you believe my criticism expresses hatred because it is factually wrong? Then i would like specific examples of what is wrong and why i am not merely mistaken but actually hate "Chomskyans"

      May I also ask: if you believe my comments indicate that i hate "Chomskyans" [as opposed to the people i actually have criticized specific views/comments/proposals of] maybe you can enlighten me what we ought to call Chomsky's attitude towards named individuals, unnamed individuals and entire fields expressed in so many of his works? In case you have not already look at a few examples i list here: - Does Chomsky hate Dawkins, Everett, Dummett, Lassiter, Papinaeu, Elman and legions of unnamed opponents?

      Based on this PUBLISHED article:; does Chomsky hate Margaret Boden? Does Norbert hate me based on his comments he made about one of my publications:
      Do Legate et al. hate Stephen Levionson because they wrote the paper i comment on here: - [which Norbert called gleefully evisceration - a term no one [including you] objected to]? Do the people i mention in the beginning of this review all hate Dan Everett?

      I do not know you and i certainly do not hate you. But i find it quite sad that you seem to have such a biased view and seem to base your evaluation of a person's motive on who is criticized not on what actually is said. You obviously find my criticisms objectionable but have shown no similar sensibilities to Norbert calling my review 'such junk' here:

      You never objected to Norbert's 'evaluation' of a paper he did not like:"This paper is a must read. Until I got through it I thought that the art of the academic lampoon was dead. FBC have proved me wrong. There are levels of silliness, stupidity and obtuseness left to plumb. [] So let me ask you: why are you so biased against me?

      @karthik durvasula: Even though i do not know what an acquisitionist is, you are right about the first two. However, i am a philosopher and [as Norbert can confirm] our job is to ask questions - which i have done here for the most part. You can of course feel entirely free to ignore what I say [including reference to work by people who are experts in fields i am not].

    4. "hater" is clearly a correct description. When I google you, all I find, besides Faculty of Language and LingBuzz posting is the following kind of thing (and there is lots more, this is just a sample) -

      Do you do anything else at all but rubbish Chomsky and advertise your papers (that also rubbish Chomsky). Do you write anything constructive?

    5. You googled me - how flattering. Instead of saying that my papers are 'rubbishing Chomsky' why don't you tell me specifically where I went wrong? I am certainly willing to correct mistakes.

      And, please, tell me: is Chomsky a hater of Elman? I am sure you agree with me that Chomsky is smart enough to KNOW that his description of Elman's work is entirely incorrect. So why does he continue the mischaracterization? It can hardly be called 'constructive'.

      Is Chomsky a hater of Everett? Based on what evidence did he call him charlatan in a Brazilian newspaper? Everett may be wrong about Piraha but does this justify calling him a charlatan in print? Is it constructive?

      Given that you seemingly accept Chomsky's relentless rhetoric against those whose only crime is to disagree with him, it is rather odd that you would call me a hater. But given that English is not my native language, maybe 'hater' does not mean what I think it does? If by hater you mean someone who criticizes that which is wrong then I am guilty as charged and, inter alia, Chomsky would not be a hater.

      Now, maybe you would like to show me how to do something constructive. One week ago i asked a question regarding a claim Robert Berwick made. So far none of the excellent linguists who have contributed thoughts since has answered what seemed to me like a very simple question by Paul Postal. Maybe you can be so good and answer his question?

      I asked Paul Postal about the claim that "nothing about linear order matters (only hierarchical structure, what Noam called in Geneva the ‘Basic Property,’ matters)" and copy below part of what he replied. So maybe Robert can answer the question at the end?

      There is no doubt that in some cases A B is good and B A bad. Compare:

      (1) That remark was sufficiently stupid to…
      (2) *That remark was stupid sufficiently to...
      (3) *That remark was enough stupid to..
      (4) That remark was stupid enough to..

      Evidently, there is something which differentiates the adverbs 'enough' and 'sufficiently' and which shows up as contrasting word order requirements with respect to the adjective they modify. In what sense does the word order not matter, only hierarchy?

  7. Hi Christina. Re your question:it's an issue of analysis. Kayne has a proposal for capturing these kinds of order differences in English vs French (where assez comes before rather than after the adjective) which he ties down to a hierarchical property which secondarily entails a linear order. the idea is that there is a feature associated with the degree word (enough vs assez) differentially causing movement of the adjective to a position where it is hierarchically superior. That then causes it to be pronounced first. The absence of this feature means that the movement doesn't take place. He then uses this basic principle to capture quite a wide range of differences and fairly subtle effects in order both within English and between French and English connected to this kind of phenomenon. It's worked out over a number of papers but the analysis is sketched in his Parameters and Principles of pronunciation on Lingbuzz. So Paul's counterexample really depends on whether the right analysis of this pattern is one that appeals to linear order or to hierarchy, so that kind of example doesn't really help with the issue. That's why I tried the attraction error argument. You may be right that there's an SV language that conditions agreement linearly, but in my 25 or so years of obsessively reading syntax papers and grammars I've never come across any research reporting one and the acquisition literature doesn't report that this is the kind of overgeneralisation that kids do, which is, I think, at least food for thought.

    1. David, thank you for your comment. Paul's comment was not meant as counter example [it ended with a question not a claim]. And I have no doubts that within the minimalist framework your analysis is correct. But in other frameworks different analyses are possible. I am a bit busy at the moment but will post an alternative to Norbert's (2) some time on the weekend.

      Regarding the possibility of a SV language that conditions agreement linearly; again my claim was not that there is such a language, or that i think one might be discovered any moment now. I just think in light of the history of GG it is preferable to stay away from categorical claims that can be falsified by a single counter example. Why take that risk, when for your purpose it is quite sufficient to say that English is a language that does not condition agreement linearly in all cases. Kids learn English so we have to figure out how they do that. [The same concern was expressed regarding Robert's claim that word order does not matter ONLY hierarchy. Had he said word order alone cannot explain certain phenomena, no one would object - certainly FCB did not object to that even though Norbert seems to imply they did]

      Maybe some amusement for the weekend: across the pond newspapers make silly claims about machine translation as well:

  8. As promised, below is the alternative to Norbert's analysis of the sentence
    (2) Instinctively eagles that fly swim.

    One of my linguist-friends proposes a scenario that involves no reference to phrase structure at all, using a term-labelled category logic built on top of the Lambek calculus which infers strings from strings, with no reference to any prior proof steps. This is intended simply to show that there is not necessity about recourse to structure. My friend also gave me the outline of the formal proof but sadly this does not fit into this format and it seems unless one is Norbert one cannot add attachments here. But I am confident someone like Alex Clark can explain that part if people are interested [or answer other questions that may arise, I am the messenger, NOT the expert].


    Let's assume that 'instinctively' has an event (type)-variable argument. It identifies certain kinds of events as arising from wired-in neural programs. 'Instinctively, eagles that fly swim' then corresponds to a paraphrase: the swimming behavior of the subset of eagles that fly reflects a neurologically built-in, not a learned, capability'.

    'Eagles that fly', however, does not denote an event (type), but rather a subset of the set 'eagle'. So there is a type mismatch between 'instinctively' and 'eagles that fly'. If we build up meanings compositionally, then by the time we get to the point where 'instinctively has something to apply to as a functor category, all we have is an event-class, in which a subset of eagles habitually carry out a particular activity. No configurational structure necessary.

    This scenario can be modeled as a proof in a term-labelled Lambek system with event semantics. The output string string will be provable *just* in case the substring corresponding to the event-type eagles-that-eat swim' is available to the predicate whose argument type is a set of events. Which means that the intersection of the set eagle and fly has already been constructed and incorporated into the predication as an argument, of whatever type you use to model kinds. To get the string, that is, you have to have intersected the two sets to get the denotation of 'eagles that fly'. There is no way that 'instinctively' or whatever can get at the the event variable of a predicate corresponding to one of the intersected sets; it is simply not available as a proof term at the point when 'instinctively' takes its argument, because you had to form the set intersection corresponding to the kind-object argument of 'eagles that fly'. To get the sentence in question, you have to get a predicate on event types taking as its argument the category whose denotation is a swim-event type, or you don't get the string. No reference to internal structure is necessary.

    Again: the final step in the proof supplies 'instinctively' with the argument λe swim(e, eagle ∩ fly). The event variable associated with 'fly' is inaccessible within the intersection that yields the subset of eagles such that each eagle flies.

  9. Maybe I am missing something but I am not sure I would interpret this as an alternative to the structural explanation. Compositionality is not a purely semantic notion, but refers to the syntax-semantics mapping. So although you can formulate the interpretaion of (2) (and the absent interpretation of (2)) in semantic terms, that does not mean that there is no implied reference to structure. The event variable of 'fly' cannot be picked up by 'instinctively' because 'fly' is interpreted as modifying 'eagles' before 'instinctively' is interpreted. The crucial word here is 'before'. That implies hierarchy, and the semantic type mismatch is a consequence of it.

  10. @Olaf. thanks for your interest. I paste below the reply of my friend and hope this answers your question. I also recommend that you ask Norbert [or Robert Berwick] for an equally detailed analysis of the internal structure of the example on the minimalist view and for suggestions how this structure is implemented in human brains [in other words, ask them to put their cards on the table as well]

    This comment reflects a serious misunderstanding of the structure of logical proof, unfortunately a common misconception among people who do not actually work in the proof theory of formal logic. The 'order' of the proof is irrelevant to the fact that the conclusions are a theorem of the assumptions. The fact that p --> q and p entails q \/ r is simply a consequence of the way in which the proof steps verify the correctness of the sequent p --> q |- q \/ r. 'Before' and after refer to the way one verifies theoremhood. There are typically many ways to prove the same theorem in logic, all of them equally valid, though one can reify proofs as structured object and define normal forms and construct metatheorems, as in the work of Gabbay, Girard, and the study of proof nets; but that has nothing to do with the status of the theorems of first order logic themselves. And a proof in category logic, such as the one that is informally summarized in the previous note, works exactly like a natural deduction proof in propositional logic; \ and / are implication operators, and just as in standard proofs of theorems in (non)classical logics, the order of inference is simply a demonstration that the rules of the logic, which have themselves no order with respect to each other, guarantee and verify the entailment corresponding to the theorem. The writer seems to believe that if I have

    p --> q |- p --> q, p |- p
    -------------------------------- imp elim
    p --> q, p |- q
    -------------------------------- \/ intro
    p --> q, p |- q \/ r
    -------------------------------- imp intro
    p --> q |- p --> (q \/ r)

    then the theorem p ---> q |- p --> (q \/ r) depends on implication elimination applying 'before' implication introduction, or the inference p --> q, p |- q 'existing before' p --> q |- p --> (q \/ r). He needs to understand that that belief is incoherent: a logical proof applies inference rules in an order in any particular proof, but the order of the application of the rules has nothing to do with the theoremhood of the entailment relation between p and q when p |- q.

    What is established in the category logical proof sketched is that the logical structure of 'eagles that fly swim' is not an object that can be an argument of the functor 'instinctively'. It is the wrong type. And the model theoretic proof corresponding to the logical proof will show the same thing in terms of the algebraic interpretation of the proof terms. And since it is the identity of the category types of the lexical items involved which count as the axioms in the proof, and since the outcome determines that you wind up with a category S proof term just in case the resulting string has the interpretation it has, the structure of the string depends purely on the categories types of its words and the corresponding interpretations in a familiar model theory. Again, no structural assumptions are involved.

  11. Thanks Christina. I think that the proposal sketched slightly misses the point of the example, and within minimalism you'd say something more or less identical about the derivation of the right interpretation. The crucial point is to rule out a derivation, proof or representation, which gives you the wrong meaning. The proposal sketched makes the assumption that the position of instinctively in the string is a consequence of its lexical properties (that it is a predicate or modifier if event types) and that seems vey reasonable. But we also need to allow the system to relax that requirement to deal with examples like

    1. 'how instinctively do scientists think that eagles (that fly) swim'.

    Here we need 'how instinctively' to still be a modifier of events, and to get the meaning where instinctively modifies swim, which is one of the meanings available, it needs to modify the swimming events here. To capture effects like this in a categorial system we need a technology that allows a long distance dependency (eg modal operators in the proof system, functional composition etc). So the crucial question is why we can't use whatever technology allows 1 to derive the Chomsky example with the bad meaning.

    It is possible to do syntax that doesn't generate constituent structures at all, in a categorial system or in a dependency grammar system. These systems though, just like structure based systems, will have to have some constraint in them which makes the bad meaning impossible for that string and my guess is that all systems are pretty similar and in agreement about this. Structure dependence is just a name for this phenomenon: that syntactic rules need to go beyond linear contiguity in the establishment of dependency (so the 'structure' can be constituent structure, or rich lexical structure, or dependency structure etc). The point of Chomsky's example, as Norbert stressed, is that linear contiguity via n-grams really won't work: you need something richer than that. I'm pretty sure most categorial grammarians would agree with that for this example, but do check with your consultant on this as I'd be interested in hearing what she/he has to say about that.

  12. This comment has been removed by the author.

  13. Maybe a clearer example that also satisfies Norbert's criteria for the cutest possible demonstration of structure dependence --

    "This example has several pleasant features. First, there is no string linear difference to piggyback on as there was in Chomsky’s Y/N question example in (1). There is only one string under discussion, albeit with only one of two “possible” interpretations. Moreover, the fact that instinctively can only modify the matrix predicate has nothing to do with delivering a true or even sensible interpretation. In fact, what’s clear is that the is fronting facts above and the adverb modification facts are exactly the same. Just as there is no possible Aux movement from the relative clause there is no possible modification of the predicate within the relative clause by a sentence initial adverb. Thus, whatever is going on in the classical examples has nothing to do with differences in their string properties [..]"

    -- might be an example that involves displacement, like the original Y/N question, while still having the other pleasant properties that Norbert lists. This way we are not discussing combinatory properties of lexical items but properties derived by whatever rule you posit in your favorite theory of grammar that licenses filler-gap structures, as in Chomsky's original example. These are not hard to construct either. Consider, for example, the fact that English "behave" without an overt quality-adverb means "behave well", and "speak Swedish" without an overt quality-adverb means "speak Swedish competently", and now consider:

    "Our study asked ...

    a. ... how well kids who behave at home behave at school."
    b. ... how well people who speak Swedish speak Danish."

    You could imagine two possible interpretations of such sentences:

    a'. "the correlation between behaving well at home and quality of behavior at school" OR "the correlation between quality of behavior at home and behaving well at school"

    b'. "the correlation between speaking Swedish well/natively and quality of speaking Danish" OR "the correlation between quality of speaking Swedish and speaking Danish well.

    -- but only the first reading is available. Every theory will say something about the kinds of things that wh-phrases are allowed to combine with, and (unlike with "instinctively") every theory will have to permit the ability to combine a wh-phrase with X (in languages like English) to have something to do with a gap inside X. Every theory will have to allow this gap to be some distance away, since (c) and (d) allow an interpretation of "well" with the rightmost verb as well as with a nearer verb:

    c. How well do you think Mary will argue that the kids behaved on their vacation.
    d. How well do you think John will argue that the Swedes speak Danish.

    -- but there is no reading that places the gap in the relative clause in a & b. (This is a trick you cannot perform with "instinctively" in Chomsky's examples.) Yet there is no string difference between the two readings of (a) and (b), they are acceptable sentences, and both the possible and impossible readings are sensible and plausible. Furthermore (c) and (d) make it clear that it's not always the case that you have to associate "how well" with the righthand verb. Only when there are structural factors preventing an association with the closer verb (here, embedding in a relative clause). Maybe these examples might be more helpful starting points for discusson in the present context?

  14. [David Adger's comment was posted while I was busy writing mine, and it looks like our thoughts were very similar.]

  15. I am grateful to the two Davids for commenting. Since individual replies might get confusing and since as David P. pointed out the comments are very similar in their essence a reply to both:

    I am especially grateful that you defended me so effectively against the charge by Marc van Oostendorp that I am a Chomsky[an] hater [and I assure Marc the replies were not ‘staged’] He wrote: “I am not sure that I understand why you are so ironic about Chomsky and his examples. You seem to be saying that there is a group of researchers [Reali & Christiansen and others] who claim that … they cannot start tackling the kinds of examples Chomsky gives, because every time they do so, that horrible guy comes up with a new example for which they have to find a new solution”.

    Now you gave a wonderful demonstration for the point Marc was skeptical about: Norbert had provided a specific example that allegedly could not be handled on non-structural analysis; my friend provided one and viola you come up with a DIFFERENT example. So you are doing exactly what I said Chomskyans would do [based not on nastiness but 10 years of research on Chomskyan argumentation].

    My friend tells me, it is easy the handle the additional examples using any category logic with a nondirectional, intuitionistic implication operator --o such as Muskens' Lambda Grammar, de Groote's Abstract CG, Pollard's Linear Categorial Grammar or Levine and Kubota's Hybrid Type Logical Grammar with both Lambek and nondirectional implication in the proof theory. He is a bit surprised that seemingly, you are not aware that medial gaps are routinely dealt with in these frameworks and adjunct extraction ambiguities can be obtained by straightforward proofs little different in kind from one provided for the original example.

    Now before providing the proof he asks to please put your best foot forward now. Are these the examples that will convince you, or will you ask for another example and another and…? Do you have one example that will settle the debate, yes or no? If yes, please give it. If not please tell Marc van Oostendorp that he cannot fault the other side for making unfalsifiable claims [which for the record Christiansen and his co-workers never did] as long as you are not able or willing to provide something that can be falsified.

    Further, I had asked for two additional things: [1] your own analysis of these examples. So far all you have said is that they indicate internal structure. I am still waiting for the derivation. And [2] please provide a proposal how this structure is implemented in the brain. After all this is what we are really after: the human hard [or wet]ware that can churn out these sentences. I am not interested in the details as long as you have a proposal that is, at least in principle, implementable in a human brain.

    1. Aha! So you are a scholar of 'Chomskyan argumentation', which, as far as I understand does not just involve the argumentation by Chomsky, but also by 'Chomskyans'. Which is a good thing, because I guess only a specialist could see how what David A or David P said can 'defend' Christine (henceforth: you) against what I said.

      My 'charge' would be that you are attacking a group, the 'Chomskyans' that is not well-defined, and you are ascribing all kinds of behaviour to this group. Since you do not define beforehand what it takes to be a 'Chomskyan', I suppose anything that anybody does and which you do not like, can of of course justify your claim that Chomskyans are bad and evil (but you do not hate them).

      But I would like to learn more, maybe in order to become a scholar of 'Chomskyan argumentation' as well, since it seems a lot of fun. So maybe you can enlighten me first about the crucial point: how do we know that somebody is a Chomskyan. Is it enought that such a person does things which you do not like, or should they have some other property as well? Are you a Chomskyan yourself, being so obsessively involved with your field of Chomskyan argumentation, as others have observed above? Why not? Can a Chomskyan ever turn into a non-Chomskyan, and what does it take?

      I am very excited about learning all these new things!

    2. This comment has been removed by the author.

    3. Marc van Oostendorp, in case you are seriously interested in what others say about Chomskyans [even fundamentalist Chomskyans, a category I do not use] you may want to try:

      Unlike me, the author is a professional linguist, so maybe you take him seriously.

    4. Dear Christina, I have no idea why you think I take people more seriously when they are linguists. Frankly, I do not see why somebody's arguments should be weighed against their diploma.

      Your link refers to a comment of a Christina B on Pieter Seuren's weblog. I suspect that Christina B is the same as you, or at least she shares your passion for finding fault with Chomsky. Otherwise, there is only one reference to 'fundamentalist Chomskyans' by prof. Seuren. This term I understand; PS was part of a fight (some even call it a war) with Chomsky, and this is a word he uses for the people who were (and are) on Chomsky's side with this. Seuren might call himself a Chomskyan, but not a fundamentalist.

      But I really have no idea what YOU mean by that term. Really. I do not understand; and I also do not understand why you refuse to give a definition which might make me understand, even though I have asked for it a few times now. You distance yourself from the epithet 'fundamentalist', but that does not clarify much for me, almost to the contrary. Do you think that fundamentalism is inherent to Chomskyanism, maybe? What makes somebody into a Chomskyan? You are the expert on that, since you are a scholar of Chomskyan argumentation; can you please clarify this for us? I am honestly curious.

    5. Marc van Oostendorp, First apologies for the link, I thought it would link to Pieter's blog not my comment [as links to this blog do] i should have tested it before - my mistake.

      Second, I did not think your questions were serious but took them as mocking. Presumably my fault as well. I do not have a scientific definition for a Chomskyan but can offer a few paradigm examples. Norbert would be one. He says somewhere on this blog he unabashedly admires Chomsky and everything he does/write. [my memory may be inaccurate here, so apologies for that, and i do not mean it disrespectful in any way - unwavering loyalty is an admirable quality]. James McGilvray would be another example. So presumably someone who accept Chomsky's ideas because they are Chomsky's ideas. Someone who applies different standards to the work of Chomsky than to the work of others. Like say David Pesetsky: have a look at his criticism of the work of some authors who were published in 'leading journals' in his LAS address. Or look at his comment to the Evans & Levinson BBS paper. I am NOT saying this work was not deserving of criticism. But compare the quality of that work to say "The science of Language" - not a word of criticism about that. Presumably a further characteristic is disdain for those who disagree with Chomsky [Norbert calls his project a 'labour of hate'].

      I hope this answers your question [at least partly]. And, maybe in return you could answer some I have repeatedly asked; why do you think it is okay [or at least not worth mentioning] that a linguist knowingly distorts the work of another researcher with seemingly no other purpose than making the other person look incompetent. Why do you think it is okay that Norbert argues in great detail against falsifiability
      [here: and in some subsequent posts] but at the same time think an unfalsifiable claim by Reali and Christiansen would be a bad thing? Do you not think the same standards should apply to everyone? Why do you not object to Norbert calling my review of 'Of Minds and Language' a piece of junk without offering one single example of what i did wrong? Maybe you can start with that, have a look of a version of the review below and let me know why it is junk:

  16. Hi Christina. Not sure about the `coming up with a new example' issue. My extra example was just to make the point that there were long distance dependencies involving adverbs like instinctively, so it's not a new example for analysis. I'm happy to let the original stand. I guess David was trying to clarify by his example, but I'll leave that aside (although I think David's example is actually sharper than Chosmky's.)

    On my own post: The crucial thing is to *stop* the system generating the `wrong' meaning for the example. Of course I know that there are numerous ways to get long distance dependencies in categorial grammars (I mentioned two, so I have no idea why your friend was surprised - you did show him the post, right). So given that, the question is how to stop that technology from being used to get the wrong meaning for the example. [this is a point well understood in lambek style categorial grammars, see for example Carpenter's 1997 book, page 7]. Of course any reasonable theory will have a way to do this, but that way will have to appeal to the fact that that technology is not available, for whatever reason, when creating a dependency into a relative clause. That is therefore a structure based reason not a reason based on linearly contiguous sequences of words. As I already said: this doesn't mean constituent structure necessarily, it could be the interaction between inference rules and a rich lexical structure, where the lexical structure encodes the combinatoric possibilities. Everyone knows that we can build systems that don't create constituent structures, but all systems need to appeal to structured information to capture Chomsky's example.

    Ok, now the question of the derivation of Chomsky's example sentence. Let's adopt a system with Merge (both external and internal) and the following lexical items, using Stabler's minimalist grammar notation:

    fly::V, swim::V
    v:: =V=N v
    that::=v +wh C
    O::N -wh
    instinctively::=v Adv -Top
    Top::=C +Top
    C::=v C

    We can then build a derivation that looks as follows:

    1. Merge (v, swim) satisfying =V
    2. Merge (0, 1) satisfing =N
    3. Merge (that, 2) satisfying =v
    4. Merge (0, 3) (internal) satisfying -wh on 0

    this gives us `that swim'

    5. Merge (eagles, 4)
    6. Merge (v, fly) satisfying =V
    7. Merge (5, 6)
    8. Merge (instinctively, 7)
    9. Merge (C, 8)
    10. Merge (Top, 9)
    11. Merge (instinctively, 10) satisfying the -Top feature on instinctively

    This gives us: Instinctively eagles that swim fly, with the interpretation that instinctively modifies fly, since it was Merged with fly, and it's semantics is, just as in your friend's proposal, or that of, say, Higginbotham 1985, that it modifies an event.

  17. continued ...

    Why can't we have the same derivation but instead Merge instinctively with swim, and then internally Merge it with Top, which would give us the same string but a structure where instinctively would modify swim not fly (incorrectly, given the meaning). This is because Merge is subject to a specifier impenetrability constraint, formalized as a condition on applicability of structure building (see, for example, Kobele and Michaelis 2011 `Disentangling notions of specifier impenetrability" in Kanazawa et al "The mathematics of language" Berlin, Springer). This is essentially what Norbert appealed to in his post, and is, of course, just the `subject island' condition from Chomsky 1973. With this constraint in place, if we Merge instinctively with the vP in the relative clause, then we can never Internally Merge it to the Top element in the matrix clause, as that is not a legitimate internal Merge operation. My guess is that whatever system your friend is using it will have something that does this job too. Alternatively, one could allow the system to generate this meaning and fold the constraint into the processing of the structure as Kluender has suggested.

    Stabler and others have shown that the addition of this constraint to Minimalist Grammars doesn't affect the expressive capacity of these grammars in terms of kinds of languages they can buid, but it does affect the possible linkages of structure with meaning, as in the case under discussion.

    So that was basically Chomsky's analysis of this phenomenon, somewhat fleshed out with implementational detail. Now how is the implemented in the brain. Well, Stabler provides in his 2012 Topics in Cognitive Science paper a rather elegant way of automatically translating a Minimalist analysis of this sort into a parsing strategy which is incremental and sorts parser predictions based on the grammar into linear order at each parsing step. See section 2 of his paper for the details. So that would be a proposal for how to implement this kind of grammar in the brain. What's quite interesting is that implementing the specifier impenetrability condition actually allows the parser to have to not carry so much in its memory, so there's a nice transparency between the shape of the grammatical constraint and a way that the parser can reduce memory load. Note that we can also make this parser statistically sensitive, while maintaining that the system of knowledge that it uses is non-stochastic, which is another interesting property, since we all agree that there are clear statistical factors in the use of language that need to be embedded into our systems (while of course there is controversy about whether there are probabilistic effects in our knowledge of language)

    So this is how a neurophysiological system could be an implementation of a minimalist grammar with Merge (both internal and external) and feature checking driving the construction of a derivation.

    My own feeling is that all of this is really quite exciting. I have no doubt that we can try to understand the kind of phenomena pointed out by Chomsky's example in other ways than this specific analysis, and all of these ways are surely likely to be helpful in either taking us forward in particular directions or ruling out others.

    1. Thanks David for working through the example; it helps to understand the argument.
      As it happens I am working with Greg Kobele this week, and we discussed this very issue last night.

      So if you have a hypothesis class of grammars H, then we know that there are grammars that generate the wrong pair (s,w2), BUT, if we fix the lexical items for the words in the language then this is no longer possible. Indeed since we have fixed the lexicon there is now only one grammar G -- MGs are a lexicalized formalism, so the grammar is all in the features of the lexicon. (apart from a few details).

      So what you have shown is that there is a grammar G in H which generates the right reading and doesn't generate the wrong reading. Which is of course good (and it does it in a natural and compact way, which is even better).

      However the class H also contains grammars which generate the right reading and the wrong reading. These grammars will have different and perhaps unnatural lexicons.
      So what you need to show is that there are *no* grammars in the class that generate the right readings for English sentences and the wrong reading. And you can't do that because there are such grammars - the trivial example I gave above and many less trivial ones.

      So the constraint on the grammar doesn't on its own explain anything:
      all of the work is of course being done by the lexicon or rather the mechanism that learns the lexicon.
      And that is the right answer in my view: the "empiricist" answer. There is a large class H that includes both the right and wrong answers, both structure dependent and structure independent answers, and the right one is learned using, perhaps, a simplicity based learning mechanism.

      It would be really interesting if you could define a subclass of MGs or of MG lexica that excluded the bad grammars. But I think that would be hard/impossible to do.

      Now I assume that you and David P and Norbert H probably don't think this is the right analysis, or at the very least disagree with my deliberately tendentious use of the word "empiricist", so what is the alternative explanation? Are there for example innate constraints on the lexicon as well, or some additional constraints on the class of grammars? Or are you happy with it being a learnability answer, and therefore not "innate"?

    2. I feel like there is some important unexpressed assumption that you guys are making that I am not that accounts for our different perspective: I am worried I am missing the point. Maybe you are assuming for example that there is some fixed innate set of features F, and we are only considering MGs that use F? Or something along those lines?

    3. For what it's worth, I was thinking along those lines (although perhaps only in hindsight, so thanks for bringing this up). And isn't that basically what the idea of substantive universals (as opposed to "merely" formal ones) is all about?

  18. Hey Alex, I responded on the `Formalization' post (or at least tried to) attempting to articulate where the difference in assumptions might lie (it's really the second of the two posts that's relevant). Interesting topic this. makes me think.

    Christina, was wondering if you had any thoughts about the derivation and suggested mode of implementation you asked for and I provided.

    1. David, apologies for not replying. I appreciate you taking the time to spell out the derivation & mode of implementation. A few questions remain and i'll ask them at the weekend [dead-lines loom large right now]

    2. know the feeling! Semester is starting and my Deanly duties call. I may have to go back to just lurking for a bit, unfortunately.

    3. David, thanks again for the detailed earlier reply. If you still have the time maybe you can look at the below [or maybe someone else can]. You gave me an analysis for [2]

      [2] Instinctively eagles that fly swim

      I had been wondering how on your analysis [3] is handled:

      [3] instinctively eagles that fly swim and dive avoid drowning

      It would seem that instinctively modifies 'swim' in [2] but 'avoid drowning' in [3]. If 'merge' works exclusively bottom up, should it not commit itself to [2] interpretation once it encounters 'fly' in [3]? Now my friend tells me, in general a left-to-right parser can handle coordinate structures. It would have a backup function to pursue alternative paths when some chosen path fails (also likely a limited lookahead function to help cut down on false paths). So even if it first committed to building a relative clause ending with "fly" and wrongly guessed "swim" was in the higher clause, when it came upon "and", it would back up and pursue a path with "swim" in the relative clause and eventually build the coordinate structure within the relative clause and then determine "avoid .." to be the main predicate etc.

      So my first question is if your model works roughly like that?

      Assuming it does there seem to be several problems. First, it seems not very plausible we process language like this. Even [3] is still very ‘cute’ when it comes to real life conversation. So having to keep all the alternatives in ‘working memory’ seems not the way to go for example. So what is the minimalist solution for this problem?
      Next, I see how the analysis you provided can work [even for [3]] when you already know the target sentence. But that is the issue in acquisition - kids don't. So how would the parser "know" ahead of time 'and' requires back up? That 'avoid' can be a target after swim ‘failed’ etc. etc. Do you [pl] claim that the parser is a universal/innate device with backup and perhaps lookahead built in? Or is some skeletal parser innate but there's some sort of co-learning going on?
      Further, not everything that works for English does work for other languages. So presumably there must be alternatives for that too? Now of course I worry: is all of this innate [specified by the genome in some way or other]? If not what IS innate?

  19. Hi Christina,

    On your example (3), the syntactic evidence is that the constituency looks as follows

    (3') Instinctively [ [ eagles [ that fly swim and dive ] avoid drowning ]

    Because, for example, you can replace [eagles that fly swim and dive] with `they' but not [eagles that fly] with they:

    (3'') Instinctively [ [ they] avoid drowning ]

    (3') * Instinctively [ [ they swim and dive] avoid drowning ]

    So the specifications that the various items have will ensure this constituency. The bottom-upness of Merge isn't relevant here (see the Stabler paper I mentioned for an implementation of the minimalist grammar I sketched as a top-down incremental beam parser).

    On the issue of linear parsing, involving backtracking, that's an empirical question and there's a lot of debate about the mechanism. So if you adopt the backtracking model you sketched, where you do keep all the alternatives in working memory, that gives us a nice explanation of garden path effects. However, we know since Crain and Steedman 1985, that these effects are somewhat vulnerable to discourse context, word frequency etc. Interestingly, the Stabler beam parser can attach probabilities of transition to the lexical items making certain transitions more or less likely (details in that paper), implementing a weakly interactive model of parser-discourse modularity (although the actual grammar itself stays resolutely categorical). That would allow us to model effectively both the psychological fact that people baulk at garden pathe sentences and the fact that they are vulnerable to discourse/frequency. So that would be a plausible minimalist solution for that problem.

    On acquisition, the parser will just follow whatever properties of these words have already been learned by the kid, as the parser is essentially a direct implementation of the grammar. So if those words have already been acquired with the correct properties, there wouldn't be an issue. If they have only been partially acquired, or acquired with the `wrong' properties, you'd expect errors in production and parsing, which we know happen when the properties of words aren't fully acquired.

    What is innate is just the operations of the system (Merge, Check features) and constraints on the possible representations of lexical items (what features are allowed, constraints on combining features in a lexical item, constraints on scope etc). These provide a sort of skeleton that's fleshed out as the child learns what the particular properties of her/his language are from hearing the data around them that make, for example, Chinese grammatically different from say Warlpiri. That's the idea behind the so called Borer-Chomsky Conjecture: there are indeed properties of linguistic elements that are learned, but these are confined to properties of grammatical lexical items (like markers for definiteness, mood, tense, voice, aspect, modality etc). The interpretation of the grammar as a parser is also innate, but perhaps not solely linguistic.

    Hope that's useful.

    1. Thank you, David, that is useful indeed. From the first part i gather that your account may differ in some details from other accounts but you have at the moment nothing that is light-years ahead of the competition [as Norbert wants us to believe]. Would that be a fair point?

      You certainly also know that what you call garden path effect is rather common in my native language. Just for entertainment I post a link to Mark Twain's frustrated rant about German
      Having an innate mechanism account for our ability to acquire with ease constructions your great poet found so perplexing, is of course cool - so i am glad to learn that the Stabler beam parser may be able to handle such 'monuments' - has t ever been tested on one?

      Regarding lexical items your account seems different from Chomsky's, is that correct? Chomsky argues that a 'massive poverty of stimulus' supports the conclusion that the lexicon [words in your story] is innate and kids just learn to attach labels [say 'river' or Fluss'] to innate concepts. He claims the concepts are uniform among all humans [someone in New Guinea has the same concept of 'river' as you do], and could not possibly have been learned from the input.

      Now you say: "the parser will just follow whatever properties of these words have already been learned by the kid, as the parser is essentially a direct implementation of the grammar. So if those words have already been acquired with the correct properties, there wouldn't be an issue. If they have only been partially acquired, or acquired with the `wrong' properties..."

      So there is a fair deal of learning going on in your view, it's not just activating an innate concept that was already 'complete' [for lack of a better word] but on your view it's more skeletal?

      Now the most interesting part of your answer was what you say about innate:

      innate is just the operations of the system (Merge, Check features) and constraints on the possible representations of lexical items (what features are allowed, constraints on combining features in a lexical item, constraints on scope etc). .... The interpretation of the grammar as a parser is also innate, but perhaps not solely linguistic.

      So it seems on your view there is quite a bit innate besides Merge. Now it would seem rather unlikely that everything you list came into existence by one mutation and has remained constant over at least 50,000 - 100,000 years. Do you agree with that or do you think the single mutation view is plausible?