Monday, September 12, 2016

The Generative Death March, part 2.

I’m sitting here in my rocking chair, half dozing (it’s hard for me to stay awake these days) and I come across this passage from the Scientific American piece by Ibbotson and Tomasello (henceforth IT):

“And so the linking problem—which should be the central problem in applying universal grammar to language learning—has never been solved or even seriously confronted.”

Now I’m awake. To their credit, IT correctly identifies the central problem for generative approaches to language acquisition. The problem is this: if the innate structures that shape the ways languages can and cannot vary are highly abstract, then it stands to reason that it is hard to identify them in the sentences that serve as the input to language learners. Sentences are merely the products of the abstract recursive function that defines them, so how can one use the products to identify the function? As Steve Pinker noted in 1989 “syntactic representations are odorless, colorless and tasteless.” Abstractness comes with a cost and so we are obliged to say how the concrete relates to the abstract in a way that is transparent to learners.

And IT correctly notes that Pinker, in his beautifully argued 1984 book Language Learnability and Language Development, proposed one kind of solution to this problem. Pinker’s idea was based on the idea that there are systematic correspondences between syntactic representations and semantic representations. So, if learners could identify the meaning of an expression from the context of its use, then they could use these correspondences to infer the syntactic reprentations. But, of course, such inferences would only be possible if the syntax-semantics correspondences were antecedently known. So, for example, if a learner knew innately that objects were labeled by Noun Phrases, then hearing an expression (e.g., “the cat”) used to label an object (CAT) would license the inference that that expression was a Noun Phrase. The learner could then try to determine which part of that expression was the determiner and which part the noun. Moreover, having identified the formal properties of NPs, certain other inferences would be licensed for free. For example, it is possible to extract a wh-phrase out of the sentential complement of a verb, but not out of the sentential complement of a noun:

(1) a. Who did you [VP claim [S that Bill saw __]]?
b.   * Who did you make [NP the claim [S that Bill saw __]]?

Again, if human children knew this property of extraction rules innately, then there would be no need to “figure out” (i.e., by general rules of categorization, analogy, etc) that such extractions were impossible. Instead, it would follow simply from identifying the formal properties that identified an expression as an NP, which would be possible given the innate correspondences between semantics and syntax. This is what I would call a very good idea. 

Now, IT seems to think that Pinker’s project is widely considered to have failed [1]. I’m not sure that is the case. It certainly took some bruises when Lila Gleitman and colleagues showed that in many cases, even adults can’t tell from a context what other people are likely to be talking about. And without that semantic seed, even a learner armed with Pinker’s innate correspondence rules wouldn’t be able to grow a grammar. But then again, maybe there are a few “epiphany contexts” where learners do know what the sentence is about and they use these to break into the grammar, as Lila Gleitman and John Trueswell have suggested in more recent work. But the correctness of Pinker’s proposals is not my main concern here. Rather, what concerns me is the 2nd part of the quotation above, the part that says the linking problem has not been seriously confronted since Pinker’s alleged failure [2]. That’s just plain false.

Indeed, the problem has been addressed quite widely and with a variety of experimental and computational tools and across diverse languages. For example, Anne Christophe and her colleagues have demonstrated that infants are sensitive to the regular correlations between prosodic structure and syntactic structure and can use those correlations to build an initial parse that supports word recognition and syntactic categorization. Jean-Remy Hochmann, Ansgar Endress and Jacques Mehler demonstrated that infants use relative frequency as a cue to whether a novel word is likely to be a function word or a content word. William Snyder has demonstrated that children can use frequent constructions like verb-particle constructions as a cue to setting an abstract parameter that controls the syntax of a wide range of complex predicate constructions that may be harder to detect in the environment. Charles Yang has demonstrated that the frequency of unambiguous evidence in favor of a particular grammatical analysis predicts the age of acquisition of constructions exhibiting that analysis; and he built a computational model that predicts that effect. Elisa Sneed showed that children can use information structural cues to identify a novel determiner as definite or indefinite and in turn use that information to unlock the grammar of genericity. Misha Becker has argued that the relative frequency of animate and inanimate subjects provides a cue to whether a novel verb taking an infinitival complement is treated as a raising or control predicate, despite their identical surface word orders. In my work with Josh Viau, I showed that the relative frequency of animate and inanimate indirect objects provides a cue to whether a given ditransitive construction treats the goal as asymmetrically c-commanding the theme or vice versa, overcoming highly variable surface cues both within and across languages. Janet Fodor and William Sakas have built a large scale computational simulation of the parameter setting problem in order to illustrate how parameters could be set, making important predictions for how they are set. I could go on [3].

None of this work establishes the innateness of any piece of the correspondences. Rather it shows that it is possible to use the correlations across domains of grammar in order to make inferences on the basis of observable phenomena in one domain to the abstract representations of another.  The Linking Problem is not solved, but there is a large number of very smart people working hard to chip away at it. 

The work I am referring to is all easily accessible to all members of the field, having been published in the major journals of linguistics and cognitive science. I have sometimes been told, by exponents of the Usage Based approach and their empiricist cousins, that this literature is too technical, that, “you have to know so much to understand it.” But abbreviation and argot are inevitable in any science, and a responsible critic will simply have to tackle it. What we have in IT is an irresponsible cop out from those too lazy to get out of their armchairs.

I think it’s time for my nap. Wake me up when something interesting happens.


[1] IT also thinks that something about the phenomenon of ergativity sank Pinker’s ship, but since Pinker spent considerable time in both his 1984 and 1989 books discussing that phenomenon, I think these concerns may be overstated.

[2] You can sign me up to fail like Pinker in a heartbeat.

[3] A reasonable review of some of this literature, if I do say so myself, can be found in Lidz and Gagliardi (2015) How Nature Meets Nurture: Statistical Learning and Universal Grammar. Annual Reviews of Linguistics 1. Also, the new Oxford Handbook of Developmental Linguistics (edited by Lidz, Snyder and Pater) is also full of interesting probes into the linking problem and other important concerns.


  1. Thanks, Jeff, especially for the concrete examples of existing linking attempts. To me the process of employing probabilistic information on the animacy of the subject/object in order to infer the underlying syntactic structure seems unclear, even in broad strokes. A clarification would be most welcome.

    1. [1 of 2] @Nina, in the case of the ditransitives, the idea is based on what has been come to be known as Harley's generalization. Some languages are like English and express possession through a possessive construction in which the possessor c-commands the possessed. Other languages (eg., Dine, Irish) express possession through a locative construction in which the possessed c-commands the possessor. Harley's Generalization is that only languages of the first type have ditransitive constructions in which the goal c-commands the theme (like the English double object construction). Everybody has the locative ditransitive (like the English prepositional dative) in which the theme c-commands the goal.

      The ditransitives where the goal c-commands the theme also show a restriction in which the goal has a quasi-possessive interpretation. (Think Oehrle's contrast: I sent the book to NY vs. #I sent NY the book).

      But how does a learner identify a given ditransitive construction as being the one where the goal c-commands the theme? Using subtle interpretive differences between the interpretation of "I sent John the book" vs. "I sent the book to John" seems pretty hopeless. Moreover, the surface form of these "possession ditranstives" varies pretty widely cross-linguistically. In English word order distinguishes it, Spanish: clitic doubling; Kannada: benefactive verbal affix; Korean: special aux; Basque: dative agreement, etc, etc.

    2. [2 of 2]. So, we observed that the possession ditransitives will have a more restricted set of goal arguments than the locative ditransitives because the set of NPs that can be possessors is a subset of the set of NPs that can be locations. Specifically, the possessors are much more likely to be animate than inanimate. This asymmetry in animacy then cues the underlying position of the goal argument. This then unlocks all kinds of facts that are dependent on the relative c-command relations among the underlying positions of the two internal arguments. See Viau & Lidz (2011) for a more thorough spelling out along with some pretty cool data on binding asymmetries in 4-year-old Kannada speakers. Also, Josh Viau shows in his dissertation that the age of acquisition of the English double object construction is predicted by the age of acquisition of other possession constructions (and that the age of acquisition of the prepositional dative is not).

      The case of raising vs. control is discussed fully in several of Misha Becker's papers and in her recent book. But the idea is pretty much the same. I'll leave it as an exercise for the interested reader.

    3. Thanks, Jeff. I knew about the ditransitive analysis you mention but failed to make what's now is an obvious link between animacy and possession. So a rough path for the learner could be if ANIMATE--> then POSSESSOR --> then CERTAIN STRUCTURAL CONFIGURATION and it is the final link of the chain is part of the innate learner's knowledge in Pinker's Semantic bootstrapping sense? Plausible given independent evidence on infant's awareness of ownership and spatial goals. Interesting in terms of the mechanics of implementing such an inference, somehow it seems more complex compared to the if OBJECT then NP link you discuss above.

    4. This comment has been removed by the author.

    5. Re the role of animacy in raising/control acquisition. It's not straightforward. There is a statistical difference in the % of inanimate subjects between raising and control predicates: basically, rare vs. very rare. To use this difference as a classifier, essentially a statistical version of indirect negative evidence, is not feasible, partly because the data in the child-directed input is very sparse.

    6. This comment has been removed by the author.

    7. Viao and Lidz (2011) is certainly an excellent piece to put in front of UG skeptics, but I think it might also be useful to formulate it with greater independence from the Minimalist implementation, which still has some issues, I think (has anyone fixed the problem of Phrase Structure Paradoxes in a generally accepted way yet?), so here would be my attempt:

      I: Structure Theory
      1. There is a notion of 'structural superiority', not necessarily expressed in linear order

      2. a pronoun bound by a quantifier can't both precede and be structurally superior to the quantifier

      3. there are two ways of structurally coding 'transfer', one involving a 'possession' predicate where the recipient/possessor is structurally superior to the theme, the other a 'location' predicate where the theme is structurally superior to the recipient/location

      II. Learning Theory (Cue/Trigger Theory?)
      the two structures can be distinguished statistically on the basis that in the possessional one, the recipient is almost always animate, but in the locational one, this is not the case.

    8. Thanks Avery. Yes, I think your summary is pretty much exactly what I think the paper says. Regarding the technology, there's always a tension between providing a fully explicit analysis and describing the key insight in a theory-independent way. In this case, we tried to use almost exclusively text-book GB analytical machinery. The paper could easily have been written prior to the existence of Minimalism with the very same technology. Perhaps the GB/Minimalism distinction is splitting hairs at some levels, though.

    9. I'll try to articulate why I think that it's necessary to have 'colloquial' presentations like the one I offered in addition to the more technical/implementational ones. My suggested reason is that in order to write an explicit generative grammar, you need to make a very large number of decisions (or just accept ones made by somebody else, such as Chomsky), and many of them will of necessity have to be at best rather lightly motivated, and others may object to them and strongly prefer to set things up differently. Therefore the solid empirical results need to be at least somewhat disentangled from the arbitrary decisions. Indeed, this post of yours below: looks to me like advocacy for this approach.

      My little story was based on two collections of technical implementations of how predicates get hooked up to their arguments, theMinimalist ones, and the LFG ones, both groups having multiple variants, but each also involving a distinctive collection of somewhat questionable assumptions. I think the first significant and successful work in this genre might have been Alec Marantz' story about grammatical relations in his 1984 book, & I think that's a model we should try harder to emulate.

      I'm currently looking at Artemis Alexiadou's 2015 book on determiner spreading, and trying to do something like that with some of the analyses there, but not getting very far yet, possibly due to the relative absence of well-worked out competitors. So I guess I'll have to try to cook some up.

  2. There are some interesting typological universals that really are observational connected to some of these things ... for example there are afaik zero languages where words referring to kinds of things and living things (or 'Spelke objects', if you prefer that kind of terminology; there are many different possibilities that work pretty well for the purpose here) are split into multiple 'part of speech categories' (that is, ones that have an effect on word order principles). This is in striking contrast to the behavior of grammatical features, such as grammatical gender, which often do split up this conceptual category crazy ways. Since there always is a single part of speech into which words for kinds of things and living things will fall, we can label this category 'noun'; then a bit of X-bar gives you NPs, and Pinker's program is basically back in the air (the basic idea is due to John Lyons, in his 1968 book _Theoretical Linguistics_, although I've modified the formulation)

    Note carefully that this says nothing about what else may or may not be dumped into the 'noun' category, for example languages often include words indicating kinds of actions ('give it a kick'). A bit oddly, I find it a lot harder to do the same kind of a job for verbs.

  3. Of all of the criticisms that one could aim at Mike Tomasello, surely failure to get out of the armchair is not one of them.

    1. Tomasello 2000 (I think the Cognition one) was one of the first papers I read in graduate school, all the way back in 2003. Tomasello has been trying to establish the death of generative linguistics by decree as least as far as back then. He even had a debate at BUCLD with Stephen Crain about this stuff (it must have been 2003/4?), so he cannot claim he is unaware of the responses by generativists to his criticisms. So if nothing else, at least one thing is definitely true: nothing about this SA piece is new. It's been around for at the very least 16 years, in virtually the same shape (same vague and underdeveloped ideas about analogical thinking and social cognition saving the day) and tone (generative grammar and/or its nativistic commitments are wrong/dead!). It was maybe provocative 16 years ago, now it is just plain sad and weird. Tomasello may be a great primatologist/social psychologist, but he has given zero evidence to the field that he has taken the time to do his homework on theories of language acquisition from the generative camp, no matter how many thoughtful responses he gets to his periodical outbursts.

  4. When it comes to doing linguistics, this is evidently the correct assessment. Of course, Tomasello has done a mountain of really important work outside of language and he has been involved in many papers on language acquisition. So obviously he is working hard. But when it comes to generative linguistics, he has for the 20 years I've been engaging with his work, I have seen no demonstration that he has even tried to understand the perspective against which he has positioned himself. So, that's the armchair I am referring to.

    1. Yes, and it bites, doesn't it. Well, that seems to be the way non-generative linguists are treated by generative ones: they are simply dismissed as not doing Linguistics at all, and this is no longer even argued but is simply part of the subtext. The result is that a non-generative linguist reading generative papers feels like even when they agree, they disagree.

  5. I entirely agree that this sort of scorched-earth criticism of Generative Grammar from this quarter is completely unfounded and lazy and deserves the response Jeff offers. However, and I say this as one of the most steadfast Chomskians you will ever find, there is something substantive in what Tomasello and others have to say. So, in the spirit of charity, I’d like to zero in on the fundamentals of what they have (had) to say that we should take into account. From Diogo’s comment above, I looked up Tomasello’s 2000 paper (I found one in Cognitive Linguistics), and I found this very interesting passage from the Conclusion:

    “The general picture that emerges from my application of the usage-based view to problems of child language acquisition is this: When young children have something they want to say, they sometimes have a set expression readily available and so they simply retrieve that expression from their stored linguistic experience. When they have no set expression readily available, they retrieve linguistic schemas and items that they have previously mastered (either in their own production or in their comprehension of other speakers) and then ``cut and paste'' them together as necessary for the communicative situation at hand-what I have called ``usage-based syntactic operations'' … It is also important that the linguistic structures being cut and pasted in these acts of linguistic communication are a variegated lot, including everything from single words to abstract categories to partially abstract utterance or phrasal schemas.”

    I’d like to point out that this is remarkably similar to the picture I painted in the recent post. If you replace “schemas” with “treelets”, it is pretty much the same thing that I outlined (note that Tomasello later uses the word “linguistic structures” instead of schemas). I find it quite interesting that Tomasello arrives at this picture based on evidence from language acquisition, and I arrived at this picture from psycholinguistics, neuroscience, and even data from the traditional domain of linguistics (e.g., idioms). I think this is a very important generalization for people working in acquisition, sentence processing, and neuroscience of sentences. What Tomasello is missing is what Jeff and the Generative community have been banging on about for a long time – there is a gap between the data and the knowledge. I would like to fill that gap with a Minimalist UG that generates these stored structures, and I think this provides an excellent bridge that incorporates the insights of both camps.

    The Generative community has not dealt substantively with Tomasello’s generalization: the heavy use of stored linguistic structures. I find it interesting that I haven’t yet met a syntactician (with the possible exception of Alec Marantz) that denied the existence/use of stored structures. So why don’t we respond to Tomasello by accepting this empirical generalization and developing a model of language acquisition and online processing that makes use of stored structures /constructions / treelets, and arguing that a Minimalist grammar gives us a theory of how children acquire these stored structures?

    1. Actually, Charles Yang dealt with it pretty explicitly and showed that the math does not support the claim that children have stored structures without rules. That is, we cannot accept the empirical generalization because it is demonstrably false. This is not to say that stored chunks aren't an important part of how language is used, but the if stored chunks exist, they do not imply that children don't have rules.

      The more general point, that a complete theory of language use will include stuff about the format of grammars, the nature of memory, the kinds of things that are stored in memory, etc, etc, etc, is obviously valid. And, it is false that the generative psycholinguistics community does not engage with literature coming from people who are not generative linguists. Indeed, much of my own work has been aimed at trying to learn as much as possible from stuff outside of linguistics in order to figure out how to understand what belongs properly to grammar. The Lidz and Gagliardi 2015 paper is an example of this.

    2. This comment has been removed by the author.

    3. "Actually, Charles Yang dealt with it pretty explicitly and showed that the math does not support the claim that children have stored structures without rules."

      I am making a charitable interpretation of Tomasello's generalization. I completely agree that the idea that there are no rules is false, but contained within his (wrong) proposal there is what seems to be a correct generalization: stored structures play a particularly prominent role in children's use of language. So shall we proceed in the guise of charity, then?

      I thought your 2015 paper with Gagliardi was excellent and clearly lays out the case for UG. I agree that your model of language acquisition does "engage with literature coming from people who are not generative linguists". However, this was not my intended claim. If I wasn’t clear, what I meant was this: the (charitable) Tomasello generalization is that stored structures and/or constructions are prominent in language use and acquisition. I have not seen substantial engagement on this specific point. I'm saying that this appears to be a lacuna that we should seriously consider addressing, and doing so would allow us to incorporate interesting insights from this work.

    4. I don't actually understand what you mean by saying that "stored structures and/or constructions are prominent in language use and acquisition" is not well represented in the field. The great majority of syntactic theories and processing theories are fundamentally lexicalist in nature, which means that a word carries its syntactic environment around with it. That's a stored structure/construction. What exactly is the lacuna? I don't mean to be dense. Lexicalism is rampant in psycholinguistics. And saying that people store constructions is not fundamentally different from this kind of lexicalism.

    5. Focusing on Minimalism, there are only two specific proposals I have ever encountered that aim to incorporate a minimalist grammar in a theory of online sentence processing (theories that take a Minimalist grammar as a real thing, not as a "specification" of a parser). One is Phillips 1996 and the other is Townsend & Bever, 2001. Neither of them uses stored tree structures in the way that other psycholinguists would like to use stored tree structures.

      I have never seen any acquisition model that explicitly draws the lines between a Minimalist grammar and what happens during language acquisition. I haven't looked into the parameter setting models of Fodor and Sakas, although these seem focused on Government and Binding and don't seem to incorporate the fundamental notion that stored tree structures are prominent in language use by children.

    6. In large measure, I think the reason you don't see minimalism deployed in language acquisition research is that minimalism, even after 20 years, is still pretty much a pipe dream. The theoretical constructs of GB are considerably more trustworthy and since the main aim of minimalism is to try to derive the theoretical constructs of GB from more general principles, a project yet to be effectively carried out beyond a couple of interesting but speculative cases, it makes sense to operate at a grain size that is generally uncommitted about the theoretical origins of the mid sized generalizations that were discovered in the GB era.

      I think it is very easy to read Chomsky's claims about "recursive merge" in the wrong way. What Chomsky has argued is that the minimum thing you'd need to add to general cognition to get language is recursive merge. I can hardly see how that could be wrong. But he also speculates that it is also the maximum. This is the dream of minimalist theorizing, and while it is a good methodological dictum to try to not use any other technology, I know of exactly zero results that suggest that it's true. So while we're in the business of calling people out on their bs, let's not forget that the minimalist program has delivered very few significant results of the sort that it aims to produce. Lots of good work has been done using "minimalist" technology (probes, goals, phrases, AGREE, etc), but most of this stuff is just another grammatical theory of roughly the same complexity as its forebears like GB, LFG, HPSG, etc.

      So, go ahead and develop a "minimalist" psycholinguistics, if that's your taste. Personally, I find it hard enough to do psycholinguistics at a grain size that is appropriate for what we really know about grammar.

    7. Thanks for the clarity of your thoughts. I think it should be made clear in all of our work which specific grammatical model is being assumed - not doing this has led to a tremendous amount of confusion on my part. I don't think that Minimalism and GB would make the same predictions in many cases of psycholinguistics, neurolinguistics, and acquisition, if indeed these predictions could be made more precise for Minimalism.

      As I mentioned in the last post, though, GB seems to be a poor model for capturing the neuropsychological and neuroimaging data that I have to deal with, and much of my post was focused on what I thought is a plausible way to incorporate minimalist grammar into a theory of processing and acquisition that would allow me to proceed adequately.

    8. @William: I urge you to look more deeply into the computational work on parsing minimalist grammars. There are so many degrees of freedom when coming up with procedures which do something you might call parsing, why not start with one that actually does exactly what the grammar says it should?

      There is no good reason to prefer the ontologically promiscuous approach that you seem to be pursuing. (If you can come up with one, it would be an important development.)

      There are already extremely precise linking theories for parsing algorithms which provably do exactly what the grammar says they should (and efficiently, at that), and well-defined and -understood ways to approach your ideas about treelets.

      Many smart people have worked for many years to develop and understand this stuff. You will achieve much more if you take their work into consideration.

    9. The problem I have with this work is that I can't get out of it any clear ontological claims. Let me be excessively clear.

      Chomsky (1995) says that the language faculty is the following:

      Lexical atoms
      Third factor principles

      Plus some other stuff. What I want to do is add specific other stuff that allows me to understand online processing, acquisition, and neuroscience. Here's what I add:

      Stored structures
      Memory retrieval operations that interact with stored structures
      Appropriate mechanisms for modifying and substituting stored structures

      I think I can figure out our to proceed with the mapping problem between biology and this list. What is the comparable list from this computational parsing work? I would like a list of ontological objects so that I can seek to understand how those objects would be implemented in the brain. I cannot extract this from the work you reference, and this could either be because I don't understand it well enough or claims like this aren't made. Can you establish this plainly and simply for me? I would be exceedingly grateful for it.

      What I really liked in the work by Lewis & Vasishth (2005) is that they made clear claims as to the ontology of language. They say that language consists of the following:

      Declarative memory chunks with particular kinds of features
      Phrase structure rules
      Memory retrieval operations

      Again, is there a comparable list from the computational parsing literature?

    10. Greg could you be so kind as to cite some relevant stating literature. I think many would appreciate a read sign here. Not a full bibliography, but a nice place to start.

    11. This comment has been removed by the author.

    12. As an example of the kind of thing Greg is talking about, you could look at his own paper here:
      The full details of the parsing algorithm that they adopt are described in this paper by Stabler:
      There is more in the same spirit which others might like to add links to.

      But I suspect that, at least on a first reading, these papers might appear to not be addressing the kinds of issues William has been raising, because they take for granted certain background assumptions about what a grammar is, what a parser is, and the relationship between them. In order familiarize oneself with those background assumptions, I think it's best to temporarily leave aside "linguistically realistic" grammatical frameworks like minimalism or TAG, and take context-free phrase structure grammars as simple toy model. In other words, imagine that all the puzzles syntacticians worry about could be solved with the basic machinery of CFGs, and then ask your questions about how grammars get "put to use" within the context of that assumption. This reduces the number of moving parts one has to worry about, compared to asking those questions in the context of a more complicated and not-yet-settled grammatical framework. There are any number of places to read about parsing with CFGs; one source I like is Makoto Kanazawa's notes here:
      In particular, if we imagine that CFGs were a linguistically adequate formalism, and then take top-down parsing, bottom-up parsing, and left-corner parsing as three candidate hypotheses about how those grammars are put to use, then I'd be interested to hear versions of William's concerns about "what's missing".

      (One might object that since we know CFGs are inadequate, we should leave them behind entirely. That's fine, I'm only suggesting it as a practical matter as a route to understanding; there may be other routes too. In other words, the linguistic inadequacies of CFGs do not taint the general approach that is being carried over into the two papers I cited first above.)

    13. @Greg: Could you explain in what sense you find William's position to be "ontologically promiscuous"? I'd be particularly interested to hear your take on this – in a nutshell, I think it's an empirical truism that we store linguistically complex objects in our long-term memory, including some that don't have any special or non-compositional properties beyond being sufficiently frequent, and/or sufficiently recent. Do you find this premise contentious?

      I suppose one could accept this premise but still maintain that, while this storage exists, it is not used in the course of online parsing. But wouldn't that, as they say, require an extra stipulation?

    14. @ Tim

      Thank you - this is quite helpful. Would it then be fair to say that this literature makes heavy use of something like phrase structure rules?

      If so, I don't think there is much of a difference between a phrase structure rule, e.g. S -> NP VP, and a treelet, e.g. an S node dominating NP and VP nodes - they both contain the same information. However, I think a path can be drawn between Merge and these kind of abstract treelets (by taking a fully derived sentence and chopping off pieces), whereas this path is more obscure for phrase structure rules.

    15. I'm not sure I understand. On the one hand you say that a phrase structure rule contains the same information as a treelet, but on the other you say merge is more closely related to treelets than phrase structures rules are? Could you explain a bit more?

    16. They contain the same structural information but they are not identical. A treelet is a subset of a full sentence. A phrase structure rule is an instruction for building structure given some input on the left hand side of the rule.

      Merge makes structures, not rules. So if Merge makes structures, those are by definition treelets, and if I chop of pieces of those structures, those are also treelets. But I don't see how Merge would construct phrase structure rules.

    17. @William, I think you're confused about what it means to say that Merge makes structures. The weakest view (i.e., one that depends least on ancillary assumptions) is that Merge defines well-formed syntactic objects. This is exactly what PS-rules do. The only difference is in whether you state the recursive function that generates complete trees by defining them in terms of the steps that take you from small objects to large objects vs. by defining them in terms of the steps that take you from large objects to small ones. They are not identical, but they stand in exactly the same kind of relation to treelets, as far as I can see.

    18. @William, I strongly recommend that you read this:
      Indeed, I recommend that all students of syntax read it to avoid getting confused about things that nearly all students of syntax get confused about (and which a great many never get unconfused about).

    19. @ Jeff

      Thank you for the reference, but I am not confused on this point. I understand that this is claimed to be the case for Merge. However, I think that if you take this weakest view, then it becomes remarkably unclear how Merge is causally involved in anything. Taking this weakest view leads to the view of this computational literature that has been discussed, that the grammar is simply the "specification" of the parser. But why, and how? I don't understand how it's explanatory to say that there is some function Merge which defines well-formed syntactic objects, that is some kind of static knowledge which dictates to other systems how to do things. The connections are entirely opaque and non-understandable.

      However, taking the stronger position that includes more assumptions, that Merge is actually causally involved in things, that it is actually a real machine that makes stuff, then all sorts of avenues open up about how Merge determines the form of FL.

      While I understand that the former is the weakest view, to me it offers really no help in trying to understand linguistic performance. Unless somebody can explain to me how you go from Merge as static body of knowledge dictating well-formed objects and how sentence production, comprehension, and acquisition works.

    20. I agree with Jeff; I think you are confused. A phrase structure rule is just as "direction agnostic" as a treelet. (Or, if you really insist that a phrase structure rule is an instruction for moving downwards in a tree, then make up a new name for the direction-agnostic information that is in both a phrase structure rule and a treelet, and substitute that term everywhere I say "phrase structure rule".) If you look into context-free parsing, you will I think get a sense of how "a static body of knowledge dictating well-formed objects" can be related to "a machine that makes stuff". In particular, you will read about a how to set up a theory where a phrase structure rule like "S -> NP VP" is "actually causally involved in things". Understanding proposals about how more complicated grammatical machinery (e.g. of the sort that minimalist syntax uses) can be causally involved in things may be difficult without getting that simpler case understood first.

    21. I am fine with phrase structure rules being direction agnostic. And I am fine with phrase structure rules being causally involved in things. However, I think it is the case that this literature wants to have Merge as the specification for the form of phrase structure rules. What I am confused about is how Merge is causally involved in specifying the form of phrase structure rules - could you please clarify how this works? And preferably without adding anything to UG besides Merge and the lexicon.

      I am fine comfortable with Merge being causally involved in specifying the form of treelets, because Merge can be easily understood as causally involved in making tree structures.

    22. You lost me at "I think it is the case that this literature wants to have Merge as the specification for the form of phrase structure rules". Are you referring to minimalist syntax literature, or to literature of the sort that Greg mentioned above? (I don't recognize the idea in either case.)

      The general ideas about how grammatical systems relate to parsers that I mentioned a few posts back, and which I suggested might be best learned by considering CFG parsing, have nothing to do with phrase structure rules (or merge, or treelets, or ...).

    23. This is why I asked "Would it then be fair to say that this literature makes heavy use of something like phrase structure rules?"

      So let me ask again - the general ideas about how grammatical systems relate to parsers, which have nothing to do with phrase structure rules, merge, or treelets - what do they have to do with? If not phrase structure rules, what devices are they using? And if they have nothing to do with Merge - this seems like a problem, if these parsers are meant to incorporate minimalist syntactic theory.

    24. @William: My favorite view about syntactic structure is aptly summarized in the following quote from Mark Steedman (2000):

      Syntactic structure is nothing more than the trace of the algorithm which delivers the interpretation

      merge and move (and lexical items) are simply abstract generalizations about types of transitions between parser states. 'Treelets' are then statements about sequences of parser state transitions.

      This is the most faithful way of incorporating minimalist syntactic theory; any other approach is going to only approximate what syntacticians say.

      @Omer: I have zero problem with the contention that some linguistically complex objects are stored. This does in fact seem to entail that we need to be able to assign weights not just to individual parsing steps but also to sequences thereof, which means that our probability model should not be a unigram model but something more sophisticated.

      The 'ontological promiscuity' lies in the fact that you would need not only the grammar (although it is not clear to me in that case what it would be for), but also the parser, and its grammar (the statement of what the parser does). I've said before, there is a way to make this reasonable (a la Thomas Graf's interpretation), but this is so close as to be a notational variant of the Marr position.

    25. William wrote: Would it then be fair to say that this literature [e.g. the Kobele and Stabler papers I linked to above] makes heavy use of something like phrase structure rules?

      I'm not sure how to answer this, to be honest. In a certain general sense, yes; but my impression is that you have something more specific in mind, and so I suspect a better answer is no.

      William wrote: the general ideas about how grammatical systems relate to parsers, which have nothing to do with phrase structure rules, merge, or treelets - what do they have to do with? If not phrase structure rules, what devices are they using? And if they have nothing to do with Merge - this seems like a problem, if these parsers are meant to incorporate minimalist syntactic theory.

      They are general ideas that can be applied to grammatical systems based on phrase structure rules, and to grammatical systems based on merge, and to grammatical systems based on treelets. My thought was that by seeing them applied to systems based on simple context-free phrase structure rules, you might be able to pull out an answer to the question of "how it's explanatory to say that there is [a set of phrase structure rules] which defines well-formed syntactic objects, that is some kind of static knowledge which dictates to other systems how to do things"; and that this might help you understand how people have approached your question of "how it's explanatory to say that there is a function merge which defines well-formed syntactic objects, that is some kind of static knowledge which dictates to other systems how to do things". But as I said, if you'd rather skip the intermediate step then feel free to dive straight into the Kobele and Stabler papers.

    26. I think there is this idea that there is a huge difference between Merge and phrase structure rules, and that phrase structure rules are really really bad. But phrase structure rules are just a form of Merge really so I don't quite see the objection. A rule like A -> BC just says you can merge B and C to produce A.

      There are a couple of important differences. It has some order, (which one can factor out in an ID/LP way) and it has labels which Merge doesn't have, but Merge based analyses need these at some point, or you can't parse. Deep in the guts of any MP parser (e.g. Sandiway Fong's) are hidden some PSG rules.

    27. There are deeper differences, though, e.g. the PSG-based MCFGs being less succinct than MGs because a PSG has to decompose everything into local dependencies. That's a more modern version of the argument Chomsky gave in favor of transformations.

      And PSGs offer less of a grip on certain issues, e.g. which dependencies are natural --- anything that can be passed around via labels is a natural dependency in a PSG. Of course you have a similar problems with Merge because of selection, but that's already a better starting point because we do have ideas about what selection should look like, e.g. for arguments VS adjuncts.

      So yes, pretty much anything done by linguists can be expressed via PSGs, but you lose certain properties in the translation, properties that matter for the kind of questions linguists are interested in. But you also gain stuff, like a straight-forward link to computational parsing theory. As so often, it's a case of picking the right representation for the task at hand, and not being too literalist when it comes to the cognitive reality of linguistic theories.

    28. @Thomas: As it regards being literalist about linguistic theory, I couldn't disagree with you more. But we've hashed this out on the pages of this blog before :-)

    29. @Omer: In this specific case, it seems to me that there's very little room for disagreement. If you're not a literatist, you agree that one and the same structure/grammar can be represented in different ways. If you are a literalist, then you must also assume that the grammar and the parser are distinct objects rather than the former being an abstraction of the latter --- for that would once again mean that the grammar can be stated differently. But if the grammar and the parser are cognitively different objects, it need not be the case that the encoding of the grammar used by the parser is identical to the actual encoding of the grammar. So unless you adopt a very strong kind of transparent parser hypothesis, you can still be a literalist and assume that the parser operates with a PSG version of the actual grammar.

    30. @Thomas: as I think I said in the linked-to post, my assumption is that the grammar is used (as, say a subroutine or set of subroutines might) in the course of both parsing and production. I suppose this would qualify as a "very strong kind of transparent parser hypothesis," yes?

    31. @Omer: 'Being used' is not specific enough to count as anything.

    32. @Greg: I disagree (though it really does seem like we're sinking back into discussions that already took place in the comments here). Let's stick to the "subroutine" model for the sake of concreteness. If the grammar is "called" when you produce sentences or parse them, then the computational complexity of grammatical procedures is implicated in those two tasks. Remember: I'm a literalist when it comes to the cognitive interpretation of how the grammar is specified. (I know that you are not.)

    33. @Omer: I don't understand how your '"subroutine" model' is supposed to work, nor what the grammar is that it can be '"called"', nor what it should return upon being '"called"', nor what is done with that which is returned. All of these questions can be legitimately answered in many ways.

    34. Yup, I don't have answers here, either, Greg. However, since I'm a cognitive literalist, I think the job of linguistics (all cogsci, really) is precisely to investigate what the answer is. Finding arguments one way or another is going to be hard, and they will often not be cut-and-dry arguments. That's fine with me. But this is a substantively different view than one that treats two models the same as long as they are extensionally equivalent (in their weak & strong generative capacities). Only under the latter type of view can one realistically appeal to the kind of ontological parsimony you were talking about earlier. Again, I'm well aware that you view things differently, and that for you, if there is no mathematical utility in specifying a grammar independent of parsing and production, then we should not pursue models that have that as an independent entity. Fair enough, but be aware that some of us view things differently.

    35. @Omer: How can you then claim to have an assumption that should count as a '"very strong kind of transparent parser hypothesis"'?

      I would very much like to see a clear statement of what the commitments of a 'cognitive literalist' are.

      The Marrian position is exactly that it is extremely useful to specify the grammar independently of how it is implemented.

      Linguists (syntacticians) are trying to describe the structure of sentences. Their arguments are based on structure type properties. Therefore, any properties of the analysis that go beyond this are undermotivated. We could try to see what consequences these unmotivated properties could have given a particular linking theory, or we could view these properties as the unmotivated notational accidents they are, and try instead to understand the space of models that give rise to the same structures.

      I understand that some of us view things differently, but I have given reasons for viewing things my way, and am not familiar with reasons for taking notational accidents as revealing of essences.

    36. "What Chomsky has argued is that the minimum thing you'd need to add to general cognition to get language is recursive merge. I can hardly see how that could be wrong."

      In that case, it is unfalsifiable and therefore not a scientific theory. But indeed it is easy to see how it could be wrong: simply provide evidence of a non-language-user who can employ recursion. Unfortunately, no amount of studying human beings will do this: so we see that (ultra) Minimalism is really a theory about non-human communication: that is, the theory that n.h.c. cannot be recursive. It is of course an empirical question whether this is true or not.

    37. @Greg: It is simply not true that "Linguists (syntacticians) are trying to describe the structure of sentences." Linguists (syntacticians) are trying to describe the properties of the mental system that underlies sentence formation. And some of us are interested in more than the extensional properties of that system.

      As for "notational accidents," I'm afraid that the very use of that term presupposes your point of view. (To reiterate, I'm not saying your point of view is invalid, only that it is not my point of view.) These differences are only viewable as a mere "notational" issue if you presuppose that all that matters is the system's extensional properties; I deny this, therefore they are not notational accidents. They are real differences, suggestive of conflicting statements about the way the mental computation proceeds.

    38. @Omer: I'm fine with your correction. The fact remains, however, that

      Their arguments are based on structure type properties. Therefore, any properties of the analysis that go beyond this are undermotivated.

      Do you believe that it is a notational accident that Chomsky used curly brackets instead of square brackets when defining merge a b as {K,a,b}? (Of course you do.) Why? Because there is nothing in the data that forces/motivates that decision. In other words, you could change that systematically throughout and wind up with an extensionally equivalent theory. We both agree that linguistic theories are full of notational accidents; surely that can't presuppose my point of view.

    39. @Greg: Yes. What we don't agree on is that stating something as a generative rule or as a filtering constraint is the same kind of arbitrary choice as curly vs. square brackets. Assuming these two types of choices are identical presupposes your point of view.

    40. @Norbert: I think the most important references in this regard are:
      1) Henk Harkema's dissertation for definitions and proofs about parsing MGs
      2) Ed Stabler's paper adapting Henk's parallel top down parser to a backtracking serial parser, and comparing it with extensionally equivalent alternatives.
      3) John Hale's paper on explaining the accessibility hierarchy using sophisticated syntactic analyses combined with sophisticated linking theories
      4) Although not directly about MG parsing, everyone should read this paper by John Hale

    41. @Omer: What kind of data motivates you to state something as a generative rule vs a filtering constraint?

      To lay bare my plan in advance: I don't think there is any data that motivates you to state it one way vs the other. In that case, this is an accidental property of your analysis.

    42. @Greg: I didn't say that I have data that conclusively favor one over another. I am just not on board with going from that to saying that the distinction therefore doesn't matter. And I have reasons to suspect that we should prefer stating things in terms of generative rules rather than filtering constraints – namely, computational efficiency. (Note that we are not talking here about computational complexity theory; there is every reason to suspect that, e.g., multiplying by a constant matters a lot in the brain.) This is why I said, above and in the linked-to post, that this will get into the domain of less-than-cut-and-dry arguments, and it will be a hard, hard thing to probe. I'm just not comfortable going from that to the view that it therefore doesn't matter.

    43. @John re:

      "In that case, it is unfalsifiable and therefore not a scientific theory. But indeed it is easy to see how it could be wrong: simply provide evidence of a non-language-user who can employ recursion. Unfortunately, no amount of studying human beings will do this: so we see that (ultra) Minimalism is really a theory about non-human communication: that is, the theory that n.h.c. cannot be recursive. It is of course an empirical question whether this is true or not."

      Two comments. First, Chomsky's claim at the general level is hard to gainsay for it rests on the trivially observable fact that humans are linguistically "creative" in the sense that they produce and understand sentences never before encountered by them. To specify the objects that this observable capacity covers requires specifying a finite set of rules that can apply repeatedly and without limit. In other words, if you can't list the linguistic objects the only way to specify them is recursively and if Merge is what we call that recursive ingredient, then it is hard to see how Chomsky could be wrong given the obvious fact of linguistic productivity.

      This is similar to noting that it is hard to see how physics could refute the claim that the Atlantic Ocean exists, a very definite empirical claim whose falsifiability is hard to imagine. Ditto with recursion.

      Now, as for the particular detailed conception of recursion that Chomsky provides; merge as a set forming operation subject to Extension and Inclusiveness, this is very much refutable in the sense that one can find arguments pro and con. Indeed, if tucking-in exists or rules can add novel formatives in the course of the derivation (say indices are required and provided under movement) then CHomsky's particular proposal will need lots of revision. So, the general claim is trivially true and the specific one is quite definitely empirically challengeable (I prefer this to falsifiable as I don't believe that many claims are falsifiable anywhere).

      Second point: minimalism does not rest on the assumption that humans are uniquely capable of recursion. This again is taken to be a fact (nothing does language like humans do) and the program tries to accommodate this fact by reducing the differences between us and them to the smallest possible difference. Say his fact is wrong and that we find some other animals with the exact same kind of recursive powers that we have. I am not sure that would change much except now we would need to explain why "they" don't talk like we do and how we and they are different from all other animals that don't have this recursive capacity. Species specificity is, IMO, mainly a red-herring, though there is no current evidence to suggest that non-human animals have the kinds of recursive powers evident in our linguistic behavior.

    44. @Omer: So then the distinction between generating vs filtering is an accidental property of your analysis given the data. My position is just that, before we get emotionally attached to accidental properties of analyses, we should try to understand their essential properties.

      There is a deep philosophical question about when two programs implement the same algorithm, but one that many brilliant people have thought deeply about. Ed Stabler discusses this in detail in the context of linguistics.

      From the perspective of the kind of data we are using to motivate the theory, we cannot discriminate between analyses with the same essential properties. I am saying that I view them as equivalent ways of describing the competence facts, and the accidental details I think of (mostly as meaningless accidents but also maybe) as different choices about how to realize the same essential analysis in a processing model.

      I think that your alternative is making an arbitrary distinction between 'obvious' notational accidents (like the color ink you use) and 'less obvious' ones (like generating vs filtering), neither of which have any connection to the data you are looking at.

    45. @Greg: I don't disagree with anything you say here. But my position is that certain undermotivated (and therefore, tentative) commitments have – in the history of scientific inquiry of all kinds, I think – turned out to have great heuristic value in theory development. Given that they are undermotivated, there is by definition no principled way to choose which ones to make; you have to go by your hunches, and your choices are vindicated (or not) by the success or failure of the theory that these choices give rise to. I think equating the choice between generation and filtration with choice of ink color / the shape of braces / etc. is not appropriate, here, because – given what I said earlier (that things that don't matter to computational complexity theory might still matter when it comes to implementing grammar in the brain) – I don't think we can say with any certainty that this choice will be inconsequential for our object of study. Of course, I'm not asking you to buy my heuristic commitments; I'm trying to explain why I find them defensible despite what you're saying.

    46. This comment has been removed by the author.

    47. @Omer: Sorry for dropping out for a few days, but it seems Greg has made exactly the same arguments I would have. I would like to add one more example, and then there's one specific point that I find troubling about your perspective and I'm curious how you resolve it for yourself.

      First, to go back to the relation between grammar and parser, your assumption that the parser uses or calls the grammar still doesn't entail that the same encoding is used. For instance, in a lexicalist framework like Minimalism the grammar can be specified as a list of lexical items. That's a very simple data structure, but a parser may opt for a more complex prefix tree representation to improve retrieval speed. I don't think syntactic theory wants to say anything about this specific issue. But if we accept that, where do we draw the line?

      This immediately leads to my main point of confusion. I completely agree with you that research is guided, for a good part, by hunches. I personally do not care what intuitions or ideas researchers use as fuel for their research as long as it produces new results. But it seems to me that your position is (or risks being) exactly the opposite of that: by committing to a literalist interpretation, you actively block off research paths that take a less literalist stance. Whatever result somebody produces, e.g. showing that one can freely switch between features and constraints (yes, I obviously have some skin in this particular game), you can just dismiss as non-literalist and hence not obeying what you have defined for yourself to be the rules of the game. If the results fit with your agenda, you may use them as additional support, otherwise you can dismiss them on literalist grounds. You are never forced to fully engage with the ideas. This probably isn't a fair characterization of your position, but as I hinted at above, I really don't understand how the line between substantial and notational difference is drawn in this case, and what it is meant to achieve.

      Now you might say that the same risk exists with the non-literalist stance: dismiss every idea that hinges on notation as ontologically over-eager malarkey. But that's not the case, and here's why: a theory or piece of notation, by itself, means nothing, just like a formula of first-order logic is just a series of symbols. Both need a specific interpretation attached to them. So a literalist may posit notation N as cognitively real, but what they actually mean is that notation N under some natural interpretation I is cognitively real. As a non-literalist, I can then study this pair [N,I] and look at other variants [N',I'], [N'',I''] that yield the same results for the domain of inquiry. And that way I'll work my way towards a more abstract understanding of what's going on until I can state the essence of [N,I] without being tied to N or I. For that reason, it is never prudent to outright dismiss a productive literalist claim, one that amounts to a new claim about [N,I], because that may be the starting point for a fruitful research program. The only thing I can dismiss as a non-literalist are negative arguments, those of the form "[N,I] is better than [N',I], so N is better than N' and nobody should use N'". These are inconsistent arguments that actively lock off research paths and easily end up impeding progress.

    48. @Thomas: I agree with you that one's hunches do not an argument make. And I therefore have no expectation that someone who is doing syntactic theory in a constraint-based framework, or building parsing software using exploded CFGs with two million category labels, would stop what they're doing just because Omer's hunches happen to say otherwise.

      But these type of commitments, I think, are indispensable for linguistics qua cognitive science (to be clear, I mean having some tentative commitments about what the psychological reality of our postulates is, not necessarily having my particular commitments on the issue). We are not studying the formal properties of some abstract object; we are studying the formal properties of a system that underlies an aspect of human behavior, a system ultimately implemented in wetware. So yes, go ahead and choose the commitments that make your work the most productive, and makes your results make the most sense. Just don't expect me a priori to be moved by claims of "x being formally translatable to y"; it will depend on what x and y are.

    49. @Omer: I just want to clarify one important point. Of course we're talking about a cognitively real object. The issue under debate is not the cognitive commitment of posited theories, but the expression of that commitment. That's what I tried to convey with the discussion of interpretations: cognitive commitments are necessarily independent of notation because whatever [N,I] is supposed to achieve, there will be an equivalent [N',I']. Characterizing this commitment independent of N is a good thing, it will allow you to switch N as needed to simplify your work while retaining and sharpening the underlying ideas.

      A hard literalist stance precludes such an approach and actively discourages thinking in these more abstract terms. Literalism achieves exactly the opposite of what it is supposed to --- it fails to seriously engage with issues of cognitive reality because it squeezes them into the narrow corset of notation.

    50. @Thomas: Many of the relevant claims concerning "equivalence" are only valid given certain contingent background assumptions. E.g. an overgenerate-and-filter architecture is only equivalent to a crash-proof architecture under particular assumptions regarding memory limitations (or lack thereof) and running time. I don't see why I'm expected to buy those assumptions in the general case. (You are free to make them, of course; that would just be an instance of you following your hunches.) In Marr-ian terms, these are algorithmic concerns rather than computational ones; but asserting that one is not allowed to weigh algorithmic concerns when thinking about these issues would be yet another heuristic hunch, no more. And it is one that I reject. You are free to adopt it.

    51. @Omer: I think I've been following our discussion pretty well so far, but I don't understand at all what the last reply has to do with what came before.

      The assumptions for equivalence claims (weak, strong, succinctness, memory usage, processing, etc.) have little to do with what I mean by an interpretation or the equivalence of notation-interpretation pairs like [N,I] and [N',I']. Similarly, nobody is saying that algorithmic concerns cannot inform computational ones, just that notation is not the glue that keeps these layers together.

    52. It depends what falls under the heading of "notation" for you. If generate-and-filter vs. crash-proof is a notational difference (as I've heard it called before), then it is a notational difference that implies an algorithmic difference (see above), and so the choice between the two is not one of those things you can "switch as needed."

    53. You most definitely can because there is no need to implement a generate-and-filter specification in a generate-and-filter way. That's where your parser transparency is supposed to kick in, but I doubt that it can be defined in a way that allows the extra detail needed for an algorithmic implementation without generating enough leeway for "non-transparent" implementations.

      To give a concrete example from parsing theory already mentioned by Greg earlier: if you want to do left-corner parsing with a CFG, you can either use a left-corner parser with the original CFG, or a top-down parser with the left-corner transform of the CFG. From outside you cannot tell them apart, and conceptually they also describe exactly the same parsing mechanism. Both are transparent in the sense that the original grammar can be easily recovered in an automatic fashion, and neither is fully transparent because the grammar is no longer a unified object and instead baked into the inference rules of the parser. You can even modulate the degree of transparency depending on whether I define the parser as a triple consisting of 1) a parsing schema, 2) a control structure, and 3) a data structure, or instead combine all of that into some kind of automaton, which is what would actually be implemented in any real-world program.

      Generate-and-filter VS crash-proof is more technical, but you can play the same game. Often very fruitfully and in a manner that tells you new things about the computational layer.

    54. @Thomas: I think this will be my last comment here, so I'll leave the last word to you. But from where I sit, it looks like you are presupposing your conclusion. I take the formulation of the grammar to impinge on the psychologically real steps taken in the algorithmic course of events. If you don't, then yes, obviously you can then change things around without committing to any changes at the algorithmic level. That seems tautological to me, though.

    55. @Omer: Do you reject the utility of the competence-performance distinction? I can make sense of what you are saying if I assume that you believe that we ultimately do not want to formulate the regularities between form and meaning independently of how they are computed by language users in real-time.

    56. This comment has been removed by the author.

    57. @Omer: I'll just conclude with a summary of my argument, which still stands as far as I'm concerned.

      1) In order for the tight link between grammar and parser to make sense, you have to be both a strong literalist and assume a strong version of parser transparency.

      2) The literalist stance is an impediment to research, rather than a catalyst. It is not a prerequisite of cognitive commitment. In fact, the non-literalist stance is better suited to studying language as a cognitively real object because the latter allows for more abstract characterizations that aren't tied to notational devices and are easier to connect with other concepts from cognitive science. And while the non-literalist can play around with literalist ideas, the opposite doesn't work.

      3) The notion of parser transparency you have in mind is too vague to support any concrete inferences. Take the example of the left-corner transform. The original grammar in this case still decides the steps taken by the parser, which is what you want, yet you seem to reject the construction as lacking in transparency. But presumably a left-corner parser operating on the original grammar would qualify, even though it is exactly the same parser. If even this isn't transparent enough, there's little to no room for any parsers at all. As far as I can tell, there is no way towards a principled, consistent answer here.

    58. @Greg: I wholeheartedly adopt the competence-performance distinction as useful and illuminating. But, of course, distinct does not mean disjoint; in particular, competence is one of several factors that go into performance. And since performance happens in real time and is underlain by wetware, and competence is implicated in performance, I think it is eminently reasonable to bring these considerations to bear on our model of competence.

      Let me also say something that, it seems to me, keeps getting lost in the shuffle: I am not trying to convince you (or Thomas) to change your position. I am merely insisting that there is nothing incoherent or methodologically problematic about my position (what I have called the cognitive-literalist position). It is only these critiques of the cognitive-literalist position that I take issue with.

    59. Small interjection on this exchange. I think we have specific reasons to distinguish generative rules vs. filtering constraints. Surveying a lot of different studies on different phenomena, I think that we find some things that are unacceptable and un-representable, and others that are judged unacceptable but that people have little trouble representing. Different phenomena call for different accounts, I suspect.

      Though I've spent much of my life worrying about grammar-parser relations, I think there's a need for caution in trying to connect them. Parsing differs from the standard concerns of grammarians in at least a couple of ways. (i) Building representations in real time. (ii) Doing so with little or no prior knowledge of the intended meaning. (i) is task independent, (ii) is task specific.

  6. @William

    Well, one could do worse than update Lewis & Vasishth's proposal with a more realistic MG instead of the CFG they use.

    Elsewhere, though, John Hale and others (including me) have advanced precise proposals about how MGs can be incorporated into a model of incremental comprehension. These models can furnish quantitative predictions for behavioral and neurophysiological data. I'd be interested in known what's missing from this line of work that you see as necessary for your own interests.

    Some recent examples:

    [hale 2016](
    [brennan 2016](

    1. Quantitative predictions are great. However, I think we have far tasks as neuroscientists than to correlate quantitative predictions of parsing states with activation patterns in different brain areas.

      What I mean is that we should be aiming to solve the mapping problems that are nicely described by Poeppel & Embick (P&E 2005; E&P 2015; P 2012). That is, we want to take a theory of what language is and figure out how the hell his could be implemented neurobiologically. This means we should have an understanding about what the elements of language are beyond those provided by syntactic theory and develop hypotheses of how brain structures might implement those elements of language, and doing this will require much more than being able to correlate parsing states with brain activations during comprehension.

      So what I want is to develop a theory of the faculty of language that will allow me pursue that agenda. This requires a "list of parts" that comprise FL. Chomsky 1995 provides a list of parts. What is the list of parts provided from the literature that is continually recommended to me? I am very serious - I would love a summary of this.

    2. We certainly agree on the problem(s) that warrant attention! I guess all I’m doing is echoing Greg’s point that the existing literature offers candidates for the parts list you’ve described.

      Thinking of stored structures, the notion of chunking discussed in, among other places, Rick Lewis' 1993 dissertation, Tim O'Donnell’s 2011 dissertation, or John Hale’s 2014 book all offer different takes on how to derive cognitively plausible stored representations that conform to a formal system like Minimalism.

      On your other points, MG-friendly parsers, including Hale’s automata, formally describe machines that retrieve, modify and substitute stored structures.

      Now, these theories aren’t sufficient, alone, to connect to neurobiology. P&E15: although cognitive theories and [neurobiological] theories are advancing in their own terms, there are few (if any) substantive linking hypotheses connecting these domains. (357). So, lets add some linking hypotheses!

      Here’s one: a neuronal circuit consumes oxygen in proportion to the number of constituents that are recognized word-by-word. This is more-or-less what Pallier et al. propose in their 2011 PNAS paper. (I say something similar in a 2012 Brain and Langauge paper.) Hale’s information theoretic metrics offer an alternative linking hypothesis, something like oxygen consumption correlates with changes in probability mass over possible structures (which might fall out of how structures are chunked in memory…)

      Maybe I’m way optimistic, but I think the current literature actually offers a few candidates for satisfying P&E’s mapping problem. We could be in a position to start comparing alternative proposals, e.g. by seeing which of several candidate linking hypotheses makes the best predictions for some neural signal of interest (e.g. turning the knob on how predictive the parser is re constituent building, do I get better or worse fits against oxygen-consumption in region X?)

      I fully agree that these correlations alone don’t answer the explantory challenge posed by P&E (and Chomsky long before them): Why does the brain carry out operations in a certain way and not some other way. I’d say we probably need a more granular parts list before making much headway on the explanatory issues.

      I'm still having an embarrassing time trying to embed links:

      Rick Lewis' 1993 dissertation
      Tim O'Donnell’s 2011 dissertation
      John Hale’s 2014 book