Saturday, September 10, 2016

The Generative Death March, Part 1

It’s been a bad few weeks to be a Chomskyan. It seems like everywhere you turn, someone is claiming that the core ideas of generative linguistics are relics of a bygone era, that the original lessons of the cognitive revolution can be discarded and that those of us who study the language faculty from the standpoint of classical cognitive science are standing around throwing hissy-fits while the real scientists repeatedly show us how the facts on the ground disprove every idea we ever had. 

In Scientific American, Ibbotson and Tomasello (henceforth IT), here, take issue with two key features of the generative enterprise. First, IT tells us that languages do not have “an underlying computational structure”. Then IT tells us that there is no special biological foundation for language, no innate structure that determines how languages can and cannot be structured. IT tells us that “instead, grammar is the product of history (the processes that shape how languages are passed from one generation to the next) and human psychology (the set of social and cognitive capacities that allow generations to learn a language in the first place).”

As I inch along to my death alongside the ideas of generative linguistics, IT looks not like a revolutionary new idea, but the revival of some very old and discredited ideas with no answer to the fundamental questions that bankrupted them to begin with.

IT characterize their revolution thus: “They [children, JL] inherit the mental equivalent of a Swiss Army knife: a set of general-purpose tools—such as categorization, the reading of communicative intentions, and analogy making, with which children build grammatical categories and rules from the language they hear around them.”

The generative linguist is more than happy to grant the child these general purpose tools. Surely the child must be able to identify regularities in the environment, to read communicative intentions and to draw analogies. The central idea of Universal Grammar is that these things are not sufficient to explain the character of the languages we come to acquire. And the goal of the generative linguist is to add as little as possible to UG so that these abilities have increased potency. UG provides the dimensions along which analogies can be drawn and constrains the character of the representations that learners construct when faced with language data. It makes no claims about the trajectory of development, about the contributions of social cognition, memory architectures, cognitive control or statistical inference mechanisms. Rather it says, that given these endowments, we also need one more thing. Thus, any evidence that any nonlinguistic cognitive systems are involved in language acquisition are entirely silent about the existence of UG, unless it can be shown that they do work that we otherwise thought UG responsible for. Note also that the generativist would generally be delighted to learn that something they thought fell into their purview is better explained by something extralinguistic for it allows UG to be smaller, which everyone agrees is the strongest scientific position. 

So, what kinds of things is UG responsible for? In an earlier post here, I worked through one example concerning the treatment of subjects and the interpretive asymmetries between sentences like these:

(1) a. Norbert knows how proud of himself Alexander was after the conference.
b. Norbert knows which picture of himself Alexander posted after the conference.

(1b) is ambiguous in a way that (1a) is not, a fact not exhibited in speech to children and not obviously explained by factors external to grammar.

Here is another relevant case, dating back to several papers from the 1970s by Chomsky and Bresnan:

(2) a.  Valentine is a good value-ball player and Alexander is too
b. Valentine is a better value-ball player than Alexander is

In both of these examples, there is no pronounced predicate in the second clause, but we fill in this predicate in our minds as equivalent to the predicate in the first clause (i.e., a good value-ball player). Is this unpronounced predicate represented in the same way in the two sentences? Evidence suggests not. For example, in some contexts, they behave differently.

(3)  a. Valentine is a good value-ball player and I think Alexander is too
b. Valentine is a better value-ball player than I think Alexander is 
c.  Valentine is a good value-ball player and I heard a rumor that Alexander is too
d. * Valentine is a better value-ball player than I heard a rumor that Alexander is

The fact to be explained here is why the child learner when building representations for (2) doesn’t treat the silent predicate in the same way in the two cases. Both can be interpreted as identical to the main clause predicate in (3a/b); however, this dependency can only hold across the expression “hear a rumor that…” in the coordinate (3c) but not the comparative (3d). It is an analogy that could be drawn but apparently isn’t. Moreover, it seems that (2b) has a structure analogous to the structure of interrogatives.

4) a.  What do Valentine and Alexander like to play together?
b. What do you think that Valentine and Alexander like to play together?
c. * What did you hear a rumor that Valentine and Alexander like to play together?
In (4) there is a dependency between the wh-phrase “what” and the verb play. In (4b), we see that this dependency can be established across multiple clauses (just like the coordinate and comparative ellipses), and in (4c) we see that it cannot be established across “hear a rumor that” (like the comparative ellipsis and unlike the coordinate ellipsis).

Evidently the analogy that the child learner draws when acquiring English is that comparatives have the same kind of structure as wh-questions. Why do they draw this analogy and not the analogy between the comparative and the coordinate ellipsis, which shares more obvious surface features? These patterns, both the analogies that our grammars make and the ones that are tempting but not taken, have been at the center of the generative enterprise since the 1960s. They hold this privileged place because they invite grammar-internal explanations in the form of computational/representational mechanisms out of which sentences are built. To my knowledge, nobody in the field of usage-based linguistics has even attempted to show how such facts follow from “categorization, the reading of communicative intentions and analogy making.” Their silence suggests one of two things: (a) that their swiss-army knife doesn’t have the right tool, or (b) that they have dismissed such cases as irrelevant because they haven't seen how to integrate them with things they do understand. I actually think the answer is a combination of these two, a point I will elaborate on in a second post.

In the meantime, when the usage-based theorists have something to say about the range of grammatical phenomena, and the deep similarities found among widely diverse languages that animate discussion in generative syntax, we will be ready to engage. Until then, my friends and I will continue our long slow march to scientific obsolescence.

31 comments:

  1. It's true that IT do not address phenomena such as these, which are indeed intriguing and worthy of in-depth study. However, it is completely unclear how innate knowledge could explain these facts, because (als far as we know) they are specific to English. I would be impressed if someone showed that there is an abstract pattern that is valid for all languages that helps explain These English facts.

    ReplyDelete
    Replies
    1. Everett's Fallacy: x is an innate feature of the human linguistic capacity, therefore x must be observable in every human language.

      Unless you have an explanation for how "analogy", "intuiting", and other ill-defined notions lead to this knowledge ending up in the heads of English speakers, the only remaining explanation is that they were there in the first place, and perhaps in some languages, specific properties conspire to make them unobservable.

      Delete
    2. What remains surprising is the mathematical naivety that Tomasello shows, as someone who putatively does quantitative work. This includes (a) taking uneven distributions in child language use as evidence for memorization rather than a productive grammar, (b) taking the absence of evidence to be the evidence of absence (e.g., the failure to observe a construction, and only its periphrastic form, to mean it's ungrammatical, as in IT discussion of "donate"), among others. Language acquisition theorists of all persuasions have argued against these views.

      Delete
    3. For (a), see here (http://facultyoflanguage.blogspot.com/2013/04/give-nim-give-me.html) and for (b) see here (http://facultyoflanguage.blogspot.com/2016/04/indirect-negative-evidence.html).

      Delete
    4. @Martin. It is also unclear how underspecified notions of analogy, categorization and intuition will lead learners to these conclusions. And since universality is not evidently an important feature of usage-based theories, they should not be as constrained as the generativist. If there were an answer within the usage-based framework that validated their approach, I think that would have a much stronger impact on our thinking than empty assertions about the demise of our field.

      Delete
    5. Of course, if X is innate, it need not be observable directly, but it MUST constrain languages in such a way that there are empirical consequences. What are these? I compare languages worldwide, and I do find lots of universals, but not many that might help with acquisition. And please note that I'm not advocating IT's "solution" of the acquisition problem – I think they are pretty naive about syntax. What I'm saying is that it's quite unclear how UG provides a solution that has empirical consequences.

      Delete
    6. I guess what Omer has in mind is the following. If a language has property X, then X is (universally) constrained in the following way. If language lacks X, the relevant constraints are latent and therefore invisible at the surface. However, since we want our constraints to be as general as possible, this seems to me an ultimately undesirable situation.

      Delete
    7. @Olaf: I don't see what is undesirable about this. It is par for the course in linguistics and, I suspect, in all empirical sciences. (Relativity is not damaged by the fact that I cannot see its effects in my backyard.)

      Example: I have noted (in work with Masha Polinsky) that if a verb X covaries in person/number/gender features with a nominal argument Y, then either (i) X and Y are clausemates, or (ii) X is in a higher clause than Y. But never: Y is in a higher clause than X. I have also argued, separately, that there is no such thing as "abstract" agreement (i.e., agreement that is null across the entire person/number/gender paradigm). This means that in a language like, say, Korean, the former property, concerning the structural relationship between verbs and nominals, is unobservable.

      It's true that if everything that was true of our linguistic capacity was surface-true in every language, our jobs would be easier. But I think it would also be way less challenging :-)

      Delete
    8. I agree, Omer, I think. I guess what I have in mind is the following. The constraint we're after to capture your generalization has to do with how syntactic dependencies can be encoded. Ideally, this constraint exceeds the phenomenon of verbal agreement. So if a language lacks verbal agreement altogether (Korean, Danish), one would hope to be able to discern the constraint on different phenomena in Korean and Danish. Otherwise the constraint ends up being structure-specific and that wouldn't be an ideal constraint to have as part of your UG. So perhaps it's more programmatic what I have in mind than "analysis in practice" because you cannot know in advance what the generality of the constraints we seek looks like.

      Delete
  2. @Martin, the appropriate comparative question here is whether there is a language that has predicate ellipsis like English, but that treats the comparative the same way as the coordinate and different from wh-questions. I'm pretty sure there isn't, but I haven't studied the phenomenon fully.

    ReplyDelete
    Replies
    1. I would be really impressed if your explanation made a prediction that can be tested readily. Mayve "predicate ellipsis" can be compared across languages (though it hasn't been studied systematically, as far as I know), but comparatives and coordination patterns vary quite widely across languages, so that it isn't obvious at all what the relevant claim about UG (that is compatible with what we know so far) might be. So here's my challenge to you: Tell me what sort of language couldn't exist, according to this particular theory of UG.

      Delete
    2. I can tell you what sort of grammar should be very hard to find. For example one that uses mirror image rules systematically, one that regularly allows adjuncts to form A'chains into islands, one in which anaphoric dependents systematically c-command their antecedents, one's in which movement involves lowering rather than raising. Such Gs should be very sparse on the ground. Now, you will come back and tell me that I am talking Gs and you are interested in languages. And I would reply that's correct, we are talking about different things, which though related are not identical. But if GG is right, then Gs are the right objects of study, not languages. Is this a problem? Not really. But it does mean that from where I sit your challenge is ill posed. It should be what kind of G can't exist, and for this question we have quite a bit to report.

      Delete
    3. Here is a prediction that can be readily made, specific to the phenomenon in the post (though Norbert's points are exactly right as well). If a language has predicate ellipsis like English (e.g., the aux remains but the remained of the VP/AP are elided) and also has comparatives over predicates that involve ellipsis, then the comparative ellipsis will pattern like wh-movement (i.e., be island sensitive). Moreover, we can make the prediction that if a comparative construction is sensitive to the complex-NP island we observe here, then it will also be sensitive to relative clause islands, adjunct islands, etc. (ie., you won't find a construction which obeys only a subset of the islands). Further, if you have a comparative construction like this, it will not only be sensitive to islands, it will also induce islands for other island-sensitive phenomena (e.g., topicalization, clefting, wh-movement, relativization, etc).

      Delete
    4. Thanks, Jeff, this sounds like a pretty concrete prediction. I wish more generative authors spelled out their predictions in a clear way, and I wish that people turned this into an actual research programme, e.g. by applying for funding to test the predictions in a systematic fashion. It might be possible eventually to test the claims, but it might not (in practice), given that it's so hard even to know what should count as VP, and given that few languages have comparative constructions that are at all like English comparatives. So it may be that the argument will remain at the abstract level, and we'll have to hope that somehow our opponents will go away ot die, which would be really sad...

      Delete
    5. Here is an excellent paper by Jason Merchant which is helpful for thinking about the typological and theoretical issues together: http://home.uchicago.edu/merchant/pubs/gk.comps.jgl.pdf
      I suspect Jason would have more to say about this topic, given his expertise in ellipsis and comparatives.

      Delete
    6. I think Jeff's point can be made even more general, and help to address Martin's concern about English: if a language L with grammar G has a nonreduced clausal comparative (like German, English, Greek, Russian do) and if L has a kind of predicate anaphor (whether involving ellipsis or a pro-form of some kind) that appears in both comparatives and non-comparatives (in coordinations, for example, but not just in those), then putting that predicate anaphor in a clausal comparative where the anaphor is separated from the marker of the standard of comparison ("than", "als", "apoti", "chem" etc) by an island will be ill-formed. One version of this argument (from Norwegian and German) is given in (Bentzen et al 2013), for example.

      Delete
    7. Nice to see the Greek and Norwegian data, but I'll get really excited once there is evidence from, say, 10-15 languages from different families and three different continents. Baker (2015) did such a study on case, in a really admirable way, but he didn't come up with any strong universal generalizations. It may well be, though, that ellipsis phenomena are less variable across languages than case – one would have to study them systematically.

      Delete
    8. Well, the only big typological study of comparatives that also treats a bit of ellipsis is Stassen's 1985 book ("Comparison and Universal Grammar"), and he has to rely on secondary lit for descriptions of predicate ellipsis (mostly gapping and conjunction reduction), and this is highly partial and problematic. He posits a number of typological universals relating comparatives and ellipsis, but the data are too sparse to conclude much (he concludes). But in the crucial cases of interest to us here, he in a way concedes Jeff's point, writing "the index on the deleted predicate [what's left over after predicate "identity deletion"] will be pronominalized, relativized and adverbialized into some locative or instrumental case .... [it] will be syntacticized into a pronominal ... item with the original meaning 'to/at/by which'" (p.312), citing Russian and Albanian among others. If this is true, then we expect such wh-pronominals to be subject to island constraints. In other words, he doesn't believe that the plain vanilla ellipsis strategies available in those languages in his sample that have them (the sample seems to be 110 areally and genetically diverse languages) are actually the ones that will appear in the comparatives that show reductions: instead, a wh-pronominal appears. One reasonable interpretation of this, it seems to me, is that this is precisely Jeff's point. We just need the island facts to put the icing on the cake.

      Delete
  3. One reason I'm sympathetic to the IT view (or parts of it, anyway) is that the data are more nuanced, and connected to the practicalities of usage, than a generative account admits. Consider this substitute for (4c):

    (i) Which opponents did you hear some news that Serena & Venus hate to play against?

    The point is not that an example can be improved, but that the data are not nearly as black & white as originally conceived. Added to this, I think the contrast you've presented in (3a) & (3b) is questionable: how do you know those are the same? The latter sounds worse to me, and I reckon naive judgments would reveal as much. I'm not trying to quibble, but rather point out that the case you're making is built on data that have been pre-interpreted. Usage-based accounts, I think, take this variation in acceptability seriously as part of what needs to be explained, beyond just the mere existence of strong contrasts.

    More pertinent to the points of this post, I don't think usage-based accounts are conspicuously silent about the examples you show. Here's one silly but possible analysis: in English, comparatives and coordination structures differ in the probability of the right-daughter element beginning with an S-in-S structure. Similarly, wh-dependencies into complex NPs are rare, but not impossible and not unheard of. Of course, this begs the question as to how things got that way, but this is not a question generative accounts answer either.

    ReplyDelete
    Replies
    1. Regarding (3a) and (3b), Alex Drummond, Dave Kush and I conducted an acceptability rating study with naive speakers which shows the relevant interaction. 3a-c are roughly equivalent and 3d is markedly worse. We will surely publish this data soon, when we finish analyzing the relevant parts of the child directed speech (which our initial analyses suggest provides almost no data of the kind that would allow learners to distinguish coordinate ellipsis from comparative ellipsis). So, the usage-based theorist has no room to maneuver here about the data.

      As for your second point, I think the generative analysis has more to offer than you think. There are essentially three types of dependency that are relevant here. Anaphoric dependencies, which can be cross-sentential; Binding dependencies (everyone thinks that he is a force for good), which are obligatorily inter-sentential, but which originate in an A-position. And A-bar dependencies, which are obligatorily inter-sentential, but which originate in an A-bar position. When learners see that VP ellipsis is possible across sentence boundaries, then it follows that they will not be island-sensitive. Assuming that something about the meaning of comparatives with ellipsis require them to be treated as A-bar dependencies (eg., something requiring degree-abstraction), then the island facts follow. No string likelihoods required. And this is a good thing because speech to children contains very little of the relevant data to generalize from.

      But my point about the conspicuous silence is not that it is impossible to engage using their theoretical vocabulary; it's that they dismiss the fundamental results of generative grammar rather than trying to engage them. I will talk more about this in a subsequent post.

      Delete
    2. @Jeff: A clarification (playing the devil's advocate here): Do people who accept 3(b): "Valentine is a better value-ball player than I think Alexander is" also accept "Valentine is a better value-ball player than what I think Alexander is" (some English speakers accept them, don't know what percentage). No idea if child directed speech has any examples of any of these (the version with "wh" or without), but if the wh-versions exist in the data, won't a learner have evidence for "analogizing" with the wh-moved questions rather than with co-ordinate ellipsis?

      Delete
    3. Utpal, we didn't test the wh-comparative, but it is pretty clearly not a part of mainstream American English grammar, so I'd be surprised if the wh-comparative is widely judged to be acceptable. And, as far as CHILDES goes, there are no long distance comparatives at all.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. Yeah, that's interesting. Clausal comparatives are correlatives or relatives in many languages. Chomsky in the Wh-movement paper (citing Bresnan) mentions the presence of overt wh- for some speakers of English, which is why I was curious.

      Delete
  4. Maybe I am alone here, but the thing that most struck me about the IT paper is how poorly it identified the main features of the GG program. There are no references, quotes of positions, outlines of arguments, data points, nothing. Just stream of consciousness dissatisfaction. Clearly T is a big name (maybe I is too, I don't know) but the shoddiness of the argument and reasoning is really breathtaking. We live in dark times when this is published and tacitly endorsed as serious thinking.

    ReplyDelete
  5. Peggy Speas I'm reading a book (Robt. A. Burton, 'On Certainty') that has a very nice analogy for the widespread misunderstanding of innateness - He says that saying that nothing about language is innate is like thinking that since Amazon has no recommendations for you the first time you ever sign in and then it generates recommendations based on its experience with your and others' purchases, Amazon has no specific built-in algorithm for determining recommendations.

    ReplyDelete
    Replies
    1. But do you have actual knowledge that it does? There are a lot of smart folks at Amazon, and I'd be surprised if the algorithm didn't involve a hefty amount of machine learning and is by no means fixed.

      Delete
    2. Of course it involves machine learning. (What was the other option?) And of course it's not fixed; that's why my recommendations are different from other peoples' recommendations. Does that mean that Amazon "has no specific built-in algorithm for determining recommendations"?

      What all those smart folks at Amazon spend their time doing is to devise the initial state of the machine that performs the appropriate generalizations; if the initial state were not rich and intricate, it would not require such expertise.

      Delete
    3. A fixed algorithm (in technical language, a pure functional algorithm) will of course produce different results on different inputs, but it will produce the same result on the same input every time you perform it. (Note that either you or other people borrowing more books constitutes a change in input.) The fixed algorithm of addition will produce different results for different summands, but always produces the same result for the same summands.

      I'm saying that Amazon's algorithm is probably not like this, and that its improvement over time is not just a matter of substituting one fixed algorithm for another.

      Delete
    4. OK, but Peggy's point seems to be independent of whether Amazon's system is fixed in this sense, i.e. independent of whether, for some given set of book purchases, Amazon's recommendation system provides different output now than "it" did two years ago.

      Delete
  6. This comment has been removed by the author.

    ReplyDelete