Monday, December 3, 2012

I'm posting, with her permission, a comment to me from Lila Gleitman

I am posting this here, rather than as comment for there are papers she cites that I have added links to.


Hi Norbert, I read your blog and there is much to say.   Re the header above (i.e. False Truisms NH), I only fear that you, like the rest of the world, might be taking these findings as an excuse to write off the domain-specific structure-dependent scheme for the lexicon that I have devoted myself to, last several decades!   As we say in this new paper, but without room for discussion at all, is that only a minute, tiny, microscopically small set of the words are acquired "by" observation (and even this leaves aside that no one has a clue about how seeing a dog could teach you the "meaning" of dog, though it could spotlight, pun, the intended referent in best cases).   We did not sample the words.   We deliberately chose from the teeniest set of whole-object basic-level nouns those very few that our subjects could guess with any accuracy at all (roughly, 1/2 the time, with all other words guessed correctly a laughable maximum of 7% of the time), and only from specially chosen "good" exemplars.   All the same, "syntactic bootstrapping" skirts with circularity unless there is another -- non-syntactic -- procedure to learn a few "seed" words, those that, once and painfully acquired, can help you build a representation of the input (something like a rudimentary clause, or at least, to decide on the structural position of the Subject NP); it is that improved input that next makes a snap of learning all the words.   So if we now say (and we do) that there is a procedure for asyntactic, domain-general, learning of first words, this alone can't do for all the rest (try learning probably, think etc from watching scenarios in which these words are uttered -- or try even learning jump, tail, animal from the evidence of the senses, as Plato rightly noted).   So please though I'm pleased at your response to this new work, don't forget that its role is minute over the lexical stock, though crucial as the starting point.

Second, it turns out (I believe) that one trial learning is the rule rather than an exception that god makes for word learning in particular.  Despite my fondness for the small-set-of-options story we told in that first paper (don't you love it! something right about it) it turns out that subjects behave the same way if you put them in the icon-to-sound experimental condition studied by our opponents.  And I have attached the paper showing this!   The paper you read examines the natural case (using video of real contexts) and this paper examines the fake case, but in so doing achieves a level of experimental control we couldn't attain originally.  Again it is one-trial learning with no savings of the choices not made.  I think you'll like this, because it really exposes the logic.   And most important, it turns out (we at least mention this past-and-present literature on one-trial learning, particularly Gallistel, who speaks for the ants and the wasps) that learning in general (until, as I say, it becomes structured) across tasks and species has this character.   There has always been a small subset of the psychologists (Rock, Guthrie, Gallistel...) who denied, on their data + logic that associationist, gradualist, learning was the key, but they have always been overwhelmed in number and influence by the associationists.  As you point out, even you (even Chomsky, if you want to go back to his early writings) think/thought about phoneme/morpheme learning this way.   A marvelous new paper from Roediger traverses this literature and ably makes the case that learning in the general case is more determinative and less statistical than you thought.

We have some new work, I think important, showing the temporal and situational conditions that actually support the primitive first procedure for word learning. 

10 comments:

  1. Well, it's always a pleasure to be instructed by Lila. The general point I was trying to wave hands at still appears to stand: viz that statistical learning gets a grip just in case there is structure to guide it. So, the 'just guess' strategy operates when there is none of this and then, one some structure emerges (e.g. syntactic bootstrapping becomes an option) fancier learning methods come into their own. If this interpretation is correct, it is very interesting for the general debate. Why? Because it strongly suggests that those interested in statistical learning methods NEED the structures that linguists are busy providing. Without these the needed traction for these statistical learning methods is not there. Great. The moral: suing data needs LOTS OF GUIDANCE. In other words: yay for UG!!

    ReplyDelete
  2. That is yay for richly structured UG, right?

    ReplyDelete
  3. Yes, that is a self-evident [and a tad boring] - can we please be a tad less lawyer like about this [and maybe more interesting in the process]

    I am not really convinced by the conclusions Lila draws from her experiments BUT i could be mistaken. So if we assume with her that:

    "only a minute, tiny, microscopically small set of the words are acquired "by" observation"
    and "that one trial learning is the rule rather than an exception that god makes for word learning in particular".
    and "that learning in the general case is more determinative and less statistical than you thought"

    It would seem that what UG needs to provide to account for the incredible richness of natural language is a lot more than Merge. Otherwise it would for example seem impossible for the child to recover from error [if everything is "one-trial learning with no savings of the choices not made"]. So most of the interpretations of the buzzing confusion that are potentially possible but factually wrong [Quine's gavagai problem] must be somehow eliminated by the innate resources that allow for the structuring. You can of course make Chomsky's move and locate the innate lexicon outside of UG. But then UG tells me little about what makes language interesting. So what, besides Merge, should we locate in UG - on your view?

    ReplyDelete
  4. Ah C, always thirsty for enlightenment aren't you? First, whatever Merge does, it has only indirect relevance to word learning. It IS relevant if there is a lot of syntactic bootstrapping (as Lila has argued is the case). If that is correct then grammars make a big difference to word acquisition. UG has a direct effect on grammar acquisition so, if Lila is right, on word acquisition. However, as Chomsky has repeatedly noted, this does not get to the real meat of word acquisition. He provides many examples of how our words have very intricate properties hardly visible in experience. This is a very hard problem, and so far as I know few people (Paul Pietroski being a notable exception; read his work) even try to deal with this.

    Now UG and merge. I believe that most of the generalization of LGB are derivable from a simple Merge like account plus general cognitive background conditions. So the richness of LGB versions of GB are roughly available to minimalists. What minimalists aim to who, and have started to show empirically, is that this reduction works so much of FL exploits cognitive "circuits" available in general cognition. That's my view and I present an attempted derivation in 'A theory of syntax.' Feel free to take a look. If this is correct, then FL is plenty structured, in fact has more or less the structure of LGB. Is that rich enough for you?

    There is a confusion in many discussions. People assume that IF minimalists can show that the complexity of FL/UG rests on few language particular operations then FL does not exist. However, let's take Chomsky's 'organ' analogy seriously. The types cells and how they function is similar across stomachs and kidneys. There aren't special stomach cells and special kidney cells, rather there are general cells organized into stomachs and kidneys. The 'Nimshal' (a rabbinic term): say Minimalism is 100% right and that there is only Merge differentiating FL from the other operations of cognition. Does that imply that FL is not richly organized? No more than the fact that stomachs differ from kidneys implies that there are special stomach cells and kidney cells. FL/UG has more or less the properties outlined in LGB, however, this is because of how the basic operations of cognition + Merge interact.

    A more linguistic specific analogy: Chomsky 1977 shows how to reduce islands to Subjacency plus the cycle. Does this mean that islands don't exist? Nope. They do, but they are not fundamental. They are effects of the subjacency condition. the Minimalist question: what fundamental theory has the principles of LGB as effects?

    ReplyDelete
  5. you say: "I believe that most of the generalization of LGB are derivable from a simple Merge like account plus general cognitive background conditions. So the richness of LGB versions of GB are roughly available to minimalists. What minimalists aim to who, and have started to show empirically, is that this reduction works so much of FL exploits cognitive "circuits" available in general cognition. That's my view and I present an attempted derivation in 'A theory of syntax.' Feel free to take a look. If this is correct, then FL is plenty structured, in fact has more or less the structure of LGB. Is that rich enough for you?"

    It is rich enough for me but it would seem if I need "a simple Merge like account PLUS general cognitive background conditions" then the 'general cognitive background conditions' are where all the structure is located. And if this is the case then I see no longer why a connectionist could not help himself to the same general cognitive background conditions + some Elman-net that uses whatever the general cognitive background conditions provide as input. So what is it IN your account that makes it incompatible with what the other guys propose?

    ReplyDelete
  6. First, connectionist accounts don't really do computation that well. See Gallistel and King on this. I think their argument dispositive. Second, if they could do it (which they can't as G&K argue) as I do, great. It would mean that they have innately weighted connections looking for specifically linguistic dependencies. These nets would not look for much else. The problem with connectionism is not only the connections, it is also the rampant associationism. A richly structured net is fine with me, if it could actually do what it had to do which G&K argue convincingly that it can't.

    ReplyDelete
  7. I did not make myself clear; I did not mean to suggest connectionist models can account for the structure. I only said if they have access to what is provided by the general cognitive background conditions which you invoke to provide all the structure that for minimalists is no longer located in UG, THEN they could probably succeed as well. If all that is left in UG is Merge 'You got an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them. That's Merge. (Chomsky, 2012, p. 14) - now that is a computation a net could do as well...

    Anyway, in very unrelated matters; you may want to post this in a more prominent place that is appropriate for the occasion and light some virtual candles:
    http://www.newyorker.com/online/blogs/newsdesk/2012/12/the-legacy-of-noam-chomsky.html

    ReplyDelete
  8. I doubt that. The problem with connectionist models have been well rehearsed by others. It's inadequate as a cognitive theory (c.f. Fodor and Pylyshyn, Marcus) because it has trouble coding algebraic dependencies for the reasons that Marcus goes into in detail. The main problem being that recursion requires a different use of memory than connection nets generally like. IT is also bad as an implementational model for the reasons that Gallistel and King go into, c.f. their discussion of the infinity problem (a problem that language makes especially salient but they think extends even to simpler kinds of animal cognition). So, if it's a bad cognitive model and a bad implementation model then I don't see how it can be saved.

    Add to this that connectionist have a strong attraction for associationsim and there's really no hope, in my opinion. BTW, this link to associationsim is not entirely due to pig-headedness. Nets are very good pattern recognizers. So they are good at finding patterns IN data. Where they are much more clumsy is in finding algebraic structure (i.e. patterns generated by procedures). I discussed this distinction in another post ("Patterns, patterning and learning...") but it is also relevant for connectionist evaluations.

    All of this is to say, that I don't see what you are driving at. Maybe this: that if we restrict the narrow faculty of language to Merge then there's not much of a faculty of language. I think that this is incorrect as I've tried to explain in "Minimality and the language specificity of UG." It is a natural conclusion to draw, but I think that there is a version of the program where this is just a mistake. Of course there may be versions where this follows. But whether it does or doesn't is not a consequence of taking taking a minimalist view of grammatical matters.

    Thanks for the link to Marcus. I will post it more prominently.

    ReplyDelete
  9. Hi Norbert,
    I loved reading this piece! Well written! :)

    Andres
    digestion help
    stomach support

    ReplyDelete