Comments

Showing posts with label structure dependent. Show all posts
Showing posts with label structure dependent. Show all posts

Sunday, September 8, 2013

A cute example


One of Chomsky’s more charming qualities is his way of making important conceptual points using simple linguistic examples. Who will ever forget that rowdy pair colorless green ideas sleep furiously and furiously sleep ideas green colorless and their pivotal roles in divorcing the notions ‘grammatical’ from ‘significant’ or ‘meaningful’ and in questioning the utility of bigram frequency in understanding the notion ‘grammaticality.’[1] Similarly, kudos goes to Chomsky’s argument for structure dependence as a defining property of UG using Yes/No question formation. Simple examples, deep point.  However, this talent has led to serious widespread misunderstandings. Indeed, very soon “proofs” appeared “showing” that Chomsky’s argument did not establish that structure dependence was a built in feature of FL for it could have been learned from the statistical linear patterns available to the child (see here for a well-known recent effort).  The idea is that one can compare the probabilities of bi/tri-gram sequences in a corpus of simple sentences and see if these suffice to distinguish the fine (1a) from the from the not so-fine (1b).

(1)  a. Is the man who is in the corner smoking
b. *Is the man who in the corner is sleeping

It appears possible to do this, as the Reali and Christiensan (R&C) paper shows. However, the results, it seems, (see here p 26) are entirely driven by the greater frequency of sequences like who is over who crying in their corpus, which in turn derives from the very high frequency of simple who is questions (e.g. who is in the room?) in the chosen corpus.[2]

There has been lots of discussion of these attempts to evade the consequences of Chomsky’s original examples (the best one is here). However, true to form, Chomsky has found a way to illustrate the pointlessness of these efforts in a simple and elegant way. He has found a way of making the same point with examples don’t affect the linear order of any of the relevant expressions (i.e. there are no bi/tri-gram differences in the relevant data).  Here’s the example:

(2)  Instinctively, eagles that fly swim

The relevant observation is that instinctively in (1) can only modify swim. It cannot be understood as modifying fly. This despite the fact that whereas it is true that eagles instinctively fly, they don’t swim.  The point can be made yet more robustly if we substitute eat egg rolls with their sushi for swim. Regardless of how silly the delivered meaning, instinctively is limited to modifying matrix predicate. 

This example has several pleasant features. First, there is no string linear difference to piggyback on as there was in Chomsky’s Y/N question example in (1). There is only one string under discussion, albeit with only one of two “possible” interpretations. Moreover, the fact that instinctively can only modify the matrix predicate has nothing to do with delivering a true or even sensible interpretation. In fact, what’s clear is that the is fronting facts above and the adverb modification facts are exactly the same.  Just as there is no possible Aux movement from the relative clause there is no possible modification of the predicate within the relative clause by a sentence initial adverb. Thus, whatever is going on in the classical examples has nothing to do with differences in their string properties, as the simple contrast in (2) demonstrates.

Berwick et. al. emphasize that the Poverty of Stimulus Problem has always aimed to  explain “constrained homophony,” a fact about absent meanings for given word strings.  Structure has always been in service of explaining not only what sound-meaning pairs are available, but just as important which aren’t.  The nice feature of Chomsky’s recent example is that it neutralizes (neuters?) a red herring, one that the technically sophisticated seem to be endlessly hooking on their statistical lines. It is hoped that clarifying the logic of the POS in terms of “absent possible interpretations,” as Berwick et. al. have done, will stop the diminishing school of red herrings from replenishing itself.






[1] See Syntactic Structures p. 15-16. Chomsky’s main point here, that we will need more than linear order properties of strings to understand how to differentiate the first sentence from the second has often been misunderstood.  He is clearly pointing out here that we need higher order notion parts-of-speech categories to begin to unravel the difference. This “discovery” is remade every so often with the implication that it eluded Chomsky. See here for discussion.
[2] See Berwick et. al. here for a long and thorough discussion of the R&C paper here. Note, that the homophony between the relative pronoun and the question word appears to be entirely adventitious and so any theory that derives its results by generalizing from questions to relative clauses on the basis of lexical similarities is bound to be questionable. 

Monday, December 3, 2012

I'm posting, with her permission, a comment to me from Lila Gleitman

I am posting this here, rather than as comment for there are papers she cites that I have added links to.


Hi Norbert, I read your blog and there is much to say.   Re the header above (i.e. False Truisms NH), I only fear that you, like the rest of the world, might be taking these findings as an excuse to write off the domain-specific structure-dependent scheme for the lexicon that I have devoted myself to, last several decades!   As we say in this new paper, but without room for discussion at all, is that only a minute, tiny, microscopically small set of the words are acquired "by" observation (and even this leaves aside that no one has a clue about how seeing a dog could teach you the "meaning" of dog, though it could spotlight, pun, the intended referent in best cases).   We did not sample the words.   We deliberately chose from the teeniest set of whole-object basic-level nouns those very few that our subjects could guess with any accuracy at all (roughly, 1/2 the time, with all other words guessed correctly a laughable maximum of 7% of the time), and only from specially chosen "good" exemplars.   All the same, "syntactic bootstrapping" skirts with circularity unless there is another -- non-syntactic -- procedure to learn a few "seed" words, those that, once and painfully acquired, can help you build a representation of the input (something like a rudimentary clause, or at least, to decide on the structural position of the Subject NP); it is that improved input that next makes a snap of learning all the words.   So if we now say (and we do) that there is a procedure for asyntactic, domain-general, learning of first words, this alone can't do for all the rest (try learning probably, think etc from watching scenarios in which these words are uttered -- or try even learning jump, tail, animal from the evidence of the senses, as Plato rightly noted).   So please though I'm pleased at your response to this new work, don't forget that its role is minute over the lexical stock, though crucial as the starting point.

Second, it turns out (I believe) that one trial learning is the rule rather than an exception that god makes for word learning in particular.  Despite my fondness for the small-set-of-options story we told in that first paper (don't you love it! something right about it) it turns out that subjects behave the same way if you put them in the icon-to-sound experimental condition studied by our opponents.  And I have attached the paper showing this!   The paper you read examines the natural case (using video of real contexts) and this paper examines the fake case, but in so doing achieves a level of experimental control we couldn't attain originally.  Again it is one-trial learning with no savings of the choices not made.  I think you'll like this, because it really exposes the logic.   And most important, it turns out (we at least mention this past-and-present literature on one-trial learning, particularly Gallistel, who speaks for the ants and the wasps) that learning in general (until, as I say, it becomes structured) across tasks and species has this character.   There has always been a small subset of the psychologists (Rock, Guthrie, Gallistel...) who denied, on their data + logic that associationist, gradualist, learning was the key, but they have always been overwhelmed in number and influence by the associationists.  As you point out, even you (even Chomsky, if you want to go back to his early writings) think/thought about phoneme/morpheme learning this way.   A marvelous new paper from Roediger traverses this literature and ably makes the case that learning in the general case is more determinative and less statistical than you thought.

We have some new work, I think important, showing the temporal and situational conditions that actually support the primitive first procedure for word learning. 

Friday, October 5, 2012

Three psychologists walk into a bar…



Linguists should applaud recent efforts by the Royal Society to inject humor into research on language.  Clearly, the Royals believe that linguists have taken their work far too seriously and in a welcome reversal of a previous policy of benign neglect, they have published a near pitch perfect parody (“How Hierarchical is language use”) by the trio of Frank, Bod and Christiansen (hereafter FBC). These authors breezily describe a research program aimed at unseating a fixed point in research on natural language that has been virtually uncontested for the last several hundred years: that sentences and phrases have both linear and hierarchical dimensions. Like all good parodists, they are undaunted by the obvious; in this case the fact that anyone who has ever examined language has concluded that words combine into phrases that combine into sentences. No, this dynamic trio, in the grand tradition of J. Swift and W. Allen, suggests that all we need are bi-grams and tri-grams.  Using the powerful methods of neuroscience, computational linguistics and behavioral psychology they propose that it’s all just one damn word after another, no hierarchy needed. Moreover, FBC do this without ever letting down their parodic cloak.  Indeed, the paper is so successfully crafted (it has such a vivid sense of seriousness) that as a public service I believe it is necessary to affix to it the right warning label just in case those with arthritic funny bones are misled, take this exquisite lampoon too seriously, and thereby wander into an intellectual desert.

The paper, like all good parody has a central rhetorical thread.  The first strand is that linguistic hierarchy poses a problem for evolution due to its biological sui genericity.  The second, as FBC note, is that linguistic use cares about sentential sequence. The third combines the two to conclude that because linguistic hierarchy is evolutionarily problematic (in contrast to sequence information) it couldn’t possibly have evolved (i.e. arisen in humans) and so linguistic systems can’t really have any.  Conclusion; linguistic use is sensitive exclusively to sequence information, as it must be. The satire requires that you inadvertently slip to the conclusion that if language use exploits linear properties of sentences and phrases then hierarchy is dispensable for all linguistic analysis (after all: if you can hop on one leg who needs two legs?). Occam and his razor are invoked to make sure that after you slip to their desired conclusion you don’t jump back up incredulous.  It’s all neatly done and very amusing.

Let’s pull the conceit apart to better admire its artistry. FBC know that linguists from the thirteenth century grammarians, through Bloomfield and Harris in the mid 1950s to Chomsky today have all taken it as obvious that natural language grammars are hierarchically organized, labeled brackets (or parse trees) being the instrument of choice to display this. The reader who is in on their send-up knows why. There are centuries of data in its favor.  Here’s a taste.

Such bracketing allows one to distinguish the two readings of ‘old men and women,’ the ambiguity of ‘I photographed a woman with a camera,’ and the three readings in ‘I saw the girl sitting on the stoop.’ The several readings can be easily coaxed from these sentences, even if some jump to the ear faster than others.  So, if one’s interest is in accounting for how the same string of words can carry several readings labeled brackets (or equivalently parse trees) are very very handy.  And not only for this. Once one takes even a modestly serious look at language (something no satirist should do as too much seriousness wrong foots the parodic muse), one finds an inexhaustible number of intra-sentential relations that seem to supervene on hierarchical rather than linear properties of phrases and sentences.  This even has a name; grammatical rules are structure dependent.  Here’s a short (and not exhaustive) list of phenomena that advert to hierarchical structure: Aux fronting, WH question formation, topicalization, focus movement, VP fronting, VP ellipsis, reflexive binding, pronoun obviation, passivization, raising, negative concord, sluicing, parasitic gap licensing, donkey anaphora, island effects, …etc., etc., etc. In fact, it is almost impossible to find a syntactic phenomenon that fails to exploit hierarchical relations.

Knowing all of this, what does a good satirist do? Ignore and misrepresent. So for example, FBC find evidence from neuroscience and psychology that there are linear effects in language use.  Yup, those neuro guys using their big expensive fMRIs are able to finally show that Broca’s area lights up (I hope the pictures of Broca’s area are in purple, as red, blue and green are so last year) when both unimpaired and aphasic humans parse a sentence left to right. Even more astounding, Broca’s area also lights up in processing music! Wow, call the Nobel committee. This is a breathtaking discovery. Who would have thought that order left/right a difference might understanding in a sentence make!! Thank goodness for “repetitive transcranial magnetic stimulation” techniques for without these the relevance of linear order to parsing sentences would have surely remained hidden from view. You gotta love this.  Said with a straight pen this is very funny stuff.

But wait, there’s more. FBC (no doubt channeling Anthony Trollope who made a similar observation in his 1883 autobiography) note that linear order can affect the recognition of agreement dependencies.  So we say and hear approvingly The coat with the ripped cuffs were hanging in the closet rather than was hanging in the closet when the linearly nearest nominal is plural rather than singular; an interesting and curious effect that implicates linear proximity in assessing agreement. This demonstrates that those linguists who natter on about hierarchy being relevant for coding agreement effects are just obtuse.  Of course, the budding satirist should take note here and learn how to use data to misdirect.  Quietly kick under a nearby rug the fact that The doors in this compound were/*is closed does not pattern like the master example or that speakers when asked to assess the acceptability of the first sentence above with was in place of were rate it no worse than were, or that for many sentences proximity is irrelevant (e.g. The book that the boys liked was/*were long). No good parody lingers over complications: if linear order rules in one example then it does so everywhere, every time. Disagree and Occam (Sweeny Todd style) will gut you with his razor.  Anyone with aspirations to satire has to love FBC senseis' demonstrated artistry.

Consider one last wonderful manoeuver.  Chomsky fathered one of the canonical arguments for structure dependence based on aux inversion in Yes/No (Y/N) question.  The argument is as follows. Consider how Y/N questions are formed in English. Here are some examples:
(1)       a. Can John come
b. Will Mary sing
c. Is Frank kissing Sue
Using (1a-c) as “data” what kind of rule would one form? Easy: take the aux (i.e. “helping verb”) and move it to the front.  Question: which aux? Easy to answer in the examples in (1) as there is but one.  So let’s consider more complex forms.  For example, what is the correct form of the Y/N question taking (2a) as an answer?  It is clearly (2b) not (2c):
(2)       a. John is saying that Mat can swim
            b. Is John saying that Mat can swim
            c. *Can John is saying that Mat swim
Ok, it seems that the rule needs to specify which helping verb to take when there is more than one.  Here are two possible answers: take the one linearly closest to the front, i.e. the left most one. That works in correctly singling out (2b) from (2c). But, and there is always a ‘but’ isn’t there, what of yet more complex forms.  What do we do in (3a,b)?
(3)       a. The fact that John is sleeping should surprise Sue
            b. The man who Bill is talking to will surprise Sue
            c. That John was asleep all day might irritate Mary
Here if we move the leftmost helping verb, the one linearly closest to the front we get rather delightful word salad:
(4)       a. *Is the fact that John sleeping should surprise Mary
            b. *Is the man who Bill talking to will surprise Mary
            c. *Was that John asleep all day might irritate Mary
This indicates that the rule cannot be framed in terms of moving the linearly leftmost helping verb. So what’s the right restriction? Well it seems that we need to move the “highest” one, the one next to the subject.  The subject in (3a) is the fact that Bill is sleeping, in (3b) it is the man who Bill is talking to and in (3c) that John was asleep all day. So the right auxiliaries to move to form the unimpeachable (5a,b,c) are should, will and might respectively:
(5)       a. Should the fact that John is sleeping surprise Mary
            b. Will the man who Bill is talking to surprise Mary
            c. Might that John was asleep all day irritate Mary
Note, to get this right we invoke hierarchical notions: we need to treat several words (in fact the linear size of the subject is unbounded) as a single unit, i.e. the “subject.” Not surprisingly, the same rule works in the other cases as well. 

This short survey of some facts is, of course, not the whole story. However, it gives one a good flavor for the kind of problem that has convinced grammarians that hierarchical structure matters. And that humans are predisposed to exploit such structure in forming the rules of grammar.  And that this predisposition is built into the powers that humans bring to the task of acquiring and using language.  The reasoning is simple: the data that will tell the child to choose the hierarchical specification over the linear one is only available in examples like (4) and (5) and given that such sentences are virtually absent in the data available to the child learning the grammar it cannot be that the predisposition towards the hierarchical condition is data driven. 

Champions of the linear have analyzed this simple example to death (as Jerry Fodor once observed: this should teach Chomsky never to give a simple uncomplicated illustrative example!).  They have worked mightily to construct algorithms using bi- and tri-grams that can distinguish the relevant good and bad cases all the while eschewing hierarchical structure. FBC note this and get rid of all problems of hierarchy with the wave of reference or two.  Moreover, being first-rate parodists they keep hidden the fact that all these algorithms have a common problem. Were the facts opposite to those cited the relevant algorithms could “learn” these as well. So, it is only an accident that that there is no natural language that has a rule analogous to the one that generates (4). For FBC such a natural language would be no less humanly accessible than the ones we happen to find.  We even know what this linguistic “gap” would look like ((4) good, (5) bad).  Thus the attested linguistic holes on this view are purely accidental.  It is of course perfectly conceivable that the absence of well-formed structures like (4) is an accidental gap and that children could learn to form Y/N questions in this way.  Accidents do happen.  And some people can be sold famous bridges.  Such niceties however would clog up a good parody, and so FBC wisely put them aside.

There are other titivations that FBC artfully use to ornament their piece, to make it seem like they are really serious about the overall proposal. Good satire demands detail and a straight face.  However, I don’t want to ruin your pleasure in finding these bonbons for yourself.  What I would like to do is end with an appreciation of what I take the real tour de force of their masterpiece to be. As noted at the outset, FBC wrap the whole discussion in a delightful Darwinian wrapping. It seems that natural selection cannot digest the kinds of hierarchy found in grammars. Thus, linear relations it is or you are a creationist!! Like any good parody, this is more suggested than stated. But the evolutionary angle adds a welcome frisson to the discussion. So, not only is this paper really funny, but it smacks of the monumental. Just think, hierarchy:God:religious fanatic, linearity:Darwin:hard scientist.  Onto the ramparts! It’s time for some culture war.

Let me end with another round of kudos. This paper is a must read. Until I got through it I thought that the art of the academic lampoon was dead. FBC have proved me wrong. There are levels of silliness, stupidity and obtuseness left to plumb. Thanks to FBC and the Royal Society for demonstrating that parody and satire are still possible.