Faculty of Language: A cute example

Sunday, September 8, 2013

A cute example

One of Chomsky’s more charming qualities is his way of making important conceptual points using simple linguistic examples. Who will ever forget that rowdy pair colorless green ideas sleep furiously and furiously sleep ideas green colorless and their pivotal roles in divorcing the notions ‘grammatical’ from ‘significant’ or ‘meaningful’ and in questioning the utility of bigram frequency in understanding the notion ‘grammaticality.’[1] Similarly, kudos goes to Chomsky’s argument for structure dependence as a defining property of UG using Yes/No question formation. Simple examples, deep point. However, this talent has led to serious widespread misunderstandings. Indeed, very soon “proofs” appeared “showing” that Chomsky’s argument did not establish that structure dependence was a built in feature of FL for it could have been learned from the statistical linear patterns available to the child (see here for a well-known recent effort). The idea is that one can compare the probabilities of bi/tri-gram sequences in a corpus of simple sentences and see if these suffice to distinguish the fine (1a) from the from the not so-fine (1b).

(1) a. Is the man who is in the corner smoking

b. *Is the man who in the corner is sleeping

It appears possible to do this, as the Reali and Christiensan (R&C) paper shows. However, the results, it seems, (see here p 26) are entirely driven by the greater frequency of sequences like who is over who crying in their corpus, which in turn derives from the very high frequency of simple who is questions (e.g. who is in the room?) in the chosen corpus.[2]

There has been lots of discussion of these attempts to evade the consequences of Chomsky’s original examples (the best one is here). However, true to form, Chomsky has found a way to illustrate the pointlessness of these efforts in a simple and elegant way. He has found a way of making the same point with examples don’t affect the linear order of any of the relevant expressions (i.e. there are no bi/tri-gram differences in the relevant data). Here’s the example:

(2) Instinctively, eagles that fly swim

The relevant observation is that instinctively in (1) can only modify swim. It cannot be understood as modifying fly. This despite the fact that whereas it is true that eagles instinctively fly, they don’t swim. The point can be made yet more robustly if we substitute eat egg rolls with their sushi for swim. Regardless of how silly the delivered meaning, instinctively is limited to modifying matrix predicate.

This example has several pleasant features. First, there is no string linear difference to piggyback on as there was in Chomsky’s Y/N question example in (1). There is only one string under discussion, albeit with only one of two “possible” interpretations. Moreover, the fact that instinctively can only modify the matrix predicate has nothing to do with delivering a true or even sensible interpretation. In fact, what’s clear is that the is fronting facts above and the adverb modification facts are exactly the same. Just as there is no possible Aux movement from the relative clause there is no possible modification of the predicate within the relative clause by a sentence initial adverb. Thus, whatever is going on in the classical examples has nothing to do with differences in their string properties, as the simple contrast in (2) demonstrates.

Berwick et. al. emphasize that the Poverty of Stimulus Problem has always aimed to explain “constrained homophony,” a fact about absent meanings for given word strings. Structure has always been in service of explaining not only what sound-meaning pairs are available, but just as important which aren’t. The nice feature of Chomsky’s recent example is that it neutralizes (neuters?) a red herring, one that the technically sophisticated seem to be endlessly hooking on their statistical lines. It is hoped that clarifying the logic of the POS in terms of “absent possible interpretations,” as Berwick et. al. have done, will stop the diminishing school of red herrings from replenishing itself.

[1] See Syntactic Structures p. 15-16. Chomsky’s main point here, that we will need more than linear order properties of strings to understand how to differentiate the first sentence from the second has often been misunderstood. He is clearly pointing out here that we need higher order notion parts-of-speech categories to begin to unravel the difference. This “discovery” is remade every so often with the implication that it eluded Chomsky. See here for discussion.

[2] See Berwick et. al. here for a long and thorough discussion of the R&C paper here. Note, that the homophony between the relative pronoun and the question word appears to be entirely adventitious and so any theory that derives its results by generalizing from questions to relative clauses on the basis of lexical similarities is bound to be questionable.

73 comments:

karthik durvasulaSeptember 8, 2013 at 1:43 PM
One could also maintain the argument for Yes/No questions with more interesting sentences.

It is a matter of finding cases where the extraction of the embedded Aux results in what are better bigram/trigram sequences than the extraction of the "correct" matrix Aux.

Let the simple sentence be as in (1) below. If bigram/trigram frequencies are what drive the Yes/No question response, then (2) should be better than (3) since "Sean happy" is surely worse that "Sean is happy" as far as the child's input is concerned. However, it is (3) that is grammatical.

(1) The boy who is called Sean is happy.
(2) *Is the boy who called Sean is happy.
(3) Is the boy who is called Sean happy.
ReplyDelete
Replies
UnknownSeptember 9, 2013 at 8:13 PM
This post certainly was educational. I learned that instinctively swimming eagles are cute. My guess would have been they'd be wet. But the fascinating ornithological fact was the only novel insight for me. Predictably, non Chomskyan modelling, this time in the incarnation of a dated Reali & Christiansen paper, has to be wrong. Predictably virtually nothing from that paper was cited, let alone anything of the work the Christiansen lab has done after publishing that paper.

Always the optimist i had a look at the Berwick et al. paper Norbert kindly linked. Like Norbert i use shortcuts when deciding whether a paper is worth reading. Mine is looking at the reference section to make sure the authors have covered RECENT literature. I gotta say i was very disappointed: nothing more recent than Reali & Christiansen, 2005 from that group. The name MacWhinney did not even appear, nor did ANY paper published in the important 2010 special issue of Journal of Child Language that was dedicated entirely to modelling and had several papers on multiple cue integration. Last not least there was a reference to the work of Jeff Elman:
Elman, J. (2003). Learning and development in neural networks: The importance of starting small. Cognition,48, 71–99. The title sounded familiar, the year did not - so i looked it up: indeed, Elman published this paper but it was in 1993 not 2003.

Yes, there were also a few references to recent work by Alex Clark [and co-authors] and the 2011 Perfors et al. paper. But it remains a mystery how Berwick et al. can draw the far reaching conclusions about the value of such work when 90% of the work in that field is simply ignored.

And then there are the bees. Assuming for a moment that ALL non-Chomsky-style models need to be rejected because they cannot account for the non-ambiguity of flying eagles that also swim or eat or ... just what makes anyone think that bee communication is relevant to figuring out "Which expression-generating procedures do children acquire; and how do children acquire these procedures?" Is it hoped that the bee model outlined in Gallistel [2007] will succeed where Reali & Christiansen [2005] failed and give an account for the examples Norbert calls cute?
ReplyDelete
Replies
Alex DrummondSeptember 10, 2013 at 7:41 AM
This comment has been removed by the author.
ReplyDelete
Replies
Alex ClarkSeptember 10, 2013 at 5:42 PM
There is clearly a difference between an argument for the structure dependence of syntax (SDS) and an argument for the innateness of the structure dependence of syntax. Now I completely accept that natural language syntax has hierarchical structure in some sense and example 2 is a perfectly good argument for that -- but I don't see why this is an argument for the *innateness* of SDS.

Norbert has been adamant that the learner learns from sound/meaning pairs, and if you are learning from sound meaning pairs then you will know that adverbs only modify verbs in the matrix clause because semantically you only ever see adverbs modifying verbs in the matrix clause.
E.g. if you are learning from sound/meaning pairs then when you hear "tomorrow the boys that live next door are coming home" you know that tomorrow modifies "coming home" not "live next door".

Now I don't think that Berwick et al share Norbert's strongly held view (though Paul P may?) -- or at least there is no indication in that paper that they do, and given that all of the papers learn only from surface strings, then surely they would have mentioned that they should learn from meanings as well.
But even if they don't there is surely some more reasoning that you need to make it a learnability problem.

Alex D says "Structure dependence is a pervasive property of natural language grammars that is exhibited in a wide range of constructions. An acquisition model that can't explain why children invariably acquire grammars that have this property is just a non-starter.". If there is some universal property of language, I don't think that the acquisition model necessarily needs to explain it. This is confusing Plato's problem with Greenberg's problem, or getting UG confused with universal grammar. I mean one explanation for why SD is pervasive is that it is built into the biases of the learning algorithm, and that is not a bad explanation, (probably the right one), but there are other explanations. And one reason why children acquire grammars of this type might just be that the languages they are exposed to are uniformly of this type and they learn that. That is to say the biases they have are towards a class of grammars that include structure dependent and structure independent syntax and as in Amy Perfors work, the learner can figure out which is applicable.

So for it to be a POS argument, you need at the very least to show that examples like (2) are rare?
And sentences like "the boys that live next door play noisily/with their dog all the time" seem like they might be frequent.
That was quite a big part of the arguments about (1). And you should be clear about the inputs.. semantics or not? Because if the argument is about interpretation then that becomes a really big deal.
ReplyDelete
Replies
davidadgerSeptember 11, 2013 at 12:02 AM
I think interesting case might be attraction errors in agreement: that is where the verb agrees with the immediately adjacent noun rather than the structural subject. We know from the work of Bock and others that these are common, so we get things like

1. The key for the cupboards are lost

These errors appear in corpora, in production, and people often don't notice them in online perception. Now we also know that agreement is quite fragile in language change (e.g sociolinguistic work by Ferguson and others). If the acquisition mechanism has no bias towards structure rather than linearity, one might expect that some SVO languages would have moved to a purely adjacency based agreement system, where the verb agrees with the immediately preceding noun rather than the subject. A language acquirer will have a ton of evidence consistent with this (since linearly preceding and structural subject will both be analyses of NP V input). But this has never happened. In fact, I know of no language where agreement depends in the linearly adjacent noun and children, to my knowledge, have never been reported to generalise their agreement rules in this was, although they generalize inflectional systems in other ways. It's hard to see how to explain this general fact about language in the absence of a bias towards structural rather than linear analyses. There's also the interesting work of Culbertson with Smolensky on how learners of artificial languages generalise statistically variable patterns in the input towards typologically common patterns, which again suggests an innate bias towards making particular kinds of generalisations where the evidence in the input is carefully designed to be equivocal, or biased in the other direction. So we would like some theory of what those innate biases are, and generative syntax provides one such theory (of course there are also competing accounts in functionalism that appeal to innate non-syntactic biases, although personally I think that these can never tell more than art of the story).
ReplyDelete
Replies
UnknownSeptember 11, 2013 at 3:32 AM
This is in part a reply to Alex D.'s question and in part response to several of the comments here:

re alternative accounts; as Alex C. points out: there is "a difference between an argument for the [1] structure dependence of syntax (SDS) and an argument for the [2] innateness of the structure dependence of syntax". You can have a view about language like say Paul Postal [one of the linguists i asked] on which accepting [1] does not commit you to accept [2]. Now given that Postal considers himself a linguist he works on language not on language acquisition [which he considers to be a job of psychologists]. So he has no account for how children acquire language. But i think his attitude [let those who are actually experts on developmental psychology work on acquisition] is commendable. "Only" working on syntax seems quite a monumental task for one person to handle. This does not mean linguists should not INFORM psychologists about the results of their work. But there is no reason for linguists to do work for which they are not trained [like speculation about biology, neurophysiology etc. etc.]

If Norbert could overcome his allergy to names like Geoffrey Sampson and would read his work he'd knew that there are corpera besides WSJ that contain Y/N questions with subject relatives. And if he'd read the massive literature on acquisition he'd also know about proposals about how kids acquire the skill that do not require innate LAD.

Until we have actually independent evidence from brain research [e.g. some structure in the brain that generates the I-language] all the arguments for innate language specific constraints remain circular: You assume that there ARE such constraints and hence if you find structure dependence in the language you assume it is because of the innate constraints. But, Alex C. is quite right, to break the circle you need additional evidence for innateness.

David A. attempts to use the persuasive, but misleading argument from [perceived] uniformity of languages to draw the desired conclusion: there is uniformity in languages, there is [genetic] uniformity among humans - hence the uniformity of languages is explained by uniformity of innate biases shared by all humans. And he gives some evidence from a few languages [I have no idea how many he surveyed] and reveals: "I know of no language where agreement depends in the linearly adjacent noun". It continues to amaze me how Chomskyans can draw these grand conclusions based on the knowledge of a few languages [which if one trusts Postal is massively incomplete even for English at the moment]. May I remind you that seemingly in 2002 Hauser et al. knew of no language that lacked recursion and declared recursion as a language universal, possibly the only component of FLN. Just three years later the same authors claimed that there might be languages that lack recursion [a claim repeated several times by Chomsky in print]. So how do you know that not 10 years from now someone discovers the kind of language you do not know of? [Ironically your argument would be stronger WITHOUT the 'no language' claim - English seems structure dependence so psychologists need to account for how kids learn it regardless of what other languages may or may not have]

Let me close by reminding you of something Chomsky has said recently. Far from what has been suggested about me, i do not think that everything in Science of Language [or even worse what Chomsky says] is 'crap'. Some of his remarks are certainly worth listening to:

“... the fact that [something is] the first thing that comes to mind doesn’t make it true.... It is not necessarily wrong, but most first guesses are. Take a look at the history of the advanced sciences. No matter how well established they are, they almost always turned out to be wrong” (Chomsky, 2012, 38).
ReplyDelete
Replies
davidadgerSeptember 12, 2013 at 1:23 PM
Hi Christina. Re your question:it's an issue of analysis. Kayne has a proposal for capturing these kinds of order differences in English vs French (where assez comes before rather than after the adjective) which he ties down to a hierarchical property which secondarily entails a linear order. the idea is that there is a feature associated with the degree word (enough vs assez) differentially causing movement of the adjective to a position where it is hierarchically superior. That then causes it to be pronounced first. The absence of this feature means that the movement doesn't take place. He then uses this basic principle to capture quite a wide range of differences and fairly subtle effects in order both within English and between French and English connected to this kind of phenomenon. It's worked out over a number of papers but the analysis is sketched in his Parameters and Principles of pronunciation on Lingbuzz. So Paul's counterexample really depends on whether the right analysis of this pattern is one that appeals to linear order or to hierarchy, so that kind of example doesn't really help with the issue. That's why I tried the attraction error argument. You may be right that there's an SV language that conditions agreement linearly, but in my 25 or so years of obsessively reading syntax papers and grammars I've never come across any research reporting one and the acquisition literature doesn't report that this is the kind of overgeneralisation that kids do, which is, I think, at least food for thought.
ReplyDelete
Replies
UnknownSeptember 14, 2013 at 10:11 AM
As promised, below is the alternative to Norbert's analysis of the sentence
(2) Instinctively eagles that fly swim.

One of my linguist-friends proposes a scenario that involves no reference to phrase structure at all, using a term-labelled category logic built on top of the Lambek calculus which infers strings from strings, with no reference to any prior proof steps. This is intended simply to show that there is not necessity about recourse to structure. My friend also gave me the outline of the formal proof but sadly this does not fit into this format and it seems unless one is Norbert one cannot add attachments here. But I am confident someone like Alex Clark can explain that part if people are interested [or answer other questions that may arise, I am the messenger, NOT the expert].

---------------------------------

Let's assume that 'instinctively' has an event (type)-variable argument. It identifies certain kinds of events as arising from wired-in neural programs. 'Instinctively, eagles that fly swim' then corresponds to a paraphrase: the swimming behavior of the subset of eagles that fly reflects a neurologically built-in, not a learned, capability'.

'Eagles that fly', however, does not denote an event (type), but rather a subset of the set 'eagle'. So there is a type mismatch between 'instinctively' and 'eagles that fly'. If we build up meanings compositionally, then by the time we get to the point where 'instinctively has something to apply to as a functor category, all we have is an event-class, in which a subset of eagles habitually carry out a particular activity. No configurational structure necessary.

This scenario can be modeled as a proof in a term-labelled Lambek system with event semantics. The output string string will be provable *just* in case the substring corresponding to the event-type eagles-that-eat swim' is available to the predicate whose argument type is a set of events. Which means that the intersection of the set eagle and fly has already been constructed and incorporated into the predication as an argument, of whatever type you use to model kinds. To get the string, that is, you have to have intersected the two sets to get the denotation of 'eagles that fly'. There is no way that 'instinctively' or whatever can get at the the event variable of a predicate corresponding to one of the intersected sets; it is simply not available as a proof term at the point when 'instinctively' takes its argument, because you had to form the set intersection corresponding to the kind-object argument of 'eagles that fly'. To get the sentence in question, you have to get a predicate on event types taking as its argument the category whose denotation is a swim-event type, or you don't get the string. No reference to internal structure is necessary.

Again: the final step in the proof supplies 'instinctively' with the argument λe swim(e, eagle ∩ fly). The event variable associated with 'fly' is inaccessible within the intersection that yields the subset of eagles such that each eagle flies.
ReplyDelete
Replies
Olaf K.September 15, 2013 at 2:01 AM
Maybe I am missing something but I am not sure I would interpret this as an alternative to the structural explanation. Compositionality is not a purely semantic notion, but refers to the syntax-semantics mapping. So although you can formulate the interpretaion of (2) (and the absent interpretation of (2)) in semantic terms, that does not mean that there is no implied reference to structure. The event variable of 'fly' cannot be picked up by 'instinctively' because 'fly' is interpreted as modifying 'eagles' before 'instinctively' is interpreted. The crucial word here is 'before'. That implies hierarchy, and the semantic type mismatch is a consequence of it.
ReplyDelete
Replies
UnknownSeptember 15, 2013 at 6:22 AM
@Olaf. thanks for your interest. I paste below the reply of my friend and hope this answers your question. I also recommend that you ask Norbert [or Robert Berwick] for an equally detailed analysis of the internal structure of the example on the minimalist view and for suggestions how this structure is implemented in human brains [in other words, ask them to put their cards on the table as well]

---------------------------
This comment reflects a serious misunderstanding of the structure of logical proof, unfortunately a common misconception among people who do not actually work in the proof theory of formal logic. The 'order' of the proof is irrelevant to the fact that the conclusions are a theorem of the assumptions. The fact that p --> q and p entails q \/ r is simply a consequence of the way in which the proof steps verify the correctness of the sequent p --> q |- q \/ r. 'Before' and after refer to the way one verifies theoremhood. There are typically many ways to prove the same theorem in logic, all of them equally valid, though one can reify proofs as structured object and define normal forms and construct metatheorems, as in the work of Gabbay, Girard, and the study of proof nets; but that has nothing to do with the status of the theorems of first order logic themselves. And a proof in category logic, such as the one that is informally summarized in the previous note, works exactly like a natural deduction proof in propositional logic; \ and / are implication operators, and just as in standard proofs of theorems in (non)classical logics, the order of inference is simply a demonstration that the rules of the logic, which have themselves no order with respect to each other, guarantee and verify the entailment corresponding to the theorem. The writer seems to believe that if I have

p --> q |- p --> q, p |- p
-------------------------------- imp elim
p --> q, p |- q
-------------------------------- \/ intro
p --> q, p |- q \/ r
-------------------------------- imp intro
p --> q |- p --> (q \/ r)

then the theorem p ---> q |- p --> (q \/ r) depends on implication elimination applying 'before' implication introduction, or the inference p --> q, p |- q 'existing before' p --> q |- p --> (q \/ r). He needs to understand that that belief is incoherent: a logical proof applies inference rules in an order in any particular proof, but the order of the application of the rules has nothing to do with the theoremhood of the entailment relation between p and q when p |- q.

What is established in the category logical proof sketched is that the logical structure of 'eagles that fly swim' is not an object that can be an argument of the functor 'instinctively'. It is the wrong type. And the model theoretic proof corresponding to the logical proof will show the same thing in terms of the algebraic interpretation of the proof terms. And since it is the identity of the category types of the lexical items involved which count as the axioms in the proof, and since the outcome determines that you wind up with a category S proof term just in case the resulting string has the interpretation it has, the structure of the string depends purely on the categories types of its words and the corresponding interpretations in a familiar model theory. Again, no structural assumptions are involved.
ReplyDelete
Replies
davidadgerSeptember 15, 2013 at 7:56 AM
Thanks Christina. I think that the proposal sketched slightly misses the point of the example, and within minimalism you'd say something more or less identical about the derivation of the right interpretation. The crucial point is to rule out a derivation, proof or representation, which gives you the wrong meaning. The proposal sketched makes the assumption that the position of instinctively in the string is a consequence of its lexical properties (that it is a predicate or modifier if event types) and that seems vey reasonable. But we also need to allow the system to relax that requirement to deal with examples like

1. 'how instinctively do scientists think that eagles (that fly) swim'.

Here we need 'how instinctively' to still be a modifier of events, and to get the meaning where instinctively modifies swim, which is one of the meanings available, it needs to modify the swimming events here. To capture effects like this in a categorial system we need a technology that allows a long distance dependency (eg modal operators in the proof system, functional composition etc). So the crucial question is why we can't use whatever technology allows 1 to derive the Chomsky example with the bad meaning.

It is possible to do syntax that doesn't generate constituent structures at all, in a categorial system or in a dependency grammar system. These systems though, just like structure based systems, will have to have some constraint in them which makes the bad meaning impossible for that string and my guess is that all systems are pretty similar and in agreement about this. Structure dependence is just a name for this phenomenon: that syntactic rules need to go beyond linear contiguity in the establishment of dependency (so the 'structure' can be constituent structure, or rich lexical structure, or dependency structure etc). The point of Chomsky's example, as Norbert stressed, is that linear contiguity via n-grams really won't work: you need something richer than that. I'm pretty sure most categorial grammarians would agree with that for this example, but do check with your consultant on this as I'd be interested in hearing what she/he has to say about that.

ReplyDelete
Replies
David PesetskySeptember 15, 2013 at 8:01 AM
This comment has been removed by the author.
ReplyDelete
Replies
David PesetskySeptember 15, 2013 at 8:05 AM
Maybe a clearer example that also satisfies Norbert's criteria for the cutest possible demonstration of structure dependence --

"This example has several pleasant features. First, there is no string linear difference to piggyback on as there was in Chomsky’s Y/N question example in (1). There is only one string under discussion, albeit with only one of two “possible” interpretations. Moreover, the fact that instinctively can only modify the matrix predicate has nothing to do with delivering a true or even sensible interpretation. In fact, what’s clear is that the is fronting facts above and the adverb modification facts are exactly the same. Just as there is no possible Aux movement from the relative clause there is no possible modification of the predicate within the relative clause by a sentence initial adverb. Thus, whatever is going on in the classical examples has nothing to do with differences in their string properties [..]"

-- might be an example that involves displacement, like the original Y/N question, while still having the other pleasant properties that Norbert lists. This way we are not discussing combinatory properties of lexical items but properties derived by whatever rule you posit in your favorite theory of grammar that licenses filler-gap structures, as in Chomsky's original example. These are not hard to construct either. Consider, for example, the fact that English "behave" without an overt quality-adverb means "behave well", and "speak Swedish" without an overt quality-adverb means "speak Swedish competently", and now consider:

"Our study asked ...

a. ... how well kids who behave at home behave at school."
b. ... how well people who speak Swedish speak Danish."

You could imagine two possible interpretations of such sentences:

a'. "the correlation between behaving well at home and quality of behavior at school" OR "the correlation between quality of behavior at home and behaving well at school"

b'. "the correlation between speaking Swedish well/natively and quality of speaking Danish" OR "the correlation between quality of speaking Swedish and speaking Danish well.

-- but only the first reading is available. Every theory will say something about the kinds of things that wh-phrases are allowed to combine with, and (unlike with "instinctively") every theory will have to permit the ability to combine a wh-phrase with X (in languages like English) to have something to do with a gap inside X. Every theory will have to allow this gap to be some distance away, since (c) and (d) allow an interpretation of "well" with the rightmost verb as well as with a nearer verb:

c. How well do you think Mary will argue that the kids behaved on their vacation.
d. How well do you think John will argue that the Swedes speak Danish.

-- but there is no reading that places the gap in the relative clause in a & b. (This is a trick you cannot perform with "instinctively" in Chomsky's examples.) Yet there is no string difference between the two readings of (a) and (b), they are acceptable sentences, and both the possible and impossible readings are sensible and plausible. Furthermore (c) and (d) make it clear that it's not always the case that you have to associate "how well" with the righthand verb. Only when there are structural factors preventing an association with the closer verb (here, embedding in a relative clause). Maybe these examples might be more helpful starting points for discusson in the present context?
ReplyDelete
Replies
David PesetskySeptember 15, 2013 at 8:12 AM
[David Adger's comment was posted while I was busy writing mine, and it looks like our thoughts were very similar.]
ReplyDelete
Replies
UnknownSeptember 15, 2013 at 10:23 AM
I am grateful to the two Davids for commenting. Since individual replies might get confusing and since as David P. pointed out the comments are very similar in their essence a reply to both:

I am especially grateful that you defended me so effectively against the charge by Marc van Oostendorp that I am a Chomsky[an] hater [and I assure Marc the replies were not ‘staged’] He wrote: “I am not sure that I understand why you are so ironic about Chomsky and his examples. You seem to be saying that there is a group of researchers [Reali & Christiansen and others] who claim that … they cannot start tackling the kinds of examples Chomsky gives, because every time they do so, that horrible guy comes up with a new example for which they have to find a new solution”.

Now you gave a wonderful demonstration for the point Marc was skeptical about: Norbert had provided a specific example that allegedly could not be handled on non-structural analysis; my friend provided one and viola you come up with a DIFFERENT example. So you are doing exactly what I said Chomskyans would do [based not on nastiness but 10 years of research on Chomskyan argumentation].

My friend tells me, it is easy the handle the additional examples using any category logic with a nondirectional, intuitionistic implication operator --o such as Muskens' Lambda Grammar, de Groote's Abstract CG, Pollard's Linear Categorial Grammar or Levine and Kubota's Hybrid Type Logical Grammar with both Lambek and nondirectional implication in the proof theory. He is a bit surprised that seemingly, you are not aware that medial gaps are routinely dealt with in these frameworks and adjunct extraction ambiguities can be obtained by straightforward proofs little different in kind from one provided for the original example.

Now before providing the proof he asks to please put your best foot forward now. Are these the examples that will convince you, or will you ask for another example and another and…? Do you have one example that will settle the debate, yes or no? If yes, please give it. If not please tell Marc van Oostendorp that he cannot fault the other side for making unfalsifiable claims [which for the record Christiansen and his co-workers never did] as long as you are not able or willing to provide something that can be falsified.

Further, I had asked for two additional things: [1] your own analysis of these examples. So far all you have said is that they indicate internal structure. I am still waiting for the derivation. And [2] please provide a proposal how this structure is implemented in the brain. After all this is what we are really after: the human hard [or wet]ware that can churn out these sentences. I am not interested in the details as long as you have a proposal that is, at least in principle, implementable in a human brain.
ReplyDelete
Replies
davidadgerSeptember 15, 2013 at 1:03 PM
Hi Christina. Not sure about the `coming up with a new example' issue. My extra example was just to make the point that there were long distance dependencies involving adverbs like instinctively, so it's not a new example for analysis. I'm happy to let the original stand. I guess David was trying to clarify by his example, but I'll leave that aside (although I think David's example is actually sharper than Chosmky's.)

On my own post: The crucial thing is to *stop* the system generating the `wrong' meaning for the example. Of course I know that there are numerous ways to get long distance dependencies in categorial grammars (I mentioned two, so I have no idea why your friend was surprised - you did show him the post, right). So given that, the question is how to stop that technology from being used to get the wrong meaning for the example. [this is a point well understood in lambek style categorial grammars, see for example Carpenter's 1997 book, page 7]. Of course any reasonable theory will have a way to do this, but that way will have to appeal to the fact that that technology is not available, for whatever reason, when creating a dependency into a relative clause. That is therefore a structure based reason not a reason based on linearly contiguous sequences of words. As I already said: this doesn't mean constituent structure necessarily, it could be the interaction between inference rules and a rich lexical structure, where the lexical structure encodes the combinatoric possibilities. Everyone knows that we can build systems that don't create constituent structures, but all systems need to appeal to structured information to capture Chomsky's example.

Ok, now the question of the derivation of Chomsky's example sentence. Let's adopt a system with Merge (both external and internal) and the following lexical items, using Stabler's minimalist grammar notation:

eagles::N,
fly::V, swim::V
v:: =V=N v
that::=v +wh C
O::N -wh
instinctively::=v Adv -Top
Top::=C +Top
C::=v C

We can then build a derivation that looks as follows:

1. Merge (v, swim) satisfying =V
2. Merge (0, 1) satisfing =N
3. Merge (that, 2) satisfying =v
4. Merge (0, 3) (internal) satisfying -wh on 0

this gives us `that swim'

5. Merge (eagles, 4)
6. Merge (v, fly) satisfying =V
7. Merge (5, 6)
8. Merge (instinctively, 7)
9. Merge (C, 8)
10. Merge (Top, 9)
11. Merge (instinctively, 10) satisfying the -Top feature on instinctively

This gives us: Instinctively eagles that swim fly, with the interpretation that instinctively modifies fly, since it was Merged with fly, and it's semantics is, just as in your friend's proposal, or that of, say, Higginbotham 1985, that it modifies an event.
ReplyDelete
Replies
davidadgerSeptember 15, 2013 at 1:04 PM
continued ...

Why can't we have the same derivation but instead Merge instinctively with swim, and then internally Merge it with Top, which would give us the same string but a structure where instinctively would modify swim not fly (incorrectly, given the meaning). This is because Merge is subject to a specifier impenetrability constraint, formalized as a condition on applicability of structure building (see, for example, Kobele and Michaelis 2011 `Disentangling notions of specifier impenetrability" in Kanazawa et al "The mathematics of language" Berlin, Springer). This is essentially what Norbert appealed to in his post, and is, of course, just the `subject island' condition from Chomsky 1973. With this constraint in place, if we Merge instinctively with the vP in the relative clause, then we can never Internally Merge it to the Top element in the matrix clause, as that is not a legitimate internal Merge operation. My guess is that whatever system your friend is using it will have something that does this job too. Alternatively, one could allow the system to generate this meaning and fold the constraint into the processing of the structure as Kluender has suggested.

Stabler and others have shown that the addition of this constraint to Minimalist Grammars doesn't affect the expressive capacity of these grammars in terms of kinds of languages they can buid, but it does affect the possible linkages of structure with meaning, as in the case under discussion.

So that was basically Chomsky's analysis of this phenomenon, somewhat fleshed out with implementational detail. Now how is the implemented in the brain. Well, Stabler provides in his 2012 Topics in Cognitive Science paper a rather elegant way of automatically translating a Minimalist analysis of this sort into a parsing strategy which is incremental and sorts parser predictions based on the grammar into linear order at each parsing step. See section 2 of his paper for the details. So that would be a proposal for how to implement this kind of grammar in the brain. What's quite interesting is that implementing the specifier impenetrability condition actually allows the parser to have to not carry so much in its memory, so there's a nice transparency between the shape of the grammatical constraint and a way that the parser can reduce memory load. Note that we can also make this parser statistically sensitive, while maintaining that the system of knowledge that it uses is non-stochastic, which is another interesting property, since we all agree that there are clear statistical factors in the use of language that need to be embedded into our systems (while of course there is controversy about whether there are probabilistic effects in our knowledge of language)

So this is how a neurophysiological system could be an implementation of a minimalist grammar with Merge (both internal and external) and feature checking driving the construction of a derivation.

My own feeling is that all of this is really quite exciting. I have no doubt that we can try to understand the kind of phenomena pointed out by Chomsky's example in other ways than this specific analysis, and all of these ways are surely likely to be helpful in either taking us forward in particular directions or ruling out others.

ReplyDelete
Replies
davidadgerSeptember 17, 2013 at 2:43 PM
Hey Alex, I responded on the `Formalization' post (or at least tried to) attempting to articulate where the difference in assumptions might lie (it's really the second of the two posts that's relevant). Interesting topic this. makes me think.

Christina, was wondering if you had any thoughts about the derivation and suggested mode of implementation you asked for and I provided.
ReplyDelete
Replies
davidadgerSeptember 19, 2013 at 5:44 AM
Hi Christina,

On your example (3), the syntactic evidence is that the constituency looks as follows

(3') Instinctively [ [ eagles [ that fly swim and dive ] avoid drowning ]

Because, for example, you can replace [eagles that fly swim and dive] with `they' but not [eagles that fly] with they:

(3'') Instinctively [ [ they] avoid drowning ]

(3') * Instinctively [ [ they swim and dive] avoid drowning ]

So the specifications that the various items have will ensure this constituency. The bottom-upness of Merge isn't relevant here (see the Stabler paper I mentioned for an implementation of the minimalist grammar I sketched as a top-down incremental beam parser).

On the issue of linear parsing, involving backtracking, that's an empirical question and there's a lot of debate about the mechanism. So if you adopt the backtracking model you sketched, where you do keep all the alternatives in working memory, that gives us a nice explanation of garden path effects. However, we know since Crain and Steedman 1985, that these effects are somewhat vulnerable to discourse context, word frequency etc. Interestingly, the Stabler beam parser can attach probabilities of transition to the lexical items making certain transitions more or less likely (details in that paper), implementing a weakly interactive model of parser-discourse modularity (although the actual grammar itself stays resolutely categorical). That would allow us to model effectively both the psychological fact that people baulk at garden pathe sentences and the fact that they are vulnerable to discourse/frequency. So that would be a plausible minimalist solution for that problem.

On acquisition, the parser will just follow whatever properties of these words have already been learned by the kid, as the parser is essentially a direct implementation of the grammar. So if those words have already been acquired with the correct properties, there wouldn't be an issue. If they have only been partially acquired, or acquired with the `wrong' properties, you'd expect errors in production and parsing, which we know happen when the properties of words aren't fully acquired.

What is innate is just the operations of the system (Merge, Check features) and constraints on the possible representations of lexical items (what features are allowed, constraints on combining features in a lexical item, constraints on scope etc). These provide a sort of skeleton that's fleshed out as the child learns what the particular properties of her/his language are from hearing the data around them that make, for example, Chinese grammatically different from say Warlpiri. That's the idea behind the so called Borer-Chomsky Conjecture: there are indeed properties of linguistic elements that are learned, but these are confined to properties of grammatical lexical items (like markers for definiteness, mood, tense, voice, aspect, modality etc). The interpretation of the grammar as a parser is also innate, but perhaps not solely linguistic.

Hope that's useful.

ReplyDelete
Replies

Add comment