Tuesday, February 5, 2013

There's No There There


I grew up in a philosophy department and so I always feel a warm glow of nostalgia when a book by an eminent practitioner of this dark art turns his/her attention to my tiny areas of interest. Recently, Jesse Prinz, Distinguished Professor in Philosophy at CUNY, has directed his attention to the shortcomings of rationalist efforts in the mental sciences in an effort to resuscitate empiricist conceptions of mind. The book, Beyond Human Nature, is very much worth looking at (but for god’s sake don’t buy it!) for it is a good primer on just how little empiricist conceptions, despite the efforts of its mightiest minds, have to offer those seriously interested in cognition. I’m not talking modest contributions, I’m talking NADA!  Before proceeding, some warnings. I am a severe thaasophobe, with an attention span that requires quick capture.  Banalities and weasel wording can induce immediate narcoleptic seizure. Prinz held me to the end of chapter 6 before Morpheus bared further progress. I never fight with gods. Consequently, remarks here are limited to the first six chapters and of these they concentrate mainly on 6, this being dedicated to what I know best, the Chomsky program in generative grammar. With this caveat, let’s push forward, though caveat lector, this post is way too long.

My main objection with the book is that it refuses to play by the rules of the game. I have discussed this before (here), but it is worth reviewing what is required to be taken seriously.  Here are the ground rules.

First, we have made many empirical discoveries over the years and the aim must be to explain these facts.  In linguistics these include Island effects, fixed subject effects, binding theory effects etc. I know I have repeated myself endlessly about this, but it seems that no matter how often we emphasize this, critics refuse to address these matters. Prinz is no exception, as we shall see. 

Second, if one is interested in not merely debunking generative grammar but the whole rationalist enterprise in cognition then attention must be paid to the results there, and there have been many. We have reviewed some by Gleitman and Spelke (here, here) but there are many more (e.g. by Baillargeon on causality and Wynn on numbers a.o.). Prinz touches on these but is coy about offering counter analyses.  Rather he is satisfied with methodological reflections on the difficulties this kind of work must deal with and dismisses 40 years of research and literally hundreds of detailed proposals by pointing out the obvious, viz. that experiments must be done carefully and that this is hard. Not exactly big news. 

Third, though acknowledging these data points is a necessary first step, more is required. In addition one must propose alternative mechanisms that derive the relevant facts.  It is not enough to express hopes, desires, expectations, wishes etc. We need concrete proposals that aim to explain the phenomena.  Absent this, one has contributed nothing to the discussion and has no right to be taken seriously.

That’s it. These are the rules of the game. All are welcome to play.  So what does Prinz do.  He adopts a simple argumentative form, which can be summarized as follows.

1.                    He accepts that there are biological bases for cognition but that they vastly underdetermine human mental capacities. He dubs his position “nurturism” (just what we need another neologism) and he contrasts with “naturism.”[1]
2.                    His main claim is that cognition has a heavy cultural/environmental component and that rationalism assumes that “all brains function in the same way…” and “our behavior is mostly driven by biology” (102).
3.                    He reviews some of the empirical arguments for rationalism and concludes that they are not apodictic, i.e. it is logically possible that they are inconclusive.
4.                    He buttresses point 3 by citing work purporting to show methodological problems with rationalist proposals. Together 3 and 4 allow Prinz to conclude that matters are unsettled, i.e. to declare a draw.
5.                    Given the draw, the prize goes to the “simpler” theory.  Prinz declares that less nativism is always methodologically preferable to more and so given the empirical standoff, the laurel goes to the empiricists.

That’s the argument. Note what’s missing: no counter proposals about relevant mechanisms. In short, Prinz is violating the rules of the game, a no-no.  Nonetheless, let’s look a bit more into his argument. 

First, Prinz allows that there is some innate structure to minds (e.g. see around 152).[2] The question is not whether there is native structure, but how much and what kind.  For Prinz, associationist machinery (i.e. anything coming in through the senses with any kind of statistical massaging) is permissible. Domain specific modular knowledge with no simple perceptual correlates is not (c.f. 171).

This is standard associationsim at its grubbiest.  So despite his insistence about how the truth must lie at some point between the naturist and nurturist extremes, Prinz erects his standard on pretty conventional empiricist ground. No modularity for him. It’s general learning procedures or nothing.

Why does Prinz insist on this rather naïve version of empiricism? He wants to allow for cultural factors to affect human mental life. For some reason. he seems to think that this is inconsistent with rationalist conceptions of the mind.  Why is beyond me.  Even if the overall structure of minds/brains is the same across species, this does not prevent modulation by all sorts of environmental and cultural factors. After all, humans have four chamber hearts as a matter of biology but how good an individual heart is for marathons is surely heavily affected by cultural/environmental factors (e.g. training regimens, diet, altitude, blood doping etc.). 

So too with cognition.  Indeed, within linguistics, this has been recognized as a boundary condition on reasonable theorizing since the earliest days of generative grammar. The standard view is that UG provides design specifications for particular Gs, and particular Gs can be very different from one another. In a standard P&P theories the differences are related to varying kinds of parameter settings, but even non-parameter theories recognize the fact of variation and aim to explain how distinct Gs can be acquired on the basis of PLD.

Indeed, one of the standard arguments for some cognitive invariance (i.e. UG) arises from the fact that despite all the attested variation among particular Gs, they have many properties in common. Comparative syntax and the study of variation has been the source of some of the strongest arguments in favor of postulating a rich domain specific UG. In short, the problem from the outset has been to explain both the invariance and the variation. Given all of this, Prinz’s suggestions that rationalists ignore variation is simply mystifying.[3]

Moreover, he seems ignorant of the fact that to date this is really the only game in town.  Prinz is staking a lot on the new statistical learning techniques to supply the requisite mechanisms for his empiricism. However, to date, purely statistical approaches have had rather modest success. This is not to say that stats are useless. They are not. But they are not the miracle drug that Prinz seems to assume they are. 

This emerges rather clearly in his discussion of that old chestnut, the poverty of the stimulus argument (POS) using the only example that non-linguists seem to understand, polar questions.  Sadly, Prinz’s presentation of the POS demonstrates once again how subtle the argument must be for he clearly does not get it. The problem (as Paul Pietroski went over in detail here and that I reviewed again here) is to explain constrained homophony (i.e. the existence of systematic gaps in sound-meaning pairings).   It is not to explain how to affix stars, question marks and other diacritics to sentences (i.e. not how to rank linguistic items along an acceptability hierarchy). There has been a lot of confusion on this point and it has vitiated much of the criticism of Chomsky’s original argument.  The confusion likely stems from the fact that whereas an acceptability hierarchy is a standard byproduct of a theory of constrained homophony, the converse is not true, i.e. a theory of acceptability need not say much about the origins of constrained homophony.  But as the question of interest is how to relate sound and meaning (viz. the generative procedures relating them), simply aiming to distinguish acceptable from unacceptable sentences is to aim in the wrong direction.

Why is this important? Because of the myriad dumb critiques of Chomsky’s original POS argument that fail precisely because they misconstrue the explanadum.  The poster child of this kind of misunderstanding is Reali and Christiansen (R&C), which, of course, Prinz points to as providing a plausible statistical model for language acquisition.  As Prinz notes (2513), P&C’s analysis counts bigram and trigram word frequencies and from just so counting, is able to discriminate (1) from (2).

(1)  Is the puppy that is barking angry?
(2)  Is the puppy barking is angry?

Prinz is delighted with this discovery.  As he says:

This is an extremely important finding. By their second birthday, children have heard enough sentences to select between grammatical and ungrammatical questions even when they are more complex than the questions they have heard (loc 2513).

The problem however is that even if this is correct, the R&C proposal answers the wrong question. The question is why can’t kids form sentences like (2) with the meaning “is it the case that the angry puppy is barking” on analogy with (1)’s meaning “is it the case that the barking puppy is angry”? This is the big fact. And it exists quite independently of the overall acceptability of the relevant examples. Thus (3) carries only the meaning we find in (1), not (2) (i.e. (3) cannot mean “is it the case that the puppy that barked was the one that Bill kissed.”).

(3)  Did the puppy Bill kissed bark

This is the same fact that (1) and (2) discuss but with no unacceptable string to account for, i.e. no analogue of (2).  Bigrams and trigrams are of no use here. What we need is a rule relating form to meaning and an explanation of why some conceivable rules are absent resulting in the inexpressibility of some meanings by some sentences.  Unfortunately for Prinz, R&C’s proposals don’t even address this question let alone provide a plausible answer.

Why do Prinz and R&C so completely misunderstand what needs explaining. I cannot be sure, but here is a conjecture. They confuse data coverage with using data to probe structure.  For Chomsky, the contrast between (1) and (2) results from the fact that (1) can be generated by a structure dependent grammar while (2) cannot be. In other words, these differences in acceptability reflect differences in possible generative procedures.  It is generative procedures that are the objects of investigation not the acceptability data. As Cartwright argued (see here), empiricists are uncomfortable with the study of underlying powers/structures, viz. here the idea that there are mental powers with their own structural requirements. Empiricists confuse what something does with what it is.  This confusion is clearly at play here with the same baleful effects that Cartwright noted are endemic to empiricist conceptions of scientific explanation.

I could go on sniping at Prinz’s misunderstandings and poor argumentation. And I think I will do so to make two more points. 

First, Prinz really seems to have no idea how poor standard empiricist accounts have been.  Classical associationist theories have been deeply unsuccessful.  I want to emphasize this for Prinz sometimes leaves the impression that things are not nearly so hopeless. They are.  And not only in the study of humans, but in mammal cognition quite generally. 

Gallistel is the go-to guy on these issues (someone that I am sure that Prinz has heard of after all he just teaches across the bridge at Rutgers).  He and King review some of the shortcomings in Memory and the Computational Brain, but there is a more succinct recapitulations of the conceptual trials and empirical tribulations of the standard empiricist learning mechanisms  in a recent paper (here).  It’s not pretty.  Not only are there a slew of conceptual problems (e.g. how to deal with the effects of non-reinforcement (69)), but the classical theories fail to explain much at all. Here’s Gallistel’s conclusion (79):

Associationist theories have not explained either the lack of effect of partial reinforcement on reinforcements to acquisition or the extinction-prolonging effect of partial reinforcement. Nor have they explained spontaneous recovery, reinstatement, renewal and resurgence except by ad hoc parametric assumptions…I believe these failures derive from the failure to begin with a characterization of the problem that specific learning mechanisms and behavioral systems are designed to solve. When one takes an analysis of the problems as one’s point of departure…insights follow and paradoxes dissolve. This perspective tends, however, to lead the theorist to some version of rationalism, because the optimal computation will reflect the structure of the problem, just as the structure of the eye and the ear reflect the principles of optics and acoustics.

Gallistel’s arguments hinge on trying to understand the detailed mechanisms underlying specific capacities. It’s when the rubber hits the road that the windy generalities of empiricism start looking less than helpful.  Sadly, Prinz never really gets down to discussions of mechanisms, i.e. he refuses to play by the rules.  Maybe it’s the philosopher in him.

So what does Prinz do instead?  He spends a lot of time discussing methodological issues that he hopes will topple the main results. For example, he discusses how difficult it can be to interpret eye gaze, the standard measure used in infant and toddler studies (1547).  Eye gaze can be hard to interpret. What change it is indexing can be unclear. Sometimes it indexes stimulus similarity other times novelty. Sometimes it is hard to tell if it’s tracking a surface change in stimulus or something deeper. And that’s why people who use eye gaze measures try to determine what eye gaze duration is actually indexing in the particular context in which it’s being used.  That’s part of good experimental design in these areas. I know this because this is extensively discussed in the lab meetings I sit in on (thanks Jeff) whenever eye gaze duration is used to measure knowledge in the as yet inarticulate.  The technique has been used for a long long time. Hence its potential pitfalls are well known and for precisely this reason it is very unlikely that all the work that uses it will go down the intellectual drain for the trivial methodological reasons that Prinz cites. To put it bluntly, Baillargeon, Carey, Spelke, Wynn etc. are not experimentally inept.  As Prinz notes, there are hundreds (thousands?) of studies using this technique that all point in the same rationalist direction.  However blunt a measure eye-gaze is, the number of different kinds of experiments all pointing to the same conclusion is more than a little suggestive.  If Prinz wants to bring this impressive edifice crashing down, he needs to do a lot more than note what is common knowledge, viz. that eye gaze needs contextual interpretation.

And of course, Prinz knows this.  He is not aiming for victory. He is shooting for a tie (c.f. 1512). He doesn’t want to show that rationalists wrong (just that “they don’t make their case”) and empiricists right (ok, he does want this but clearly believes this goal is out of reasonable reach), rather he wants to muddy the waters, to insinuate that there is less to the myriad rationalist conclusions than meets the eye (and there is a lot here to meet an unbiased eye), and consequently (though this does not follow as he no doubt knows) that there is more to empiricist conceptions than there appears to be. Why? Because he believes that “empiricism is the more economical theory” (1512) and should be considered superior until rationalist prove they are right.

This strategy, aside from setting a very low bar for empiricist success, conveniently removes the necessity of presenting alternative accounts or mechanisms for any phenomena of interest. Thus, whereas rationalists try to describe human cognitive capacities and explain how they might actually arise, Prinz is satisfied with empiricist accounts that just point out that there is a lot of variation in behavior and gesture towards possible socio-environmental correlates. How this all translates into what people know or they do what they do is not something Prinz demands of empiricist alternatives. [4] He is playing for a tie, assured in the belief that this is all he needs. Why does he believe this? Because he believes that Empiricism is “a more economical theory.”

Why assume this? Why think that empiricist theories are “simpler”? Prinz doesn’t say, but here is one possible reason: domain specificity in cognition requires an account of its etiology. In other words, how did the innate structure get there (think Minimalism)?  But, if this is the reason then it is not domain specificity that is problematic, but any difference in cognitive power between descendant and ancestor. Here’s what I mean.

Say that humans speak language but other animals don’t. Why? Here’s one explanation: we have domain specific structure they don’t.  Here’s another, we have computational/statistical capacities they don’t. Why is the second account inherently methodologically superior to the first? The only reason I can think of is that enhanced computational/statistical capacities are understood as differences in degree (e.g. a little more memory) while domain specific structures are understood as differences in kind.  The former are taken to be easy to explain, the latter problematic. But is this true?

There are two reasons to think not. Consider the language case. Here’s the big fact: there’s nothing remotely analogous to our linguistic capacities in any other animal. If this is due to just a slight difference in computing capacity (e.g. some fancier stats package, a little more memory) then we need a pretty detailed story demonstrating this. Why? Because it is just as plausible that a little less computing capacity should not result in this apparently qualitative difference in linguistic capacity (indeed, this was the motivation behind the earlier teach-the-chimps/gorillas-to-talk efforts). What we might expect is more along the following lines: slower talk, shorter sentences, fewer ‘you know’s interspersed in speech. But a complete lack of linguistic competence, why expect this? Maybe the difference really is just a little more of what was there before, but, as I said, we need a very good story to accept this. Need I say that none has been provided?

Second, there are many different ways to adding to computational/statistical power. For example, some ways of statistically massaging data are computationally far more demanding than others (e.g. it is no secret that Bayesianism if interpreted as requiring updating of all relevant alternatives is too computationally extensive to be credible and that’s why many Bayesians claim not to believe that this is possible).[5] If the alternative to domain specific structure is novel (and special) counting methods then what justifies the view that the emergence of the latter is easier to explain than the former? 

Prinz’s methodological assumption here is not original.  Empiricists often assume that rationalism is the more complex hypothesis. But this really depends on the details and no general methodological conclusions are warranted.  Sometimes domain specific structures allow for economizing on computational resources.[6] At any rate, none of this can be adjudicated a priori. Specific proposals need to be proposed and examined.  This is how the game is played. There are no methodological shortcuts to the argumentative high ground.

This post has gone on far too long.  To wrap up then: there is a rising tendency to think well again of ideas that have been deservedly buried.  Prinz is the latest herald of the resurrection. However, empiricist conceptions of the mind should be left to mold peacefully with other discredited ideas: flat earth, epicycles, and phlogiston.  Whatever the merits of these ideas may once have been (these least three, did have some once) they are no longer worth taking seriously. So too with classical empiricism, as Prinz’s book ably demonstrates.



[1] I’m going to drop the naturism/nurturism lingo and just return to the conventional empiricism/rationalism labels.
[2] All references are to the Kindle version.
[3] I’m no expert in these areas but it seems obvious to me that the same can be said about most work in cognitive psychology. Actually, here, if anything, the obsession with dealing with individual differences (aka cognitive variation) has retarded the search for invariances.  In the last decades this has been partly assuaged. Baillargeon, Spelke, Carey, Gleitman a.o. have pursued research strategies logically analogous to the one described above in generative grammar.
[4] I should be careful here. I am discussing capacities, but Prinz mainly trucks in behavior loc 112-132). Most rationalists aim to understand the structure of mental capacities, not behavior. What someone does is only partially determined by his/her mental capacities.  Behavior, at least for linguists of the generative variety, is not an object of study, at least at present (and in my own view, never).  I am going to assume that Prinz is also interested in capacities, though if he is not then his discussion is irrelevant to most of what rationalists are aiming to understand.
[5] See here, here and here for discussion.
[6] Berwick had an excellent presentation to this effect at the recent LSA meeting in Boston. I’ll see if I can get the slides and post them.

29 comments:

  1. As regards the question of why the POS argument is so frequently misunderstood, I have to admit that I think we linguists could still do a better job of explaining it. As I see it the crucial point is this:
    The problem ... is to explain constrained homophony (i.e. the existence of systematic gaps in sound-meaning pairings). It is not to explain how to affix stars, question marks and other diacritics to sentences.

    But to my mind the following way of putting things, although relatively common/standard, muddies the waters a bit:
    They confuse data coverage with using data to probe structure. For Chomsky, the contrast between (1) and (2) results from the fact that (1) can be generated by a structure dependent grammar while (2) cannot be. In other words, these differences in acceptability reflect differences in possible generative procedures. It is generative procedures that are the objects of investigation not the acceptability data.
    This second quote seems to suggest that while the likes of R&C do in fact account for the acceptability data, there's something else besides the data that they are missing, or they're accounting for the data in some irrelevant or unsatisfying way.

    I think the more straightforward way to put things is simply to say that "acceptability" is a property of sound-meaning pairings, not sounds (or strings) alone. Then the acceptability data to account for has the form: these string-meaning pairings are acceptable (i.e. this string goes with that meaning, etc.), and these other string-meaning pairings are not. (The best concrete cases to flesh out this way might be Paul's examples about waffles and parking meters and lost hikers.) No doubt this is what every good linguist has in mind, in some sense, when they make the argument; I'm not trying to say anything new. But this is much simpler, I think, than the usual angle which invites the misunderstanding that getting the right set of strings is "the data", but pairing them with the right meanings is part of "the way you get the data", and there are "good ways" and "bad ways" to get the data.

    The fact that we are "using data to probe structure", and that "generative procedures ... are the object of investigation not the acceptability data", is of course relevant, but I think bringing up these issues just makes the objection to string-classifiers seem more philosophical or aesthetic than necessary. A clearer and blunter way to put it is simply that if the data take the form of classifications of string-meaning pairs, then a string-classifier doesn't cover the data.

    (To be honest I actually think that quite generally, the notion of acceptability as a property of strings might well do more harm than good. The only thing it seems to mean is derived from the acceptability of string-meaning pairs, i.e. we could define s to be string-acceptable iff there is some m such that (s,m) is acceptable. But we don't find much use for the mirror image notion of meaning-acceptable (i.e. there is some s such that (s,m) is acceptable), and is there reason to think that string-acceptability is any more relevant?)

    ReplyDelete
    Replies
    1. I think that is very much on the right track.

      I think to be fair the way that the POS has been put forward in the literature is much more as a 'weak' property, and that is why people like R & C have taken it as such. The 'constrained homophony' version of the POS is much more recent. (Indeed searching for the phrase 'constrained homophony' doesn't produce any hits before 2009.

      If you look at the 2002 Special issue of the Linguistic review, none of the retorts to Pullum and Scholz say, oh wait a minute you have missed the point, it is really about the syntax/semantics interface, and homophony (If I recall correctly). Similarly if you look at Laurence and Margolis portrayal of the POS, which is very pro-Chomsky, they talk just about the primary linguistic data (which they take to be the strings).
      Perhaps this has been exacerbated by Chomsky's general antipathy towards semantics and perhaps also by the misuse of the autonomy of syntax?

      So I think the "constrained homophony" version is just a *new* version of the POS (and quite a good one, to be clear).

      But I am very willing to be corrected as there is a lot of literature I am not familiar with. So I would be genuinely very interested to know if there are some clear quotes from the period 1980-2000 that show that this version of the POS has some history.

      Delete
    2. P&S assumed (at least for the sake of argument) that what the child acquired was a rule relating questions and declarative statements, not merely the ability to mark certain strings as grammatical and others as ungrammatical. A rule relating questions to declarative statements is, implicitly at least, a rule which imposes constraints on homophony. So I think one reason that P&S didn't get that kind of response is that they didn't miss the point in the same way Reali and Christiansen did.

      Chomsky doesn't have much time for certain strands of formal semantics, but data points regarding what sentences can and can't mean have always played an important role in his work. (E.g., half of the ECP facts relating to movement of adjunct wh-phrases crucially depend on judgments regarding possible interpretations.)

      Delete
    3. Yes good point. And I guess your second point is exactly what Tim is getting at too.

      Delete
    4. I certainly agree with Alex C. that it's become much more common recently to talk explicitly about string-meaning relations in POS arguments, but my suspicion (which may well be wrong) is that this is indicative of people being forced to better express what they meant, rather than people adjusting what they meant. As Alex D. points out, data points about what things can and can't mean have always been part of the picture, it's just that there seems to have been a tendency at times to not clearly distinguish (a) the observable fact that certain string-meaning pairs are acceptable and others aren't, from (b) the theoretical move of assigning a certain kind of structure (and operations on this structure, etc.) in order to capture these facts. Failing to make this distinction would pave the way for sub-optimal presentations of the POS argument which said something like "It's not enough to just get the right set of strings, you have to also assign those strings the right structures", when what was really intended (or should have been intended) was "you have to also assign those strings the right meanings".

      But I haven't looked back at earlier arguments carefully, and I would also be genuinely interested to know about early presentations of POS arguments that talk relatively explicitly the sound-meaning relation.

      Delete
    5. Sadly, the only explicit discussion of POS in the literature (a real disservice I believe) is the one dealing with polar questions. There are many given by Crain that explicitly discuss absent meanings and Chomsky has many cases where he observes that X cannot mean Y (think of the principle C illustrations ('he thinks John is tall' can't mean John thinks he is tall) and B ('John likes him' cannot mean John likes himself). All of these have the ingredients for a POS argument but they have been discussed as such less directly than Y/N questions. The problem with the literature redoing Chomsky is that it is obsessed with the idea that sentences don't really have any interesting structure (hence bi/tri gram models). From the get go the point has been to understand the kinds of rules that grammars qua rule systems that pair sounds and meanings. The term 'constrained homophony' is new. The concept and its place in POS arguments very old. So far as I can tell, it's what the POS has always been about; why some kinds of rules of grammar and not others given that many kinds that could be compatible with the PLD are nonetheless absent.

      Delete
    6. One think I have been chewing over recently in the context of these arguments is what the inputs are.
      So if we consider the constrained homophony POS, then we are trying to get the right sound meaning pairs; so are we entitled to assume that the input is sound/meaning pairs?

      That is to say, if we take the POS as an argument against certain types of learning algorithm, or against the possibility of certain types of learners/acquisition systems, then what are the inputs to these learners? Just the strings, or the strings associated with the one contextually appropriate meaning?

      Delete
    7. Since the get go the PLD has consisted of sound/meaning pairs from simple sentences. There are various conceptions of simple. For Lightfoot it is roughly degree 0. For Wexler and Culicover, it was degree 2. The latter formalizes the problem.

      What's the "meaning"? I believe that most have taken it to be the thematic structure; who did what to whom sort of thing. This has a certain "epistemological priority" in Chomsky's terms, i.e. it is observable in a context and is not language dependent, think Davidsonian events. I find it likely that non-speakers (e.g. my dog Sampson) can parse a simple scene into its participants. If this is so, then humans can too, or so we assume.

      The most interesting assumption is the degree 0 one. This may be too strong. Some interesting work on this by Lisa Pearl looking into OV to VO change historically and trying to figure out if we really don't look at embedding information or just if there is not enough of it.

      So, yes, SM pairs as input. The M part being roughly a thematic decomposition. The "size" is roughly what you get in a simple degree 0 clause.

      Delete
    8. This comment has been removed by the author.

      Delete
    9. The PLD is what children hear which consists of largely simple sentences but also some complex ones (this is the right terminology I hope).

      I am not sure that I agree that the PLD has always been taken to be sound/meaning pairs. Chomsky is typically vague at crucial points, but for example in his Explanatory models in Linguistics, 1966, he is quite explicit that it is just the sentences, not sentence meaning pairs. Indeed he considers and rejects the role of semantics:
      "For example, it might be maintained, not without plausibility, that semantic information of some sort is essential even if the formalized grammar that is the output of the device does not contain statements of direct semantic nature. Here, care is necessary. It may well be that a child given only the inputs of (2) as nonsense elements would not come to learn the principles of sentence formation. This is not necessarily a relevant observation, however, even if true. It may only indicate that meaningful- ness and semantic function provide the motivation for language learning, while play- ing no necessary part in its mechanism, which is what concerns us here."

      (Of course when people try to solve the problem they may try to help themselves to additional information to make the model work, but that is a different issue, and perhaps related to your degree 0/2 comments).

      I googled primary linguistic data, and one of the hits was this article (by you) which again seems quite explicit that it is the set of sentences (let me try and do a functioning link here.)

      Obviously that is a long time ago, and things have changed since then but again if you could come up with some point explicit statement about the semantics as input that would be interesting.

      And the BPYC paper that uses the phrase 'constrained homophony' is quite carefully worded throughout, and they do not mention that all three of the papers they critique only use sentences and not sentence/meaning pairs. Indeed the phrase 'constrained homophony' is itself very carefully chosen to leave the question open.

      So maybe this is not as open and shut as all that. I don't know if Bob B or Paul P would like to add anything here.

      Just to be clear -- I am actually on Chomsky's side here (or at least the 1966 Chomsky), I don't think that one should assume that semantics is importantly involved in learning syntax. And of course the Parameter setting models (like Charles Yang's) don't really use semantics at all, except maybe to bootstrap the syntactic categories early on.

      (This was missing a 'but' which I put in, and deleted the previous comment.)

      Delete
    10. Reply in two parts as needed more room.


      Grammars since "the earliest days of generative grammar" mediated sound meaning pairings. The POS was an argument to determine the proper shape of grammars, hence what rules/operations mediate sound/meaning pairings. Indeed, the reasons Deep Structure played such a prominent role in early discussions is that Chomsky wanted to make clear that the problem was not getting a surface parse of the string, but to establish rules that mapped structures with a surface parse to structures that that were adequate for semantic interpretation. Chomsky's point about semantics was not that grammars did not map to them but that their properties were not reducible to semantic properties, viz. the autonomy of syntax thesis.

      So, given this, one can ask what are the inputs to FL/UG that the LAD uses to establish a grammar, i.e. such a mapping? Certainly from very early on they included semantic information. Recall that very very early on people were interested in binding (Lees & Kilma), control, scope of negation, etc. These all had semantic consequences and were rules that included semantic restrictions in their formulation (c.f. Lees and Klima on identity conditions). Second, diagnostics for raising and passivization involved preservation of thematic information. So from the beginning grammar rules involved meaning. How much? Well 'semantics' is a vague term, but at the very least thematic roles: who did what to whom. Theta information was part of the PLD. Plausibly quantifier scope and scope of negation etc was not. Again, this is why Chomsky considered thematic information to enjoy "epistemological priority." This meant that it was usable by the LAD to develop a grammar, it did not presuppose having one, which plausibly notions like scope and binding do. So, the PLD consists of thematic info and phono info and your job Mr Phelps, should you agree to this mission, is to find the rules that generate the unbounded sets of sound/meaning dependencies. These rules are syntactic (that's what generates the dependencies, PF and LF being purely interpretive, though that's a little later) and the PLD does NOT involve all the sentences that will be generated as this is impossible, the latter being unbounded.

      Delete
    11. Continuation:


      It took a little time to triangulate on the details of the problem, but Wexler and Culicover's work made a positive proposal about how the problem should be posed: they assumed sound meaning pairs up to degree 2. Others, e.g. Lightfoot, thought that this was too much and that we should aim for degree 0 (plus a little bit, roughly unembedded binding domains) as the right characterization of the PLD. This issue, I believe, is still being debated. What is not debated and has not been for quite a while is that the PLD is sound/meaning pairs.

      As regards constrained homophony: there are two kinds of data that syntacticians have used: (i) acceptability, (ii) acceptability under an interpretation. The former is a degenerate instance of the latter. Some sentences are unacceptable regardless of their interpretation, some only relative to one. Chomsky, to his (and our) everlasting consternation, illustrated the POS argument using examples that could be grasped using data like (i). Why? BECAUSE IT WAS INTENDED AS A SIMPLE ILLUSTRATION OF THE LOGIC OF THE PROGRAM FOR NEOPHYTES!!!!! This was grabbed by everyone and his brother and mangled to death. Recently, Bob, Paul, Noam used another term to highlight the fact that this illustration had been misunderstood and so to make this clear 'constrained homophony' was introduced. This is how I read the history.

      However, say that I am wrong about the history (I'm not), does it matter? The logic is clear: grammars relate sounds and meanings. POS is a tool to investigate the structure of grammars. The inputs are information from the relate, sounds and meanings properly construed so as to be available to pre-linguistic LADs. Given this construal most of what has been done by Chomsky's POS critics is simply besides the point. Nobody can stop them from doing this work (I wuldn't if I could, free country) but nobody has to take this stuff seriously either. It is largely besides the point. That's the take home message.

      Delete
    12. When you say "the PLD consists of thematic info and phono info", what exactly is the thematic info? Is it the thematic bare bones of the intended meaning of the utterance, or a sort of model of the thematic relations that held in (the relevant nearby parts of) the world when the utterance was made?

      The latter seems more natural, but leaves some extra work to be done by the learner obviously.

      Delete
    13. It has been uncontroversial that who-did-what-to-whom-how info is available. That's what I meant by thematic info, what one gets from a classical D-structure representation given something like UTAH or DS as a "pure representation of GF-theta." However, as we all know, there is more to semantics than this. There is scope information, dependency information (binding/control), and maybe more still. The classic assumption I believe is that the semantic info is limited to the thematic information (recall DS, aka kernel sentences, were *given* in syntactic structures). If so, then all of the rest must follow in some principled fashion from general features of the grammar and mapping to the interface. Some reasonable principles are that A scopes over B iff A c-commands B, or merge being interpreted as conjunction, or A semantically binds B iff A syntactically binds B (i.e. c-coomands and is coindexed with). I am not saying this is true, just that it is reasonable. Chomsky has generally assumed that there is no variation at LF (his argument being that it is too remote from "experience"). I'm not fully convinced by the argument, but I do think that it is reasonable to believe that thematic info is epistemologically prior in the relevant sense to be usable by the LAD to acquire Gs using UG. So, it's a nice place to start. Type driven grammars make similar assumptions as regards QR: QR is required given certain mapping principles and the inherent interpretation that quantifiers bear. Again, I am not endorsing this, just observing.

      So, there is at least thematic info and many have hope that there is at most such info. All the rest of the *meaning* following from general features of G as described in UG.

      Oh yes, I do assume that it is readily accessible in the utterance situation; so thematic info in *relevant nearby parts of the world* as you put it. Utah is critical to allowing this sort of information to be mapped into usable syntactic phrase markers.

      Delete
    14. I am a bit confused about what parts of this discussion relate to your particular theory of language acquisition and which parts relate to the POS argument.

      If you assume that the child has access to theta roles in the PLD and you appeal to the UTAH then you are already assuming that you have nontrivial UG aren't you?
      Namely innate knowledge of theta roles. But that seems a little bit circular in the context of the POS.

      And isn't the whole point of the Gleitman/Medina/Trueswell articles that you can't do this? Which words cannot be learned from observation?

      Could you point me to some papers where these basic assumptions about the PLD are laid out in the context of the POS, so I can think about them carefully? BPYC don't discuss it as far as I can see.

      Blog posts only go so far.

      Delete
    15. Wexler and Culicover.

      Yes, the child has access to theta roles, these are "observable" in a context. So in a given situation a child knows how to identify an even, and the participants. This may be wrong, but I assume it is not. Theta roles are enjoy epistemological priority. I think I've mentioned this twice before. This means that you can identify agents, patients, etc independently of an active FL. Even my dog can do that, or so I suppose. UTAH is a mapping principle that relates Event descriptions to phrase markers. For example, it says that patients are internal arguments and agents are external arguments. This is part of UG. Why we map agents to external and patients to internal positions is an axiom. We cannot currently derive it from anything more general.

      The Gleitman et al paper allows learning of words without grammar, it is just not very efficient. Things get interesting when the grammar kicks in because then syntactic info can supplement this slower process to really get things moving. That's how I read them at least. I should add that identifying event participants, the nouns, is something people do ok, much better than identifying the relevant events. At any rate, yes, there is innate structure to UG. No, theta roles are not part of UG but the mapping principles from theta roles to internal/external positions is part of UG (UTAH). The POS kicks in because even all of this is not enough. It does not tell you about possible movements, anaphoric dependencies, freezing effects, case licensing etc. There is a lot more to grammar than who did what to whom. However, the idea is that just given this info gleanable from simple clauses is all that you need to get a fully operational G. Why, the rest is provided by UG. PLD is necessary but not nearly sufficient to get a G.

      Hope this helps. If not maybe I'll write a post on it.

      Delete
    16. Wexler and Culicover (the 1980 book) does not support what you are saying. It's a big book and I haven't read it for years, but I glanced at it this morning, and the relevant sections are 2.7 onwards especially pages 79-84.
      They admit that the PLD does not contain this information and say a propos of their model of learning from meaning/sound pairs (b,s).
      'Of course in no way is b actually presented to the learner ...
      Again for formal and notational purposes, we will make an assumption that appears to be at variance with this state of affairs [i.e. the reality of the situation , Alex]'.

      And again the whole raft of arguments based on learnability theory deployed by nativists over the years are based on Gold's results, Horning's results etc which are based on the assumption that the PLD is just strings. So for example Partha Niyogi's 2006 book is entirely based on these assumptions.

      So I just don't think your historical claim that everybody has always assumed that the PLD includes meanings is tenable.
      And I don't think that the empirical argument that the child can extract the meanings of the utterances, starting at age 6 months, is tenable either; for the reasons that Tim alluded to earlier, as well as a lot of Gleitman's work, including, e.g. the work with Barbara Landau on blind children.

      And of course this matters hugely to the logic of the POS argument -- what can be learned from sound/meaning pairs by empiricist learners is very different from and much larger than what can be learned from sounds alone by empiricist learner. For me the force of the POS argument flows from considerations about how little the input contains, how thin it is. If you make the input much richer then you deprive the POS of much of its power.

      Delete
    17. The POS has whatever power it has in a given formulation of the input and the output. Let me reiterate, from the get go the aim was to understand how to acquire grammars that generated sound/meaning pairs. TO my knowledge, nobody ever thought it possible that this could be done in the absence of all information concerning meaning. The question was not whether meanings were relevant but which and how much. Try to explain the grammatical difference between 'John loves Mary' and 'Mary loves John' without saying something about theta roles and grammatical functions. Now, even given thematic information there is a lot that needs explaining. For example, even if I give you the thematic information that in 'John lives himself' John bears two theta roles deduce principle A, i.e.it's locality conditions, its complementarity with pronouns, principle C. There is still a lot to explain even given the thematic information and that's what UG was about. Given only thematic information about simple clauses deduce the properties of movement, i.e. islands, fixed subject effects, WCO, SCO, ECP. As you will quickly see, the thematic information underdetermines these restrictions. that's why if they are correct they must be in UG.

      Without knowing something about thematic structure movement theory is impossible. You need to know where you started from to understand how far you have moved. Thus, some version of theta roles is required and some version of UTAH. At any rate, as I noted, from the very beginning the issue was how to learn grammars that mapped sound to meaning, not strings to trees.

      Now for some problems we can substitute the second problem for the first. Gold shows that even for doing the string to tree mapping one needs some restrictions (or that's how I read Niyogi's version of Gold). So IF this is how you think of the problem, you need something like a UG. But, and I emphasize this, it is NOT the problem generativists have been interested in. Did you ever notice that Chomsky has not discussed the Gold results much? Why? They formalize the wrong problem. That does not mean it is useless, just that it is not what we should be addressing.

      So, stop using the term "meaning." That's a very vague term. Use thematic roles (i.e. what DS provided) and assume that you can get what's in simple clauses. Then go ahead and see how far you get without a rich UG. Here's what I know: not nearly far enough. THat's the power of the POS. What do we get? A pretty rich UG roughly described by theories like LGB. Minimalism's aim: to make these LGB descriptions follow from more reasonable assumptions.

      Delete
    18. I agree that natural languages associate sounds and meanings. And I agree that children hear languages in situations and that that is one of the ways that they learn that the word 'cat' means cat is by observing cats in the environment.

      But that doesn't commit me either to the existence of theta roles, the UTAH or your favoured semantic bootstrapping theory of language acquisition. And looking at BPYC again, they don't talk about theta roles, but they do talk about learning trees or structural descriptions from strings.

      Gold had nothing to say about learning trees from strings; he only talks about weak learning. That is one of the main points of the BPYC paper; that we should be learning trees from strings rather than learning strings from strings.

      Again, please could you give me a reference to a paper where this thematic roles version of the POS is explained, because W & C aren't putting forward a POS argument, but rather proposing a solution to the problem using inputs that they frankly admit are implausible.

      Delete
    19. First the reference: as I said Wexler and Culicover. They start with sound/meaning pairs, whether they think that this is empirically reasonable or not. Note, you should find this acceptable as by your lights, it makes the acquisition problem easier.

      Historically, as I noted, DS is taken as given: look at Syntactic Structures; We start with kernel sentences. There is no theory of these. The puzzle is to find the transformations that map these into surface structures. In Aspects, we assume that the lexicon is given and that lexical information provides enough information to give one a DS; actually there are PS rules but the lexical entries serve to filter out the inappropriate PS structures via lexical insertion. In EST and GB again lexical entries are given with their argument structure. From these X' structures are projected: i.e. the lexical structure of the head determines the phrase structure of the phrase. Again, this is assumed to be given. No the problem of course is what it means to be given? The POS arguments in syntax assume that we have lexical meanings and constructs arguments from these. However, it is fair to ask where the lexical information comes from. Here Dowty and Baker as well as our own Davidsonians (Paul is particularly good at this) consider how to go from theta roles (prove roles in Dowty) to either lexical items of base phrase structures. UTAH is the proposed principle for doing this and is incorporated in virtually every theory of grammar I know of. Again, till now, POS arguments have assumed that this much is given. And even with this information there is a big gap between input and attained G.

      Now, what should we be learning: not trees from strings but Gs that generate phone/sem pairs from PLD. Now, if I get you correctly, you don't want to use theta information as part of the PLD. Fine with me. I think you won't get off the ground if you don't, but far be it from me to stand in your way. So long as we agree that what needs is explaining is how to get Gs that generate phone/sem pairs. That's the problem as generative grammar sees it.

      Now does anything I say commit you to this? Of course not. I can't commit you to anything. I am simply observing that if you don't address the indicated problem then what you are doing is at right angles to what it is that generativists are interested in. You want to talk to this problem, great. If not, that's also fine but don't expect me or other generativists to pay close attention.

      Last point: the mapping problem from theta roles/proto roles to (roughly) DSs is discussed in Baker and Dowty. Moreover, I suspect that when you think about the phone/sem generation you too will assume theta information plays a role for otherwise you wil have nothing to say about most of the structures linguists have discussed for years (passive, raising, unaccusatives, questions, control, binding etc). Of course you may not wish to address the properties of these kinds of constructions, but if you don't then most generativists will then not really care about what you have to say for it will be irrelevant to the identified problem.

      Delete
  2. Interesting post and discussion!
    It leads me to the question whether an A^nB^n grammar can tell neuroscientists anything at all about processing natural language.

    ReplyDelete
    Replies
    1. Unclear. If this cannot be done then it tells you something. If it can, probably not much.

      Delete
  3. I think one thing that might be useful is for the syntax community to start classifying cases where absence 'easily' counts as evidence, vs those where it does so less easily (in terms of the sophistication of the statistical technology that the learner would need to use in order to interpret the absence as evidence). The cases I class as 'easy' have the following two characteristics:

    a) they arise frequently (blocking of regular verb formation rules by irregular forms; basic binding theory)

    b) they involve the suppression of one conceivable way of expressing a meaning by another, where a meaning is construed of as a collection of lexical meanings plus a scheme of semantic composition. (blocking & binding, again). What makes this 'easy' is that when you hear something and understand it, you can run your grammar in generation mode to find out what other ways it's providing to express the meaning, and if it's possible to change the grammar to reduce the number of alternates without failing to parse existing utterances, you can tighten up your grammar (this is a grammar tightening/debugging method in the LFG/XLE community).

    In addition to blocking and binding, other easy cases would be the dative alternation, pro-drop, clitic doubling (but not constraints on clitic combinations), and basic word order contraints (for people who think they are produced by LP constraints added ID rules).

    Not easy would include:

    1) some of the more complex structures that figure in the Icelandic syntax literature (I conjecture that I might have invented the very first ones ever to have been produced where a secondary adjective predicate shows case agreement with a quirky case-marked PRO subject (hún vonast til að vanta ekki eina/*ein í tímanum) - $20 reward for anyone who produces a naturally occurring example prior to 1976). Hard due to extreme rarity.

    2) island constraints: hard since the blocked structures often require considerable reformulation to express their meaning (as pointed out by Fodor & Crain wrt the Uniqueness Principle in their chapter in McWhinney (1987).

    3) no complex possessors in Piraha: hard because a complex periphrasis involving two clauses is needed to express the meaning of the blocked construction. Easy otoh would be the restrictions on prenominal possessors in German, since semantically equivalent postnominal ones are also available.

    Intermediate and unclear cases would include that-trace and wanna contraction, since Gathercole & Zukowski, respectively, have found the relevant data to be quite rare in the input of young children, at least, but they're certainly very common relative to the Icelandic agreement with QC PRO subject ones.

    ReplyDelete
  4. Oh yes, and the bottom line point being that alleged UG principles whose support comes only from easy (and probably intermediate) cases need to be regarded skeptically until or unless they can get solid support from typology. I'd consider structure dependence of Aux-inversion as an intermediate case that does get the needed support from typology.

    ReplyDelete
    Replies
    1. I completely endorse A's proposal. One caveat, indirect negative evidence, I.e. absence of evidence being evidence of absence, seems to me to require a UG to find the gaps. This said, for some cases, using absence will be easy, sometimes not. E.g. We don't see many 3-level of embedding sentences, nonetheless their absence does not indicate they are not generable. I assume that this is not an option in UG and so the absence here tells us nothing. However, cases like this indicate that the indirect neg evidence story does not make sense, I believe, without some UG to play off of. Maybe I misunderstood your last point however. The proposal to find those cases where this strategy was workable, where INE could be used and where not strikes me as an excellent suggestion.

      Delete
    2. To clarify the last point (hopefully), the Morphological Blocking Principle in Andrews (1990) should be abandonded, because the stuff it covers falls into the easy class. The various proposals floating around for island constraints (traditional Chomsky, LFG's functional uncertainty, Pearl and Sprouse's node-paths) should not be, because they aren't easy in this sense (except in languages with 'syntactically active' resumptives (Asudeh 2012), ie, resumptives that suspend island constraints).

      For island constraints, I think the way to go is to assume that extraction is impossible in the absence of evidence that it is possible, and then you need to specify a 'basis of generalization' to interpret the possibility evidence. E.g. on Pearl & Sprouse's scheme, if you hear 'who did you give a book to' you add VP PP to the list of possible paths, so then expect to be able to say 'who did you steal a book from'. I suspect that all of the three approaches I mentioned above can be made to work, and that it will be somewhere between very difficult and completely impossible to distinguish them, but they are all instances of the same strategy, which would be a relatively modest form of UG, at least for people who don't insist that UG has to be language-specific (how do we know that P&S's paths, or Chomskian escape hatches, aren't important for learning to tie your shoes? Seems implausible, but we don't *know* that it's false.)

      Delete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. So I started a blog about it:

    http://innatelysyntactic.wordpress.com/

    I think it will take some time to work through the details.

    [if you can seamlessly remove the two posts above plus this, that might be good]

    ReplyDelete