I grew up in a philosophy department and so I always feel a
warm glow of nostalgia when a book by an eminent practitioner of this dark art turns
his/her attention to my tiny areas of interest. Recently, Jesse Prinz,
Distinguished Professor in Philosophy at CUNY, has directed his attention to
the shortcomings of rationalist efforts in the mental sciences in an effort to
resuscitate empiricist conceptions of mind. The book, Beyond Human Nature, is very much worth looking at (but for god’s
sake don’t buy it!) for it is a good primer on just how little empiricist
conceptions, despite the efforts of its mightiest minds, have to offer those
seriously interested in cognition. I’m not talking modest contributions, I’m
talking NADA! Before proceeding, some
warnings. I am a severe thaasophobe, with an attention span that requires quick
capture. Banalities and weasel wording
can induce immediate narcoleptic seizure. Prinz held me to the end of chapter 6
before Morpheus bared further progress. I never fight with gods. Consequently,
remarks here are limited to the first six chapters and of these they
concentrate mainly on 6, this being dedicated to what I know best, the Chomsky
program in generative grammar. With this caveat, let’s push forward, though caveat lector, this post is way too long.
My main objection with the book is that it refuses to play
by the rules of the game. I have discussed this before (here), but it is worth
reviewing what is required to be taken seriously. Here are the ground rules.
First, we have made many empirical discoveries over the
years and the aim must be to explain these facts. In linguistics these include Island effects,
fixed subject effects, binding theory effects etc. I know I have repeated
myself endlessly about this, but it seems that no matter how often we emphasize
this, critics refuse to address these matters. Prinz is no exception, as we
shall see.
Second, if one is interested in not merely debunking
generative grammar but the whole rationalist enterprise in cognition then
attention must be paid to the results there, and there have been many. We have
reviewed some by Gleitman and Spelke (here, here) but there are many more (e.g.
by Baillargeon on causality and Wynn on numbers a.o.). Prinz touches on these but
is coy about offering counter analyses.
Rather he is satisfied with methodological reflections on the
difficulties this kind of work must deal with and dismisses 40 years of
research and literally hundreds of detailed proposals by pointing out the
obvious, viz. that experiments must be done carefully and that this is hard.
Not exactly big news.
Third, though acknowledging these data points is a necessary
first step, more is required. In addition one must propose alternative mechanisms
that derive the relevant facts. It is
not enough to express hopes, desires, expectations, wishes etc. We need
concrete proposals that aim to explain the phenomena. Absent this, one has contributed nothing to
the discussion and has no right to be taken seriously.
That’s it. These are the rules of the game. All are welcome
to play. So what does Prinz do. He adopts a simple argumentative form, which
can be summarized as follows.
1.
He accepts that there are biological bases for
cognition but that they vastly underdetermine human mental capacities. He dubs his
position “nurturism” (just what we need another neologism) and he contrasts
with “naturism.”[1]
2.
His main claim is that cognition has a heavy
cultural/environmental component and that rationalism assumes that “all brains
function in the same way…” and “our behavior is mostly driven by biology”
(102).
3.
He reviews some of the empirical arguments for
rationalism and concludes that they are not apodictic, i.e. it is logically
possible that they are inconclusive.
4.
He buttresses point 3 by citing work purporting
to show methodological problems with rationalist proposals. Together 3 and 4
allow Prinz to conclude that matters are unsettled, i.e. to declare a draw.
5.
Given the draw, the prize goes to the “simpler”
theory. Prinz declares that less
nativism is always methodologically preferable to more and so given the
empirical standoff, the laurel goes to the empiricists.
That’s the argument. Note what’s missing: no counter
proposals about relevant mechanisms. In short, Prinz is violating the rules of
the game, a no-no. Nonetheless, let’s
look a bit more into his argument.
First, Prinz allows that there is some innate structure to
minds (e.g. see around 152).[2]
The question is not whether there is native structure, but how much and what
kind. For Prinz, associationist
machinery (i.e. anything coming in through the senses with any kind of
statistical massaging) is permissible. Domain specific modular knowledge with
no simple perceptual correlates is not (c.f. 171).
This is standard associationsim at its grubbiest. So despite his insistence about how the truth
must lie at some point between the naturist and nurturist extremes, Prinz
erects his standard on pretty conventional empiricist ground. No modularity for
him. It’s general learning procedures or nothing.
Why does Prinz insist on this rather naïve version of
empiricism? He wants to allow for cultural factors to affect human mental life.
For some reason. he seems to think that this is inconsistent with rationalist
conceptions of the mind. Why is beyond
me. Even if the overall structure of
minds/brains is the same across species, this does not prevent modulation by
all sorts of environmental and cultural factors. After all, humans have four
chamber hearts as a matter of biology but how good an individual heart is for
marathons is surely heavily affected by cultural/environmental factors (e.g.
training regimens, diet, altitude, blood doping etc.).
So too with cognition.
Indeed, within linguistics, this has been recognized as a boundary
condition on reasonable theorizing since the earliest days of generative
grammar. The standard view is that UG provides design specifications for particular
Gs, and particular Gs can be very different from one another. In a standard
P&P theories the differences are related to varying kinds of parameter
settings, but even non-parameter theories recognize the fact of variation and
aim to explain how distinct Gs can be acquired on the basis of PLD.
Indeed, one of the standard arguments for some cognitive
invariance (i.e. UG) arises from the fact that despite all the attested variation
among particular Gs, they have many properties in common. Comparative syntax
and the study of variation has been the source of some of the strongest
arguments in favor of postulating a rich domain specific UG. In short, the
problem from the outset has been to explain both
the invariance and the variation. Given all of this, Prinz’s suggestions that
rationalists ignore variation is simply mystifying.[3]
Moreover, he seems ignorant of the fact that to date this is
really the only game in town. Prinz is
staking a lot on the new statistical learning techniques to supply the
requisite mechanisms for his empiricism. However, to date, purely statistical
approaches have had rather modest success. This is not to say that stats are
useless. They are not. But they are not the miracle drug that Prinz seems to
assume they are.
This emerges rather clearly in his discussion of that old
chestnut, the poverty of the stimulus argument (POS) using the only example
that non-linguists seem to understand, polar questions. Sadly, Prinz’s presentation of the POS
demonstrates once again how subtle the argument must be for he clearly does not
get it. The problem (as Paul Pietroski went over in detail here and that I
reviewed again here) is to explain constrained homophony (i.e. the existence of
systematic gaps in sound-meaning pairings).
It is not to explain how to affix
stars, question marks and other diacritics to sentences (i.e. not how to rank linguistic
items along an acceptability hierarchy). There has been a lot of confusion on
this point and it has vitiated much of the criticism of Chomsky’s original
argument. The confusion likely stems
from the fact that whereas an acceptability hierarchy is a standard byproduct
of a theory of constrained homophony, the converse is not true, i.e. a theory
of acceptability need not say much about the origins of constrained
homophony. But as the question of
interest is how to relate sound and meaning (viz. the generative procedures
relating them), simply aiming to distinguish acceptable from unacceptable
sentences is to aim in the wrong direction.
Why is this important? Because of the myriad dumb critiques
of Chomsky’s original POS argument that fail precisely because they misconstrue
the explanadum. The poster child of this
kind of misunderstanding is Reali and Christiansen (R&C), which, of course,
Prinz points to as providing a plausible statistical model for language
acquisition. As Prinz notes (2513),
P&C’s analysis counts bigram and trigram word frequencies and from just so
counting, is able to discriminate (1) from (2).
(1) Is
the puppy that is barking angry?
(2) Is
the puppy barking is angry?
Prinz is delighted with this discovery. As he says:
This is an extremely important
finding. By their second birthday, children have heard enough sentences to
select between grammatical and ungrammatical questions even when they are more
complex than the questions they have heard (loc 2513).
The problem however is that even if this is correct, the
R&C proposal answers the wrong question. The question is why can’t kids
form sentences like (2) with the meaning “is it the case that the angry puppy
is barking” on analogy with (1)’s meaning “is it the case that the barking
puppy is angry”? This is the big fact. And it exists quite independently of the
overall acceptability of the relevant examples. Thus (3) carries only the
meaning we find in (1), not (2) (i.e. (3) cannot mean “is it the case that the
puppy that barked was the one that Bill kissed.”).
(3) Did
the puppy Bill kissed bark
This is the same fact that (1) and (2) discuss but with no
unacceptable string to account for, i.e. no analogue of (2). Bigrams and trigrams are of no use here. What
we need is a rule relating form to meaning and an explanation of why some
conceivable rules are absent resulting in the inexpressibility of some meanings
by some sentences. Unfortunately for
Prinz, R&C’s proposals don’t even address this question let alone provide a
plausible answer.
Why do Prinz and R&C so completely misunderstand what
needs explaining. I cannot be sure, but here is a conjecture. They confuse data
coverage with using data to probe structure.
For Chomsky, the contrast between (1) and (2) results from the fact that
(1) can be generated by a structure dependent grammar while (2) cannot be. In
other words, these differences in acceptability reflect differences in possible
generative procedures. It is generative
procedures that are the objects of investigation not the acceptability data. As Cartwright argued (see here), empiricists
are uncomfortable with the study of underlying powers/structures, viz. here the
idea that there are mental powers with their own structural requirements. Empiricists
confuse what something does with what it is.
This confusion is clearly at play here with the same baleful effects
that Cartwright noted are endemic to empiricist conceptions of scientific
explanation.
I could go on sniping at Prinz’s misunderstandings and poor
argumentation. And I think I will do so to make two more points.
First, Prinz really seems to have no idea how poor standard
empiricist accounts have been. Classical
associationist theories have been deeply unsuccessful. I want to emphasize this for Prinz sometimes
leaves the impression that things are not nearly so hopeless. They are. And not only in the study of humans, but in
mammal cognition quite generally.
Gallistel is the go-to guy on these issues (someone that I
am sure that Prinz has heard of after all he just teaches across the bridge at
Rutgers). He and King review some of the
shortcomings in Memory and the
Computational Brain, but there is a more succinct recapitulations of the
conceptual trials and empirical tribulations of the standard empiricist learning
mechanisms in a recent paper
(here). It’s not pretty. Not only are there a slew of conceptual
problems (e.g. how to deal with the effects of non-reinforcement (69)), but the
classical theories fail to explain much at all. Here’s Gallistel’s conclusion
(79):
Associationist theories have not
explained either the lack of effect of partial reinforcement on reinforcements
to acquisition or the extinction-prolonging effect of partial reinforcement.
Nor have they explained spontaneous recovery, reinstatement, renewal and
resurgence except by ad hoc parametric assumptions…I believe these failures
derive from the failure to begin with a characterization of the problem that
specific learning mechanisms and behavioral systems are designed to solve. When
one takes an analysis of the problems as one’s point of departure…insights
follow and paradoxes dissolve. This perspective tends, however, to lead the
theorist to some version of rationalism, because the optimal computation will
reflect the structure of the problem, just as the structure of the eye and the
ear reflect the principles of optics and acoustics.
Gallistel’s arguments hinge on trying to understand the
detailed mechanisms underlying specific capacities. It’s when the rubber hits
the road that the windy generalities of empiricism start looking less than
helpful. Sadly, Prinz never really gets
down to discussions of mechanisms, i.e. he refuses to play by the rules. Maybe it’s the philosopher in him.
So what does Prinz do instead? He spends a lot of time discussing
methodological issues that he hopes will topple the main results. For example,
he discusses how difficult it can be to interpret eye gaze, the standard
measure used in infant and toddler studies (1547). Eye gaze can be hard to interpret. What
change it is indexing can be unclear. Sometimes it indexes stimulus similarity
other times novelty. Sometimes it is hard to tell if it’s tracking a surface
change in stimulus or something deeper. And that’s why people who use eye gaze
measures try to determine what eye gaze duration is actually indexing in the
particular context in which it’s being used.
That’s part of good experimental design in these areas. I know this
because this is extensively discussed in the lab meetings I sit in on (thanks Jeff)
whenever eye gaze duration is used to measure knowledge in the as yet
inarticulate. The technique has been
used for a long long time. Hence its potential pitfalls are well known and for
precisely this reason it is very unlikely that all the work that uses it will go down the intellectual drain for
the trivial methodological reasons that Prinz cites. To put it bluntly,
Baillargeon, Carey, Spelke, Wynn etc. are not experimentally inept. As Prinz notes, there are hundreds
(thousands?) of studies using this technique that all point in the same
rationalist direction. However blunt a
measure eye-gaze is, the number of different kinds of experiments all pointing
to the same conclusion is more than a little suggestive. If Prinz wants to bring this impressive
edifice crashing down, he needs to do a lot more than note what is common
knowledge, viz. that eye gaze needs contextual interpretation.
And of course, Prinz knows this. He is not aiming for victory. He is shooting
for a tie (c.f. 1512). He doesn’t want to show that rationalists wrong
(just that “they don’t make their case”) and empiricists right (ok, he does want this but clearly believes this
goal is out of reasonable reach), rather he wants to muddy the waters, to
insinuate that there is less to the myriad rationalist conclusions than meets
the eye (and there is a lot here to meet an unbiased eye), and consequently
(though this does not follow as he no doubt knows) that there is more to
empiricist conceptions than there appears to be. Why? Because he believes that
“empiricism is the more economical theory” (1512) and should be considered
superior until rationalist prove they are right.
This strategy, aside from setting a very low bar for
empiricist success, conveniently removes the necessity of presenting
alternative accounts or mechanisms for any phenomena of interest. Thus, whereas
rationalists try to describe human cognitive capacities and explain how they
might actually arise, Prinz is satisfied with empiricist accounts that just
point out that there is a lot of variation in behavior and gesture towards
possible socio-environmental correlates. How this all translates into what
people know or they do what they do is not something Prinz demands of
empiricist alternatives. [4]
He is playing for a tie, assured in the belief that this is all he needs. Why
does he believe this? Because he believes that Empiricism is “a more economical
theory.”
Why assume this? Why think that empiricist theories are
“simpler”? Prinz doesn’t say, but here is one possible reason: domain
specificity in cognition requires an account of its etiology. In other words,
how did the innate structure get there (think Minimalism)? But, if this is the reason then it is not
domain specificity that is problematic, but any difference in cognitive power
between descendant and ancestor. Here’s what I mean.
Say that humans speak language but other animals don’t. Why?
Here’s one explanation: we have domain specific structure they don’t. Here’s another, we have
computational/statistical capacities they don’t. Why is the second account
inherently methodologically superior to the first? The only reason I can think
of is that enhanced computational/statistical capacities are understood as
differences in degree (e.g. a little more memory) while domain specific
structures are understood as differences in kind. The former are taken to be easy to explain,
the latter problematic. But is this true?
There are two reasons to think not. Consider the language
case. Here’s the big fact: there’s nothing remotely analogous to our linguistic
capacities in any other animal. If this is due to just a slight difference in
computing capacity (e.g. some fancier stats package, a little more memory) then
we need a pretty detailed story demonstrating this. Why? Because it is just as
plausible that a little less computing capacity should not result in this
apparently qualitative difference in linguistic capacity (indeed, this was the
motivation behind the earlier teach-the-chimps/gorillas-to-talk efforts). What
we might expect is more along the following lines: slower talk, shorter
sentences, fewer ‘you know’s interspersed in speech. But a complete lack of
linguistic competence, why expect this? Maybe the difference really is just a
little more of what was there before, but, as I said, we need a very good story
to accept this. Need I say that none has been provided?
Second, there are many different ways to adding to
computational/statistical power. For example, some ways of statistically
massaging data are computationally far more demanding than others (e.g. it is
no secret that Bayesianism if interpreted as requiring updating of all relevant
alternatives is too computationally extensive to be credible and that’s why
many Bayesians claim not to believe that this is possible).[5]
If the alternative to domain specific structure is novel (and special) counting
methods then what justifies the view that the emergence of the latter is easier
to explain than the former?
Prinz’s methodological assumption here is not original. Empiricists often assume that rationalism is
the more complex hypothesis. But this really depends on the details and no
general methodological conclusions are warranted. Sometimes domain specific structures allow
for economizing on computational resources.[6]
At any rate, none of this can be adjudicated a priori. Specific proposals need to be proposed and examined. This is how the game is played. There are no
methodological shortcuts to the argumentative high ground.
This post has gone on far too long. To wrap up then: there is a rising tendency
to think well again of ideas that have been deservedly buried. Prinz is the latest herald of the
resurrection. However, empiricist conceptions of the mind should be left to
mold peacefully with other discredited ideas: flat earth, epicycles, and
phlogiston. Whatever the merits of these
ideas may once have been (these least three, did have some once) they are no
longer worth taking seriously. So too with classical empiricism, as Prinz’s
book ably demonstrates.
[1]
I’m going to drop the naturism/nurturism lingo and just return to the
conventional empiricism/rationalism labels.
[2]
All references are to the Kindle version.
[3]
I’m no expert in these areas but it seems obvious to me that the same can be
said about most work in cognitive psychology. Actually, here, if anything, the
obsession with dealing with individual differences (aka cognitive variation)
has retarded the search for invariances.
In the last decades this has been partly assuaged. Baillargeon, Spelke,
Carey, Gleitman a.o. have pursued research strategies logically analogous to
the one described above in generative grammar.
[4]
I should be careful here. I am discussing capacities, but Prinz mainly trucks
in behavior loc 112-132). Most rationalists aim to understand the structure of
mental capacities, not behavior. What someone does is only partially determined
by his/her mental capacities. Behavior,
at least for linguists of the generative variety, is not an object of study, at
least at present (and in my own view, never).
I am going to assume that Prinz is also interested in capacities, though
if he is not then his discussion is irrelevant to most of what rationalists are
aiming to understand.
[6]
Berwick had an excellent presentation to this effect at the recent LSA meeting
in Boston. I’ll see if I can get the slides and post them.
As regards the question of why the POS argument is so frequently misunderstood, I have to admit that I think we linguists could still do a better job of explaining it. As I see it the crucial point is this:
ReplyDeleteThe problem ... is to explain constrained homophony (i.e. the existence of systematic gaps in sound-meaning pairings). It is not to explain how to affix stars, question marks and other diacritics to sentences.
But to my mind the following way of putting things, although relatively common/standard, muddies the waters a bit:
They confuse data coverage with using data to probe structure. For Chomsky, the contrast between (1) and (2) results from the fact that (1) can be generated by a structure dependent grammar while (2) cannot be. In other words, these differences in acceptability reflect differences in possible generative procedures. It is generative procedures that are the objects of investigation not the acceptability data.
This second quote seems to suggest that while the likes of R&C do in fact account for the acceptability data, there's something else besides the data that they are missing, or they're accounting for the data in some irrelevant or unsatisfying way.
I think the more straightforward way to put things is simply to say that "acceptability" is a property of sound-meaning pairings, not sounds (or strings) alone. Then the acceptability data to account for has the form: these string-meaning pairings are acceptable (i.e. this string goes with that meaning, etc.), and these other string-meaning pairings are not. (The best concrete cases to flesh out this way might be Paul's examples about waffles and parking meters and lost hikers.) No doubt this is what every good linguist has in mind, in some sense, when they make the argument; I'm not trying to say anything new. But this is much simpler, I think, than the usual angle which invites the misunderstanding that getting the right set of strings is "the data", but pairing them with the right meanings is part of "the way you get the data", and there are "good ways" and "bad ways" to get the data.
The fact that we are "using data to probe structure", and that "generative procedures ... are the object of investigation not the acceptability data", is of course relevant, but I think bringing up these issues just makes the objection to string-classifiers seem more philosophical or aesthetic than necessary. A clearer and blunter way to put it is simply that if the data take the form of classifications of string-meaning pairs, then a string-classifier doesn't cover the data.
(To be honest I actually think that quite generally, the notion of acceptability as a property of strings might well do more harm than good. The only thing it seems to mean is derived from the acceptability of string-meaning pairs, i.e. we could define s to be string-acceptable iff there is some m such that (s,m) is acceptable. But we don't find much use for the mirror image notion of meaning-acceptable (i.e. there is some s such that (s,m) is acceptable), and is there reason to think that string-acceptability is any more relevant?)
I think that is very much on the right track.
DeleteI think to be fair the way that the POS has been put forward in the literature is much more as a 'weak' property, and that is why people like R & C have taken it as such. The 'constrained homophony' version of the POS is much more recent. (Indeed searching for the phrase 'constrained homophony' doesn't produce any hits before 2009.
If you look at the 2002 Special issue of the Linguistic review, none of the retorts to Pullum and Scholz say, oh wait a minute you have missed the point, it is really about the syntax/semantics interface, and homophony (If I recall correctly). Similarly if you look at Laurence and Margolis portrayal of the POS, which is very pro-Chomsky, they talk just about the primary linguistic data (which they take to be the strings).
Perhaps this has been exacerbated by Chomsky's general antipathy towards semantics and perhaps also by the misuse of the autonomy of syntax?
So I think the "constrained homophony" version is just a *new* version of the POS (and quite a good one, to be clear).
But I am very willing to be corrected as there is a lot of literature I am not familiar with. So I would be genuinely very interested to know if there are some clear quotes from the period 1980-2000 that show that this version of the POS has some history.
P&S assumed (at least for the sake of argument) that what the child acquired was a rule relating questions and declarative statements, not merely the ability to mark certain strings as grammatical and others as ungrammatical. A rule relating questions to declarative statements is, implicitly at least, a rule which imposes constraints on homophony. So I think one reason that P&S didn't get that kind of response is that they didn't miss the point in the same way Reali and Christiansen did.
DeleteChomsky doesn't have much time for certain strands of formal semantics, but data points regarding what sentences can and can't mean have always played an important role in his work. (E.g., half of the ECP facts relating to movement of adjunct wh-phrases crucially depend on judgments regarding possible interpretations.)
Yes good point. And I guess your second point is exactly what Tim is getting at too.
DeleteI certainly agree with Alex C. that it's become much more common recently to talk explicitly about string-meaning relations in POS arguments, but my suspicion (which may well be wrong) is that this is indicative of people being forced to better express what they meant, rather than people adjusting what they meant. As Alex D. points out, data points about what things can and can't mean have always been part of the picture, it's just that there seems to have been a tendency at times to not clearly distinguish (a) the observable fact that certain string-meaning pairs are acceptable and others aren't, from (b) the theoretical move of assigning a certain kind of structure (and operations on this structure, etc.) in order to capture these facts. Failing to make this distinction would pave the way for sub-optimal presentations of the POS argument which said something like "It's not enough to just get the right set of strings, you have to also assign those strings the right structures", when what was really intended (or should have been intended) was "you have to also assign those strings the right meanings".
DeleteBut I haven't looked back at earlier arguments carefully, and I would also be genuinely interested to know about early presentations of POS arguments that talk relatively explicitly the sound-meaning relation.
Sadly, the only explicit discussion of POS in the literature (a real disservice I believe) is the one dealing with polar questions. There are many given by Crain that explicitly discuss absent meanings and Chomsky has many cases where he observes that X cannot mean Y (think of the principle C illustrations ('he thinks John is tall' can't mean John thinks he is tall) and B ('John likes him' cannot mean John likes himself). All of these have the ingredients for a POS argument but they have been discussed as such less directly than Y/N questions. The problem with the literature redoing Chomsky is that it is obsessed with the idea that sentences don't really have any interesting structure (hence bi/tri gram models). From the get go the point has been to understand the kinds of rules that grammars qua rule systems that pair sounds and meanings. The term 'constrained homophony' is new. The concept and its place in POS arguments very old. So far as I can tell, it's what the POS has always been about; why some kinds of rules of grammar and not others given that many kinds that could be compatible with the PLD are nonetheless absent.
DeleteOne think I have been chewing over recently in the context of these arguments is what the inputs are.
DeleteSo if we consider the constrained homophony POS, then we are trying to get the right sound meaning pairs; so are we entitled to assume that the input is sound/meaning pairs?
That is to say, if we take the POS as an argument against certain types of learning algorithm, or against the possibility of certain types of learners/acquisition systems, then what are the inputs to these learners? Just the strings, or the strings associated with the one contextually appropriate meaning?
Since the get go the PLD has consisted of sound/meaning pairs from simple sentences. There are various conceptions of simple. For Lightfoot it is roughly degree 0. For Wexler and Culicover, it was degree 2. The latter formalizes the problem.
DeleteWhat's the "meaning"? I believe that most have taken it to be the thematic structure; who did what to whom sort of thing. This has a certain "epistemological priority" in Chomsky's terms, i.e. it is observable in a context and is not language dependent, think Davidsonian events. I find it likely that non-speakers (e.g. my dog Sampson) can parse a simple scene into its participants. If this is so, then humans can too, or so we assume.
The most interesting assumption is the degree 0 one. This may be too strong. Some interesting work on this by Lisa Pearl looking into OV to VO change historically and trying to figure out if we really don't look at embedding information or just if there is not enough of it.
So, yes, SM pairs as input. The M part being roughly a thematic decomposition. The "size" is roughly what you get in a simple degree 0 clause.
This comment has been removed by the author.
DeleteThe PLD is what children hear which consists of largely simple sentences but also some complex ones (this is the right terminology I hope).
DeleteI am not sure that I agree that the PLD has always been taken to be sound/meaning pairs. Chomsky is typically vague at crucial points, but for example in his Explanatory models in Linguistics, 1966, he is quite explicit that it is just the sentences, not sentence meaning pairs. Indeed he considers and rejects the role of semantics:
"For example, it might be maintained, not without plausibility, that semantic information of some sort is essential even if the formalized grammar that is the output of the device does not contain statements of direct semantic nature. Here, care is necessary. It may well be that a child given only the inputs of (2) as nonsense elements would not come to learn the principles of sentence formation. This is not necessarily a relevant observation, however, even if true. It may only indicate that meaningful- ness and semantic function provide the motivation for language learning, while play- ing no necessary part in its mechanism, which is what concerns us here."
(Of course when people try to solve the problem they may try to help themselves to additional information to make the model work, but that is a different issue, and perhaps related to your degree 0/2 comments).
I googled primary linguistic data, and one of the hits was this article (by you) which again seems quite explicit that it is the set of sentences (let me try and do a functioning link here.)
Obviously that is a long time ago, and things have changed since then but again if you could come up with some point explicit statement about the semantics as input that would be interesting.
And the BPYC paper that uses the phrase 'constrained homophony' is quite carefully worded throughout, and they do not mention that all three of the papers they critique only use sentences and not sentence/meaning pairs. Indeed the phrase 'constrained homophony' is itself very carefully chosen to leave the question open.
So maybe this is not as open and shut as all that. I don't know if Bob B or Paul P would like to add anything here.
Just to be clear -- I am actually on Chomsky's side here (or at least the 1966 Chomsky), I don't think that one should assume that semantics is importantly involved in learning syntax. And of course the Parameter setting models (like Charles Yang's) don't really use semantics at all, except maybe to bootstrap the syntactic categories early on.
(This was missing a 'but' which I put in, and deleted the previous comment.)
Reply in two parts as needed more room.
DeleteGrammars since "the earliest days of generative grammar" mediated sound meaning pairings. The POS was an argument to determine the proper shape of grammars, hence what rules/operations mediate sound/meaning pairings. Indeed, the reasons Deep Structure played such a prominent role in early discussions is that Chomsky wanted to make clear that the problem was not getting a surface parse of the string, but to establish rules that mapped structures with a surface parse to structures that that were adequate for semantic interpretation. Chomsky's point about semantics was not that grammars did not map to them but that their properties were not reducible to semantic properties, viz. the autonomy of syntax thesis.
So, given this, one can ask what are the inputs to FL/UG that the LAD uses to establish a grammar, i.e. such a mapping? Certainly from very early on they included semantic information. Recall that very very early on people were interested in binding (Lees & Kilma), control, scope of negation, etc. These all had semantic consequences and were rules that included semantic restrictions in their formulation (c.f. Lees and Klima on identity conditions). Second, diagnostics for raising and passivization involved preservation of thematic information. So from the beginning grammar rules involved meaning. How much? Well 'semantics' is a vague term, but at the very least thematic roles: who did what to whom. Theta information was part of the PLD. Plausibly quantifier scope and scope of negation etc was not. Again, this is why Chomsky considered thematic information to enjoy "epistemological priority." This meant that it was usable by the LAD to develop a grammar, it did not presuppose having one, which plausibly notions like scope and binding do. So, the PLD consists of thematic info and phono info and your job Mr Phelps, should you agree to this mission, is to find the rules that generate the unbounded sets of sound/meaning dependencies. These rules are syntactic (that's what generates the dependencies, PF and LF being purely interpretive, though that's a little later) and the PLD does NOT involve all the sentences that will be generated as this is impossible, the latter being unbounded.
Continuation:
DeleteIt took a little time to triangulate on the details of the problem, but Wexler and Culicover's work made a positive proposal about how the problem should be posed: they assumed sound meaning pairs up to degree 2. Others, e.g. Lightfoot, thought that this was too much and that we should aim for degree 0 (plus a little bit, roughly unembedded binding domains) as the right characterization of the PLD. This issue, I believe, is still being debated. What is not debated and has not been for quite a while is that the PLD is sound/meaning pairs.
As regards constrained homophony: there are two kinds of data that syntacticians have used: (i) acceptability, (ii) acceptability under an interpretation. The former is a degenerate instance of the latter. Some sentences are unacceptable regardless of their interpretation, some only relative to one. Chomsky, to his (and our) everlasting consternation, illustrated the POS argument using examples that could be grasped using data like (i). Why? BECAUSE IT WAS INTENDED AS A SIMPLE ILLUSTRATION OF THE LOGIC OF THE PROGRAM FOR NEOPHYTES!!!!! This was grabbed by everyone and his brother and mangled to death. Recently, Bob, Paul, Noam used another term to highlight the fact that this illustration had been misunderstood and so to make this clear 'constrained homophony' was introduced. This is how I read the history.
However, say that I am wrong about the history (I'm not), does it matter? The logic is clear: grammars relate sounds and meanings. POS is a tool to investigate the structure of grammars. The inputs are information from the relate, sounds and meanings properly construed so as to be available to pre-linguistic LADs. Given this construal most of what has been done by Chomsky's POS critics is simply besides the point. Nobody can stop them from doing this work (I wuldn't if I could, free country) but nobody has to take this stuff seriously either. It is largely besides the point. That's the take home message.
When you say "the PLD consists of thematic info and phono info", what exactly is the thematic info? Is it the thematic bare bones of the intended meaning of the utterance, or a sort of model of the thematic relations that held in (the relevant nearby parts of) the world when the utterance was made?
DeleteThe latter seems more natural, but leaves some extra work to be done by the learner obviously.
It has been uncontroversial that who-did-what-to-whom-how info is available. That's what I meant by thematic info, what one gets from a classical D-structure representation given something like UTAH or DS as a "pure representation of GF-theta." However, as we all know, there is more to semantics than this. There is scope information, dependency information (binding/control), and maybe more still. The classic assumption I believe is that the semantic info is limited to the thematic information (recall DS, aka kernel sentences, were *given* in syntactic structures). If so, then all of the rest must follow in some principled fashion from general features of the grammar and mapping to the interface. Some reasonable principles are that A scopes over B iff A c-commands B, or merge being interpreted as conjunction, or A semantically binds B iff A syntactically binds B (i.e. c-coomands and is coindexed with). I am not saying this is true, just that it is reasonable. Chomsky has generally assumed that there is no variation at LF (his argument being that it is too remote from "experience"). I'm not fully convinced by the argument, but I do think that it is reasonable to believe that thematic info is epistemologically prior in the relevant sense to be usable by the LAD to acquire Gs using UG. So, it's a nice place to start. Type driven grammars make similar assumptions as regards QR: QR is required given certain mapping principles and the inherent interpretation that quantifiers bear. Again, I am not endorsing this, just observing.
DeleteSo, there is at least thematic info and many have hope that there is at most such info. All the rest of the *meaning* following from general features of G as described in UG.
Oh yes, I do assume that it is readily accessible in the utterance situation; so thematic info in *relevant nearby parts of the world* as you put it. Utah is critical to allowing this sort of information to be mapped into usable syntactic phrase markers.
I am a bit confused about what parts of this discussion relate to your particular theory of language acquisition and which parts relate to the POS argument.
DeleteIf you assume that the child has access to theta roles in the PLD and you appeal to the UTAH then you are already assuming that you have nontrivial UG aren't you?
Namely innate knowledge of theta roles. But that seems a little bit circular in the context of the POS.
And isn't the whole point of the Gleitman/Medina/Trueswell articles that you can't do this? Which words cannot be learned from observation?
Could you point me to some papers where these basic assumptions about the PLD are laid out in the context of the POS, so I can think about them carefully? BPYC don't discuss it as far as I can see.
Blog posts only go so far.
Wexler and Culicover.
DeleteYes, the child has access to theta roles, these are "observable" in a context. So in a given situation a child knows how to identify an even, and the participants. This may be wrong, but I assume it is not. Theta roles are enjoy epistemological priority. I think I've mentioned this twice before. This means that you can identify agents, patients, etc independently of an active FL. Even my dog can do that, or so I suppose. UTAH is a mapping principle that relates Event descriptions to phrase markers. For example, it says that patients are internal arguments and agents are external arguments. This is part of UG. Why we map agents to external and patients to internal positions is an axiom. We cannot currently derive it from anything more general.
The Gleitman et al paper allows learning of words without grammar, it is just not very efficient. Things get interesting when the grammar kicks in because then syntactic info can supplement this slower process to really get things moving. That's how I read them at least. I should add that identifying event participants, the nouns, is something people do ok, much better than identifying the relevant events. At any rate, yes, there is innate structure to UG. No, theta roles are not part of UG but the mapping principles from theta roles to internal/external positions is part of UG (UTAH). The POS kicks in because even all of this is not enough. It does not tell you about possible movements, anaphoric dependencies, freezing effects, case licensing etc. There is a lot more to grammar than who did what to whom. However, the idea is that just given this info gleanable from simple clauses is all that you need to get a fully operational G. Why, the rest is provided by UG. PLD is necessary but not nearly sufficient to get a G.
Hope this helps. If not maybe I'll write a post on it.
Wexler and Culicover (the 1980 book) does not support what you are saying. It's a big book and I haven't read it for years, but I glanced at it this morning, and the relevant sections are 2.7 onwards especially pages 79-84.
DeleteThey admit that the PLD does not contain this information and say a propos of their model of learning from meaning/sound pairs (b,s).
'Of course in no way is b actually presented to the learner ...
Again for formal and notational purposes, we will make an assumption that appears to be at variance with this state of affairs [i.e. the reality of the situation , Alex]'.
And again the whole raft of arguments based on learnability theory deployed by nativists over the years are based on Gold's results, Horning's results etc which are based on the assumption that the PLD is just strings. So for example Partha Niyogi's 2006 book is entirely based on these assumptions.
So I just don't think your historical claim that everybody has always assumed that the PLD includes meanings is tenable.
And I don't think that the empirical argument that the child can extract the meanings of the utterances, starting at age 6 months, is tenable either; for the reasons that Tim alluded to earlier, as well as a lot of Gleitman's work, including, e.g. the work with Barbara Landau on blind children.
And of course this matters hugely to the logic of the POS argument -- what can be learned from sound/meaning pairs by empiricist learners is very different from and much larger than what can be learned from sounds alone by empiricist learner. For me the force of the POS argument flows from considerations about how little the input contains, how thin it is. If you make the input much richer then you deprive the POS of much of its power.
The POS has whatever power it has in a given formulation of the input and the output. Let me reiterate, from the get go the aim was to understand how to acquire grammars that generated sound/meaning pairs. TO my knowledge, nobody ever thought it possible that this could be done in the absence of all information concerning meaning. The question was not whether meanings were relevant but which and how much. Try to explain the grammatical difference between 'John loves Mary' and 'Mary loves John' without saying something about theta roles and grammatical functions. Now, even given thematic information there is a lot that needs explaining. For example, even if I give you the thematic information that in 'John lives himself' John bears two theta roles deduce principle A, i.e.it's locality conditions, its complementarity with pronouns, principle C. There is still a lot to explain even given the thematic information and that's what UG was about. Given only thematic information about simple clauses deduce the properties of movement, i.e. islands, fixed subject effects, WCO, SCO, ECP. As you will quickly see, the thematic information underdetermines these restrictions. that's why if they are correct they must be in UG.
DeleteWithout knowing something about thematic structure movement theory is impossible. You need to know where you started from to understand how far you have moved. Thus, some version of theta roles is required and some version of UTAH. At any rate, as I noted, from the very beginning the issue was how to learn grammars that mapped sound to meaning, not strings to trees.
Now for some problems we can substitute the second problem for the first. Gold shows that even for doing the string to tree mapping one needs some restrictions (or that's how I read Niyogi's version of Gold). So IF this is how you think of the problem, you need something like a UG. But, and I emphasize this, it is NOT the problem generativists have been interested in. Did you ever notice that Chomsky has not discussed the Gold results much? Why? They formalize the wrong problem. That does not mean it is useless, just that it is not what we should be addressing.
So, stop using the term "meaning." That's a very vague term. Use thematic roles (i.e. what DS provided) and assume that you can get what's in simple clauses. Then go ahead and see how far you get without a rich UG. Here's what I know: not nearly far enough. THat's the power of the POS. What do we get? A pretty rich UG roughly described by theories like LGB. Minimalism's aim: to make these LGB descriptions follow from more reasonable assumptions.
I agree that natural languages associate sounds and meanings. And I agree that children hear languages in situations and that that is one of the ways that they learn that the word 'cat' means cat is by observing cats in the environment.
DeleteBut that doesn't commit me either to the existence of theta roles, the UTAH or your favoured semantic bootstrapping theory of language acquisition. And looking at BPYC again, they don't talk about theta roles, but they do talk about learning trees or structural descriptions from strings.
Gold had nothing to say about learning trees from strings; he only talks about weak learning. That is one of the main points of the BPYC paper; that we should be learning trees from strings rather than learning strings from strings.
Again, please could you give me a reference to a paper where this thematic roles version of the POS is explained, because W & C aren't putting forward a POS argument, but rather proposing a solution to the problem using inputs that they frankly admit are implausible.
First the reference: as I said Wexler and Culicover. They start with sound/meaning pairs, whether they think that this is empirically reasonable or not. Note, you should find this acceptable as by your lights, it makes the acquisition problem easier.
DeleteHistorically, as I noted, DS is taken as given: look at Syntactic Structures; We start with kernel sentences. There is no theory of these. The puzzle is to find the transformations that map these into surface structures. In Aspects, we assume that the lexicon is given and that lexical information provides enough information to give one a DS; actually there are PS rules but the lexical entries serve to filter out the inappropriate PS structures via lexical insertion. In EST and GB again lexical entries are given with their argument structure. From these X' structures are projected: i.e. the lexical structure of the head determines the phrase structure of the phrase. Again, this is assumed to be given. No the problem of course is what it means to be given? The POS arguments in syntax assume that we have lexical meanings and constructs arguments from these. However, it is fair to ask where the lexical information comes from. Here Dowty and Baker as well as our own Davidsonians (Paul is particularly good at this) consider how to go from theta roles (prove roles in Dowty) to either lexical items of base phrase structures. UTAH is the proposed principle for doing this and is incorporated in virtually every theory of grammar I know of. Again, till now, POS arguments have assumed that this much is given. And even with this information there is a big gap between input and attained G.
Now, what should we be learning: not trees from strings but Gs that generate phone/sem pairs from PLD. Now, if I get you correctly, you don't want to use theta information as part of the PLD. Fine with me. I think you won't get off the ground if you don't, but far be it from me to stand in your way. So long as we agree that what needs is explaining is how to get Gs that generate phone/sem pairs. That's the problem as generative grammar sees it.
Now does anything I say commit you to this? Of course not. I can't commit you to anything. I am simply observing that if you don't address the indicated problem then what you are doing is at right angles to what it is that generativists are interested in. You want to talk to this problem, great. If not, that's also fine but don't expect me or other generativists to pay close attention.
Last point: the mapping problem from theta roles/proto roles to (roughly) DSs is discussed in Baker and Dowty. Moreover, I suspect that when you think about the phone/sem generation you too will assume theta information plays a role for otherwise you wil have nothing to say about most of the structures linguists have discussed for years (passive, raising, unaccusatives, questions, control, binding etc). Of course you may not wish to address the properties of these kinds of constructions, but if you don't then most generativists will then not really care about what you have to say for it will be irrelevant to the identified problem.
Interesting post and discussion!
ReplyDeleteIt leads me to the question whether an A^nB^n grammar can tell neuroscientists anything at all about processing natural language.
Unclear. If this cannot be done then it tells you something. If it can, probably not much.
DeleteI think one thing that might be useful is for the syntax community to start classifying cases where absence 'easily' counts as evidence, vs those where it does so less easily (in terms of the sophistication of the statistical technology that the learner would need to use in order to interpret the absence as evidence). The cases I class as 'easy' have the following two characteristics:
ReplyDeletea) they arise frequently (blocking of regular verb formation rules by irregular forms; basic binding theory)
b) they involve the suppression of one conceivable way of expressing a meaning by another, where a meaning is construed of as a collection of lexical meanings plus a scheme of semantic composition. (blocking & binding, again). What makes this 'easy' is that when you hear something and understand it, you can run your grammar in generation mode to find out what other ways it's providing to express the meaning, and if it's possible to change the grammar to reduce the number of alternates without failing to parse existing utterances, you can tighten up your grammar (this is a grammar tightening/debugging method in the LFG/XLE community).
In addition to blocking and binding, other easy cases would be the dative alternation, pro-drop, clitic doubling (but not constraints on clitic combinations), and basic word order contraints (for people who think they are produced by LP constraints added ID rules).
Not easy would include:
1) some of the more complex structures that figure in the Icelandic syntax literature (I conjecture that I might have invented the very first ones ever to have been produced where a secondary adjective predicate shows case agreement with a quirky case-marked PRO subject (hún vonast til að vanta ekki eina/*ein í tímanum) - $20 reward for anyone who produces a naturally occurring example prior to 1976). Hard due to extreme rarity.
2) island constraints: hard since the blocked structures often require considerable reformulation to express their meaning (as pointed out by Fodor & Crain wrt the Uniqueness Principle in their chapter in McWhinney (1987).
3) no complex possessors in Piraha: hard because a complex periphrasis involving two clauses is needed to express the meaning of the blocked construction. Easy otoh would be the restrictions on prenominal possessors in German, since semantically equivalent postnominal ones are also available.
Intermediate and unclear cases would include that-trace and wanna contraction, since Gathercole & Zukowski, respectively, have found the relevant data to be quite rare in the input of young children, at least, but they're certainly very common relative to the Icelandic agreement with QC PRO subject ones.
Oh yes, and the bottom line point being that alleged UG principles whose support comes only from easy (and probably intermediate) cases need to be regarded skeptically until or unless they can get solid support from typology. I'd consider structure dependence of Aux-inversion as an intermediate case that does get the needed support from typology.
ReplyDeleteI completely endorse A's proposal. One caveat, indirect negative evidence, I.e. absence of evidence being evidence of absence, seems to me to require a UG to find the gaps. This said, for some cases, using absence will be easy, sometimes not. E.g. We don't see many 3-level of embedding sentences, nonetheless their absence does not indicate they are not generable. I assume that this is not an option in UG and so the absence here tells us nothing. However, cases like this indicate that the indirect neg evidence story does not make sense, I believe, without some UG to play off of. Maybe I misunderstood your last point however. The proposal to find those cases where this strategy was workable, where INE could be used and where not strikes me as an excellent suggestion.
DeleteTo clarify the last point (hopefully), the Morphological Blocking Principle in Andrews (1990) should be abandonded, because the stuff it covers falls into the easy class. The various proposals floating around for island constraints (traditional Chomsky, LFG's functional uncertainty, Pearl and Sprouse's node-paths) should not be, because they aren't easy in this sense (except in languages with 'syntactically active' resumptives (Asudeh 2012), ie, resumptives that suspend island constraints).
DeleteFor island constraints, I think the way to go is to assume that extraction is impossible in the absence of evidence that it is possible, and then you need to specify a 'basis of generalization' to interpret the possibility evidence. E.g. on Pearl & Sprouse's scheme, if you hear 'who did you give a book to' you add VP PP to the list of possible paths, so then expect to be able to say 'who did you steal a book from'. I suspect that all of the three approaches I mentioned above can be made to work, and that it will be somewhere between very difficult and completely impossible to distinguish them, but they are all instances of the same strategy, which would be a relatively modest form of UG, at least for people who don't insist that UG has to be language-specific (how do we know that P&S's paths, or Chomskian escape hatches, aren't important for learning to tie your shoes? Seems implausible, but we don't *know* that it's false.)
This comment has been removed by a blog administrator.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteSo I started a blog about it:
ReplyDeletehttp://innatelysyntactic.wordpress.com/
I think it will take some time to work through the details.
[if you can seamlessly remove the two posts above plus this, that might be good]