Faculty of Language: There's no poverty of the stimulus? PISH.

Sunday, November 23, 2014

There's no poverty of the stimulus? PISH.

Lately, I’ve been worried that many people – mostly psychologists, but also philosophers, computer scientists and even linguists, do not appreciate the argument from the poverty of the stimulus. They simply do not see what this argument is supposed to show. This difficulty leads to skepticism about the poverty of the stimulus and about generative syntax more generally, which, in turn, interferes with progress towards solving the problem of how children learn language.

An argument from the poverty of the stimulus is based on two observations: (a) there exists a generalization about the grammar of some language and (b) the learner’s experience does not provide sufficient data to support that generalization over a range of alternative generalizations. These two observations support the conclusion that something other than experience must be responsible for the true generalization that holds of the speakers of the language in question. This conclusion invites hypotheses. Typically, these hypotheses have come in the form of innate constraints on linguistic representations, though nothing stops a theorist from proposing alternative sources for the relevant generalization.

But a common response to this argument is that we just didn’t try hard enough. The generalization really is supported in the data, but we just didn’t see the data in the right way. If we only understood a little more about how learners build up their representations of the data, then we would see how the data really does contain the relevant generalization. So, armed with this little bit of skepticism, one can blithely assert that there is no poverty of the stimulus problem based only on the belief that if we linguists just worked a little harder, the problem would dissipate. But skepticism is neither a counter argument nor a counter proposal.

So, I’d like to issue the following challenge. I will show a poverty of the stimulus argument that is not too complicated. I will then show how I looked for the relevant data in the environment and conclude that it really wasn’t there. I will then invite all takers (from whatever field) to show that the correct generalization and none of the alternatives really is available in the data. If someone shows that the relevant data really was there, then I will concede that there was no poverty of the stimulus argument for that phenomenon. Indeed, I will celebrate, because that discovery will represent progress that all students of the human language faculty will recognize as such. Nobody requires that every fact of language derives from innate knowledge; learning which ones do and which ones don’t sounds like progress. And with that kind of progress, I’d be more than happy to repeat the exercise until we discover some general principles.

But, if the poverty of the stimulus is not overturned for this case, then we can take that failure as a recognition that the problem is real and that the way forward in studying the human language faculty is by asking about what property of the learner makes the environmental data evidentiary for building a grammar.

With that preamble out of the way, let’s begin. Consider the judgments in 1-2, which Leddon and Lidz (2006) show with experimentally collected data are reliable in adult speakers of English:

(1) a. Norbert remembered that Ellen painted a picture of herself

b. * Norbert remembered that Ellen painted a picture of himself

c. Norbert remembered that Ellen was very proud of herself

d. * Norbert remembered that Ellen was very proud of himself

(2) a. Norbert remembered which picture of herself Ellen painted

b. Norbert remembered which picture of himself Ellen painted

c. Norbert remembered how proud of herself Ellen was

d. * Norbert remembered how proud of himself Ellen was

The facts in (1) illustrate a very simple generalization: a reflexive pronoun must take its antecedent in the domain of the closest subject. In all of (1a-d) only Ellen can be the antecedent of the reflexive. Let us assume (perhaps falsely) that this generalization is supported by the learner’s experience and that there is no poverty of the stimulus problem associated with it.

The facts in (2) do not obviously fit our generalization about reflexive pronouns. If we take “closest subject” to be the main clause subject, then we would expect only (b) and (d) to be grammatical. If we take “closest subject” to be the embedded subject, then we expect only (a) and (c) to be grammatical. And, if we take “closest subject” to be underspecified in these cases, then we expect all of (a-d) to be grammatical. So, something’s gotta give. What we need is for the “closest” subject to be Ellen in (c-d), but not (a-b). And, we need closest subject to be underspecified in (a-b) but not (c-d). We’ll get back to a way to do that in a moment.

But first we should see how these patterns relate to the poverty of the stimulus. Leddon and Lidz (2006) also showed that sentences like those in (2) are unattested in speech to children. While we didn’t do a search of every sentence that any child ever heard, we did examine 10,000 wh-questions in CHILDES and we didn’t find a single example of a wh-phrase containing a reflexive pronoun, a non-reflexive pronoun or a name. So, there really is no data to generalize from. Whatever we come to know about these sentences, it must be a generalization beyond the data of experience.

One might complain, fairly, that 10,000 wh-questions is not that many and that if we had looked at a bigger corpus we might have found some with the relevant properties. We did search Google for strings containing wh-phrases like those in (2) and the only hits we got were example sentences from linguistics papers. This gives us some confidence that our estimate of the experience of children is accurate.

If these estimates are correct, the data of experience appears to be compatible with many generalizations, varying in whether Norbert, Ellen or both are possible antecedents in the (a-b) cases, the (c-d) cases or both. With these possibilities, there are 8 possible patterns. But out of these eight, all English speakers acquire the same one. Something must be responsible for this uniformity. That is the extent of the argument. It doesn’t really have a conclusion, except that something must be responsible for the pattern. The argument is merely the identification of a mystery, inviting hypotheses that explain it.

Here’s a solution that is based on prior knowledge, due to Huang (1993). The first part of the solution is that we maintain our generalization about reflexives: reflexives must find their antecedent in the domain of the nearest subject. The second part capitalizes on the difference between (2a-b), in which the wh-phrase is an argument of the lower verb, and (2c-d), in which the wh-phrase is the lower predicate itself. In (2a-b), the domain of the nearest subject is underspecified. If we calculate it in terms of the “base position” of the wh-phrase, then the embedded subject is the nearest subject and so only Ellen can be the antecedent. If we calculate it in terms of the “surface position” of the wh-phrase, then the matrix subject is the nearest subject. For (2c-d), however, the closest subject is the same, independent of whether we interpret the wh-phrase in its "base" or "surface" position. This calculation of closest subject follows from the Predicate Internal Subject Hypothesis (PISH): The predicate carries information about its subject wherever it goes. Because of PISH, the wh-phrase [how proud of himself/herself] contains an unpronounced residue of the embedded subject and so is really represented as [how ~~Ellen~~ proud of himself/herself]. This residue (despite not being pronounced) counts as the nearest subject for the reflexive, no matter where the predicate occurs. Thus, the reflexive must be bound within that domain and Ellen is the only possible antecedent for that reflexive. So, as long as the learner knows the PISH, then the pattern of facts in (2) follows deductively. The learner requires no experience with sentences like (2) in order to reach the correct generalization.

Now, this argument only says that the learner must know that the predicate carries information about its subject with it in the syntax prior to encountering sentences like (2). It doesn’t yet require that knowledge to be innate. So, the poverty of the stimulus problem posed by (2) shifts to the problem of determining whether subjects are generated predicate internally.

Our next question is whether we have independent support for PISH and whether the data that supports PISH can also lead to its acquisition. I can think of several important patterns of facts that argue in favor of PISH. The first (due, I believe, to Jim McCloskey) concerns the relative scope of negation and a universal quantifier in subject position. Consider the following sentences:

(3) a. Every horse didn’t jump over the fence

b. A fiat is not necessarily a reliable car

c. A fiat is necessarily not a reliable car

The important thing to notice about these sentences is that (3a) is ambiguous but that neither (3b) nor (3c) is. (3a) can be interpreted as making a strong claim that none of the horses jumped over the fence or a weaker claim that not all of them jumped. This ambiguity concerns the scope of negation. Does the negation apply to something that includes the universal or not? If it does, then we get the weak reading that not all horses jumped. If it does not, then we get the strong reading that none of them did.

How does this scope ambiguity arise? The case where the subject takes scope over negation is straightforward if we assume (uncontroversially) that scope can be read directly off of the hierarchical structure of the sentence. But what about the reading where negation takes wide scope? We can consider two possibilities. First, it might be that the negation can take the whole sentence in its scope even if it does not occur at the left edge of the sentence. But this possibility is shown to be false by the lack of ambiguity in (3c). If negation could simply take wide scope over the entire sentence independent of its syntactic position, then we would expect (3c) to be ambiguous, contrary to fact. (3c) just can’t mean what (3b) does. The second possibility is PISH: the structure of (3a) is really (4), with the struck-out copy of every horse representing the unpronounced residue of the subject-predicate relation:

(4) every horse didn’t ~~[every horse]~~ jump over the fence

Given that there are two positions for every horse in the representation, we can interpret negation as either taking scope relative to either the higher one or the lower one.

Is there evidence in speech to children concerning the ambiguity of (3a)? If there is, then that might count as evidence that they could use to learn PISH and hence solve the poverty of the stimulus problem associated with (2). Here we run into two difficulties. First, Gennari and MacDonald (2005) show that these sentences do not occur in speech to children (and are pretty rare in speech between adults). Second, when we present such sentences to preschoolers, they appear to be relatively deaf to their ambiguity. Julien Musolino and I have written extensively on this topic and the take away message from those papers is (i) that children’s grammars can generate the wide-scope negation interpretation of sentences like (3a), but (ii), it takes a lot of either pragmatic or priming effort to get that interpretation to reveal itself. So, even if such sentences did occur in speech to children, their dominant interpretation from the children’s perspective is the one where the subject scopes over negation (even when that interpretation is not consistent with the context or the intentions of the speaker) and so this potential evidence is unlikely to be perceived as evidence of PISH. And if PISH is not learned from that, then we are left with a mystery of how it comes to be responsible for the pattern of facts in (2).

A second argument (due to Molly Diesing) in favor of PISH concerns the interpretation of bare plural subjects, like in (5):

(5) Linguists are available (to argue with)

This sentence is ambiguous between a generic and an existential reading of the bare plural subject. Under the generic reading, it is a general property of linguists (as a whole) that they are available. Under the existential reading, there are some linguists who are available at the moment.

Diesing observes that these two interpretations are associated with different syntactic positions in German. The generic interpretation requires the subject to be outside of the verb phrase. The existential interpretation requires it to be inside the verb phrase (providing evidence for the availability of the predicate-internal position crosslinguistically). So, Diesing argues that we can capture a cross-linguistic generalization about the interpretations of bare plural subjects by positing that the same mapping between position and interpretation occurs in English. The difference is that in English, the existential interpretation is associated with the unpronounced residue of the subject inside the predicate. This is not exactly evidence in favor of PISH, but PISH allows us to link the German and English facts together in a way that PISH-less theory would not. So we could take it as evidence for PISH.

Now, this one is a bit trickier to think about when it comes to acquisition. Should learners take evidence of existential interpretations of bare plural subjects to be evidence of PISH? Maybe, if they already know something about how positions relate to interpretations. But in the end, the issue is moot because Sneed (2007) showed that in speech to children, bare plural subjects are uniformly used with the generic interpretation. How children come to know about the existential readings is itself a poverty of the stimulus argument (and one that could be solved by antecedent knowledge of PISH and the rules for mapping from syntactic position to semantic interpretation). So, if we think that the facts in (2) follow from PISH, then we still need a source for PISH in speech to children.

The final argument that I can think of in favor of PISH comes from Jane Grimshaw. She shows that it is possible to coordinate an active and a passive verb phrase:

(6) Norbert insulted some psychologists and was censured

The argument takes advantage of three independent generalizations. First, passives involve a relation between the surface subject and the object position of the passive verb, represented here by the invisible residue of Norbert:

(7) Norbert was censured ~~[Norbert]~~

Second, extraction from one conjunct in a coordinated structure is ungrammatical (Ross’s 1968 Coordinate Structure Constraint):

(8) * Who did Norbert criticize the book and Jeff insult

Third, extraction from a conjunct is possible as long as the extracted phrase is associated with both conjuncts (Across The Board extraction):

(9) Who did Norbert criticize and Jeff insult

So, if there were no predicate internal subject position in (6), then we would have the representation in (10):

(10) Norbert [_VP insulted some psychologists] and [_VP was censured ~~[Norbert]~~]

This representation violates the coordinate structure constraint and so the sentence is predicted to be ungrammatical, contrary to fact. However, if there is a predicate internal subject position, then the sentence can be represented as an across the board extraction:

(11) Norbert [_VP ~~[Norbert]~~ insulted some psychologists] and [_VP was censured ~~[Norbert]~~]

So, we can understand the grammaticality of (6) straightforwardly if it has the representation in (11), as required by PISH.

Do sentences like (6) occur in speech to children? I don’t know of any evidence about this, but I also don’t think it matters. It doesn’t matter because if the learner encountered (6), that datum would support either PISH or the conclusion that movement out of one conjunct in a coordinate structure is grammatical (i.e, that the coordinate structure constraint does not hold). If there is a way of determining that the learner should draw the PISH conclusion and not the other one, I don’t know what it is.

So, there’s a potential avenue for the stimulus-poverty-skeptic to show that the pattern in (2) follows from the data. First show that data like (6) occurs at a reasonable rate in speech to children, whatever reasonable means. Then show how the coordinate structure constraint can be acquired. Then build a model showing how putting (6) together with an already acquired coordinate structure constraint will lead to the postulation of PISH and not to the discarding of the coordinate structure constraint. And if that project succeeds, it will be party time; we will have made serious progress on solving a poverty of the stimulus problem.

But for the moment, the best solution on the table is the one in which PISH is innate. This solution is the best because it explains with a single mechanism the pattern of facts in (2), the ambiguity of (3), the interpretive properties of (5) and its German counterparts, and the grammaticality of (6). And it explains how each of these can be acquired in the absence of direct positive evidence. Once learners figure out what subjects and predicates look like in their language, these empirical properties will follow deductively because the learners will have been forced by their innate endowment to build PISH-compatible representations.

One final note. I am confident that no stimulus-poverty-skeptics will change their views on the basis of this post (if any of them even see it). And it is not my intention to get them to. Rather, I am offering an invitation to work on the kinds of problems that poverty of the stimulus arguments raise. It is highly likely that the analyses I have presented are incorrect and that scientists with different explanatory tastes would follow different routes to a solution. But we will all have a lot more fun if we engage at least some of the same kinds of problems and do not deny that there are problems to solve. The charge that we haven’t looked hard enough to find out how the data really is evidentiary is hereby dismissed. But if there are stimulus-poverty-skeptics who want to disagree about something real, linguists are available.

Jeff Lidz, November 23, 2014.

73 comments:

ChrisNovember 23, 2014 at 11:42 PM
This comment has been removed by the author.
ReplyDelete
Replies
ChrisNovember 23, 2014 at 11:44 PM
I have noticed that as I've moved further toward thinking about learning as a machine learning (ML) person and further away from thinking about learning as a linguist, I have begun to find poverty of the stimulus arguments progressively more difficult to grasp. (nb. I also don't see anything like these arguments in ML--perhaps because none of us can even imagine working with anything but impoverished data, except as theoretical constructs.) To me, a POS argument is necessarily bound up in lots of meaty, theory-internal constructs. You can't say whether the input is impoverished for some linguistic phenomenon without saying how said phenomenon is coded--and what the learning algorithm is. Thus, POS doesn't really have any content beyond specific hypotheses about properties of grammars and their learning algorithms, which are the big open questions of linguistic research! So, the "ultimate POS confirmation/refutation" will be a matter of accounting when all is said and done. (Btw, I say this as someone who periodically has dinner with an avowed member of the anti-POS camp, but the arguments somehow don't really stick in my ML-addled head.) In all, I think linguists should spend more time talking to ML theorists (not the language ones, just the ML theorists--some of them will drive you crazy, but the smart ones will be interesting allies).
ReplyDelete
Replies
Colin PhillipsNovember 24, 2014 at 6:41 AM
Two remarks on this.

1. I think that a big part of the problem is that discussion of POS arguments tends to devolve into fights about subject-auxiliary inversion and structure dependence. Which do not get us very far. We need to focus on more knotty cases, exactly what Jeff does here (yay!).

2. Chris is exactly right that a POS argument is bound up in "meaty, theory-internal constructs". A typical POS argument implicitly says not "the input is impoverished relative to all encodings and learning algorithms". It says "the input is impoverished relative to an (implicit) superficial encoding and learning algorithm". The argument then proceeds to say if we assume some more specific encoding of the input and some more specific learning algorithm, then the input is no longer impoverished at all. (Well, we hope that the second part is included, though that is often left as a promissory note.) Part of the problem, then is that the implicit "simple learning model" is almost never spelled out, making it a bit of a moving target.

When people say "if only you would work harder, you'd see how to learn from the input", the answer should be "that's exactly what our work is about." Of course, it's debatable whether the work of many linguists is indeed focused on that question.
ReplyDelete
Replies
NorbertNovember 24, 2014 at 7:28 AM
I also have several remarks:
1. It is depressing to see Chris respond to Jeff's case in this way. What do we get? Well he has a gut feeling that some MT approach would work were MT to attend to the problem. Jeff presents a SPECIFIC case where he argues that data is very very sparse and the learning is both robust and well documented. So, he presents a challenge. The reply seems to be the kind that Jeff always gets: well it's very theory internal and you'd be surprised at what MT can do. Discussion of the actual cases? Nope. Payoff on the hunches? No. Recommendation: yup, talk to someone in MT and your problems will be laid to rest because these guys are oh soo good! Color me skeptical, and disappointed.
2. I disagree with both Colin and Chris that the problems are bound up in "meaty theory-internal constructs." WRONG!!! The problems aren't, the SOLUTIONS proposed are. The problem as described by Jeff has NO theory internal constructs, just a range of observations concerning how sentences are judged and interpreted. The data is the data and it is not theory internal. So, Colin and Chris are just wrong here. Of course, the solutions ARE theoretically colored, as one would expect from an explanation. Explanations are ALWAYS theoretically wrapped. That's what makes something an explanation. Aren't MT explanations theoretically loaded. They sure look like they are from where I sit.
3. Contrary to Colin, I don't believe that the kinds of case Jeff discusses needs much more than the postulation of the innate learning constraint. This is not the kind of case that is near and dear to his heart; one where there is lots of linguistic variation. There is very little linguistic variation in the case Jeff discusses (actually, I believe none). In this kind of case the solution to the acquisition problem comes in the form of a hard constraint on admissible grammars. Period.

The general reply to the innateness strategy Jeff adopts (rightly IMO) is that this is not an "interesting" way to solve the problem for it removes the problem by assuming an innate constraint that just whisks it away. Just so. It does. There is, of course, another question; a minimalist one. Is this innate constraint one that we want? But that's not an argument that it does not solve the learnability problem (it does) but that it fails on other grounds, e.g. Darwin Problem grounds, another concern. It behooves us to distinguish these issues if we are to make progress. I am not implying that Colin confuses them, but that they are often confused.
4. It seems to me from Chris's comments that Jeff will be forever disappointed. It is worth asking why. I believe that the answer is sociological more than conceptual. MT is a big fish. Linguistics is not. Therefore, linguistic arguments are seen as thorns in the lion's paw, a nuisance that must be removed not addressed. Prestige plays a huge role in the sciences, and almost never for the good. That said, all that we can do is keep throwing down the gauntlet and hope that the failure to engage the empirics finally catches up. Here's praying.
ReplyDelete
Replies
Colin PhillipsNovember 24, 2014 at 8:26 AM
Norbert: I think that you are completely misunderstanding what Chris and I were saying. More later.
ReplyDelete
Replies
Alex ClarkNovember 24, 2014 at 11:25 AM
Chris says "You can't say whether the input is impoverished for some linguistic phenomenon without saying how said phenomenon is coded--and what the learning algorithm is." I think this is the key problem with POS arguments as they are currently put forward.
Every learning algorithm (except for a rote learner) goes beyond the data in some way. Every learner generalises and that means it generalise to new examples that do not occur in the data. So the data is always impoverished.

Suppose a child has heard "very funny" and "very very funny" but has never heard "very very very funny", but correctly determines that it is grammatical. We have a mini POS argument. There simply are no examples of "very^3" in the data and so this must be innately specified. It goes beyond what is in the data.

This isn't an interesting POS of course, but it points towards some innate stucture in the learner. But we can see easily, I hope, that in this case the innate structure can be quite simple because we can see quite easily why a superficial domain general learner, working only with strings, can make this leap.

So the real debate for me is about what Pullum and Scholz call the indispensability part of the argument -- the claim that in order to learn fact X we need to see examples of type Y.
So Jeff makes this move -- in the phrase where he says "we didn’t find a single example of a wh-phrase containing a reflexive pronoun, a non-reflexive pronoun or a name."
But this is like saying (allow me my hyperbole here) that "we didn't find a single example of "very^3 adjective".

So the claim here is that you need to see an example of sentences like
"Which pictures of John, which pictures of himself, which pictures of him" or so on, in order to learn that these are grammatical and to learn the syntactic facts. But why do we think this? This assumption only makes sense if we think of very superficial learners, as Colin observes. Deeper learners that work with more abstract representations, over tree structures, say the derivation trees of MCFGs may make generalisations that seem quite radical. But how can we tell what they are without studying learning algorithms for these rich grammar classes?

So what if the learner has not heard any complex wh-phrases.
The learner has heard phrases like "which cakes do you like" and "the cakes that Mummy baked" and "John takes lots of pictures of himself" and "which pictures does John like?"
but has never heard "Which cakes that Mummy baked do you like?", or "which pictures of himself does John like?"
So it has to generalise. The question is what directions does it generalise in? And why?
And do we have any reason to think that this is something domain-specific or does it arise out of the interaction of two or three very general constraints? And finally, since none of us have the answers, what is the best way of putting our collective heads together to solve these problems?

(PS I think it is best as Jeff does to think of the POS argument not as an argument -- with a definite conclusion -- but as a question: how on earth do we learn phenomenon X from data Y?)

ReplyDelete
Replies
Tim HunterNovember 24, 2014 at 3:38 PM
An aside that has nothing to do with the POS: the argument for the PISH based on the conjoinability of active and passive VPs is cute, but do we have any independent evidence that A-movement is subject to the CSC? (I've been wondering about this for a long time and this is the closest thing I've found to an appropriate time and place to ask about it. Hope it doesn't derail things.)
ReplyDelete
Replies
Colin PhillipsNovember 24, 2014 at 9:16 PM
Back on the topic at hand (sorry, Tim) ...

Norbert is right (see above) that I care more about the learning problems where there is cross-language variation. Because those are the ones where we have some work to do. (Folks need jobs, you know.) The ones that can be solved simply by appealing to innate constraints interest me less, because they shift the burden to the evolution problem ("Darwin's Problem"), and I have sworn to not attempt to say anything about evolution until I'm at least 55.

[Terminological aside: Norbert should distinguish MT (his comment) from ML (Chris's comment). MT is what Google struggles with when they give me garbled translations from Japanese. ML is what Google struggles with when they send Norbert and me ads for hair care products.]

Norbert's spirited reaction is out of proportion to what Chris or I said above. I was not objecting to Jeff's challenge, and certainly not complaining that it should be dismissed because it depends on theory-internal constructs. I don't think Chris was either. [In fact, if I'm not mistaking who Chris is, the one co-authored study featuring Chris, Jeff, and me one that tests in adults exactly the reconstruction paradigm that Jeff discusses here for children (Omaki et al. 2007).] What I take Chris to be saying is not "meh, ML can solve this", but rather "sure, successful learners require a lot of detailed stuff; MLers don't even regard that as controversial." My point, inspired by Chris, is that POS arguments tend to be shorthand for model comparison arguments. When we say, you can't (directly) learn X from the available evidence, we're implicitly assuming a certain kind of superficial encoding and learning model. And when we say, "so you need to assume additional structure in the learner", we're effectively saying that there's an alternative encoding and learning model that makes the observed outcome straightforward. This is not trivializing Jeff's challenge. It's placing it in the context of difficult problems that learning theorists tackle all the time.

And MLers are generally motivated by getting the job done, and don't mind building detailed stuff into the learner if it will achieve the desired result. Their models will adopt as much innate structure as you want, if it will ensure that the hair product ads go to Jeff rather than to Norbert and me.
ReplyDelete
Replies
UnknownNovember 26, 2014 at 8:19 AM
Jeff asks if there is evidence that supports PISH that can lead to its acquisition. He goes through three pieces of evidence. Wouldn't a fourth be existential sentences with 'there', as in 'there is a dog in our backyard', with the subject occurring after the finite verb? I imagine that such 'there' sentences can be found in speech to children.
ReplyDelete
Replies
UnknownNovember 26, 2014 at 8:52 AM
Kratzer and Diesing showed, though, that the contrast is not really one of grammatical category but of the stage/individual distinction. Thus, we have the grammatical 'there are firemen available' or even 'there are stars visible tonight' but not '*there are firemen intelligent'.

We could even imagine that the child, once he or she hears the EC, generalizes to all cases. So one shot of hearing an EC is enough to generalize to all categories, even if we find that there are no cases in the speech to young children of EC constructions to children. Then some other principle of the grammar rules out 'there are firemen intelligent'.

I'm not arguing so much against POS in general as against the idea that the PISH is not something that there is little/no evidence for in the primary data. That said, I haven't looked. But these ECs,even if they are mostly found with a certain category in the speech to children, does give something that supports entertaining that hypothesis. How the child takes it from there, though, is another story.
ReplyDelete
Replies
OmerNovember 26, 2014 at 9:21 AM
Part of the discussion here has centered on a question of the form, "is PISH vacuous?" -- itself understood as something of the form "does PISH expand/restrict the set of sound-meaning pairs available without PISH?"

[Though note that, in most of the comments, it is presupposed that if PISH does anything, it will be on the restrict side, not the expand side.]

So, as several commentators have pointed out, that question cannot be answered without saying "expand/restrict in comparison to what." In that vein, I'd like to make two comments:

(i) PISH restricts the set of possible sound-meaning pairs even in comparison to a grammar where binders, operators, and quantifiers take scope strictly according to their surface positions. Forgetting for a moment that such a grammar would be a nonstarter for other reasons (e.g. Jeff's (3a)), such a grammar predicts that (2d) would be a fine sound-meaning pairing, and PISH eliminates that possibility (due to insisting on a predicate-internal binder, that 'himself' would then pick up as its closest antecedent; yes, that is an ancillary assumption, but show me a theory of 'himself' that doesn't assume something like this). So: restricts.

(ii) PISH restricts the set of possible sound-meaning pairs even in comparison to a grammar where binders, operators, and quantifiers can take scope wherever they please. Forgetting for a moment that such a grammar, too, would be a nonstarter (see, e.g., Jeff's (1b,d), (3c)), such a grammar predicts that (2d) would also have a reading where 'Norbert' and 'Ellen' each chose, on a whim, to take scope in their surface positions. Meaning there exists a derivation that predicts (2d) to be a licit sound-meaning pairing, and PISH eliminates this possibility (in the same manner described in (i)). So: restricts.

Would one of the people (Alex C, Greg K) with the "hunch" that PISH does not affect the set of licit sound-meaning pairs -- and is therefore vacuous -- please explain what the X is such that "PISH does not expand/restrict the set of sound-meaning pairs made available by the PISH-less X"?

Thanks in advance.

ReplyDelete
Replies
UnknownNovember 26, 2014 at 11:40 AM
As a thought experiment I tried to pick out all the assumptions of the argument that POS-sceptics might disagree with. There's a lot of them --- if you're already opposed to POS (or anything stronger than the claim that blank slate learning is impossible), you've got a rich number of supposed deal breakers to choose from. They fall into three broad groups: 1) disagreement about the object of study, 2) disagreement about the technical machinery, 3) disagreement about the goals of the theory.

An example of point 1 is the assumption of a categorical split between grammatical and ungrammatical and that the violation of a single principle induces ungrammaticality. In a probabilistic framework, it might easily be the case that the well-formed sentences also violate some binding principle, but there's other factors that keep them above the ungrammaticality threshold.

Point 2 is rather obvious, it's what most of the discussion so far has focused on. What are our assumptions about binding, phrase structure, movement, the mapping between strings and meanings, etc.

Point 3 is best exemplified by the following passage:

the best solution on the table is the one in which PISH is innate. This solution is the best because it explains with a single mechanism the pattern of facts in (2), the ambiguity of (3), the interpretive properties of (5) and its German counterparts, and the grammaticality of (6).

This is a linguist's notion of best solution: one explanation for many superficially different phenomena. An engineer, on the other hand, will worry whether this solution requires a more powerful formalism than if these phenomena were treated in isolation. Stating generalizations is expensive; MGs, for example, can generate copies even if they don't have copy movement, but only in a very round-about way that is difficult to decypher. With copy movement, it becomes trivial to refer to copies, but the formalism also becomes much more powerful. And more powerful formalisms also have higher resource demands, so a psychologist might be similarly disinclined to sacrifice psychological feasiblity for scientific elegance.
ReplyDelete
Replies
Colin PhillipsNovember 26, 2014 at 1:29 PM
I’m surprised at Thomas’s response. So I suspect that I must be misunderstanding something.

#1. Replacing categorical good/bad with continuous acceptability scales does not make the problem any easier, as far as I can see. For what it’s worth, we tested the facts in Jeff’s paradigm in (2), and the ratings are uncommonly categorical (Omaki, Dyer, Malhotra, Sprouse, Lidz, & Phillips, 2007).

2a. 5.69
2b. 5.67
2c. 5.39
2d. 2.31
(1-to-7 rating scale).

#2. The problem doesn’t depend heavily on the technical machinery. You can be squeamish about the proposed solutions to a problem, but surely that doesn’t make the problem evaporate. (If only we could make all of our problems go away so easily.)

#3. One can say, “I don’t find that solution very satisfying”, but in the absence of an alternative solution, it perhaps defaults to being the best solution. Again, dislike of a solution to a problem does not make the problem itself disappear, does it?

But I think that Thomas’s formulation does accurately characterize some of the reactions that one encounters in this domain, where the rhetorical strategy is, roughly, “If I can pooh pooh your solution, or make your problem sound even harder, then perhaps I can persuade you that the problem doesn’t exist."
ReplyDelete
Replies
Paul PietroskiNovember 26, 2014 at 5:18 PM
This comment has been removed by the author.
ReplyDelete
Replies
Alexander WilliamsNovember 27, 2014 at 9:47 AM
Just so it doesn't get lost in the fray, and because I believe that there is far less disagreement in this dispute than there would seem to be, I'd like to highlight Greg's comment in response to Thomas's translation procedure:

"What is being given up in the move to noPISH is also the general idea about how binding conditions should be stated, although it may be the case that (SEM_PISH . NoPISH^{-1}) can be deforested into something elegant."

This comment, I think, has an important and correct presupposition. In Thomas's translation there will be some homologue to the effects created by the interaction of PISH and the standard Binding Theory, tuned to operate within a grammar that doesn't make as much use of A-movement as PISH grammars do (maybe it instead uses Function Composition). Greg now asks the important question of whether this homologue will be as nice, for whatever reason, as the version that Jeff presumes. So the battle for niceness is on. But in the spirit of the day, let us for now pause to give thanks that we have not one but two ways, in principle, of getting exactly the effect we're interested in, one sort of the upside-down version of the other.

This shows us that it is not so much the PISH that matters but whatever aspects of the grammar implement the association of arguments with predicates, and the way these interact with the BT. The Huang suggestion, described by Jeff, was just the way one makes this claim in the 'mainstream' transformational tradition (where predicates and arguments don't wait to mate, etc.). And now we can talk about a generalization (i.e., abstraction) of this suggestion, via Thomas's translation, approved by Greg. No? Am I missing something?

And now, importantly, it seems to me that even this generalization (which comes in PISH and PISHless variants), call it the Thanksgiving Theory or TT, is a worthwhile hypothesis about acquisition. At the relevant stage of the acquisition, kids 'presume' TT.

This does still leave an important question, which I believe Alex cares about. Namely, how much of the work done by the LAD has the form of (let's just call them) substantive constraints, versus either the sorts of things that effect (weak?) generative capacity, versus strategies of induction, versus whatever else. But nobody is denying that it's important to figure that out, I think.
ReplyDelete
Replies

Faculty of Language

Comments

Sunday, November 23, 2014

There's no poverty of the stimulus? PISH.

73 comments:

Contributors