Sunday, November 23, 2014

There's no poverty of the stimulus? PISH.


Lately, I’ve been worried that many people – mostly psychologists, but also philosophers, computer scientists and even linguists, do not appreciate the argument from the poverty of the stimulus. They simply do not see what this argument is supposed to show. This difficulty leads to skepticism about the poverty of the stimulus and about generative syntax more generally, which, in turn, interferes with progress towards solving the problem of how children learn language.
 
An argument from the poverty of the stimulus is based on two observations: (a) there exists a generalization about the grammar of some language and (b) the learner’s experience does not provide sufficient data to support that generalization over a range of alternative generalizations. These two observations support the conclusion that something other than experience must be responsible for the true generalization that holds of the speakers of the language in question. This conclusion invites hypotheses. Typically, these hypotheses have come in the form of innate constraints on linguistic representations, though nothing stops a theorist from proposing alternative sources for the relevant generalization.
 
But a common response to this argument is that we just didn’t try hard enough. The generalization really is supported in the data, but we just didn’t see the data in the right way. If we only understood a little more about how learners build up their representations of the data, then we would see how the data really does contain the relevant generalization. So, armed with this little bit of skepticism, one can blithely assert that there is no poverty of the stimulus problem based only on the belief that if we linguists just worked a little harder, the problem would dissipate. But skepticism is neither a counter argument nor a counter proposal.
 
So, I’d like to issue the following challenge. I will show a poverty of the stimulus argument that is not too complicated. I will then show how I looked for the relevant data in the environment and conclude that it really wasn’t there. I will then invite all takers (from whatever field) to show that the correct generalization and none of the alternatives really is available in the data. If someone shows that the relevant data really was there, then I will concede that there was no poverty of the stimulus argument for that phenomenon. Indeed, I will celebrate, because that discovery will represent progress that all students of the human language faculty will recognize as such. Nobody requires that every fact of language derives from innate knowledge; learning which ones do and which ones don’t sounds like progress. And with that kind of progress, I’d be more than happy to repeat the exercise until we discover some general principles.
 
But, if the poverty of the stimulus is not overturned for this case, then we can take that failure as a recognition that the problem is real and that the way forward in studying the human language faculty is by asking about what property of the learner makes the environmental data evidentiary for building a grammar.
 
With that preamble out of the way, let’s begin. Consider the judgments in 1-2, which Leddon and Lidz (2006) show with experimentally collected data are reliable in adult speakers of English:

(1) a.     Norbert remembered that Ellen painted a picture of herself
      b.  * Norbert remembered that Ellen painted a picture of himself
      c.     Norbert remembered that Ellen was very proud of herself
      d.  * Norbert remembered that Ellen was very proud of himself

(2) a.     Norbert remembered which picture of herself Ellen painted
      b.    Norbert remembered which picture of himself Ellen painted
      c.    Norbert remembered how proud of herself Ellen was
      d. * Norbert remembered how proud of himself Ellen was

The facts in (1) illustrate a very simple generalization: a reflexive pronoun must take its antecedent in the domain of the closest subject. In all of (1a-d) only Ellen can be the antecedent of the reflexive. Let us assume (perhaps falsely) that this generalization is supported by the learner’s experience and that there is no poverty of the stimulus problem associated with it.
 
The facts in (2) do not obviously fit our generalization about reflexive pronouns. If we take “closest subject” to be the main clause subject, then we would expect only (b) and (d) to be grammatical. If we take “closest subject” to be the embedded subject, then we expect only (a) and (c) to be grammatical. And, if we take “closest subject” to be underspecified in these cases, then we expect all of (a-d) to be grammatical. So, something’s gotta give. What we need is for the “closest” subject to be Ellen in (c-d), but not (a-b). And, we need closest subject to be underspecified in (a-b) but not (c-d). We’ll get back to a way to do that in a moment.
 
But first we should see how these patterns relate to the poverty of the stimulus. Leddon and Lidz (2006) also showed that sentences like those in (2) are unattested in speech to children. While we didn’t do a search of every sentence that any child ever heard, we did examine 10,000 wh-questions in CHILDES and we didn’t find a single example of a wh-phrase containing a reflexive pronoun, a non-reflexive pronoun or a name. So, there really is no data to generalize from. Whatever we come to know about these sentences, it must be a generalization beyond the data of experience.
 
One might complain, fairly, that 10,000 wh-questions is not that many and that if we had looked at a bigger corpus we might have found some with the relevant properties. We did search Google for strings containing wh-phrases like those in (2) and the only hits we got were example sentences from linguistics papers. This gives us some confidence that our estimate of the experience of children is accurate.
 
If these estimates are correct, the data of experience appears to be compatible with many generalizations, varying in whether Norbert, Ellen or both are possible antecedents in the (a-b) cases, the (c-d) cases or both.  With these possibilities, there are 8 possible patterns. But out of these eight, all English speakers acquire the same one. Something must be responsible for this uniformity. That is the extent of the argument. It doesn’t really have a conclusion, except that something must be responsible for the pattern. The argument is merely the identification of a mystery, inviting hypotheses that explain it.
 
Here’s a solution that is based on prior knowledge, due to Huang (1993). The first part of the solution is that we maintain our generalization about reflexives: reflexives must find their antecedent in the domain of the nearest subject. The second part capitalizes on the difference between (2a-b), in which the wh-phrase is an argument of the lower verb, and (2c-d), in which the wh-phrase is the lower predicate itself. In (2a-b), the domain of the nearest subject is underspecified. If we calculate it in terms of the “base position” of the wh-phrase, then the embedded subject is the nearest subject and so only Ellen can be the antecedent. If we calculate it in terms of the “surface position” of the wh-phrase, then the matrix subject is the nearest subject. For (2c-d), however, the closest subject is the same, independent of whether we interpret the wh-phrase in its "base" or "surface" position. This calculation of closest subject follows from the Predicate Internal Subject Hypothesis (PISH): The predicate carries information about its subject wherever it goes. Because of PISH, the wh-phrase [how proud of himself/herself] contains an unpronounced residue of the embedded subject and so is really represented as [how Ellen proud of himself/herself]. This residue (despite not being pronounced) counts as the nearest subject for the reflexive, no matter where the predicate occurs. Thus, the reflexive must be bound within that domain and Ellen is the only possible antecedent for that reflexive. So, as long as the learner knows the PISH, then the pattern of facts in (2) follows deductively. The learner requires no experience with sentences like (2) in order to reach the correct generalization.

Now, this argument only says that the learner must know that the predicate carries information about its subject with it in the syntax prior to encountering sentences like (2). It doesn’t yet require that knowledge to be innate. So, the poverty of the stimulus problem posed by (2) shifts to the problem of determining whether subjects are generated predicate internally.

Our next question is whether we have independent support for PISH and whether the data that supports PISH can also lead to its acquisition. I can think of several important patterns of facts that argue in favor of PISH. The first (due, I believe, to Jim McCloskey) concerns the relative scope of negation and a universal quantifier in subject position. Consider the following sentences:
 
(3) a. Every horse didn’t jump over the fence
      b. A fiat is not necessarily a reliable car
      c. A fiat is necessarily not a reliable car

The important thing to notice about these sentences is that (3a) is ambiguous but that neither (3b) nor (3c) is. (3a) can be interpreted as making a strong claim that none of the horses jumped over the fence or a weaker claim that not all of them jumped.  This ambiguity concerns the scope of negation. Does the negation apply to something that includes the universal or not? If it does, then we get the weak reading that not all horses jumped. If it does not, then we get the strong reading that none of them did.
 
How does this scope ambiguity arise? The case where the subject takes scope over negation is straightforward if we assume (uncontroversially) that scope can be read directly off of the hierarchical structure of the sentence. But what about the reading where negation takes wide scope? We can consider two possibilities. First, it might be that the negation can take the whole sentence in its scope even if it does not occur at the left edge of the sentence.  But this possibility is shown to be false by the lack of ambiguity in (3c). If negation could simply take wide scope over the entire sentence independent of its syntactic position, then we would expect (3c) to be ambiguous, contrary to fact. (3c) just can’t mean what (3b) does. The second possibility is PISH: the structure of (3a) is really (4), with the struck-out copy of every horse representing the unpronounced residue of the subject-predicate relation:
 
(4) every horse didn’t [every horse] jump over the fence

Given that there are two positions for every horse in the representation, we can interpret negation as either taking scope relative to either the higher one or the lower one.
 
Is there evidence in speech to children concerning the ambiguity of (3a)? If there is, then that might count as evidence that they could use to learn PISH and hence solve the poverty of the stimulus problem associated with (2).  Here we run into two difficulties. First, Gennari and MacDonald (2005) show that these sentences do not occur in speech to children (and are pretty rare in speech between adults). Second, when we present such sentences to preschoolers, they appear to be relatively deaf to their ambiguity. Julien Musolino and I have written extensively on this topic and the take away message from those papers is (i) that children’s grammars can generate the wide-scope negation interpretation of sentences like (3a), but (ii), it takes a lot of either pragmatic or priming effort to get that interpretation to reveal itself. So, even if such sentences did occur in speech to children, their dominant interpretation from the children’s perspective is the one where the subject scopes over negation (even when that interpretation is not consistent with the context or the intentions of the speaker) and so this potential evidence is unlikely to be perceived as evidence of PISH. And if PISH is not learned from that, then we are left with a mystery of how it comes to be responsible for the pattern of facts in (2).
 
A second argument (due to Molly Diesing) in favor of PISH concerns the interpretation of bare plural subjects, like in (5):
 
(5) Linguists are available (to argue with)

This sentence is ambiguous between a generic and an existential reading of the bare plural subject. Under the generic reading, it is a general property of linguists (as a whole) that they are available. Under the existential reading, there are some linguists who are available at the moment.
 
Diesing observes that these two interpretations are associated with different syntactic positions in German. The generic interpretation requires the subject to be outside of the verb phrase. The existential interpretation requires it to be inside the verb phrase (providing evidence for the availability of the predicate-internal position crosslinguistically). So, Diesing argues that we can capture a cross-linguistic generalization about the interpretations of bare plural subjects by positing that the same mapping between position and interpretation occurs in English. The difference is that in English, the existential interpretation is associated with the unpronounced residue of the subject inside the predicate. This is not exactly evidence in favor of PISH, but PISH allows us to link the German and English facts together in a way that PISH-less theory would not. So we could take it as evidence for PISH.
 
Now, this one is a bit trickier to think about when it comes to acquisition. Should learners take evidence of existential interpretations of bare plural subjects to be evidence of PISH? Maybe, if they already know something about how positions relate to interpretations. But in the end, the issue is moot because Sneed (2007) showed that in speech to children, bare plural subjects are uniformly used with the generic interpretation. How children come to know about the existential readings is itself a poverty of the stimulus argument (and one that could be solved by antecedent knowledge of PISH and the rules for mapping from syntactic position to semantic interpretation). So, if we think that the facts in (2) follow from PISH, then we still need a source for PISH in speech to children.
 
The final argument that I can think of in favor of PISH comes from Jane Grimshaw. She shows that it is possible to coordinate an active and a passive verb phrase:

(6) Norbert insulted some psychologists and was censured

The argument takes advantage of three independent generalizations. First, passives involve a relation between the surface subject and the object position of the passive verb, represented here by the invisible residue of Norbert:

(7) Norbert was censured [Norbert]

Second, extraction from one conjunct in a coordinated structure is ungrammatical (Ross’s 1968 Coordinate Structure Constraint):

(8)     * Who did Norbert criticize the book and Jeff insult

Third, extraction from a conjunct is possible as long as the extracted phrase is associated with both conjuncts (Across The Board extraction):

(9) Who did Norbert criticize and Jeff insult

So, if there were no predicate internal subject position in (6), then we would have the representation in (10):

(10) Norbert [VP insulted some psychologists] and [VP was censured [Norbert]]

This representation violates the coordinate structure constraint and so the sentence is predicted to be ungrammatical, contrary to fact. However, if there is a predicate internal subject position, then the sentence can be represented as an across the board extraction:

(11) Norbert [VP [Norbert] insulted some psychologists] and [VP was censured [Norbert]]

So, we can understand the grammaticality of (6) straightforwardly if it has the representation in (11), as required by PISH.
 
Do sentences like (6) occur in speech to children? I don’t know of any evidence about this, but I also don’t think it matters. It doesn’t matter because if the learner encountered (6), that datum would support either PISH or the conclusion that movement out of one conjunct in a coordinate structure is grammatical (i.e, that the coordinate structure constraint does not hold). If there is a way of determining that the learner should draw the PISH conclusion and not the other one, I don’t know what it is. 
 
So, there’s a potential avenue for the stimulus-poverty-skeptic to show that the pattern in (2) follows from the data. First show that data like (6) occurs at a reasonable rate in speech to children, whatever reasonable means. Then show how the coordinate structure constraint can be acquired. Then build a model showing how putting (6) together with an already acquired coordinate structure constraint will lead to the postulation of PISH and not to the discarding of the coordinate structure constraint. And if that project succeeds, it will be party time; we will have made serious progress on solving a poverty of the stimulus problem.
 
But for the moment, the best solution on the table is the one in which PISH is innate. This solution is the best because it explains with a single mechanism the pattern of facts in (2), the ambiguity of (3), the interpretive properties of (5) and its German counterparts, and the grammaticality of (6). And it explains how each of these can be acquired in the absence of direct positive evidence. Once learners figure out what subjects and predicates look like in their language, these empirical properties will follow deductively because the learners will have been forced by their innate endowment to build PISH-compatible representations.
 
One final note. I am confident that no stimulus-poverty-skeptics will change their views on the basis of this post (if any of them even see it). And it is not my intention to get them to. Rather, I am offering an invitation to work on the kinds of problems that poverty of the stimulus arguments raise. It is highly likely that the analyses I have presented are incorrect and that scientists with different explanatory tastes would follow different routes to a solution. But we will all have a lot more fun if we engage at least some of the same kinds of problems and do not deny that there are problems to solve. The charge that we haven’t looked hard enough to find out how the data really is evidentiary is hereby dismissed. But if there are stimulus-poverty-skeptics who want to disagree about something real, linguists are available.

Jeff Lidz, November 23, 2014.

73 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. I have noticed that as I've moved further toward thinking about learning as a machine learning (ML) person and further away from thinking about learning as a linguist, I have begun to find poverty of the stimulus arguments progressively more difficult to grasp. (nb. I also don't see anything like these arguments in ML--perhaps because none of us can even imagine working with anything but impoverished data, except as theoretical constructs.) To me, a POS argument is necessarily bound up in lots of meaty, theory-internal constructs. You can't say whether the input is impoverished for some linguistic phenomenon without saying how said phenomenon is coded--and what the learning algorithm is. Thus, POS doesn't really have any content beyond specific hypotheses about properties of grammars and their learning algorithms, which are the big open questions of linguistic research! So, the "ultimate POS confirmation/refutation" will be a matter of accounting when all is said and done. (Btw, I say this as someone who periodically has dinner with an avowed member of the anti-POS camp, but the arguments somehow don't really stick in my ML-addled head.) In all, I think linguists should spend more time talking to ML theorists (not the language ones, just the ML theorists--some of them will drive you crazy, but the smart ones will be interesting allies).

    ReplyDelete
    Replies
    1. If you're finding POS arguments hard to grasp, then it's because they are so obvious to you that you forget how they work (sort of like having a spot on your glasses that you can't see). The first day of any ML class begins with the idea that there is no learning without an inductive bias. And, it doesn't take much longer to get to the lesson that there is no such thing as a universal inductive bias. All of this is just the ML guy's way of saying that there is always a poverty of the stimulus. So, of course, any solution to a learning problem must have a representation of the input (the intake), a space of possible answers (the inductive bias), and a method of linking one to the other (the learning mechanism). These are *solutions* to the poverty of the stimulus. The solutions I'm talking about above are just statements of the inductive bias of the learner. What a good POS argument does is merely to direct attention towards something worthy of study. Nothing more, nothing less.

      Delete
  3. Two remarks on this.

    1. I think that a big part of the problem is that discussion of POS arguments tends to devolve into fights about subject-auxiliary inversion and structure dependence. Which do not get us very far. We need to focus on more knotty cases, exactly what Jeff does here (yay!).

    2. Chris is exactly right that a POS argument is bound up in "meaty, theory-internal constructs". A typical POS argument implicitly says not "the input is impoverished relative to all encodings and learning algorithms". It says "the input is impoverished relative to an (implicit) superficial encoding and learning algorithm". The argument then proceeds to say if we assume some more specific encoding of the input and some more specific learning algorithm, then the input is no longer impoverished at all. (Well, we hope that the second part is included, though that is often left as a promissory note.) Part of the problem, then is that the implicit "simple learning model" is almost never spelled out, making it a bit of a moving target.

    When people say "if only you would work harder, you'd see how to learn from the input", the answer should be "that's exactly what our work is about." Of course, it's debatable whether the work of many linguists is indeed focused on that question.

    ReplyDelete
  4. I also have several remarks:
    1. It is depressing to see Chris respond to Jeff's case in this way. What do we get? Well he has a gut feeling that some MT approach would work were MT to attend to the problem. Jeff presents a SPECIFIC case where he argues that data is very very sparse and the learning is both robust and well documented. So, he presents a challenge. The reply seems to be the kind that Jeff always gets: well it's very theory internal and you'd be surprised at what MT can do. Discussion of the actual cases? Nope. Payoff on the hunches? No. Recommendation: yup, talk to someone in MT and your problems will be laid to rest because these guys are oh soo good! Color me skeptical, and disappointed.
    2. I disagree with both Colin and Chris that the problems are bound up in "meaty theory-internal constructs." WRONG!!! The problems aren't, the SOLUTIONS proposed are. The problem as described by Jeff has NO theory internal constructs, just a range of observations concerning how sentences are judged and interpreted. The data is the data and it is not theory internal. So, Colin and Chris are just wrong here. Of course, the solutions ARE theoretically colored, as one would expect from an explanation. Explanations are ALWAYS theoretically wrapped. That's what makes something an explanation. Aren't MT explanations theoretically loaded. They sure look like they are from where I sit.
    3. Contrary to Colin, I don't believe that the kinds of case Jeff discusses needs much more than the postulation of the innate learning constraint. This is not the kind of case that is near and dear to his heart; one where there is lots of linguistic variation. There is very little linguistic variation in the case Jeff discusses (actually, I believe none). In this kind of case the solution to the acquisition problem comes in the form of a hard constraint on admissible grammars. Period.

    The general reply to the innateness strategy Jeff adopts (rightly IMO) is that this is not an "interesting" way to solve the problem for it removes the problem by assuming an innate constraint that just whisks it away. Just so. It does. There is, of course, another question; a minimalist one. Is this innate constraint one that we want? But that's not an argument that it does not solve the learnability problem (it does) but that it fails on other grounds, e.g. Darwin Problem grounds, another concern. It behooves us to distinguish these issues if we are to make progress. I am not implying that Colin confuses them, but that they are often confused.
    4. It seems to me from Chris's comments that Jeff will be forever disappointed. It is worth asking why. I believe that the answer is sociological more than conceptual. MT is a big fish. Linguistics is not. Therefore, linguistic arguments are seen as thorns in the lion's paw, a nuisance that must be removed not addressed. Prestige plays a huge role in the sciences, and almost never for the good. That said, all that we can do is keep throwing down the gauntlet and hope that the failure to engage the empirics finally catches up. Here's praying.

    ReplyDelete
  5. Norbert: I think that you are completely misunderstanding what Chris and I were saying. More later.

    ReplyDelete
  6. Chris says "You can't say whether the input is impoverished for some linguistic phenomenon without saying how said phenomenon is coded--and what the learning algorithm is." I think this is the key problem with POS arguments as they are currently put forward.
    Every learning algorithm (except for a rote learner) goes beyond the data in some way. Every learner generalises and that means it generalise to new examples that do not occur in the data. So the data is always impoverished.

    Suppose a child has heard "very funny" and "very very funny" but has never heard "very very very funny", but correctly determines that it is grammatical. We have a mini POS argument. There simply are no examples of "very^3" in the data and so this must be innately specified. It goes beyond what is in the data.



    This isn't an interesting POS of course, but it points towards some innate stucture in the learner. But we can see easily, I hope, that in this case the innate structure can be quite simple because we can see quite easily why a superficial domain general learner, working only with strings, can make this leap.


    So the real debate for me is about what Pullum and Scholz call the indispensability part of the argument -- the claim that in order to learn fact X we need to see examples of type Y.
    So Jeff makes this move -- in the phrase where he says "we didn’t find a single example of a wh-phrase containing a reflexive pronoun, a non-reflexive pronoun or a name."
    But this is like saying (allow me my hyperbole here) that "we didn't find a single example of "very^3 adjective".

    So the claim here is that you need to see an example of sentences like
    "Which pictures of John, which pictures of himself, which pictures of him" or so on, in order to learn that these are grammatical and to learn the syntactic facts. But why do we think this? This assumption only makes sense if we think of very superficial learners, as Colin observes. Deeper learners that work with more abstract representations, over tree structures, say the derivation trees of MCFGs may make generalisations that seem quite radical. But how can we tell what they are without studying learning algorithms for these rich grammar classes?

    So what if the learner has not heard any complex wh-phrases.
    The learner has heard phrases like "which cakes do you like" and "the cakes that Mummy baked" and "John takes lots of pictures of himself" and "which pictures does John like?"
    but has never heard "Which cakes that Mummy baked do you like?", or "which pictures of himself does John like?"
    So it has to generalise. The question is what directions does it generalise in? And why?
    And do we have any reason to think that this is something domain-specific or does it arise out of the interaction of two or three very general constraints? And finally, since none of us have the answers, what is the best way of putting our collective heads together to solve these problems?

    (PS I think it is best as Jeff does to think of the POS argument not as an argument -- with a definite conclusion -- but as a question: how on earth do we learn phenomenon X from data Y?)



    ReplyDelete
    Replies
    1. Exactly right. If you can specify what the generalization function is that goes from (a) simple wh questions and (b) declaratives with reflexives to the distinctions that I show in the paper, then you will have answered the poverty of the stimulus problem for this phenomenon. My answer is that one way of writing that generalization function is to restrict it so that it must encode PISH. So, please, offer me an alternative. I take it that what you mean by asking "how can we tell what they are without studying the learning algorithms for these rich grammar classes?" is that it is possible to build a learning algorithm that will have PISH as a consequence without building PISH into the constraints on what a learning algorithm is. If that's the case, my invitation stands.

      And in answer to your final question "what's the best way of putting our collective heads together to solve some of these problems", I think the answer is obvious: we make specific proposals that solve specific problems. If someone finds a specific proposal unsatisfactory, then they offer an alternative and show that the alternative is more satisfactory and has at least as good empirical coverage. Just like in any other domain of scientific inquiry, analyses are disproven by better analyses.

      Delete
    2. From my perspective -- and you may disagree -- your proposal is not specific enough as it does not contain a specification of some learning algorithm that could exploit the restricted hypothesis class of grammars. So I don't think it solves the problem of how we acquire e.g. reconstruction effects. In other words, I don't see how to restrict the generalisation function so that it encodes PISH.

      One specific problem is that it isn't clear to me whether PISH is vacuous or not. Is there an example of an E-language (a set of strings, or a set of string/meaning pairs) that is ruled out by PISH? I guess it is restrictive as a theory of I-language, but if it isn't restrictive as a claim about E-language then it doesn't help learnability.

      Delete
    3. I'm not understanding your objection. If the only possible grammars are those in which PISH holds, then when learners build a representation for sentences like "norbert wondered how proud of herself ellen was", the only possible representation would be one in which the invisible residue of the subject is contained in the moved wh-phrase. A grammar that is ruled out by PISH is one in which that wasn't the case. For example, suppose subjects were generated in their surface positions. Then, the representation for the sentence above would not have a copy of the embedded subject in the wh-phrase and we would predict that the wh-predicates would show the samme pattern as the wh-arguments.

      SImilarly, if there were no PISH, then (assuming that the position of negation is fixed) we would not be able to generate the inverse scope interpretation of the "every horse didn't jump" sentences.

      So, clearly PISH is not vacuous.

      Delete
    4. I'll add two conciliatory notes, though. First, I am quite certain that it is above my pay grade to actually make a machine do this. But I am also pretty sure that someone with your skills would be able to code this up. My impression from working with computational modelers is that if you have a clear enough idea, they can figure out a way to implement it. So, the implementation issue doesn't seem to me to be a problem, but that's speaking from a pretty ignorant vantage point.

      Second, while PISH predicts that subjects can reconstruct under negation (and hence get low scope), there are languages where a universal quantifier in subject position cannot take scope under negation (e.g., Chinese, Korean, and many others). I'm pretty sure that the correct analysis of those languages is not that they do not exhibit PISH, but rather that something else makes the inverse scope reading impossible. But here this is a point for investigation. Note, though, that if we were to take the view that the lack of inverse scope in some language is evidence for the lack of PISH in that language, then that would make the learning problem worse because now learners would have to learn PISH and there doesn't seem to be any data in the experience of the learner that would make that possible (as discussed in the post).

      Delete
    5. Whether PISH is vacuous or not depends on a lot of other assumptions that are not spelled out -- perhaps because they are too obvious to you and to other readers of this blog. For example, do you assume that there is some fixed set of innate syntactic categories and many other restrictions on movement etc etc?

      Under some reasonable assumptions the class of languages with or without PISH is the same -- the class of languages defined by Minimalist Grammars/MCFGs etc. So adding PISH doesn't really solve anything unless there is a specified learning algorithm that can navigate the space of possible options in some way. For example it probably would be the case, even if the classes of languages are the same with and without PISH, that adding PISH changes the size of the grammar, making some size based learning algorithms more likely to generalize in one way rather than another. And that might be enough of an explanation. But I have no idea whether that is true or not, and establishing it would rely on a precise specification of the class of grammars that is not available at the moment.
      So I can see that there might be decent explanations that make use of an innate constraint like PISH, but just saying "PISH is innate" doesn't solve the learning problem and it does create a bunch of Darwin type problems.

      Delete
    6. I'm not sure I understand what you're saying here at all, so forgive me if I respond in a way that doesn't make sense to you. But, I'm not making any assumptions at all about classes of grammars, generative capacity, etc. I'm saying that grammars with PISH are easily distinguishable from grammars without PISH. That is, if you hold everything else constant (whatever everything else happens to be), then it is simple to tell the difference between grammars incorporating PISH and those that do not (presumably this is true even if the generative capacity of these grammars is the same). Hence, not vacuous. For sure, you are right that saying "PISH is innate" doesn't solve the learning problem, if what you mean is explain everything about language acquisition. What it does do is explain a certain pattern of facts and put a very strong constraint on what an explicit learning model should be like (in this case, it is not at all obvious to me what the added value of making that explicit learning model is; where an explicit learning model is helpful is when you are trying to model the selection/construction of a particular grammar from the space of possible grammars). As for the Darwin problem, I think the strategy for approaching that ought to be to find some explicit solutions to particular instances of Plato's problem and then see how we can derive those explicit solutions from something that would have them as a consequence (which is what Norbert's version of the Minimalist agenda is). But as Norbert would say, if we derive PISH from something else, that doesn't make PISH any less explanatory or any less of a solution to Plato's problem. Solutions to Darwin's problem will encompass the solutions to Plato's problem that they are meant to derive. If they do not, then they are not really solutions. So, I'll leave it to the minimalist syntacticians to solve Darwin's problem. I'll just keep reminding us of what some instances of Plato's problem and their solutions look like so that they can keep their eye on the ball.

      Delete
    7. Thanks for your patience, Jeff! My argument or observation rests on the distinction between restricting a set of grammars versus restricting the set of languages defined by those grammars. So for example, we can consider CFGs which generated the class of CFLs, and we can considered the restricted class of CFGs with at most two symbols on the right hand side of a production (a bit like in Chomsky normal form); these define exactly the same class of languages as the unrestricted class. So as a restriction on the class of grammars this is not vacuous but as a restriction on the class of languages it is vacuous. So my query is whether PISH is vacuous in this sense. If it is then it doesn't solve the learnability problem.

      Delete
    8. Ok, but if the languages are form/meaning pairs, then clearly they define different languages.

      Delete
    9. In Alex's context-free example, both grammar classes define the same sound-meaning pairings. (Assuming just a standard interpretation scheme using the lambda calculus.) The larger class contains the smaller one, and so clearly defines a superset of the sound-meaning pairings of the latter. To see that it defines exactly these, note that we can turn any grammar in the unrestricted class into one in the restricted form in a regular way (binarizing the RHSs of rules, adding new rules and non-terminals as need be, etc). If to each rule is associated a lambda term, this binarization process can be applied to the lambda terms as well.

      It is not obvious that the PISH restricts the definable sound-meaning pairs. (Or in comparison to what alternative analysis.) I would conjecture not.

      Delete
    10. Let's suppose that PISH is vacuous in this sense (I also think that it probably is), then on its own claiming that PISH is an innate constraint on the class of *grammars* does not count as a solution to this particular POS problem.

      To move forward you either need to specify a learning algorithm of some sort, or to tighten up the ancillary assumptions a lot. But that seems problematic: the past attempts at this don't give one much cause for optimism.

      Delete
    11. @Jeff: I think Alex is just saying that with a bit of creativity you could take a PISH grammar and make it do exactly the things you don't want. Or, put another way, when you say "holding everything else constant" it will probably depend a lot on what you're holding constant. I'm less imaginative but at the very least you would have to say what a predicate is and what a subject is and spell them out pretty nicely.

      Delete
    12. Ok, so here's what I'm understanding at this moment. Sometimes one is surprised to learn that certain constraints do not restrict the weak generative capacity of a system. And, in my example, I'm assuming not just PISH but a whole lot of other stuff about how grammars can work. I get that and those seem like reasonable concerns. But, here's my question is a simpler form. If a sentential predicate travels with its subject, how can you generate a string where a reflexive within a predicate is NOT bound by the subject? You can't (by hypothesis). So, if the string "proud of himself" is always classified as a sentential predicate, it can never occur as a substring in a string whose interpretation has something other than the subject binding "himself". This is not a fancy grammatical argument and it does not seem to me to depend on much else. So, the only question you guys can be asking is whether it is fair to presume that "proud of himself" is recognized as a sentential predicate, hence subject to the PISH-related generalization. Is that it?

      Delete
    13. Or to say it a different way, you're saying that a grammar in which "proud of himself" is not a sentential predicate and does not always travel with its subject can be made to behave like a grammar in which it is and it does.

      That is, what grammar in which PISH does not hold can make it so that "Norbert knows how proud of himself Ellen is" is bad?

      Can you show me how to do that (in a way that is still compositional)?

      Delete
    14. @Jeff: I haven't had time to fully think through all the issues, but here's an informal sketch of a procedure that takes a grammar with PISH and translates it into one that is both strongly equivalent modulo PISH-traces and compositional (if by compositional you mean that the semantics is computed from the syntactic tree in a bottom-up fashion without look-ahead). The idea is actually very simple and just relies on the fact that you can replace certain instances of movement by base-merger while keeping the generated language and interpretations fixed:

      1) Define a function NoPISH that maps every derivation tree t to its PISH-free counterpart t' such that if phrase p is merged in a PISH-configuration in t and moves to SpecXP, p is directly merged in SpecXP in t'. Your grammar is specified by the set of derivation trees thus obtained.
      2) You already have a function SEM_PISH that computes the semantics for the PISH-trees. For the PISH-free counterparts, the semantic interpretation SEM_NoPISH is the composition of the inverse of NoPISH (the translation back into PISH-trees) and SEM_PISH.

      Without further restrictions on what grammars and PISH-configurations we consider, the PISH-free grammar might belong to a much more powerful class of grammars, and the same is true for SEM_NoPISH. But the case at hand is simple enough that this can be avoided, I think:

      1) MGs can be used to implement the PISH-based analysis you provide.
      2) Every MG's derivation tree language belongs to a specific class of tree languages, the regular tree languages.
      3) The function NoPISH preserves regularity as long as the predicate from which the phrase is moved does not move on its own after the phrase has been extracted (remnant movement). That condition is satisfied in your examples.
      4) MG derivation tree languages are closed under intersection with regular tree languages, so we can construct an MG lexicon with all the needed lexical items and intersect the grammar's derivation tree language with the output of NoPISH to obtain the desired PISH-free MG.
      5) The mapping from derivation trees to LF-trees is definable in monadic second-order logic (MSO).
      6) The inverse of an MSO-mapping is an MSO-mapping.
      7) MSO-mappings are closed under composition.

      The shakiest property is point 5, which depends a lot on what you want LF-trees to look like, in particular with respect to QR. But that is independent of whether your grammar uses PISH.

      Also keep in mind that this procedure actually achieves a lot more than what you asked for because it also keeps the generated trees almost exactly the same. If it is just about grammaticality and preserving the sound-meaning mappings, then things become a lot easier.

      Delete
    15. Correction: (2c) involves remnant movement, contrary to my earlier claim in 3), but since it is extremely local it still does not pose a problem.

      Delete
    16. @Thomas: This is also your recipe for dealing with (a case of) late operations (?); it looks reasonable to me!

      This is commensurate with Alex's original observation that the PISH may very well (despite being a normal form) change important properties of the allowable grammars; in your construction sketch, the unPISHed grammar can be much bigger than the original.

      What is being given up in the move to noPISH is also the general idea about how binding conditions should be stated, although it may be the case that (SEM_PISH . NoPISH^{-1}) can be deforested into something elegant.

      Delete
    17. Of course PISH is vacuous in the sense that you can get rid of it if you also change a bunch of other stuff and have no regard for the simplicity/elegance of the resulting system. Pretty much any substantive universal will be vacuous by that criterion. I think that you guys are not objecting to PISH as such but to any attempt to assign an explanatory role to substantive universals in language acquisition. Or perhaps not, in which case I'd be interested to see an example of a non-vacuous (in your sense) substantive universal.

      Delete
    18. @Greg: The idea is similar, yes, except that the Late Merge argument turns Merge into a particular kind of movement, and this one turns movement into Merge.

      @AlexD: I don't think anybody's directly objecting to PISH but rather trying to figure out how the individual assumptions interact. You're right that the grammar above is very inelegant, Greg's remark about binding theory also homes in on this. But as I mentioned in my other post further down not everybody cares about elegance quite as much.

      Jeff's post starts out lamenting that certain groups of researchers do not appreciate POS arguments. Well here's one reason why: they see a viable alternative (or at least it looks like one to them) that's still good enough for their purposes and only loses those lofty properties we linguists care about, i.e. elegance, succinctness, and generality with respect to specific constructions. Some may even view that shortcoming as a virtue because "biological systems are messy and redundant". Others might just not care about the specific structural descriptions all that much because a state-of-the-art learning algorithm will automatically infer structures that work well enough for their narrowly restricted purposes (machine translation, data mining, etc.). We may not like that, we may consider them short-sighted, we may be bewildered by their pragmatism, but a more useful reaction is to evaluate how much of the argument can be stated without the linguistic backdrop.

      As for your second point about the importance of substantive universals, here's one that would have strong effects: the set of movement features consists of wh, case, and top. Now MGs can only generate a proper subclass of the MCFLs. Restrictions of this kind are hugely important for learnability. For example, the class of strictly local string languages is not learnable in the limit, but the subclass of grammars where the size of the locality domain is bounded by some k is learnable.

      Delete
    19. @Thomas: Do you see a viable alternative to Jeff’s account? I’m sure we all agree that PISH as such need not be involved (there are many possible ways of rejigging the grammar), but what would be a substantially different alternative analysis that would cast doubt on the existence of a similarly rich innate structure? I don’t see how giving up the lofty goals you mention is going to make it easier to solve the problem in this case.

      Regarding your suggestion for a non-vacuous substantive universal, the one you suggest only barely counts as substantive. The labels in themselves are meaningless, so all you are really proposing is a restriction to three movement features. I’m not sure if that falls clearly into the formal or substantive category, but in any case, it’s not all that similar to the sorts of formal universals that linguists typically propose.

      Delete
    20. *to the sorts of substantive universals

      Delete
    21. @Alex D: I knew you would class that as a formal example :)

      I'd be inclined to agree, but at the same time the prototypical example of a substantive universal is the restriction to specific categories (V, N, ...). So the same should go for movement features. One might say that the example only works because the substantive universal implies a formal one, but what kind of substantive universal doesn't? Phases being tied to C, v, and D still implies the concept of phases as a formal universal. Even PISH implies something universal about the grammar, namely that there is a mechanism for identifying subjects of predicates.

      Maybe the argument is that these also restrict the grammar beyond the proposed formal requirement? That wouldn't invalidate my argument above, though: if there's only specific types of movement features that can only be hosted by specific heads, that also rules out certain kinds of movement that would work with the basic restriction to n movement features. So overall, I share your feeling that the example above is very sneaky, but I can't think of a principled reason that would disqualify it.

      Turning to the issue of alternatives, I addressed that in my little thought experiment below. It doesn't matter whether there actually is a viable alternative, the issue is that many people believe they have an alternative and wouldn't be convinced by Jeff's argument because they can cast the whole argument into doubt by rejecting one of its premisses.

      Delete
    22. As far as I know, the formal/substantive distinction is not precisely defined, so I have no objection to classifying the restriction to wh foc and top as formal. The thing is I'm having a hard time thinking of a substantive universal (that has actually been proposed independently by linguists) that meets your criteria for non-vacuity. In other words, I think the objection that you and others are making to Jeff's use of PISH is not specific to PISH but would apply to any other extant hypothesized substantive universal. Now you may be right to raise such a wide-ranging objection, but I think the wide range of the objection is worth noting.

      Delete
    23. Yes, I agree. But doesn't that worry people? If they are all vacuous (in this sense) then the evidential base for them must be entirely based on meta-theoretic criteria like simplicity and elegance (as Thomas notes). And why should we think that the real grammars are simple or elegant ? the methodological reasons don't apply as these are natural biological objects not scientific theories. Is the genome simple or elegant? It has 4 bases rather than 2.
      I worry that the metatheoretical criteria that syntacticians uses to construct theories are not going to point to the psychologically real grammars.

      Delete
    24. @Alex C: I'm not sure if the issue of simplicity and elegance is all that relevant to the POS. If it's actually PISH's ugly sibling that's doing the work, the POS argument still goes through mutadis mutandis. Inevitably, the evidence for principle X of the grammar being innate can be at most as strong as the evidence for principle X indeed being a principle of the grammar. Now if we are just going to be skeptics about every grammatical principle that generative syntacticians (think they) have discovered over the past 60 years, I don't know where to go from there.

      Delete
    25. I don't understand that. If PISH is vacuous, then Jeff's proposed solution is not a solution. Because there are grammars in the class that will generalise in the wrong way from the data in question.

      I think there are two reasonable ways to answer this problem: one (which we can call the P & P way) is to say, PISH is not vacuous if we make additional assumptions X, Y and Z, and given these assumptions there are just no grammars in the class of possible grammars that can get the facts wrong in the relevant way, and therefore this is a valid explanation even without specifying a learning algorithm.

      The other way (the Aspects way) is to say: sure, PISH is vacuous, but the grammars without PISH are much larger (or rated worse on this evaluation metric) than the ones with PISH and so, given a learning algorithm that uses this particular evaluation metric we can still explain how the learner generalises in this way rather than the other.

      Delete
  7. An aside that has nothing to do with the POS: the argument for the PISH based on the conjoinability of active and passive VPs is cute, but do we have any independent evidence that A-movement is subject to the CSC? (I've been wondering about this for a long time and this is the closest thing I've found to an appropriate time and place to ask about it. Hope it doesn't derail things.)

    ReplyDelete
    Replies
    1. Considrer:
      (1) It seems to be likely that Jeff is tall and that Norbert is handsome
      This cannot mean that it seems to be likely that Jeff is tall and it seems that Norbert is handsome. Thus, the second conjunct must be interpreted as complement of 'likely' rather than that of 'seem.' Why? Well the CSC would explain why if A-movement were subject to it.
      Note that (2) seems fine:
      (2) John expected Bill to win the Preakness and that Frank would win the Kentucky Derby.
      If this is so, then the Vs can take both finite and non-finite complements, thus pre-empting one possible objection to the cases in (1).
      How's this?

      Delete
    2. One of the first papers to take this up that I know of is:

      Vivian Lin. "A way to undo A-movement". WCCFL 20 Proceedings, ed. K. Megerdoomian and L. A. Bar- el, pp. 358–371. Somerville, MA: Cascadilla Press.

      https://web.archive.org/web/20100613180814/http://ling.wisc.edu/vlin/Downloads/LinAMove.pdf

      -- a very interesting paper too. (She developed the work a bit more in her dissertation: http://dspace.mit.edu/handle/1721.1/8151)

      Delete
    3. Well, first, on certain raising-to-object-type assumptions the fact that (2) is OK looks like evidence that A-movement is *not* subject to the CSC, doesn't it?

      But let's just take it as given that you can conjoin finite and non-finite complements. What (1) tests, then, is whether you can have an expletive raising out of the first (non-finite) conjunct, and no raising out of the second (finite) conjunct. I don't have a clear judgement on (1), but can we make it simpler by replacing "to be likely that Jeff is tall" with "to rain"?
      (3) It seems to rain and that Norbert is handsome.
      This actually seems not too bad, at least to me. Certainly not as bad as a usual CSC violation. Hmmm.

      Delete
  8. Back on the topic at hand (sorry, Tim) ...

    Norbert is right (see above) that I care more about the learning problems where there is cross-language variation. Because those are the ones where we have some work to do. (Folks need jobs, you know.) The ones that can be solved simply by appealing to innate constraints interest me less, because they shift the burden to the evolution problem ("Darwin's Problem"), and I have sworn to not attempt to say anything about evolution until I'm at least 55.

    [Terminological aside: Norbert should distinguish MT (his comment) from ML (Chris's comment). MT is what Google struggles with when they give me garbled translations from Japanese. ML is what Google struggles with when they send Norbert and me ads for hair care products.]

    Norbert's spirited reaction is out of proportion to what Chris or I said above. I was not objecting to Jeff's challenge, and certainly not complaining that it should be dismissed because it depends on theory-internal constructs. I don't think Chris was either. [In fact, if I'm not mistaking who Chris is, the one co-authored study featuring Chris, Jeff, and me one that tests in adults exactly the reconstruction paradigm that Jeff discusses here for children (Omaki et al. 2007).] What I take Chris to be saying is not "meh, ML can solve this", but rather "sure, successful learners require a lot of detailed stuff; MLers don't even regard that as controversial." My point, inspired by Chris, is that POS arguments tend to be shorthand for model comparison arguments. When we say, you can't (directly) learn X from the available evidence, we're implicitly assuming a certain kind of superficial encoding and learning model. And when we say, "so you need to assume additional structure in the learner", we're effectively saying that there's an alternative encoding and learning model that makes the observed outcome straightforward. This is not trivializing Jeff's challenge. It's placing it in the context of difficult problems that learning theorists tackle all the time.

    And MLers are generally motivated by getting the job done, and don't mind building detailed stuff into the learner if it will achieve the desired result. Their models will adopt as much innate structure as you want, if it will ensure that the hair product ads go to Jeff rather than to Norbert and me.

    ReplyDelete
    Replies
    1. Colin,
      I'm pretty sure that you didn't intend to endorse the following three presuppositions of your first paragraph. But I'd like to disagree with them anyway.

      (1) The classic POS arguments that address invariant constraints are also "learning problems." Perhaps some universals are universally learned. But I see no reason for thinking that the usual suspects are learned, unless we stipulate that all knowledge acquisition counts as learning, so long as experience was somehow causally relevant.

      (2) We, whoever we are, don't have work to do with regard to invariant constraints--at least not work that would count as, you know, a job--because those "learning problems" can be "solved simply by appealing to innate constraints."

      (3) Recognizing that innately determined constraints are innately determined "shifts the burden" to discussions of evolution--inviting just so stories--as opposed to attempts to reduce the diverse constraints to a more principled basis, partly in an attempt to see if the constraints can be described as reflections of deeper architecture constraints, as in evo-devo (as opposed to neo-Darwinian) biology.

      Some of us, even some of us not yet 55, still think that the classic POS arguments--even the bad old ones involving aux inversion--provide a wonderful set of explananda for anyone trying to figure out which theoretical vocabulary is best for purposes of describing the particular I-languages that are among the humanly acquirable I-languages. I have nothing against attempts to ask how kids who grow up surrounded by people who speak idiolects of English end up acquiring idiolects of English. Some of my best friends do that kind of thing for a living. But if we all agree that addressing specific acquisition questions requires substantive assumptions about the vocabulary that kids use to formulate grammars and characterize the "data of experience," then I think it's super important to NOT downplay the POS arguments that address invariant constraints. Until we know to how formulate grammars in a way that makes the relevant space of options available--without also making a much larger space available--we're not going to formulate grammars in the ways that kids do in response to experience.

      Delete
    2. First, thx: I did mean ML. I am T/L colorblind.
      Second, ah "context"! Great, I love contextualizers. If I understand you correctly, what you are saying is that there is no reason for Chris's avowed skepticism concerning POS arguments as these are just part and parcel of what they always do. So the worry that one cannot deal with POS problems without solving all of linguistic theory is just the same problem that any ML projects has: after all you cannot piecemeal solve any ML problem until you have solved them all. Does anyone really think this? In any domain? It's all or nothing? You are beginning to sound like Alex C here.

      POS problems are interesting for they manage to carve out problems in manageable bits. You can address a circumscribed (relatively speaking) problem and deal with it. What needs building in to address binding? Given the results of this what needs to be built in to solve PISH-like data? The utility if POS is that it can be used to progressively deepen our understanding of FL, as Jeff has demonstrated in one example. But as we all know there are endlessly many of these around.

      As for you preference in problems; there is no accounting for taste. But I personally believe that addressing YOUR problems will require solving some of mine. The principles that Jeff is hunting look to be invariant. They form a syntactic scaffolding for the ones that you are interested in. Take your that-t effect research with Dustin. It assumes that fixed subject effects are invariant and then hunts for ways to circumscribe these in Rizzisih ways. That is a well known recipe, based on invariant UG principles which POS considerations helped discover.

      Let's end with some pablum: POS arguments are empirical subject to all the vagaries thereof. ML models are as well. I'm glad that MLers are not put off by sparse/absent data issues. I just wish they would address problems that I care about rather than worrying about which hair products I ought to buy.

      Delete
    3. I think that separating out learning problems into those where there is cross-linguistic variation and those where there isn't is a false distinction that presupposes some analyses. Learning problems that involve cross-linguistic variation are problems where we already have some understanding of the contraints on possible grammars. In many cases (as Norbert notes) the discovery of the constraints on possible grammar grows out of the initial discovery that there is a POS problem. Solutions to POS problems can be fundamentally "principle" based, i.e., the postulation of an invariant. Or they can be "parameter" based, i.e,. the postulation of constrained variation. When you're in the domain of constrained variation, then the learning problems are better described as "Opacity of the Stimulus" problems, in that one has to discover which out of a constrained space of grammars is the actual grammar, which is nontrivial and certainly worth working on. But to be interested in Opacity of the Stimulus over Poverty of the Stimulus just means that you'd rather work on problems where some of the solutions to the POS have been figured out.

      Delete
    4. Norbert,
      I’m still puzzled at your reaction. I think you must be seeing something in what I wrote that I did not intend. I think that carving out model problems of the kind that Jeff raises is great, and that making the problem and the solution as explicit as possible is helpful. I guess I’ll leave it at that.

      Paul,
      thanks for raising those issues and thereby giving me a chance to clarify. I owe an apology to folks whose labors I may have trivialized.

      #1: That’s right. I don’t think that the invariances are learned, and I make a fuss about this in Psycholinguistics II each spring. Any account that assumes that the invariances are always learned needs to explain why learning doesn’t sometimes fail. I don’t know of particularly good answers to that challenge.

      #2: Yes, I certainly don’t think that identifying invariances is trivial or worthless. (I’m not sure that I presupposed that either. Asserting that “it’ll take a lot of work to figure out Y” doesn’t entail that “there’s no (useful) work in figuring out X … though I see that is a plausible inference.) I do think that the problems tend to invite different kinds of work.

      #3: Yes and no. You’re right that reducing diverse invariances to deeper underlying constraints is valuable and interesting. At it’s best, this kind of work is extremely important. But in practice it is easy to slide into just so stories, and the good stuff can be hard to sort out from the not so good stuff. (The presupposition that you’re reacting to here is one that Norbert introduced into the thread.)

      As for the value of classic arguments involving subject-aux inversion (SAI) and the like. I’m afraid I do believe that they haven’t been terribly successful. Despite the efforts of many smart folks who I hold in high regard, colleagues included. What was introduced as a simple model case seems to have been interpreted as a grand challenge. This has apparently led some to assume that if they can solve SAI, then a solution to the entire language learning problem is just around the corner (it’s not, of course). It has shone a light on a domain that practicing linguists tend regard as so straightforward as to be uninteresting. So it has drawn learning theorists' attention away from the harder problems. And it has placed the fight in a domain that Joe Linguist just doesn’t get too exercised about, because it doesn’t touch on his day-to-day concerns. That’s why I find Jeff’s challenge more promising.

      Delete
    5. And then there’s the question of taste. It’s true that I personally find the learning problems involving cross-language variation more interesting, much as I acknowledge the importance of the other problems (and I agree with Jeff that they’re not as different as this discussion implies). But this thread has pushed me to think about why I like the problems of variation more. Here’s a try.

      a. Degree of confidence in the problem. When we take a series of invariances and try to reduce them to deeper principles, there might be a great unifying solution, or there might not. The world might turn out to be kind of messy. In contrast, when I encounter hard-to-observe properties that vary across languages (or idiolects), then I feel pretty confident that there’s an interesting solution, even if I don’t yet have a clue about what it is. Can constraints on movement and anaphora ultimately be reduced to the same thing? Maybe, maybe not. Even when looking at concrete proposals, it’s hard to assess their success. If Japanese and English show different possibilities for scope inversion, is there some way that they figure this out from their experience? I’m pretty confident that there is, and I can fairly easily tell whether a solution works.

      b. Productive surprises. Across different domains, I tend to get exercised by surprising contrasts. Perhaps because I suspect that understanding them is likely to be particularly informative. The observation that the same experiments were eliciting brain activity in quite different regions, depending on whether you’re measuring using fMRI or MEG was quite surprising, and it turned out to be hugely informative (Lau et al. 2008). The observation that people systematically screw up in computing subject-verb agreement but seem not to do the same thing when linking reflexives to subjects was a surprise, but it turned out to be the entry point into a goldmine (Dillon et al. 2013, and others). Cross-language variation in scope and island effects appeal to me for similar reasons.

      c. I’ve got to admit, have “languistic” tendencies, to use Norbert’s term of endearment. Interest in the learning of variant properties harnesses those dangerous tendencies.

      d. Related to (c), one of my biggest disappointments is the decline of interest from linguists in learning problems, which has been accompanied by a loss of focus on understanding constraints on cross-language variation. (Yes, there’s a lot of work on the topic, but it’s hard to identify what would count as real surprises.) My hunch is that reconnecting these two problems would be mutually highly beneficial, and would help to provide more focus for the study of language variation. My sense is that the study of invariances hasn’t encountered similar challenges.

      So it’s a matter of taste. But also a matter of placing bets on where to dig for gold.

      Delete
    6. Ewan:
      How much of this is vein driven by the ceteris paribus clause that all non-math work in the sciences relies on? Yes, there is more than PISH. There is also the binding theory and the theory of phrase structure to name two kinds of assumptions that are being held constant. Is the claim simply that were we to change these we could derive the *cases even given PISH? Is that the problem? You know why I'm asking right? Because if that is the problem, this is something that goes way beyond the infant domain of linguistics. Gravity does not "explain" elliptical orbits or how balls roll down inclined planes or the tides without a lot of ceteris being held paribus. In fact, absent a final theory, NOTHING goes forward without such assumptions. Is this really what the criticism amounts to? If so, might one not conclude that such criticism is, ahem, both uninteresting and very unhelpful?

      The way we try to explore issues, especially in the special sciences, is by comparing purported explanations of effects. Jeff provided one "sketch" that can be made more precise as required. The way to show it is on the wrong track is to provide another story, not the promissory note for one, but a particular one. We can then look to see how (or if) they fundamentally differ or if one is better than another on empirical or other grounds. That is what both Alex C and Greg K haven't yet provided. In my experience, the repeated claims that there are many different ways of explaining the "facts" has generally turned out to be incorrect. Most stories look pretty much alike, despite superficial looking notational differences. So, coming out with an alternative is useful.

      Last point: POS arguments are NEVER dispositive. But this is just to say that science outside math does not truck in proofs. That's what makes it an empirical science, a place where ceteris is always extremely paribus.

      Delete
    7. I completely agree, Norbert, that it depends on some ancillary assumptions as everything does. I am asking what those ancillary assumptions are, because on one standard set of assumptions (i.e. Stablerian MGs) the result doesn't go through (I think .. I am not sure). A physicist if you ask him/her about elliptical orbits can give you the list of ancillary assumptions he/she needs -- point masses, inverse square law, Newton's laws of motion err no other forces and that's it. They aren't a closely guarded secret only released on a need to know basis to those people who are putting forward other alternative theories.

      Delete
    8. re Newton: these are necessary but even given these assumptions, the conclusion need not follow. Depends what else one is excluding. Say, absence of large masses within the vicinity of the orbit, or space not being homogeneous (flat) etc. There are an unbounded number of such CP clauses that are taken as background until being shown to be possibly relevant. So, that's your job: show how to derive these results given PISH and other assumptions. I should add that you will be judged on how plausible these assumptions are.

      But to reply more concretely: we assume a standard kind of X'-ohrase structure, and the binding theory. Jeff's is taking as stable a kind of GBish theory: Binding at SS, X'-PS rules, WH movement to Spec CP. This still does not suffice to distinguish the two cases he discusses. Nor given all of this plus the available PLD is there any way of distinguishing them. Add PISH and things work out. Furthermore, there is independent evidence that PISH is a good idea. Lastly, there is some reasons for thinking that PISH is derivable from more basic assumptions (see Understanding Minimalism for a discussion).

      So that's the sketch of the argument. Now, given my views, if this argument cannot be reconstructed within MG that's a problems as much for MG as it is for PISH. However, given that MG says nothing about these cases, it's hard to know what to say. THere are versions of Minimalist Program theories that do have things to say consistent with Jeff's PISH argument (e.g. see A Theory of Syntax) but MGs (at least Stabler's discussions) do not deal with these kind of data (though Kobele's thesis does go some way of translating these results into MG formats). So there are, I would hazard to guess, ways of doing this MGishly.

      But, and here I end, there is nothing sacred about MGs. They are attempts to formalize theories with certain properties. If they fail to do so, we can ask if the MG is a good formalization of these theories or not OR we can ask whether we should re-analyze the data. Making the assumptions relatively explicit is not very hard. I've done so above. If you think that there is some MG or GBish grammar that given PISH still gets the "bad" cases please feel free to provide these for inspection. We can then see if the claims are correct.

      Delete
    9. Colin,

      Thanks for the clarifications. I’d like to offer another perspective, though, on SAI and other simple illustrations of constraints that human I-languages respect. You say, “As for the value of classic arguments involving subject-aux inversion (SAI) and the like. I’m afraid I do believe that they haven’t been terribly successful.” I think it depends on what counts as success.

      Suppose that indeed, “What was introduced as a simple model case seems to have been interpreted as a grand challenge. This has apparently led some to assume that if they can solve SAI, then a solution to the entire language learning problem is just around the corner...” I don’t think—and don't think you think—that the value of an argument lies with how people react to it, especially not if the reaction involves misunderstanding.

      I think the classic POS arguments were quite successful as models of how to focus on relatively clear phenomena—concerning how strings of words cannot be understood—that call for explanation. If such phenomena could be explained as side effects (spandrels, whatevers) of how kids respond to experience, I would find that very interesting, in part because I suspect the explanation will take a more Rationalistic form: blame the cognitive architecture that makes learnable hypotheses available, as opposed to processes of using the architecture to learn. Put another way, if the relevant constraints are learned, then POS arguments—of the sort I like—fail to identify aspects of knowledge that are not due to learning; in which case, I’d need to rethink a lot. (Put yet another way: while I’m unimpressed by the skeptical point that POS arguments might fail, I’d be very impressed by a plausible account of how the constraints in question are learned. For that would make me think that POS arguments do fail, not just in the sense of failing to convince, but as tools for getting at what they are supposed to get at.) However, if the constraints are not learned, then the classic POS arguments are interesting/successful as illustrations of how to identify explananda that call for a “non-learning” explanation.

      You speak of shining a light “on a domain that practicing linguists tend to regard as so straightforward as to be uninteresting.” I’m not sure who counts as practicing linguists, or how to assess their various tendencies. But if many who know the basic facts find the phenomena/constraints “so straightforward as to be uninteresting”—rather than (relatively) straightforward and very interesting—that’s just sad. In other domains of inquiry, finding generalizations that seem about right is cause for celebration, in part because we can then ask why the world works that way.

      For many purposes, physicists can take the law of gravity as given and get on with other questions. Still, it was good to be puzzled about how gravity could “act” at a distance, and why it is described with an inverse square law. (Trying to unify forces was also fruitful.) I agree that “learning theorists”—are they the practicing linguists?—can bracket questions about where invariant constraints come from, for purposes of addressing questions about how kids navigate the space of human I-languages given experience. But that which learning theorists presuppose can still be an important explanandum. It's not all about learning.

      I wouldn't recommend unlearned constraints as a topic for those who want to study what kids learn; though someone who wants to know what kids learn might care about the constraints. As for what Joe Linguist gets exercised about, I’d say that Joe ought to be puzzled about the source of unlearned universal principles: if Joe posits them blithely, leaving it to others to explain where they come from, then Joe should buy a beer for those who do the work invited by Joe’s posits. As many others have noted, POS arguments are basically tools for figuring out which hard work lies with coming up with the right learning algorithms, and which hard work lies elsewhere. But I agree that useful tools can be misused.

      Delete
  9. Jeff asks if there is evidence that supports PISH that can lead to its acquisition. He goes through three pieces of evidence. Wouldn't a fourth be existential sentences with 'there', as in 'there is a dog in our backyard', with the subject occurring after the finite verb? I imagine that such 'there' sentences can be found in speech to children.

    ReplyDelete
    Replies
    1. Nice point. But note, these are pretty poor with adjectival predicates: *there is a dog angry at Bill, sad about the loss of the Bruins. So the ECs are fine with PP predicates and verbal ones but not adjectives. The absence of these in the PLD given some basis Bayesian magic could lead the LAD to conclude that PISH does not hold for adjectival predicates, hence leading to the wrong conclusion wrt Jeff's data.

      Delete
  10. Kratzer and Diesing showed, though, that the contrast is not really one of grammatical category but of the stage/individual distinction. Thus, we have the grammatical 'there are firemen available' or even 'there are stars visible tonight' but not '*there are firemen intelligent'.

    We could even imagine that the child, once he or she hears the EC, generalizes to all cases. So one shot of hearing an EC is enough to generalize to all categories, even if we find that there are no cases in the speech to young children of EC constructions to children. Then some other principle of the grammar rules out 'there are firemen intelligent'.

    I'm not arguing so much against POS in general as against the idea that the PISH is not something that there is little/no evidence for in the primary data. That said, I haven't looked. But these ECs,even if they are mostly found with a certain category in the speech to children, does give something that supports entertaining that hypothesis. How the child takes it from there, though, is another story.

    ReplyDelete
    Replies
    1. Sorry, I meant 'even if we find that there are no cases in the speech to young children of EC constructions with adjectives'.

      Delete
    2. Reasonable point. If I recall correctly I argued for using ECs as evidence for VP internal subjects (not so called then) in my first paper in 1977. God that's long ago. At any rate, the HK observation goes back to Milsark's original thesis on ECs. The point you make is reasonable, though there still may be room for POS concerns depending on how one generalizes from the PLD. After all, in the cases Jeff cites, the predicates cannot appear in ECs.

      That said let me add one more point. POS is ONE argument form. Even if Jeff is right about the problem it does not mean that PISH is the solution. Others have provided different accounts (e.g. Heycock via reconstruction). I do not mean to suggest that you think otherwise. But others might. POS even when successful is only ONE argument for a proposal. Other proposals might also meet POS demands, and even proposals that do not mean them might have many other things going for them. But having one useful argument form is nothing to sneeze at.

      Delete
  11. Part of the discussion here has centered on a question of the form, "is PISH vacuous?" -- itself understood as something of the form "does PISH expand/restrict the set of sound-meaning pairs available without PISH?"

    [Though note that, in most of the comments, it is presupposed that if PISH does anything, it will be on the restrict side, not the expand side.]

    So, as several commentators have pointed out, that question cannot be answered without saying "expand/restrict in comparison to what." In that vein, I'd like to make two comments:

    (i) PISH restricts the set of possible sound-meaning pairs even in comparison to a grammar where binders, operators, and quantifiers take scope strictly according to their surface positions. Forgetting for a moment that such a grammar would be a nonstarter for other reasons (e.g. Jeff's (3a)), such a grammar predicts that (2d) would be a fine sound-meaning pairing, and PISH eliminates that possibility (due to insisting on a predicate-internal binder, that 'himself' would then pick up as its closest antecedent; yes, that is an ancillary assumption, but show me a theory of 'himself' that doesn't assume something like this). So: restricts.

    (ii) PISH restricts the set of possible sound-meaning pairs even in comparison to a grammar where binders, operators, and quantifiers can take scope wherever they please. Forgetting for a moment that such a grammar, too, would be a nonstarter (see, e.g., Jeff's (1b,d), (3c)), such a grammar predicts that (2d) would also have a reading where 'Norbert' and 'Ellen' each chose, on a whim, to take scope in their surface positions. Meaning there exists a derivation that predicts (2d) to be a licit sound-meaning pairing, and PISH eliminates this possibility (in the same manner described in (i)). So: restricts.

    Would one of the people (Alex C, Greg K) with the "hunch" that PISH does not affect the set of licit sound-meaning pairs -- and is therefore vacuous -- please explain what the X is such that "PISH does not expand/restrict the set of sound-meaning pairs made available by the PISH-less X"?

    Thanks in advance.

    ReplyDelete
    Replies
    1. X is a set of grammars not an individual grammar. So if we have a particular grammar G, and we add PISH or switch it off then this will change the sets of sound/meaning pairs defined. But if we have a set of grammars SG then the set of languages defined may not be changed because of the high degree of flexibility in the grammar formalisms. Indeed there are, under standard assumptions, infinite numbers of different grammars that all define the same set of sound meaning pairs as each other (say those of English). And so ruling out one of these grammars is not enough -- even if it is your favourite grammar -- you need to rule out all of the other infinitely many grammars that may or may not use PISH. That turns out to be really hard.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Alex and Greg, this discussion is scattered across three threads now, but because I want to see how this works I'll prime the pump here. The thing Jeff says is:

      - if the string "proud of himself" is always classified as a sentential predicate [SP], it can never occur [with an interpretation with] other than the subject binding "himself" [because binding is always to the local subject]

      background,
      - there is something called an SP
      - it always has a single unique subject S
      - what S can be (in GB-ish terms) is either:
      - the thing internal to the SP in the specifier position, let's call that position B (i.e., the argument that himself isn't)
      - things that form a chain with B
      - and all this works equally for a reconstructed SP

      You say you couldn't translate these postulates into MGs in such a way that it would actually rule anything out. I say show me why, I'm as surprised as everyone else, but I'll believe you if you show me.

      Here are the examples broken down (I hope I haven't made a mistake here, Jeff keeps implying the picture NP is also an SP, and I don't know enough about syntax to know if that's true, all I remember is people go back and forth about binding into picture NPs - correct me if I'm wrong)

      (1) a/b. "picture of herself" is an SP, Ellen is S (thus Norbert is not S) because Ellen is forms a chain with B(SP)
      (1) c/d. "proud of herself" is an SP, Ellen is S (thus Norbert is not S) because Ellen forms a chain with B(SP)

      (2) a/b. "picture of herself" is an SP; that SP can be wh-reconstructed to painted _; Ellen is S (thus Norbert is not S) because Ellen forms a chain with B in the reconstructed SP
      (2) c/d. "proud of herself" is an SP; that SP can be wh-reconstructed to painted _; Ellen is S (thus Norbert is not S) because Ellen forms a chain with B in the reconstructed SP

      You will need some ancillary assumptions to make sure only Ellen and not Norbert can form a chain with B in both cases. In both, if I remember my syntax right, movement to the Norbert position would be too far for A movement, and I suppose it would also be bad because it's out of a specifier if it were out of the wh-raised SP in (2).

      If it comes down to these ancillary assumptions then a professional can help fill them in better than me. However I don't imagine it's these basic constraints on movement that are the problem. So go slowly and tell us which item (or interaction of items) looks hard to encode in an MG. Everything I see here rules out the offending sound-meaning pairs.

      Delete
    4. Ewan, stick to phonology! "picture of herself" is not a sentential predicate and "proud of herself" is. That's pretty much the whole story. Because "picture of herself" is not a sentential predicate, then when "which pic of herself" is fronted in the embedded clause, it can be interpreted relative to the embedded subject or the matrix subject (allowing either to be the antecedent of the reflexive). "proud of herself" is a sentential predicate, so no matter where "how proud of herself" occurs, it must be related to the subject of that predicate.

      Delete
    5. @AlexC: So wait, the argument is "PISH is vacuous because I can think of some (potentially infinite) class of grammars SG such that the effects of PISH on a particular grammar G is not does not move it from inside SG to outside of SG, or from outside of SG to inside of SG"? [And mind you, I have not yet seen what that SG is. You have alluded to it but not specified it.]

      That seems like a very low bar for calling something "vacuous"...

      Delete
    6. Omer, you are quite right it depends on the class of grammars. For some classes of grammars it will be vacuous and for some others (e.g. classes consisting of only one grammar) it will not be vacuous. Under the standard MG assumptions it does seem to be vacuous. I don't quite know what assumptions Jeff has in mind, or even if they are precise enough to tell whether they make it vacuous or not -- but from my perspective it seems like it is crucially important to Jeff's explanation that it is *not* vacuous.

      Delete
    7. @Omer: If Alex's and my hunch is right, we are saying that the PISH is a normal form for grammars (like chomsky normal form); every grammar would have a PISH-equivalent. A normal form is important, it shows that the full power of the class of grammars can be realized by a syntactically limited class, which can be useful both for understanding (proofs) and for applications (such as parsing or learning).
      Saying that it is a `vacuous' restriction is not an insult (think of string vacuous movement), but rather a precise characterization of what effects it has on the class of describable languages.
      One of the difficulties in knowing whether this is the case (i.e., why it is just a `hunch') is that it is not clear how to formulate the PISH in a general way without making a host of other stipulations. For example, what is a predicate, and what is its subject. While we can settle this in many ways (by, for example, postulating a fixed universal category system, and stipulating that Vs and As (for example) are predicates), having a more general characterization allows us to pinpoint just what it is really doing, and what role it is playing in the grammar. (Otherwise, you are in the situation we are in now, where we say `PISH' but really mean `PISH + ... '.)

      Delete
    8. I bristled at "picture" being a sentential predicate but deferred to the authority of the logic. (In my defense, this was because I misread a star in front of "Norbert remembered which picture of himself Ellen painted" is bad, a judgment which I would have been perfectly happy to believe.)

      But if I understand Greg right, not one single detail turns out to be important, because the hunch is based merely on the fact that you need to stipulate that "proud" is a predicate and "picture" is not - really? Don't you need to do that anyway? How much deeper can you go?

      Delete
  12. As a thought experiment I tried to pick out all the assumptions of the argument that POS-sceptics might disagree with. There's a lot of them --- if you're already opposed to POS (or anything stronger than the claim that blank slate learning is impossible), you've got a rich number of supposed deal breakers to choose from. They fall into three broad groups: 1) disagreement about the object of study, 2) disagreement about the technical machinery, 3) disagreement about the goals of the theory.

    An example of point 1 is the assumption of a categorical split between grammatical and ungrammatical and that the violation of a single principle induces ungrammaticality. In a probabilistic framework, it might easily be the case that the well-formed sentences also violate some binding principle, but there's other factors that keep them above the ungrammaticality threshold.

    Point 2 is rather obvious, it's what most of the discussion so far has focused on. What are our assumptions about binding, phrase structure, movement, the mapping between strings and meanings, etc.

    Point 3 is best exemplified by the following passage:

    the best solution on the table is the one in which PISH is innate. This solution is the best because it explains with a single mechanism the pattern of facts in (2), the ambiguity of (3), the interpretive properties of (5) and its German counterparts, and the grammaticality of (6).

    This is a linguist's notion of best solution: one explanation for many superficially different phenomena. An engineer, on the other hand, will worry whether this solution requires a more powerful formalism than if these phenomena were treated in isolation. Stating generalizations is expensive; MGs, for example, can generate copies even if they don't have copy movement, but only in a very round-about way that is difficult to decypher. With copy movement, it becomes trivial to refer to copies, but the formalism also becomes much more powerful. And more powerful formalisms also have higher resource demands, so a psychologist might be similarly disinclined to sacrifice psychological feasiblity for scientific elegance.

    ReplyDelete
  13. I’m surprised at Thomas’s response. So I suspect that I must be misunderstanding something.

    #1. Replacing categorical good/bad with continuous acceptability scales does not make the problem any easier, as far as I can see. For what it’s worth, we tested the facts in Jeff’s paradigm in (2), and the ratings are uncommonly categorical (Omaki, Dyer, Malhotra, Sprouse, Lidz, & Phillips, 2007).

    2a. 5.69
    2b. 5.67
    2c. 5.39
    2d. 2.31
    (1-to-7 rating scale).

    #2. The problem doesn’t depend heavily on the technical machinery. You can be squeamish about the proposed solutions to a problem, but surely that doesn’t make the problem evaporate. (If only we could make all of our problems go away so easily.)

    #3. One can say, “I don’t find that solution very satisfying”, but in the absence of an alternative solution, it perhaps defaults to being the best solution. Again, dislike of a solution to a problem does not make the problem itself disappear, does it?

    But I think that Thomas’s formulation does accurately characterize some of the reactions that one encounters in this domain, where the rhetorical strategy is, roughly, “If I can pooh pooh your solution, or make your problem sound even harder, then perhaps I can persuade you that the problem doesn’t exist."

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. The point of #1 was that somebody might even reject the framing of the problem as a two-way opposition between grammatical and ungrammatical. Basically, the pattern could be a side-effect of several interacting factors in these sentences so that if we look at other sentences with the same construction but other differences the supposed pattern disappears. For instance, one may wonder how the judgments change if we increase the distance between the antecedent and the reflexive. My hunch is that due to performance limitations the values will get a lot closer, and you can probably also create grammaticality illusions. So if somebody thinks the learning problem is one of matching performance rather than competence, the entire argument goes down the drain immediately.

      Now suppose that we are talking to Dora the computer scientist, and Dora can be convinced to go beyond #1 and accept that the problem we see is one of the grammar showing a pricinipled contrast with a specific construction. Then Dora still has to be convinced that it is a POS problem. Jeff cites evidence that sentences of this form do not occur in child-directed speech, which strictly speaking is enough to qualify it as an instance of POS. But in that form the argument is very weak, because it only disproves blank slate learning, which is already known to be impossible. If that's all we have to say, we'll get an annoyed "duh" from Dora.

      As Alex pointed out, every learning problem is a POS problem. The interesting part is what counts as the relevant evidence. As linguists we think in terms of binding, c-command, licensing, scope, etc, but as a computer scientist, Dora may be inclined to think in simpler terms, e.g. n-gram grammars (#2). If that works it would be a very simple solution and there's very little POS when it comes to n-grams, every sentence actively contributes to the grammar. It's also not obvious that this won't work for the specific problem Jeff discusses. The other problems might need a different explanation, but those might be very different problems --- you don't expect to have one common explanation for headaches and tooth aches (#3).

      So Dora just shrugs and concludes that this problem would be solved already if linguists dropped their baroque theories and just used a much more reliable and versatile tool. And then we have yet another non-linguist who believes exactly what Jeff wanted to debunk: we linguists overstate the importance of POS, the problem is much easier than we make it out to be.

      So overall, I'm not surprised that many non-linguistics are sceptical about the importance and difficulty of the POS-problem, even a well-articulated and carefully constructed argument like the one Jeff presents can easily yield this undesired conclusion.

      Delete
    3. Thomas, could you help us out here? It sounds like you're endorsing the view that you can make the problem disappear simply by vaguely wondering whether (i) the problem has greater complexity, or (ii) whether the correct generalization might somehow fall out of a very simplistic learning algorithm, despite the fact that Jeff has already told you more specific details of what is(n't) in the learner's experience. If you're merely describing a reaction from others, rather than endorsing this yourself, could you suggest ways that we might better convey the argument to the (lazy) skeptics that you're describing?

      If you have specific suggestions on how the paradigm in question might collapse under closer scrutiny, then we'd love to hear about them. The strategy of saying "Well, I've heard that judgments can vary when you manipulate additional factors; so perhaps this is all a hoax" is not entirely constructive.

      Are you really arguing that "Meh, there's nothing to worry about here"? I think either I'm misunderstanding you, or we're highlighting a fundamental difference in perspective.

      Delete
    4. I am definitely *not* endorsing the argument. At the same time, I'm not quite sure how the argument could be adapted to make it more convincing to these people, the scenario I described above is an instance of this, I believe.

      The only way to address that is to give a formal proof that none of these simple ideas work (without a proof you can only debunk specific instances of it, so the other side can always switch to another simple idea), but that's extremely hard (I'd wager too hard for now) because of all the parameters. So if the other said doesn't want to play, no POS argument won't change their mind.

      What might be more successful, though, is to think of reasons why assuming a strong version of POS --- irrespective of whether it is an empirical fact --- would be advantageous. Similar to why we model natural languages as infinite sets even if that isn't necessary in practice. Dora the computer scientist, for instance, should be easily swayed by a demonstration (formal proof, implemented model, etc.) that a highly restricted grammar space with a simple machine learner does better in certain tasks than a sophisticated learner with few priors. That's basically a point Alex C has mentioned several times: it's not obvious that a rich UG simplifies learning. And apparently Charles is working on this right now, so we might have a more convincing argument soon.

      Delete
    5. Great, I totally screwed up the essential part of the first paragraph: I'm not endorsing the sceptic's logic for dismissing the POS, yet at this point I can't offer much to substantially improve on Jeff's argument.

      Delete
    6. So, if I understand the line of argument here, perhaps Dora could be mollified by producing some reasonably rich UG-based systems that actually did manage to learn analyses and predict the properties of plausibly unseen data in ways that are at least prima facie challenging for n-grams, etc, (and not grossly at variance with what is known about linguistic typology), so that even if she were not convinced, she would nevertheless be confronted with a worthy enemy to defeat, by showing that n-grams can actually do the job, rather than just a cloud of propagandistic fluff.

      This is the 'Homeric Hero' theory of science, whereby to achieve immortal fame, you need to defeat a worthy enemy (or at least be defeated by one), rather than just be 'the only game in town'.

      Delete
    7. @Avery: My idea was rather that if it isn't feasible to convince somebody that the problem is actually hard, show them that there are benefits to imposing that handicap on yourself. E.g. that taking POS arguments seriously and studying learners that can succeed under the conditions linguists assume --- irrespective of whether those are actually realistic --- can lead to new algorithms with useful applications somewhere else. But I also like your Homeric Hero interpretation, in that case there doesn't even need to be any practical benefit beyond the challenge. Not sure how well it would work on computer scientists, who tend to be fairly pragmatic (a natural predisposition amplified by an enormous pressure to bring in extramural funding, I'd presume).

      Delete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Just so it doesn't get lost in the fray, and because I believe that there is far less disagreement in this dispute than there would seem to be, I'd like to highlight Greg's comment in response to Thomas's translation procedure:

    "What is being given up in the move to noPISH is also the general idea about how binding conditions should be stated, although it may be the case that (SEM_PISH . NoPISH^{-1}) can be deforested into something elegant."

    This comment, I think, has an important and correct presupposition. In Thomas's translation there will be some homologue to the effects created by the interaction of PISH and the standard Binding Theory, tuned to operate within a grammar that doesn't make as much use of A-movement as PISH grammars do (maybe it instead uses Function Composition). Greg now asks the important question of whether this homologue will be as nice, for whatever reason, as the version that Jeff presumes. So the battle for niceness is on. But in the spirit of the day, let us for now pause to give thanks that we have not one but two ways, in principle, of getting exactly the effect we're interested in, one sort of the upside-down version of the other.

    This shows us that it is not so much the PISH that matters but whatever aspects of the grammar implement the association of arguments with predicates, and the way these interact with the BT. The Huang suggestion, described by Jeff, was just the way one makes this claim in the 'mainstream' transformational tradition (where predicates and arguments don't wait to mate, etc.). And now we can talk about a generalization (i.e., abstraction) of this suggestion, via Thomas's translation, approved by Greg. No? Am I missing something?

    And now, importantly, it seems to me that even this generalization (which comes in PISH and PISHless variants), call it the Thanksgiving Theory or TT, is a worthwhile hypothesis about acquisition. At the relevant stage of the acquisition, kids 'presume' TT.

    This does still leave an important question, which I believe Alex cares about. Namely, how much of the work done by the LAD has the form of (let's just call them) substantive constraints, versus either the sorts of things that effect (weak?) generative capacity, versus strategies of induction, versus whatever else. But nobody is denying that it's important to figure that out, I think.

    ReplyDelete