Comments

Friday, March 14, 2014

Hornstein's lament

Chomsky has long noted the tension between description and explanation. One methodological aim of the early Minimalist Program (MP) was to focus attention on this tension so as to nudge theories in a more explanatory direction.[1] The most important hygienic observation of early MP was that many proffered analyses within GB lacked explanatory power for they were as complex as the data they intended to “explain.” Early MP aimed to sharpen our appreciation of the difference between (re)description and explanation and to encourage us to question the utility of proposals that serve mainly to extend the descriptive range of our technical apparatus.

How well has this early MP goal been realized? IMO, not particularly well. I have had an opportunity of late to read more widely in the literature than I generally do and from this brief foray into the outside world it looks to me that the overriding imperative of the bulk of current syntax research is descriptive. Indeed, the explanatory urge, weak as it has been in the past, appears to me now largely non-existent. Here’s what I mean.

The papers I’ve read have roughly the following kinds of structure:

1.     Phenomenon X has been analyzed in two different ways; W1 and W2. In this paper I provide evidence that both W1 and W2 are required.
2.     Principle P forbids structures like S. This paper argues that phenomenon X shows that P must be weakened to P’ and/or that an additional principle P’’ is required to handle the concomitant over-generation.
3.     Principle P prohibits S. Language L exhibits S. To reconcile P with S in L we augment the features in L with F, which allows L to evade P wrt S.

In each instance, the imperative to cover the relevant data points has been paramount. The explanatory costs of doing so are largely unacknowledged. Let me put this a different way. All linguists agree that ceteris paribus simpler theories are preferable to more complex ones (i.e. Ockham shaves us all!). The question is: What makes things not equal? Here is where the trade-off between explanation and description plays out.

MP considerations urge us to live with a few recalcitrant data points rather than weaken our explanatory standards.[2]  The descriptive impulse urges us to weaken our theory to “capture” the strays. There is no right or wrong move in these circumstances. Which way to jump is a matter of judgment. The calculus requires balancing our explanatory and descriptive urges. My reading of the literature is that right now, the latter almost completely dominate. In other words, much (most?) current research is always ready to sacrifice explanation in service of data coverage. Why?

I can think of several reasons for this.

First, I think that the Chomsky program for linguistics has never been widely endorsed by the linguistic community. To be more precise, whereas Chomskian technology has generally garnered an enthusiastic following, the larger cognitive/bio-linguistic program that this technology was in service of, has been warily embraced, if at all. How many times have you seen someone question a technical innovation because it would serve to make a solution to Plato’s or Darwin’s problem more difficult? How many papers have you read lately that aim to reduce rather than expand the number of operative principles in FL/UG? More pointedly, how often do we insist that students be able to construct Poverty of Stimulus (POS) arguments? Indeed, if my experience is anything to go by, the ability to construct a POS is not considered a necessary part of a linguist’s education. Not surprisingly, asking whether a given phenomenon or proposal raises “learnability” issues at all is generally not part of evaluative calculus.  The main concerns revolve around how to deploy/reconfigure the given technology to “capture the facts.”[3]

Second, most linguists take their object of study to be language not the faculty of language. Sophisticates take the goal of linguistics to be the discovery of grammatical patterns. This contrasts with the view that the goal of linguistics is to uncover the basic architecture of FL. I have previously dubbed the first group languists and the second linguists. There is no question that languists and linguists can happily co-exist and that the work of each can be beneficial to the other.[4] However, it is important to appreciate that the two groups (the first being very much bigger than the second) are engaged in different enterprises. The grandfather of languistics is Greenberg, not Chomsky. The goal of languistics is essentially descriptive, the prize going to the most thorough descriptions of the most languages. Its slogan might be no language left behind.

Linguists are far more opportunistic. The description of different languages is not a goal in itself. It is valuable precisely to the degree that it sheds light on novel mechanisms and organizing principles of FL. The languistic impulse is that working on a new or understudied language is ipso facto valuable. Not so the linguistic one. Linguists note that biology has come a long way studying essentially three organisms: mice, e-coli and fruit flies. There is nothing wrong in studying other organisms, but there is nothing particularly right about it either. Why? Because the aim is to uncover the underlying biological mechanisms, and if these do not radically differ across organisms then myopically focusing on three is a fine way to proceed. Why assume that the study of FL will be any different? Don’t get me wrong here: studying other organisms/languages might be useful. What is rejected is the presupposition that this is inherently worthwhile. 

Together, the above two impulses (actually two faces of the same one) reflect a deeper belief that the Chomsky program is more philosophy than science. This reflects a more general view that theoretical speculation is largely flocculent, whereas description is hard and grounded. You must know what I am about to say next: this reflects the age-old difference between empiricist and rationalist approaches to naturalistic explanation. Empiricism has always suspected theoretical speculation, conceiving of theory in largely instrumental terms. For a rationalist what’s real are the underlying mechanisms, the data being complex manifestations of the interacting more primitive operations. For the empiricist, the data are real, the theoretical constructs being bleached averages of the data, with their main virtue being concision. Chomsky’s program really is Rationalist (both in its psychology and its scientific methodology). The languistic impulse is empiricist. Given that Empiricism is the default “scientific” position (Lila Gleitman once remarked that it is likely innate), it is not surprising that explanation almost always gives way to description, for the former has not independent existence from the latter.

So what to do? I don’t know. What I suspect, however, is that the rise of languistics is not good for the future of the field. To the degree that it dominates the discipline it cuts Linguistics off from the more vibrant parts of the wider intellectual scene, especially cognitive and neuro-science. Philology is not a 21st century growth industry. Linguistics won’t be either if it fails to make room for the explanatory impulse.



[1] MP had both a methodological and a substantive set of ambitions. A good part of the original 93 paper was concerned with the former in the context of GB style theories.
[2] Living with them does not entail ignoring or forgetting about them. Most theories, even very good ones, have a stable of problem cases. Collecting and cataloguing them is worthwhile even if theory is insulated from them.
[3] This is evident in the review process as well. One of the virtues of NELs and WCCFL papers is that they are short and their main ideas easy to discern. By the time a paper is expanded for publication in a “real” journal it has bloated so much that often the main point is obscured. Why the bloat? In part this is a response to the adept reviewer’s empirical observations and scrambling by the author to cover the unruly data point. This is often done by adding some ad hoc excrescence that it takes pages and pages to defend and elaborate. Wouldn’t the world be a better place if the author simply noted the problem and went on leaving the general theory/proposal untouched? I think it would. But this would require judging the interest of the general proposal, i.e. the theory would have to be evaluated independently of whether it covered every last data point.
[4] IMO, one of the reasons for GG’s success was that it allowed the two to live harmoniously without getting into disputes. MP sharpens the differences between languists and linguists and this may in part explain its more luke warm embrace.

51 comments:

  1. How many times have you seen someone question a technical innovation because it would serve to make a solution to Plato’s or Darwin’s problem more difficult?
    I agree that linguistics has as its goals (among others) an account of how language is learned (Plato) and how language ability emerged (Darwin). I think it is uncontroversial that we also need an account of how language is used and how languages change over time. The methodology of classical cognitive science and GG has been to attempt to describe an underlying rule governed system (competence) which will enter into the explanation of all of these four problems.
    Many people work on language processing and parsing, and so we understand somewhat how to hook up competence with some aspects of language behaviour. Thus, it makes some sense to object to certain proposals by appeal to their performance consequences. Very few people work on hooking up competence to language learning or evolution in any rigorous way in the GG community, and so it would be bizarre to object to a proposal by claiming that it would make learning or evolution more difficult to explain -- we just have no idea how that would work. Outside of the GG community, in NLP/ML we see that the mainstay adages about learning are just plain wrong, which gives us even less confidence in objecting to competence proposals on learning grounds.

    ReplyDelete
    Replies
    1. …many adages are just plain wrong…
      Ok, which? I'd love to know. How about adages like cyclic bounds on computation (phases, minimality) would be useful for learning and parsing? Or how about the adage that the PLD is not rich enough to fix many of the principles we find operative in FL? Are these adages "just plain wrong"? That's not the impression I get from reading, say, Stabler's work on these matters, or Berwick's, or Marcus'.

      Maybe, I am misinterpreting you here, however. Maybe you mean that the adages within the NLP/ML world are just plain wrong. I could easily agree with this. There is almost no recognition of POS problems within this world and much of the work within it is at right angles to linguistic concerns. It often looks good, but its payoff has been pretty slight, again excepting the sort of work I mention above.

      Last point: lots of interesting theory within GB has been motivated by the impulse towards unification. This can be pursued for simple methodological Ochamite reasons or for more robust Darwin's Problem reasons. Either one is fine with me. My sense, however, is that this kind of work has been in abeyance of late and this is too bad. BTW, it's actually not that hard to object to many proposals on the grounds you find surprising. We do this all the time here at UMD. It does pay to ask the question of what one's proposal implies for these larger issues. It is even enlightening at time. Try it. You might find it interesting.

      Delete
    2. "…many adages are just plain wrong…
      Ok, which? I'd love to know. How about adages like cyclic bounds on computation (phases, minimality) would be useful for learning and parsing? Or how about the adage that the PLD is not rich enough to fix many of the principles we find operative in FL? Are these adages "just plain wrong"? That's not the impression I get from reading, say, Stabler's work on these matters, or Berwick's, or Marcus'."

      Yes, those adages.. also adages like you have to restrict the hypothesis space to obtain learnability. Or that you cannot retreat from an overgeneralisation on the basis of positive evidence alone. And so on.

      Delete
    3. Ok, which? I'd love to know.
      I was thinking along the lines of finitely many parameters and parameter setting, which seemed to have the status of a sine qua non of learnability, but which is nowhere near a necessary requirement (see any work on learning in any computational paradigm). [The late Partha Niyogi in his book analyzes this kind of model in careful detail.]
      Chomsky also claimed that it was not possible to distinguish unlikely but grammatical sentences from ungrammatical ones on statistical grounds. (His `colorless green ideas'.) Pereira shows that this is no longer the case.
      There is also the issue of subject auxiliary inversion, which our very own Alex Clark demonstrated could be learned by a very simple distributional learner. Of course, all learners have inductive biases of some sort or another, but that is not at issue here (and I have not met anyone who has ever doubted it).
      How about adages like cyclic bounds on computation (phases, minimality) would be useful for learning and parsing?
      There are too many weasel words here to give a definitive answer. We know however many things about how various conditions on movement in Minimalist Grammars affect the parsing problem, and it is not straightforward. And these results are not affected by forcing incremental linearization (phases) at every derivation step.
      Or how about the adage that the PLD is not rich enough to fix many of the principles we find operative in FL? Are these adages "just plain wrong"?
      This is a non sequitur (and a straw man). The linguist who argues against a particular proposal based on learning considerations has to have something more concrete than just "there are inductive biases." That is not in dispute.

      Delete
    4. Ok, I'd be curious to hear about the solution to any one specific learning problem involving hard-to-observe phenomena that NLP/ML has now solved. I'm not aware of any, though perhaps I just haven't been paying attention. They preferably involve cases where there's apparently cross-language variation, and hence a prima facie learning problem. I'm thinking of cases like (i) why English has surface that-t effects but Spanish does not, (ii) why English has scope inversion but Japanese (generally) does not; (iii) why long-distance anaphors show blocking effects while others appear not to; etc. etc. These are just a few examples of cases that have been well known for 25+ years. I'd really love to know how folks have managed to solve problems like these. (Discussions of subject-aux inversion and the like don't count - far too easy.)

      Delete
    5. @Greg@Alex
      We've been around these turns several times before. The best course right now would be to agree to disagree. That's fine. I was largely addressing people who share enough assumptions with me to make argumentation useful. GG skeptics need not apply. There are other grounds for arguing with you, and I have done some of this elsewhere. Maybe I will try a more direct broadside in the future. Stay tuned.

      Delete
    6. @colin (and actually norbert as well): I think you are misunderstanding my claim. I am not claiming that NLP/ML has solved the problems that linguists have been addressing, nor that NLP/ML is on the way to solving them. (Although I do think that ML is making amazing leaps forward, and that pioneers like Joshi and Steedman (et al) are connecting NLP work with rich linguistic work.) I am responding specifically to Norbert's lament that people don't appeal to learning considerations as often as he would like by claiming that, to the extent that they don't, this is good because we have no idea how to make an argument like that really stick. Evolutionary considerations are even less plausible. As I see it, the proper way to disagree with my comment is to claim that we do in fact have some ideas about a linking theory between competence and learning, or evolution. Then I could critically evaluate these claims, and we could get something like a productive discussion going.

      And @Norbert: I wonder what assumptions I need to be sharing with you that I'm not. I do not consider myself anything like a `GG skeptic.' I think I can give you pretty good reasons for the matters of doctrine that I disagree with you about.

      Delete
    7. @Colin: I don't quite understand how the three examples you give are instances of a learning problem. The class of attested languages should be in the intersection of the class of languages carved out by your grammar formalism and the class of learnable languages. Now if this intersection contains languages with and without that-trace effects, as it apparently does, why would a learner struggle with learning one of the two from input data? Similarly for scope inversion. This is just a case where the class furnishes a number of options and the learner picks one based on input data. What's so special about these cases that you think the learner could not make the right choice?

      Example (iii) has more bite, it makes a claim about what is possible and impossible (blocking effects with locally bound anaphors). Let's assume that the generalization is actually correct (I have my doubts, blocking effects seem to be restricted to long-distance reflexives that can be discourse-bound, but there are pronouns like aapan in Marathi that must have a non-local syntactic antecedent and for which no blocking effects are reported afaik). Why would it be the job of the learning algorithm to derive this property? We are looking at the intersection of multiple classes, so why not put the work on a different component, e.g. the parser?

      Quite generally, I think there's a difference in how we weigh the evidence. You, as many other linguists, look at the millions of things for which no working learning algorithm has been proposed yet and conclude that "the usual adages" is our best solution right now. But nobody has ever given a conclusive proof that one cannot do without "the usual adages" --- and such a proof is a theoretical possibility, just like you can show that the class of regular languages is not Gold learnable --- while the standard arguments are being knocked down one after another. So from my perspective, even though machine learning might have comparatively little to show right now, its a better bet long-term than buying into assumptions the necessity of which has not been proven conclusively.

      Delete
    8. @Thomas. I don't follow the logic of your last paragraph. If there many grammatical principles which appear not to be inductively learnable, shouldn't we conclude that they are probably not learned inductively? Take a specific example. Kids have to figure out somehow that (1) is ambiguous and (2) isn't:

      (1) How often did you tell John that he should take out the trash?
      (2) How often did you tell John why he should take out the trash?

      As far as I know, no progress whatever has been made in explaining how kids would acquire this bit of grammatical knowledge without having something like the ECP built in (whether in the grammar itself or as a consequence of the operation of the parser). Are you suggesting that machine learning will eventually get us there?

      Delete
    9. @Alex: There's two questions here: Can an ML algorithm learn the contrast between 1 and 2? Probably. Can it explain why the contrast exists? I don't see how, but that's not sufficient to conclude that it is impossible. Just like the fact that I don't see how P = NP could be true makes it false, or even unlikely. Many complexity theorists believe P != NP, but that doesn't give them license to use P != NP to prove P != PSPACE.

      The ECP is a sufficient answer to the challenge posed by (1) and (2), it is not a necessary one. So why should I infer from (1) and (2) that the ECP (and nothing else) is true? And, going back to Hornstein's lament (I like that term), how could we turn something like (1) and (2) into an argument for or against specific syntactic proposals if we do not even know the range of possible answers to the problem?

      Delete
    10. How would you justify the "probably"? For one thing, we're dealing with a pretty clear POS case: kids will likely have little or no relevant data, in which case it doesn't matter how fancy the learning algorithm is. Even if we assume that there is enough data, the kid still has to figure out the intended interpretations of the sentences and link their interpretative properties to the syntactic structure, control for plausibility confounds, etc. etc. AFAIK there are no learning algorithms out there which make any serious attempt at modeling all of this.

      In my view, then, one should infer that the ECP is built in because there is currently no other explanation for how kids come to acquire grammars in which (1) is ambiguous and (2) isn't. If someone comes up with a better explanation then I'll change my mind.

      Some of what you're saying here reads to me as a general skepticism about science. All we can ever do is make our best guess at what's going on given the available evidence and the ideas that people have come up with so far. So, no, we can't be certain that the ECP is the only or best explanation; such is scientific life.

      Delete
    11. @Greg. Here are some: that UG has innately specified structure that it is our job to describe. That good examples of such include the principles of the binding theory, control theory, bounding theory etc. I assume that GG is not a set of techniques but a set of results. I take it that the data points very strongly to the conclusion that principles like the cycle, bounding theory (aka barriers, phases), binding domains are more or less correct descriptions of FL and either these principles themselves or more abstract principles from which they follow are parts of the innate specification that humans bring to the task of language learning (and, given interesting work by Colin, Berwick, Marcus, Fodor, etc) to parsing as well. So there is a body of doctrine that characterizes GG, and it requires, in part, accepting the descriptive success of GB (or notational variants thereof). This is the settled part, IMO. What is not settled iw whether these principles are fundamental or derived from yet more general principles/mechanisms.

      To repeat something I've said many times before: I do not take it as open for real debate whether or not binding, island, ECP, case, *effects* are real. They are. Nor do I take it as a real open question whether these effects reflect innate structure operative in the domain of language. They do. Those who think otherwise, are, IMO outside the GG tradition. I keep an open enough mind that were someone to show me that the effects could follow without assuming a relatively rich innate structure then I would be happy to listen. To date, nobody has done so. As such, I take the empirical evidence to point to the conclusions noted above and take these to define the current GG lay of the land.

      BTW, I don't find the arguments even for Aux mvmt being learnable particularly compelling. The way I read the data, it's not enough to show that structure dependent operations are acquirable. We also need to show that non-structure dependent operations are not, or are at least very costly. To my knowledge this has not been demonstrated, nor even attempted.

      @Thomas. So far as I know, as Alex D has noted, the standard arguments have NOT been knocked down. Rather there have been some attempts to sidestep the argument based on Aux movement. Period. I have, for example, repeatedly asked for some other example: e.g. derive island effects or binding effects or ECP effects for me. Or take any of Colon's cases. Then we will talk. Till then, I see no reason to abandon the view that the powerful methods of ML/NLP have yet put to rest the general innateness conclusions that characterize GG. From what I can tell, BTW, this is a view that people like Niyogi/Berwick/Marcus who took a look at some of these issues in detail have happily endorsed.

      Delete
    12. @Norbert: I see no reason to abandon the view that the powerful methods of ML/NLP have yet put to rest the general innateness conclusions that characterize GG. From what I can tell, BTW, this is a view that people like Niyogi/Berwick/Marcus who took a look at some of these issues in detail have happily endorsed.

      I don't think anybody's arguing that there is no innate component, at least I am not; Greg also admitted that there are priors, and Alex C would too. I explicitly said that you need both a grammar formalism and a learning algorithm. The question is how you distribute the workload over the two.

      Learnability arguments in Minimalism assume a very particular distribution without showing that this has to be the case. I'm fine with that because it has little impact on how most syntactic work is done --- which seems to be one of the reasons for your lament --- but that's also why it isn't prudent to use such factors to compare syntactic proposals: all of a sudden your syntactic theory depends on assumptions from a different area that might easily turn out to be wrong.

      That's not a matter of "scientific scepticism", as Alex D calls it. It's a matter of managing the cost and payoff of assumptions. For Colin, who works on problems of language acquisition and syntactic processing on a daily basis, the assumptions of a strongly nativist position may well have enough of an empirical pay-off to merit the risk of said assumptions being wrong. For syntax, I don't see why you would want to bring in this extra risk if you can do without it. And most languists seem to think that you can.

      So, quick summary of my posts so far, which touched on three issues that aren't all that closely related:
      1) I'm curious why Colins considers his (i) and (ii) learnability challenges.
      2) I'd like to see impossibility proofs for machine learning rather than just assertions, and
      3) I'm a fan of minimizing assumptions wherever possible, which is why I agree with Greg that questions like Plato's problem, which must be connected to the syntactic machinery via an elaborate and not particularly well-understood linking theory, do not make for a compelling case for or against a given proposal, at least not at this point.

      Delete
    13. @Thomas. Apart from point (1), what you're saying seems to me to suffer from the usual problem in these discussions. Syntacticians bring up specific problems which have led them to postulate that X is innately specified and get told (i) that no-one has any better ideas but (ii) that we shouldn't assume that X is innately specified. Asking for impossibility proofs is setting the bar much too high: this is empirical science, not mathematics. We already have perfectly good POS arguments which make a strong case for, e.g., the ECP being innate.

      Delete
    14. @Alex D: I don't see how it's setting the bar too high, in particular because we have impossibility proofs in the other direction, e.g. that the Trigger learning algorithm does not succeed even for finite classes of languages. Learnability theory is full of such impossiblity proofs, they are not a priori unattainable.

      Your (i) is a bit of a strawman because: 1) people do have other ideas, you just do not accept them as valid alternatives, and 2) you have not shown that your account of X actually solves the issue at the level of detail that you expect the ML accounts to live up to. That is to say, given e.g. a PnP perspective with innate ECP, prove that it successfully learns the data. Is there a PnP algorithm that successfully learns English from realistic input? Does it also give you the correct learning trajectory? Does it capture the semantic ambiguities above? And does it actually require the PnP component to succeed? There are some interesting Bayesian PnP proposals in the literature, but I don't think any of them would pass your litmus test.

      As for (ii), one can of course assume that the ECP is innate, but that's all it is at this point, an assumption --- in contrast to the fact that something like ECP effects exist. And the former is a weaker criterion than the latter for evaluating competing proposals because of the extra assumptions it relies on. I think this basic difference is not in dispute. The fuzzier issue is whether the former is too weak, which depends on how well-supported you think the extra assumptions are that it relies on.

      Delete
    15. @Thomas. It's not just an assumption. It's a conclusion from a non-apodictic argument. I should add that these even exist in the rarefied realms of mathematics. See here: http://www.scottaaronson.com/blog/?p=1720 for example. One can give reasons that are not mathematically conclusive and one can give mathematical proofs that may well be irrelevant to the issues at hand. I cannot speak for all linguists, but for me I would be happy to discuss ML results that addressed the issues we linguists have been discussing for over 50 years now. I have asked Alex C to provide one such a while ago, with no luck. I am being told endlessly that there is another way to get the same results that we do by "assertion." But this is just false: show me the learning theory for the binding theory or bounding theory or ECP or whatever you want that deals with any of the myriad cases we have discussed.

      Can ML learn these? Of course, with the right priors and the right limits on the hypothesis space. We could after, acquiring the ECP is not big deal if it is a prior or part of the definition of an admissible derivation in the hypothesis space. This was precisely Chomsky's point in Aspects and who could doubt this. The ML world however insists that we don't need such specific priors and that more general learning methods will suffice. Ok: let's see. Put up! To date, this has not been done in any way that looks mildly compelling. Where it has been tried, it looks pretty dismal (the review of this by Berwick, Pietroski, Yankama and Chomsky that I noted in a much earlier post goes over some of the latest attempts). So, you think that this is a real alternative, then it should be easy to produce a usable example. I await this eagerly.

      Last point: why speculate about the Plato/Darwin implications? I think that the reasons are that these speculations fule further related research. Crain, for example, ran with several POS arguments and added further evidence from real time acquisition. Jeff Lidz has done this several times in his writing (and there is soon to be a book showing the utility of this way of thinking). Berwick and Weinberg ran with bounding theory and extended the speculations to parsing, Wexler and Culicover did so for grammar learning, as did Berwick later on. Yang has run with similar speculations and embedded them in reasonable learning theories. Dehaene and Moro have done this in neuro studies. I should add that these studies though far less abstract than what we see out of the standard ML accounts, actually make contact with syntactic speculations and so benefit each. To end: the speculations of syntacticians worried about these kinds of theoretical issues have gained traction in the past, and this is a very good thing.

      There is a tendency for researchers to believe that the only valid methods are the ones that they employ. That the only valid questions and considerations are the ones that they can exploit using their methods. I disagree. Syntax has prospered by pursuing questions that you seem to find ill defined and so useless. Maybe for you. Oddly, based on my evaluation of the past successes I would argue that we need more of these "delusions" not fewer. What I find unfortunate is that these considerations are being ENTIRELY displaced. It does not even have a small place anymore in active research. Given the successes we have had in considering these larger issues, this is an unfortunate state of affairs. No doubt you disagree, and this is precisely what I see as part of the problem. You are part of a very large majority.

      Delete
    16. Your (i) is a bit of a strawman because: 1) people do have other ideas, you just do not accept them as valid alternatives

      In the case of the specific example I mentioned (the contrast between (1) and (2)), I would be genuinely interested to learn what these alternatives are, valid or otherwise.

      Delete
    17. @Norbert: The ML world however insists that we don't need such specific priors and that more general learning methods will suffice. [...] So, you think that this is a real alternative, then it should be easy to produce a usable example.
      Why would it be easy? If this were the early 20th century and I were to ask you to give a unified account of all the variation we see in, say, binding theory, you probably couldn't do it. Even in the 60s this was hardly an easy thing to do. Would you have concluded from this that generative grammar is bound to fail?

      You yourself have often emphasized that data coverage is not everything, and that new proposals shouldn't be expected to do as well as well-established ones. Why then does machine learning have to have the same coverage as a strong nativist perspective from the get go? Why can't we cut it some slack and see it as a promising alternative that still needs more work?

      Let me clarify that I don't have an axe to grind here, I'm agnostic about how much of the learning process is domain specific. But the arguments for the strong nativist position aren't very convincing to me. The argument against ML is an assertion that it cannot work, and the argument for a strong nativist position is an assertion that it does work. Neither one is obvious. The strong nativist position is not enough to guarantee learnability (cf. failure of the Trigger learner), and the fact that ML algorithms aren't doing too well right now is only a claim about current algorithms, not about domain general algorithms in general. So for now I keep at the sidelines and see how this pans out over the next 20 years.

      Delete
    18. [continued]:
      why speculate about the Plato/Darwin implications? I think that the reasons are that these speculations fule further related research.
      Yes, that's definitely true, and I admitted as much above when I pointed out that these assumptions can get you a lot in psycholinguistics, parsing, etc. But we were talking about the place of these implications for evaluating syntactic proposals, and there things seem a lot fuzzier to me.

      Syntax has prospered by pursuing questions that you seem to find ill defined and so useless.
      Where did I say they're useless? They certainly inspire questions, drive research programs and have an influence on what problems we consider Important with capital I. But there's a difference between what motivates a researcher and how we evaluate research. There's many things I believe in that I cannot prove and that determine what kind of problems I work on. But I cannot use those beliefs of mine as an evaluation criterion for theories. Some claims are somewhere between personal belief and undisputed truth, they only hold under specific assumptions, which also limits their applicability as arguments for a given theory.

      I would say that this is the actual reason for languistic arguments: they depend on fewer assumptions, so if you can take down a proposal with that kind of argument, there's no reason to move on to more subtle issues, you've already delivered a major blow. Conversely, defending a theory that has been attacked on languistic grounds with the more indirect/subtle arguments arising from Plato's and Darwin's Problem is a harder sell because people are more likely to doubt at least one of your assumptions.

      What I find unfortunate is that these considerations are being ENTIRELY displaced. It does not even have a small place anymore in active research.
      I'm not sure to which extent this is actually a new state of affairs. From my historically naive perspective of a twenty-something, it looks like these questions enjoyed a brief period of very strong presence in the wake of Chomsky93, but with the stabilization of Minimalism after Chomsk01 things seem to be back in "puzzle solving" mode, similar to the GB period, where Plato's problem was considered solved thanks to PnP and the focus was on getting the descriptive details right. And that still generated a lot of interesting results. Of course, this might be a very distorted view of our field's history.

      Delete
    19. We have probably run our course here. We are frequently told that ML is the way to go. Lots of promise. Maybe. I'll judge when there is an example or two to sink one's teeth into. But, we can all agree I hope, what kind of example we want. Take whatever part of the theory of grammar people have worked on that has led to conclusions about the native structure of FL. Derive the relevant data domain with assumptions significantly different and more congenial to ML. Then we can talk. Till then, you bet on your horses and I will bet on mine.

      Delete
    20. I think that there are useful ways in which we can use NLP/ML to help to put some teeth into Plato's problem, and in doing so address some of the linguistic/languistic tension. It is all to easy for people to dismiss POS arguments by saying "meh, the stimulus isn't so poor", though it's rare to actually go and check. NLP makes it easier to go and check what a learner plausibly gets to work with -- or better yet, to describe what input differences face learners of different languages that need to arrive at different outcomes. And then ML can be used to get a rough sense of how effective that input might be. (Sure, the ML tools might get better, but if the information isn't reliably present in the input, it doesn't much matter how fancy your math is.) There are going to be some phenomena for which the learner faces abundant evidence. And so it may be that many current areas of languistic fascination do not have accompanying POS arguments, and hence less need to be captured by Norbert's sought-after linguistic system. Those could perhaps vary rather arbitrarily. On the other hand, there are likely to be cases where the POS arguments are dreadfully strong, and where we therefore very much want variation to be tightly constrained, and preferably linked to readily observable facts, as in classic PnP accounts. A languist may object to Norbert's inclination to dismiss some of his flora and fauna, on the grounds that it ignores the facts. But we now have better tools than before that could help us to figure out which facts call for which kinds of explanation, in particular where Plato's problem is and is not relevant.

      Two concrete cases that I'm quite familiar with. (i) Morphological learning. Pinker's '84 book has an impressively rich discussion of how learners might proceed. A hot-off-the-presses paper in Language by Gagliardi and Lidz revisits some of these issues; they show what information children do and do not attend to in learning a complex morphological system, and in collaboration with Naomi Feldman they've used computational modeling to explore ways of capturing this selectivity. (ii) Pearl & Sprouse have an interesting and elegant model that attempts to learn island constraints from a realistic corpus of child-directed speech. Their corpus is very interesting for what it shows that kids *don't* encounter. I have a reply (in the new Sprouse/Hornstein islands book) in which I fawn over them for taking on a real problem, but argue that it doesn't work. And these are just a couple of examples: folks here are using a mix of linguistics, child studies, and computational/corpus research to explore similar challenges in phonology and semantics. Based on being around this kind of stuff all the time, I regard it as productive and fun, and not something to put off for a few decades.

      Delete
    21. This comment has been removed by the author.

      Delete
    22. Let me pick up on one point, though I agree with Norbert that we have perhaps gone as far as we can at the moment.

      So ML guys like me have made some progress on learning very simple syntactic phenomena like the aux movement in polar interrogatives, which we can model using CFGs, and there has been lots of theoretical work on learning MCFGs etc but not accompanied with any empirical demonstrations of learning of more complex phenomena of the type that are interesting to syntacticians. Norbert and others therefore take a "put up or shut up" stance on this: show us your learner that can learn parasitic gaps or whatever and until you can do that we will assume that there are no such arguments.


      But where are the nativist models that show the acquisition of these phenomena? I will probably be pointed to Charles Yang's work: but this is not a model of language acquisition, (it is a parameter setting model -- which is only one tiny piece), and it doesn't output grammars that will parse 1 and 2 and give the right interpretations. Forget about a learning model, there isn't even a specified grammar formalism on the table, nor a parser.

      Thomas made this point earlier and I think it is a good one. You guys (AlexD Norbert etc) seem to be setting a very high bar for opposing theories, a bar that your own alternative could not get over.

      Indeed are there even adequate accounts in the MP of the acquisition of
      auxiliary fronting? Ideally one that has been implemented, but if not that then a full description of how one goes from strings of phonemes or words to complete grammars. Can the nativist theories get over the bar that the empiricist ones have already passed?

      There are practical problems involved with doing the empirical experiments that would be most convincing --- MCFG parsers even of dimension 2 are computationally expensive, and MCFG learners more so-- and there are as a result some engineering problems to overcome. I think that is the main reason why the empirical experiments that you might find convincing haven't been done yet.

      So a question: what would be a good example that is
      a) uncontroversial (from a theoretical point of view)
      b) has data that can be described weakly (i.e. using grammaticality judgments rather than ambiguity/coreference judgments)
      c) acquired quite early
      and d) would be convincing?

      Delete
    23. @AlexC. It's certainly true that the ECP (or whatever) in and of itself is not an acquisition model. I personally think ML could have a lot to tell us about how acquisition works. I just think that the successful acquisition model will have to have the ECP etc. built in. In addition to that it may well have lots of fancy ML algorithms too. So it's great that people are working on improving ML techniques, but I don't see that this work has so far given any reason to reject the majority of POS arguments for innate specification of certain grammatical principles. In other words, ML accounts do not have to be in competition with nativist accounts unless you want to bear the burden of showing that all of the innate structure that appeared to be necessary can be done away with.

      Delete
    24. @Alex C. You're quite right. There are not a host of successful accounts of learning hard-to-observe phenomena. The call is not to "put up or shut up", but rather for people on both sides to engage with the harder problems, and a hope that people on both sides will recognize that those problems are not already solved.

      (But I do wish that we would stop talking about aux-fronting as a model case. It diverts attention from the real challenges.)

      As for a good example to sink the teeth into, we've been working on cross-language variation in that-t effects, in a project led by Dustin Chacón. The facts are fairly clear (and we're re-confirming those more rigorously), the data can be described weakly, and we now have a reasonable idea of what the input corpus data looks like for English (from Pearl & Sprouse) and Spanish (from Dustin). Not sure that the contrast is mastered super-early, but I'm not sure that this is so essential. The theoretical analysis of these phenomena is not so clear, but that's also not such a barrier.

      Delete
  2. Because the aim is to uncover the underlying biological mechanisms, and if these do not radically differ across organisms then myopically focusing on three is a fine way to proceed. Why assume that the study of FL will be any different? Don’t get me wrong here: studying other organisms/languages might be useful. What is rejected is the presupposition that this is inherently worthwhile.

    Would you agree that it's worthwhile to investigate the assumptions underlying a program or theory? If we don't currently know whether or not the underlying mechanisms differ across languages, it seems worthwhile to me to try to find out.

    As it happens, in biology, there have been some fairly high profile arguments that animal models can fail badly when we've tried to generalize from one organism to another (e.g., mouse to human). So, yes, there's been a lot of progress due to a focus on a small number of organisms in biology, but it's pretty clear that we won't get anything like the whole story doing only that. Why assume it's any different for language?

    ReplyDelete
    Replies
    1. In principle I think that investigating assumptions is always a good idea. However, I may differ from you in having concluded that the basic assumptions underlying GG are largely correct. I would go further, that the generalizations coded within GB (and many other frameworks) are largely empirically correct and so not really up for grabs. They may need refinement, but they are here to stay on more or less current form. As such, I don't expect that we will find languages that use mechanisms that different from those we have already discovered. Indeed, I would say that one of the take home messages from cross linguistic investigation is that languages work pretty much in the same ways wrt phrase structure, movement, binding etc. There are some oddities, some have long distance anaphors, some move all WHs to the front, some move none, but as regards the basic operations, they are pretty similar. This is comparable to what we have founding biology: proteins dare made in pretty much the same way regardless of organism, coding these from DNA via RNA and a host of other intermediaries is widely shared. The MECHANICS it seems is pretty much the same and lots of it is conserved. There are differences, and these are very important in a technological setting: drug testing, medical models etc. But the basic mechanisms for storing information and using it for development seem pretty much the same. I expect no less for FL/UG. Of course, I MIGHT be wrong, but I currently have no reason to think I am.

      Last point: I am not at all sure what "getting the whole story" means. We rarely get the whole story in Science. What we get is a decent account of the underlying mechanisms. The complexities of their interaction we largely outsource to technology. So if you mean that studying different languages will/may tell us things we didn't know, I must agree. Whether it will tell us anything new about the basic mechanisms? Here I am less sure, and right now I tend to doubt it.

      Delete
    2. Using the phrase "the whole story" wasn't particularly apt. I agree that "the whole story" isn't what we're after with science. I should have said something along the lines of that we may well make significant mistakes if our assumptions don't hold.

      Please correct me if I'm wrong, since I am not particularly knowledgeable about syntax/GB/minimalism, but isn't minimalism (either as program or theory) in large part a reconceptualization of the basic mechanisms of syntax/FL/UG? That is, even if the empirical generalizations captured by GB are not up for grabs, the basic mechanisms are, aren't they?

      If so, to the extent that different languages exhibit or reflect different properties of the underlying mechanisms, it seems reasonable to me to suppose that studying different languages is likely to shed light on exactly what those mechanisms are.

      I gather that you disagree, but I guess I'm not entirely sure why.

      Delete
    3. I don't disagree that they MIGHT be useful. Who knows? I am suggesting that the assumption that they MUST be is oversold. From my reading of the literature, when it comes to basic mechanisms not much has been learned from cross linguistic study. We have discovered that there is sometimes more variation than we expected (Fixed subject effects come to mind, and weak island variation) and we have discovered that there are species of expressions and constructions different from what we find in English, say, that exploit the same mechanisms for different ends (V fronting in various languages for contrastive focus effects). However, odd as it may seem, the basic mechanisms and principles have stayed pretty much the same. So, the question now is whether I believe that cross-ling study will be useful in getting clear how the GB generalizations relate to basic architectural issues. Maybe. But I personally believe this is not where the real action will be. My current aim is not to discourage this kind of work, however, but to make room for other attacks on the problem; those prizing unification as these make rather more direct claims on both Plato's and Darwin's problem, IMO.

      Delete
    4. @Norbert: May PCC effects be a counterexample? Here we have a phenomenon where the challenge comes from the fact that only 4 out of 64 logical possibilities are attested, and very specific technical assumptions about feature structures and the properties of Agree have been made to accommodate them. These extra assumptions are not (cannot be?) motivated by English data, yet they change the underlying mechanisms quite significantly.

      Delete
    5. Yes. The place where things might be different is precisely in cases like the PCC where, as you note, we are dealing with consequences of feature theories. Why? Because, the only current purchase we have on features is by looking at languages that express them overtly. So, I would agree, that this particular area is a good counter-example. I should also add, that from what I can tell, though we have had to consult such feature laden languages to find the relevant phenomenon, it appears that the mechanisms are more or less the same, thought the relevant features are not. But, point conceded.

      Delete
  3. Very much agree with you, Norbert, on pretty much all points. I personally think that a large part of the problem has been the rise of "cartography" in syntax, which is largely a recasting of descriptive generalizations in terms of unanalyzed and underived phrase structure. As far as I can tell analyses in this paradigm are typically just as complex as the phenomenon they seek to capture, yielding no genuine insight -- but satisfying a philological urge, apparently.

    Not sure if you know the this short paper by Sascha Felix, which echoes many of your concerns: http://amor.cms.hu-berlin.de/~ottdenni/felix.pdf

    ReplyDelete
  4. Interesting paper by the late Sascha Felix, Dennis. Did it appear somewhere? Year?

    ReplyDelete
    Replies
    1. Jan -- it appeared in _Language and Logos: Studies in Theoretical and Computational Linguistics_, ed. Thomas Hanneforth & Gisbert Fanselow, de Gruyter, 2010. I believe it's a festschrift for Peter Staudacher.

      (Hope all is well in Groningen, btw!)

      Delete
  5. As a generative languist/teacher who wants to stay more or less on top of things, I follow this very useful blog (and struggle through the bewildering amount of abbreviations). Can I ask a 'simple' question, about SU-AUX inversion, the hypothesized showcase for innate-not innate? I read here that the ML camp has cracked it (Greg). I also read that it's not convincing (Norbert). And I read that we should leave this case aside (Colin). I have three why questions: why is it cracked, why is it not cracked and why does Colin think we should leave it aside? It would be very useful for the average languist/teacher to understand where precisely the point of disagreement lies. Not sure that the Berwick et al paper discusses the solution that Greg has in mind, not sure if Norbert can pinpoint why the analysis Greg has in mind does not convince him. And not sure why Colin thinks the case is irrelevant (because he agrees with Greg?).

    ReplyDelete
    Replies
    1. Hi Olaf,
      To make things even more confusing, I think everyone is in agreement here (not that you could tell from our posts). Let me explain. As I see it, the Subject-Aux inversion argument was intended to establish that the child needed to be able to learn rules which referred to structure. (Usually, one takes an extra step and concludes that children can only learn such rules.)
      Alex Clark has a very simple (but very insightful) family of distributional learning algorithms in the spirit of Zellig Harris' work. (He has characterized himself as `ML' (= machine learning), but I meant something more restrictive; I would have put him into the `GI' (= grammatical inference) community, whose goals and methods I find much more congenial to linguists.) Amazingly, Alex's distributional learning algorithms give rise to grammars which front the right auxiliary. Moreover, they cannot learn to front the wrong one.
      Now the critiques.
      (1) (colin) leave aside [because Alex is learning the kind of structure the argument is trying to establish we want to learn]
      me: That's right. Of course, Alex is using an extremely simple distributional learner, with very simple `innate biases'. It is interesting (for everybody) to see how much we can do with so little, so as to pinpoint exactly where things are going wrong; these then are the things we should all be focussing on.
      (2) (norbert) not convincing [because the algorithm/grammar is too simple]
      me: that's right, and Alex is perfectly aware of this. It does, however, establish that simple distributional methods can learn patterns similar to the ones we find in language. Imho, the right thing to do is (as is Alex) try to understand where these methods go wrong, and try to fix them.

      Berwick et al do discuss Alex's work, only to dismiss it because it does not learn the `right structures'. Which is true, as Alex points out in his paper. Of course, Alex was only looking at strings, and `right structures' must derive the proper meanings and prosodic contours, and allow for good linking theories to processing et cetera. It seems to me a more productive response would have been to, instead of dismissing his work because it doesn't solve all problems at once, have said "hey, this is great, finally someone who is engaging with us and trying to pick up our explanatory slack from the wastebasket of innateness. Maybe it will pan out, maybe it won't, but it sure will be interesting to see what happens! Especially because we're interested in what exactly needs to be evolved, and the less in that wastebasket the better (well, maybe; if only we had a linking theory to evolution)."

      Delete
    2. Correct me if I'm wrong, but isn't it the case that Berwick et al. actually point out some rather severe short comings of the notion of "weak substitution" that Alex's algorithm builds on, rather than merely beating the "this isn't learning the right structures"-horse? Perhaps I'm a little behind the current state of the debate, then I'd highly appreciate a pointer to the most recent paper(s).

      Delete
    3. I agree with Berwick's criticisms of that paper (which is basically the very first paper on the topic) and of the naive notion of substitution, but there has been a lot of progress since then, in various directions.

      on strong learning from positive examples in JMLR (i.e. learning the right structures, but still using a naive substitution test) (http://jmlr.org/papers/v14/clark13a.html)

      on weak learning of a very big class but using queries (so you can get feedback on whether a string is grammatical)
      in Machine Learning with Ryo Yoshinaka "Distributional learning of Parallel Multiple Context-Free Grammars".

      and if you don't like queries (who does?), then Chihiro Shibata, Ryo Yoshinaka: PAC Learning of Some Subclasses of Context-Free Grammars with Basic Distributional Properties from Positive Data. ALT 2013.

      Whether you can stick all three together with sufficiently large amounts of gaffer tape is an open problem.

      Delete
    4. @Benjamin: Fair enough. I should have been more careful, and ended up understating what they do. (And unfairly criticizing them to boot.)

      I still wish the attitudes were more collegial instead of hostile, but, as you have aptly pointed out, I am contributing to the problem.

      Delete
    5. @Greg. To clarify, I'm not being dismissive of the subj-aux inversion discussion because I have a problem with trying to do more with simpler algorithms. It's always interesting when you can do more with less. And I'm certainly not trying to be dismissive of AlexC, who is a model of engaging with folks from different perspectives and trying to find good solutions to hard problems. Rather, I'm complaining because I think that Chomsky created a huge distraction by drawing attention to that specific problem, and that the field has been held back by treating it as a test case. If subj-aux inversion were a parade case, then we'd have been done decades ago. My concern is not with the solution, but with the choice of problem.

      Delete
    6. @Alex: thanks for the pointers.

      @Greg: I do share your sentiment that the kind of reply that BPYC is could be much more constructive than it is.

      Delete
  6. @Colin, I agree that sometimes the field converges on a test case that maybe isn't in retrospect quite the right example: the English past tense being maybe another example where the chosen problem isn't really hard enough. But a lot of the problems that syntacticians concern themselves with (and this is not intended as a criticism) are quite subtle effects from adult syntax where there is still debate about where the correct locus of explanation is. For example, island effects are still in dispute (e.g. Sag and Hofmeister) . I am not trying to adjudicate on the disputes there, I just note that from the perspective of someone outside of syntax, the dust hasn't settled yet.
    So the right problems to look at are more complex than aux inversion and less complex than island effects and it seems like there should be a lot of options to choose from. The very simple and the very complex aren't the only alternatives.

    ReplyDelete
    Replies
    1. Alex, I agree that one wouldn't want to go down the rabbit hole of trying to solve a problem that turns out to not be a real problem at all. But I disagree with the implication that many of the harder cases are "quite subtle effects". For sure there are some questionable data points in current journal articles, some that haven't been carefully controlled, or that won't stand the test of time. But we have plenty of hard cases that have been around for 30-40 years. I'm sure that you're aware of the Sprouse et al. validation studies for acceptability judgments and recent journal articles. Also, for a workshop last fall in Potsdam I pulled together the results of all of the acceptability studies that we've done in our lab over the past 15 years - roughly 50 experiments on 1700+ participants - and the results were almost always boringly straightforward. Just one example: judgments about parasitic gaps that even syntacticians have tended to regard as subtle turn out to be quite robust; now replicated in multiple labs. We have no shortage of obscure facts that are not subtle.

      A slightly different issue that your comment points to is whether some phenomena are poor test cases because their explanation is in dispute, and you point to the controversy over islands. I'll resist the urge to preach about islands and reductionist accounts, but I would frame this point differently. A good reason to not make the learning of relative clause islands a parade case is that it's a good candidate for a universal, and so it's debatable whether it is even a learning problem at all. (Aside: we're aware of Swedish, Korean etc.: that's another story.) The cases to focus on would be ones where there's cross-language variation, and hence there's a clear learning challenge to be solved. And I should note that the cases of cross-language variation have not lent themselves to compelling reductionist explanations.

      Delete
    2. Just to disentangle issues...If you suspect that many (alleged) universals are by-products of learning procedures, yielding common patterns in response to diverse PrimaryLinguisticData-sets, then relative clause islands might be an excellent case study on which to test your suspicions. If you suspect that many cases of linguistic variation are by-products of learning procedures, yielding distinct patterns in response to diverse PLD-sets, then you'll want to focus on cases of linguistic variation. If you believe in innately determined universals, but suspect that appeals to learning rarely if ever *explain* anything about animal cognition, then you might try to reduce various universals to the simplest plausibly innate basis (in a Minimalist sprit).

      Delete
    3. Yes, I agree with Paul here -- I don't necessarily agree with the
      "typologically universal implies innate and unlearned" implication, so even if it is universal in the Greenbergian sense it might not be built in, but rather learned. So non-obvious universals with no variation might be a good test case.

      Delete
    4. Sure, it can be interesting to devise a learner that can derive RC islands from PLD, and to show that learners will arrive at the same conclusion given the PLD that they'd encounter in any language. (In fact, this is what Pearl & Sprouse 2013 have already done; well, the first part, at least.) But in such cases the other side can simply respond: "Meh, I think it's innate, so it's a non-problem." And others can simply respond: "Meh, it's just processing constraints." The reason why I emphasize the cases of linguistic variation is because those are the ones where everybody has to assume that the learner has to arrive at different outcomes based on differences in the PLD. So it should be harder to simply dismiss the problem. To use Alex's terms from somewhere up the page, everybody has to "put up or shut up".

      Delete
  7. I don't understand the ground rules for when one gets to say 'meh'. If I suspect that a certain of network constraints (e.g., concerning spots on butterfly wings, or the character of bee dances) is innate, and someone shows how the proposed constraints could be epiphenomenal by-products of a not implausible learning algorithm, I might retain my suspicion...and I might start looking into how the spots/dances would go in certain ethologically unusual settings...and I might look into revising the statements of the constraints. Though in the interim, I'd better give up any "only game in town" arguments for the innateness of the constraints.

    On the other hand, if someone points to certain respects in which wings/dances differ across individuals (even individuals of the same species), I might say 'meh' if my goal was to understand the basic architecture that gives rise to the phenomenon (and ideally, how that architecture unfolds as the animal develops). Once someone has an actual theory of individual variation--and arguments that certain kinds of variation are not mere differences, but theoretically interesting phenomena that theories can and should capture--then of course, that expands the list of explananda that our theories should capture. (And one might have many independent reasons for studying variation.) But I'm leery of any suggestion that the real theoretical action lies with understanding variation, as opposed to deep commonalities.

    ReplyDelete
    Replies
    1. (I'm not making rules about when to say "meh", merely observing situations where people actually do so.) My focus is not on variation in general, but specifically on situations where we can be confident that the end state of the language system is a consequence of regularities that it picks up from the input. When two groups of speakers (vaguely defined) arrive at two consistent yet different end states, then it's pretty likely that those end states are a consequence of what they got from the input. We know there's learning to be explained there. In the cases where either (i) all speakers reach the same end state, irrespective of input (i.e. in any language), or where (ii) speakers vary apparently randomly within a community, then there are certainly interesting things to be said, but it's less obvious that this is a consequence of systematic interactions with the input. So there might or might not be a learning problem to be solved.

      Delete
    2. (1) "When two groups of speakers (vaguely defined) arrive at two consistent yet different end states, then it's pretty likely that those end states are a consequence of what they got from the input."

      Well, it's pretty likely that the *difference* between the end states is somehow due to the difference between the inputs. The end states may still be overwhelmingly similar, and in no interesting sense consequences of the inputs.

      (2) "We know there's learning to be explained there."

      If 'learning' is just a label for input-sensitive acquisition, then we may know that learning occurred. Whether explicable learning occurred depends on whether there is any explanation to be had. And one might be agnostic about lots of particular cases, absent plausible specific accounts of how the input-sensitive acquisition occurred. Indeed, one can imagine a classical learning theorist who is agnostic about whether their model (perhaps designed for rats in boxes) explains specific differences between human languages, yet maintains (perhaps implausibly) that their model does explain how certain allegedly innate constraints can be by-products of learning.

      I agree that learning accounts of *acquisition* should focus on some actual cases of input-sensitive acquisition. But often, at least part of the debate is about whether we should appeal to Chomsky-style innate constraints as part of our explanation for why human grammars pair meanings with pronunciations in certain ways but not others. In my view, one can argue for an affirmative answer without having a clue about how actual acquisition occurs (or how actual variation emerges) within the space characterized by the constraints. Others disagree. And in this dispute, it would be wrong to burden the other side with the job of explaining how actual acquisition occurs (or how actual variation emerges). I shouldn't ask the other side to do what I cannot do...and vice versa. Moreover, nativists shouldn't set things up so that a "draw game" is the likely outcome. One doesn't need an account of
      how acquisition occurs (or how variation emerges) to argue that human grammars respect substantive constraints that go beyond by-products of learning--and hence, that whatever input-sensitive acquisition is, it is best viewed as a process of acquiring a grammar of a rather special sort (and not as a process of learning which grammar of a more general sort) is being used by local adults).

      I worry that the alleged death/demise/decline of "poverty of stimulus arguments" is due in part to the (mistaken) thought that such arguments aren't any good if the advocates and critics are alike in failing to have good accounts of actual acquisition/variation/production. And I wouldn't want to foster this mistake by insisting that critics account for input-sensitive variation, even if many of the critics have conceptions of acquisition that make it reasonable to ask which cases of input-sensitive variation they can account for.

      Delete
    3. Any argument for Chomsky style innate constraints that relies on the hardness of acquisition (there are other arguments I know) must have as a premise something like

      (P) Chomsky style innate constraints make the acquisition problem easier.

      So that premise, in my opinion needs some support (in fact I think it is false), and it is hard to see how one can argue for (P) without some sort of acquisition model.

      Delete
    4. Alex: I agree. Hence my desire to distinguish debates about (P) from other debates in the neighborhood.

      Delete