Thursday, June 12, 2014

Two comments

Here are two unconnected comments about previous posts. The first is a little rantish as it relates to a shibboleth I believed I had chopped to the ground.  The second is an obvious point about Chomsky’s first lecture that I stupidly forgot to remark on but is really very important.  Here they are:

1.     Ewan makes the following comment (here):

I try not to give up hope that some day linguists and psychologists will deconflate indispensable generic notions like abstractness and similarity from issues about being domain-general or domain-specific. Completely orthogonal and there isn't much hope for the field until that distinction is made. The exchange between Norbert and Alex in the first comment thread gives me a bit of optimism (because Norbert came around) but it can get tiring to have to say it again and again and again.

There is nothing to object to in the content of this remark. Who wouldn’t want these notions “deconflated”?  However, I find a lot to object to in its connotation, and in two ways.

The first is the suggestion that linguists (like moi) have finally “come around” to recognizing that these dimensions are orthogonal. This suggests that there was a time when there was widespread confusion among linguists on this matter. If so, I don’t know who was so confused and when it was.  We have long known that both domain general and domain specific acquisition mechanisms need notions of abstractness and similarity to get them to go beyond the input. There is no “learning” without biases and there are no biases without a specification of dimensions of similarity, some of them “concrete” and some of them “abstract.” Abstract, in such contexts, means going beyond dimensions afforded by a purely perceptual quality space (see Reflections on Language for a long discussion of these themes in the context of a critique of Quine).  Empiricists have long argued that all we need are such perceptual dimensions and they have been more than happy to assume that coding these dimensions is an innate part of the mind. Part of what Rationalists have argued is that this is insufficient and that more “abstract” dimensions of generalization are required. 

This noted, there is a second question: what’s the nature of the abstractness required?  Is the abstractness that is required peculiar to some domain (e.g. peculiar to visual computations or linguistic computations or auditory computations alone), or are these all aspects of one and the same kind of abstract computation.[1]  This is where the domain general/domain specific meets the abstract/non-abstract dimension. Let me explain.

Domain general explanations are, ceteris paribus, preferable to domain specific ones. Why? Because they are more general. That means that they potentially apply to a wider range of data than domain specific accounts apply to precisely because they are domain general.  And, all things being equal, this means that were they empirically adequate they would have more kinds of evidence in their favor than domain specific accounts would have. So, imagine a world (not this one from what we can tell at present) where we could unify vision and language in one set of common principles. And say that these principles could explain things like Island Effects or the ECP or Binding Effects etc. as well as the Muller-Lyer illusion or Common Fate. This would be a GREAT theory and clearly better than a purely linguistic specific explanation of Islands, the ECP etc. Moreover, it’s obvious why, but let me say it anyhow: it would be better because it explains more than the purely domain specific linguistic account does.  For the record, UG has nothing to say about the Muller-Lyer illusion. So, to repeat, were there such accounts linguists like me would rejoice and pop open the champagne, and nominate the relevant scientists for Nobel Prizes. All agree that such domain general accounts would be very nice to have, and there is a sense in which Minimalists are betting that some might be available. So as far as our druthers are concerned, we are all singing from the same hymnal.

So, given this huge agreement among all good thinking people what’s the fighting all about. Well, it’s NOT about this! Rather the above noted aspirations have never come close to realization. The problem linguists like me have with domain general explanations of linguistic phenomena of interest is that they do not currently exist!!! The reason I resist all the hype surrounding a very attractive prospect is that it seems to have relieved advocates of concretely coming up with the goods.  The reason I respect domain specific, i.e. UGish, accounts is that they are currently the only game in town (actually, I think that there are a few more general accounts of a minimalist variety, but they are rightly controversial).  In other words, I respect domain specific accounts for empirical reasons. Moreover, IMO, the big difference between those inclined to dismiss domain specific accounts and those that don’t is that the former respect the discoveries GG has made over the last 60 years and those that don’t, really don’t.  Alex C and I had a long involved “debate” about this (here) and I think that it is fair to say that we agreed that we considered different things to be key data currently in need of explanation. I take the results to 60 years of work of GG to give us a whole slew of effects that should be the targets of linguistic explanation. He is “interested in a different problem.”  And, as of this moment in time he has nothing to say about these effects and so the domain specific accounts are the only ones available. As I’ve also suggested, I think that ignoring these effects is bolstered by insisting that they do not really exist. This leads different interests to couple with a kind of skepticism about what GG has found. It is psychologically, if not logically, comforting to ignore GG results if one denies the validity of these results (e.g. see (here).

If this diagnosis is correct (and, of course, I believe it is) then the domain general vs domain specific issues have never been confused, nor have they ever hindered fruitful dialogue. The gulf between linguists and some cognitive researchers has to do with what the worthwhile problems are taken to be. GGers refuse to be diverted from their discoveries by promises of a potential theory that might one day in ways we do not yet begin to comprehend solve our problems.  As I’ve state repeatedly, give us some concrete domain general accounts and they will be carefully considered. I personally hope they exist and are produced soon. But as I learned when I was about 7, wishing things to be so doesn’t make them so. 

So, contra Ewan the problem is not that GGers like me don’t “get” the distinctions he considers vital to internalize. We get them alright. We just don’t see how they are currently relevant. Moreover, I would argue that pointing to the “failure” Ewan identifies, serves simply to throw sand in GGers eyes. It tells us to discard or demote results that we have labored hard to gain in favor of theories that don’t yet exist (not even in the faintest outline) and that address problems in ways that we think overly simplistic.  It’s cloud cuckoo land advice and serves only to mislead the interested parties as to what the real fight is about. It’s not that some prize domain general accounts and others prize domain specific ones. It’s that what GGers want to explain currently have only domain specific accounts and that those with domain general stories to peddle are trying to convince us that we should stop worrying about the facts we have found.  No thanks.

2.     On Lecture 1

One point I should have made about lecture 1 but did not was that it’s amazing that this is Chomsky’s first lecture on a series of very technical topics in minimalist grammar. It’s clear that he thinks that what follows is interesting because it is situated in a long tradition of questions about minds, brains, evolution, learning and more.  Put another way, the technical discussion that follows should be understood as addressing these far more general questions.

This is Chomsky’s standard modus operandi, and it is what makes GG so exciting IMO.  GG has always been part of the Rationalist tradition (as the lecture makes clear). Thus, the success of the GG enterprise is philosophically very telling. It is rare that large philosophical issues can be related to concrete empirical issues, albeit abstract ones, but this is one such case.  What Chomsky has always been very good at showing is how very abstract philosophical concerns are reflected in detailed empirical worries and how empirical problems commit hostages to large-scale philosophical positions. What the first lecture makes clear is that Chomsky is a modern incarnation of the 17th and 18th century natural philosophers. Big empirical issues have philosophical roots and each has implications for the other.  That’s an important insight, and nobody delivers it better than Chomsky.

[1] Gallistel and King, for example, doubt the coherence of a generalized “sensing” mechanism (as opposed to visual or olfactory sensing) and they seem to doubt the coherence of a purely domain general notion of sensing as an abstraction.


  1. "Domain general explanations are, ceteris paribus, preferable to domain specific ones. Why? Because they are more general. That means that they potentially apply to a wider range of data than domain specific accounts apply to precisely because they are domain general."

    I don't agree with this -- I think there are strong empirical arguments against domain specific models in linguistics, which are based on evolutionary considerations. These don't apply to the evolutionarily ancient parts of the brain like the visual system, but given what we know about evolution and how recently we think language evolved, and how stable it appears to be across the whole species, there is strong empirical evidence that the domain specific theories (e.g. P and P models) are just false.
    So this isn't a methodological argument from parsimony , just a straight empirical one -- aka Darwin's problem.

    Now in any event, I don't think that your current theories count as domain-specific, since you are no longer claiming, if I understand your view correctly, that things like the ECP are innate, but rather that there is some other unknown and unspecified thing X which is innate which derives the ECP like effects. And X may or may not be specific to the domain of language.

    1. I think you misspoke. What you mean to say is that in addition to the methodological advantages more general theories have over more specific ones there is in linguistics an additional evolutionary reason for preferring domain general theories. that's the minimalist conceit, and you endorse it.

      As you know I agree with this view. However, there are interesting ways of addressing Darwin's Problem and very boring ways of doing so. The interesting ones require that you actually explain the properties that some cognitive system has. With regard to FL this means accounting for the fact that FL appears to give rise to myriad effects (I've enumerated them, as you know). These effects REFLECT innate features of FL, even if the effects themselves are not innate (effects are never themselves what give rise to effects). So ECP effects implicate built in features of FL. The question then is whether these features of FL are proprietary to FL or reflect more domain general properties of cognitive computation or anatomy or physics or whatever.

      Now, from where I sit, anyone who wants to explain how linguistic facility arose in humans but won't address these myriad effects has, basically, changed the topic of discussion. It would be like an evo explanation of bird locomotion that decided that it would address how birds walk and hop but say nothing about how they came to fly. No doubt, as Chomsky once wryly observed, some birds can hop farther than others so the implication might be waved about that once we understand hopping flying will be understood as well (a kind of elaborate very distant hopping). I think that this would be a very unpersuasive retort, but it is of course, logically possible that it is correct. I have no trouble ignoring this kind of possibility. Your replies to me always seem to have this flavor. I ask you what you have to say about what GG has established as non trivial properties of Gs over the last 60 years and you reply that you are interested in something else. Then you suggest that one day you might address the kinds of problems I focus on. Is this logically possible? Yes. Likely? Not IMO.

      So, where I object to is not wrt the the logic of your views, but in what you take the problem requiring explanation to be. You abstract away from what is interesting. You are looking at hopping. I am interested in flying. And I want to make sure that this difference is clear and up front so that when people work on what they do they are not tempted by this switch and bait.

  2. I'm cross posting my comment replying to your previous objection to my objection on the other post:

    The reason I harp on this is lost then. The idea is that the sword is supposed to cut both ways. The people who think that there can be "assumption free" learning or learning which is based "only" on similarity without saying "similar in what sense" are wrong.

    But the point is that by that token the "good guys" cannot then turn around and set up a contrast between learning that "[only] works by generalization, compresses the regularities in the input" versus learning that doesn't. The whole point is that as soon you buy into that false dichotomy you're automatically playing the wrong game.

    The whole point should always be to underscore that each time the "input driven" people say it's "just" this to point out that they've made a commitment to a particular way of doing it, one among many, there is no "just." That's the Goodman logic, that's the argument I want to see. The skepticism about input-driven-ness should go so deep that you come out and say "there is no serviceable notion of input-driven." Because I don't think there is, and therefore I think all rebuttals that presuppose there is are incoherent.

    1. ... the context in which it makes the most friction (on this blog) to contrast "input-driven" and not is when you're dealing with mathematically/computationally sophisticated people who largely get the lesson of Goodman, because they actually have to deal with plugging in the details - people who are just as ready as you to ask "input-driven how?" or to say "everything is input-driven in a trivial sense but if you think there is any non-trivial sense then cough it up and get precise."

      Anyway, that's a slightly different issue but it compounds my frustration to have bad communication between people who _agree_ - in addition to the main issue that there are tons of people, as you say, who still genuinely don't get it (for which see my last comment).

    2. Also should say I meant "backed off", that's the sense of "came around." I meant that in the original post you drew a sharp contrast (following the paper), then in Alex C's comment he suggested you shouldn't, then you said something to the effect of "yes of course not."

    3. I agree. But this is where the incipient Empiricism comes in. Input means driven by surface visible properties. The most obvious are sensory properties. Now, what you observe (and Goodman observed) is that to even discuss THESE one needs to specify the projectable predicates. Without a specification of these, there is no way of talking about similarity. However, I think that Plato's Problem highlights a second sense in which reliance on such a strategy will fail: there are features of FL that the effects highlight that do not track ANY surface properties of the input, if by that we mean the PLD. That's what Lidz and Gagliardi were at pains to show, that simple minded similarity based accounts fail in BOTH respects. GG has tended to focus on the second POS problem, but the first one is also worth making clear, but not at the price of forgetting the second. My objection to your remark was that it misplaced the point of disagreement between many GGers and most MLers. The latter want to think that all the problems of acquisition boil down to getting the set of surface projectable predicates right. I disagree. Why? Because of the myriad effects GG has discovered over the years. Virtually everyone of these requires making assumptions about FL's inner structure, not the shape of the input data or generalizations over surface detectible properties. There are not patterns IN the data to track! That's what these effects show. And I for one think this a very big deal. So, if you don't address these you are not addressing my concerns. And I want people to appreciate that this is what the fight between me and others (e.g. Alex C) is about. It's not about whether I would tolerate a domain general account of some set of effects (e.g. ECP). I would not only tolerate it, I would welcome it. What I don't tolerate is saying that these effects are not what we should be trying to explain.

    4. To be clear: you're locating the L&G cases under (i) we have done the non-trivial work of specifying the surface-projectible predicates and the generalization _still_ is not transparent with respect to those; or (ii) a case where there are no surface-projectible predicates at all? My point is that there is no surface. Now, once you specify FL/learner you have created a surface implicitly; furthermore the only meaningful notions of input, generalization, and similarity are to be found implicit in a learner/observer. The "input" is not the external world; the external is not what's of interest at all. Seen this way, I'm not sure how there are two different problems.

    5. Put another way, the game should be bigger. The stakes should be to defend the idea that every notion of similarity, of input-transparency, of generalization, is not only contingent, but is as arbitrary with respect to the external world as any other. If that's the case then I'm not sure I see the difference between problem 1 and problem 2 here.

    6. I guess I do see a difference. A concrete case: I have no problem believing that whether a learner concludes that its language is OV or VO is driven by inspecting instances of the language that it hears and hearing that it is one or the other. The data drives the conclusion arrived at. Now, thee may be some debate about what the right predicates are: is it VO/OV or XDP/DPX or obligatory movement to Spec v/vs optional etc. But, the conclusion will be heavily influenced by the data canvassed as there is relevant data in the input to track. This contrasts with, e.g. ECP effects. There is NO data to track in the PLD so far as we can tell relevant to this. Indeed, this is one reason we think that ECP effects will appear in any G that allows adjuncts and arguments to enter into long distance dependencies. Precisely because there is no relevant PLD to allow variation the bet is that there will be no variation. And, if we find some this is a real puzzle for these sorts of accounts.

      Now, can I make this distinction hard and fast? Can I define a difference. Not at the moment. But it seems like an important difference to me. In one case we can find predicates whose instances can be tracked in the PLD and in the other we cannot. So in the first we can go from instances to generalizations. In the latter we cannot. Of course, as Goodman notes, what counts as an instance can be a very complex affair. We need to specify the projectable predicates. But what are the projectable predicates with relevant instances in the PLD for ECP effects? Damn if I know. For OV/VO there are many, and of course, which one is the right one, is an interesting problem, and not one that is fixed by the "external world."

    7. Wait, why ECP? That's a case where there's no evidence there's any learning at all. I'm not talking about the silly idea that you'd be learning the ECP. No the case you're talking about ought to be one where there is learning, but it takes place at some very abstract level removed from the input. Taking the Tsez noun class case though, I don't see why it doesn't just fall under specifying the surface-projectible predicates and the notion of similarity/generalization, mutatis mutandis.

    8. That's right, Ewan. We have to distinguish properties that occur in Gs but leave no surface evidence for themselves (eg, ECP, Verb-raising in a head-final language), properties that occur in Gs and leave very clear evidence for themselves (e.g, VO vs. OV; Noun classes) and properties that occur in Gs but whose surface signature is only a signature relative to a very specific, arbitrary and peculiar set of representational primitives (e.g., ditransitive alternations). As Norbert says (and Tal says, and probably everyone would say...) the division between these cases is not crystal clear, but the intuitions about what distinguishes the prototypical cases are.

      The noun class data doesn't necessarily require any particular inductive bias, other than that the learner is trying to organize the lexicon in some kind of efficient way (though it might require something to explain why the arbitrary semantic features are ignored but the statisticallly equivalent phonological features are not). The purpose of that example is to highlight the input-intake distinction. As we said, in thinking about intake it is also important to think about the difference between (a) the properties that make it into the intake only because of perceptual processes and (b) those that are selected from the perceptual as being specially relevant for some problem. The way UG focuses our attention on the input is to help us identify what makes some properties selectively attended (out of all the ones that make it into the perceptual intake).

      From my perspective, the purpose of all of this stuff is to give a fully specified theory of both the UG contribution and the way that UG makes selection of relevant properties from the input/perceptual intake possible, so that (as Norbert keeps saying) there are some cases on the ground where there are explicit proposals about not just the content of UG, but also how it helps learning in the cases where the input is important. With those proposals in hand, we have benchmarks against which other approches can be measured. As Alex C says, he has no idea how to handle the Kannada ditransitives. So, if I were Norbert, I'd say he can't really make claims about eliminating UG until he comes up with a solution that is at least as good as the one we suggested. (of course, I'm not Norbert, so all I'll say is: that's a pretty serious challenge, so get to it).

    9. That all representational primitives are "specific, arbitrary and peculiar" is the goose that laid generative grammar, IMHO. I continue to see the burden of proof as being on those who claim that there is a coherent sense in which one could point to one or another kind of surface evidence as being somehow more objectively "clear" than another; evidence is exactly as clear as the "specific, arbitrary and peculiar" representational primitives make it.

      I am thinking now of the work cited in this paper by Alex C and Shalom Lappin. In two cases they point to ways to learn regular and context-free language representations using "easy" parameters that bear a fairly "transparent" relationship to sliced-up strings. So that would count as "surfacey."

      Put aside that these are for toy cases, which is Norbert's problem. Put aside that strings are specific, arbitrary and peculiar in themselves (although this is the sort of thing that people need to be hit over the head with). My question is what precisely to make of this notion of "transparent." I claim that, despite appearances, there is no there there, and so there is no point in debating these things.

      What would the reply be? That is, what precisely is it that makes this kind of evidence "close" to the surface and something else "further"? I can think of one notion but I'm not satisfied. Why should we even bother engaging with the idea that there are more or less "surfacey" properties? The learner does whatever it does, and things are "surfacey" in virtue of what it does. Whether they look "surfacey" to you is irrelevant.

    10. Ewan, philosophically you're right of course that every learner needs to make some assumptions to get off the ground, and what counts as surface data depends on those assumptions. But I wonder if you're not taking this a little too far when you say that all kinds of evidence are equally "surfacey". Suppose we know roughly what the initial perceptual representations in the brain look like (e.g. contours). Do you mean that there's no way to come up with a metric that we could use to rank the complexity of different input transformations, similarity functions, etc?

    11. @Ewan:
      What's close ande what's removed from the surface is a good heuristic distinction. It is not a theory. Is the distinction useful? I think it has been. Like I said, we have no problem distinguishing cases like the head parameter from the ECP. We have no problem noting that there is really no PLD relevant to "learning" ECP effects. I could be wrong about this. Someone might come along an provide relevant data parameters, but I am confident that this won't happen. Some things can be data driven IF WE CAN COME UP WITH WAYS OF HOW TO SPECIFY IT (the right projectable predicates). Some things will have NO real reflexes in the data at all and so this kind of strategy will not work. This does not mean, as you know that there is no relevant data, but none in the PLD, and that's what counts.

      Is this distinction principled? Probably not. Is it useful. Yup. It suggests not looking for a data driven analysis of ECP effects. This is not a problem of finding the RIGHT projectable predicates, but a problem of there being none to be had. Cases illustrate this difference, and that is heuristic enough for me. I'll leave the deep philosophical issues to you.

    12. This comment has been removed by the author.

    13. I'm inclined to agree with Ewan. I think the gut feeling we often have that makes some pattern look like it's "there on the surface" is based on what a learner could discover if it analyzed its input in a beads-on-a-string way. Could this be the right way to make precise the heuristic Norbert is reaching for? If we want to agree that that's what "on the surface" means, that's fine, but in practice it does tend to sometimes get conflated with "no structure imposed by the learner" (see e.g. many discussions of the Saffran et al studies).

      We went over this ground at least once before here.

  3. Just to make sure I understand, what does Chomsky's divide between Empricists and Rationalists translate to in terms of current computational models in cognitive science? Does any model that assumes innate knowledge (domain-specific or otherwise) count as Rationalist?

    1. No. As Chomsky has pointed out, everyone is a nativist. No biases, no learning. SO even Empiricists require nativist assumptions. Chomsky, btw, makes this point in his critique of Quine. the divide between Es and Rs is over what they tolerate as built-in. Es have tended to allow generalizations across a sensory quality space. R's allow much richer hypothesis spaces. The leading conceptual; difference, I think, is between those that only tolerate generalizations from patterns IN the data and those that allow these and generative generalizations. I talked about this here:
      So, no everyone is a nativist. It's the kind of nativism that one tolerates that distinguishes Es and Rs, at least as concerns "learning" theories.

    2. Well, as I think we agreed in the original thread ("POS and PLD"), once you allow nontrivial input representations it's not that easy to tell which patterns are in the data and which patterns aren't. In terms of existing models I believe that it's essentially just connectionist models that are Empiricist in the sense you describe; Bayesian models would be Rationalist, regardless of whether they assume domain-specific innate knowledge or not. Or did you mean that a model needs to have domain-specific assumptions to qualify as Rationalist?

    3. (just to be clear, by "Bayesian models" I was referring informally to the particular examples of symbolic probabilistic generative models that people have proposed in the last decade)

    4. Correct, there is nothing per se in being a Bayesian that makes you an E or an R. In fact, this is one of the things I liked about L&G's paper: they provide Bayesian models for some of their discussions. Ditto for Berwick's stuff. So there is nothing incompatible between Bayes and Rs (btw, I think that a case can be made that the acquisition model in ch 1 of Aspects is very Bayes like).

      So, nativism is orthogonal to domain specific vs domain generality. I personally am open-minded about whether a given effect has a domain general or domain specific explanation, though as a matter of fact up till now the accounts have been domain specific largely though as Alex C has observed (and I agree) minimalist sentiments suggest that these cannot be fundamental. What's critical, IMO, is that the models we investigate explain THESE data. These really are very effective probes into the structure of FL. Our job is to provide theories that explain these data so as to reveal this structure. Given my minimalist druthers I hope that we can find domain general principles that explain a lot of what's going on. I HOPE this will turn out to be so. Hoping, however, is not showing and until we show how to do this the question of domain specificity is, IMO, open.

      Let me say this another way. Say that we find that our best stories tell us that FL has LOTS of domain specific info. Say that this makes Darwin's Problem harder to solve. There are two possible conclusions: either there is less domain specificity than we think, i.e. our best theories are wrong. Or we don't understand how evolution works. I think that both are live options right now. Btw, from where I sit, our understanding of the mechanics of Evo is hardly as solid as what physicists thought at the turn of the 20th century and they were wrong. So I have no problem believing that our best solutions to Darwin's Problem lie substantially in the future. This is a point that Chomsky's lecture 1 hinted at as well. The project of reconciliation is still valuable, but we should realize that we know very little about the details here and the theories are quite speculative.

      Nonetheless, personally I'd love to thread the needle. But, it's entirely possible we don't understand how evolutions works and it's possible that our best theories of FL are false. But, throwing the cases that reveal this tension under the rug will prevent us from ever adequately addressing these issues. That's what I want to prevent. And the easiest rug around under which to hide the issues is the one that ignores what GG has discovered over the last 60 years. People seem able to do this without batting an eye. I am doing all that I can to prevent them from stipulating away what needs to be explained.

    5. Yes, it would be extremely useful for the field to have a standard set of grammatical and ungrammatical sentences discovered by GG that every grammar induction algorithm would be evaluated on.

      Unfortunately I'm still not sure I understand what you mean by Empiricist -- what are some examples of models that belong to this category?

    6. I probably shouldn't get too hung up on these terms. I'm just not sure that there's a real divide here.

    7. Good examples, and ones that might be partially right, include those systems the, for example, learn words by transitional probabilities (something, btw, that Chomsky proposed in LSLT) and then learn phrases by induction over the inductions that got us words and then movement by induction over inductions that got us phrases. This effectively revives the old discovery procedures of the structuralists. We know that this won't work, but its back again. This time called deep learning. A mark of a general E kind of commitment is the idea that Gs are just compilations of these kinds of bottom up indictions and then inductions over the inductions over the inductions etc. What makes these E type theories? Well, just the idea that there is sufficient information in the signal (once we find the right predicates with which to analyze the signal) to determine all features of the grammar. So, just as we can use transitional probabilities to find words, we can use probability distributions over strings of words to find phrase structure and distributions over phrase structures to find movement etc.

      As noted above, this need not always be a wrong conception of the problem. Chomsky thought that something like this might be useful to identify morphemes (But Charles has argued that it is not sufficient once we consider languages that have words like English. He argues you need built in stress information as well a principle roughly a principle that says one stress per word). But it could be right at least in part. This is a decidedly E view of the problem. An R view is one that says that though this may be necessary, it is hardly sufficient. ONe needs to build into FL principles for which there will be no good evidence in the PLD, e.g. whatever it is that gives one ECP effects.

      Now, I suspect that the right answer to these issues will borrow ideas from both traditions. And so most theories will be mixed. However, once one allows built in principles that go beyond data grouping based on properties of the input string yoga re in R territory.

      So is there a principled divide? I dunno. I do think that there is a practical one and I've reviewed some approaches in other posts that clearly seem to me on the E side of the street. Can one combine both kinds of considerations? I hope so, for it seems pretty clear to me that some G properties follow the PLD pretty closely and some don't.

    8. What makes these E type theories? Well, just the idea that there is sufficient information in the signal (once we find the right predicates with which to analyze the signal) to determine all features of the grammar.

      Ah, that parenthetical... But thanks, I think I understand the terms now.

    9. Let's grant everything you say Norbert for the moment: the question I am then interested is "How can we figure out which G properties follow the PLD and which ones don't?". So we could just use our intuition, but that's not very conclusive. If you take Ewan's and Jeff L's points seriously, then we can't answer that question without having a learning theory, because it is only with respect to a learning algorithm/theory that we can even ask the question what is there evidence for in the PLD?.

    10. Yes, our intuition is not enough, though it is not bad either. And yes, I believe that developing learning theories is an excellent idea, hence my bringing the L&G paper to your attention. There is, IMO, a very interesting project out there that is very promising and that the L&G paper illustrates (as does Ewan's thesis work): how to bring UG discoveries together with Bayesianism. There is nothing that makes this marriage impossible, and I think that there is likely to be a good fit. There are problems with Bayes, I believe, but as a first pass, this is a reasonable project.

      I do, however, demur wrt to your last claim. I think that we had a pretty good idea of how rich the PLD is for any given case. There is a reason that MLs shy away from "learning" ECP effects and Island violations: they are virtually unattested in the PLD. It makes it very hard to generalize from the nonexistent regardless of your algorithm. Indeed, you said as much in an earlier comment: "But that still won't really account for this distinction since the relevant sentences are kind of rare in the PLD." Right. And if that is so then learning algorithms, which generalize FROM the data won't do much good.

      So, learning theories that specify the PLD and see how it is used? Of course. But, we can be pretty confident that this won't be sufficient given the absence of relevant data for most of the things we are interested in (remember those 30 and more effects!). Where does this leave us? It leaves us the exploration of models that COMBINE UGish like theories and theories that can use such pre-specified information in its learning algorithms. Bayes has this property as we can use UG to organize the available hypothesis space or weightings across these. And this is why I mentioned the L&G paper as a useful model.

    11. "here is a reason that MLs shy away from "learning" ECP effects and Island violations: they are virtually unattested in the PLD"
      So I guess the point at which I disagree is the claim that
      the only relevant data for learning ECP effects is the sort of examples that people use to illustrate ECP effects. What is claimed to be innate is not the ECP but some deep structural principle about grammars that the MP will eventually reduce the ECP too. Call this ECPX. So what counts as evidence for ECPX for a learner? Well potentially anything; if it's a deep structural constraint then it is used in nearly everything.

    12. Maybe yes, maybe no. Faith is a wonderful thing to behold.

    13. Maybe yes and maybe no is all that I am claiming, and all that the dialectic needs. You need to have a definite no to sustain your argument.

      (I think ECP effects are a good example of something that might be a reflex of some structural property of the class of grammars-- so I think this is the easiest one for you to make your argument. If you can't do it for this, then you can't do it for any).

    14. Now coming back to this and I think that Ewan/Tim are making a different point from Alex C and that both of them have failed to appreciate what is, in fact, a real distinction. And, because I think they are making different points. I say this with some hesitance because of how smart everyone in this discussion is, but here I go. Before I start (by the way), I have no idea what it means to be a goose that laid a golden egg, but it sounds bad (which is weird, since who wouldn't want a golden egg?). And if anyone has laid an egg here it is Ewan (and maybe Tim by agreeing with him) and Alex C . Ewan says that Norbert has not gone far enough in appreciating the Goodman/Fodor/Chomsky line about all possible (and actual) projectible predicates being arbitrary. He thinks it is a bad idea to even allow the possibility of empiricism. Empiricism is incoherent, in Ewan's view. But it isn't incoherent, it's just wrong. Why is it not incoherent? Because Norbert laid out a version of it that is coherent (it also happens to be that of Locke, Zellig Harris, Elissa Newport, Linda Smith, Jeff Elman and many many other people). Here it is. The learner represents the input in some way. Let's call that representation of the input the "intake". Let's say that this representation traffics only in features that are provided directly by the senses. These representations may be arbitrary from a philosophical perspective, but they aren't arbitrary from the perspective of the organism since these are the representations that the senses provide. Now, let's also say that this creature has some way of noticing patterns among those features and out of those patterns has a way of making new categories, built up entirely out of the perceptual features but having their own internal consistency. These new categories become the new "intake". Patterns among these new categories might be noticed in the same way, creating some more new categories and so on and so on until you've got phonetics, phonology, morphology, syntax, semantics and pragmatics (i.e., a grammar). Any of the patterns that give rise to new categories (e.g., at a higher level of abstraction/representation) count as what NoHo and I were calling the "surface". Of course this "surface" isn't out there; it's a projection by the learner. But it's a projection based only on surface evidence. And the categories it creates are just a combination of whatever pattern matching equipment the learner has paired up with the intake. Why do we think this is wrong, because the properties that follow from the categories that can be noticed this way are richer than just the distributional categories themselves. I can notice the category "reflexive pronoun" distributionally and I can notice the category "wh-question" distributionally, but what I can't notice is how these interact (Norbert knows which picture of himself Jeff has in his office vs. Norbert knows how pleased with himself Jeff always seems to be). To the degree that I know something about these facts (e.g that one is ambiguous and the other isn't), it follows not from the distributions but from something richer. So, Empiricism is both coherent and false. Good for Empricism (bad for Ewan).

    15. Now Alex's problem is different. He says that if we only knew what the cool projectable predicates were we might discover that those projectable predicates do, in fact, leave a surface signature that the learner could notice and make new representations out of. But that's not an objection to UG, that's a statement of wholehearted agreement with UG. I know Norbert has now written a whole post about how you don't agree and aren't even engaging, but frankly if the comment repeated just below this sentence is any indication, it sounds to me like you've fully adopted the Nativist idiom. Unless I'm confused (which I don't think I am):

      "What is claimed to be innate is not the ECP but some deep structural principle about grammars that the MP will eventually reduce the ECP too. Call this ECPX. So what counts as evidence for ECPX for a learner? Well potentially anything; if it's a deep structural constraint then it is used in nearly everything."

    16. @Jeff:
      I would think that Alex believes that there is lots of data in favor of ECPX once we get the right projectable predicate. In other words, setting ECPX is data driven once we find the right way to characterize the data. Were this right, it would be interesting, sort of a high level version of the headedness parameter.

      This is coherent and logically speaking, a possibility. However, I very much doubt it. A deep structural constraint is not fixed BY experience but allows for the very possibility of experience (think Kant here and say it reverentially). So, for many of our major effects, I think that we area tapping into what the structural features of FL really are. That's why they are interesting. I don't believe that Alex C believes this. He thinks that if we just find the right inductive predicates we will find that we learn ECPX just like we learn the headedness parameter. Again, this is a logical possibility, but I wouldn't bet on it.

    17. @Jeff: "But that's not an objection to UG, that's a statement of wholehearted agreement with UG." Exactly. When I say I am a good Chomskyan, this is not (just) to annoy Norbert. What makes me a bad Chomskyan is that my methodology is completely different: I start with what can be learned rather than what is not learned.

      Norbert argues that what I am trying to do (achieve explanatory adequacy) is completely irrelevant to what he is trying to do (explain the existence of universals), and that the latter is true GG. I am not sure what to think about this claim.

    18. I'm happy to borrow and play Norbert's "above my pay grade" card on whether empiricism is coherent or not. Let's say for the sake of argument that, in Alex C's words from the other post, "one can rescue a technical notion of surfacey/objective/empiricist". I just think that in practice it's probably best to separate (a) carrying out this rescue mission, from (b) answering the nuts and bolts questions of how language acquisition works. Mixing the two seems to create a tendency for discussion of (b) to lapse into the old-fashioned "is there innate stuff, yes or no?" discussion, and miss the now-familiar point that the question is a "which generalizations" question rather than a yes/no question.

      (And broadly speaking, it seems to me that the way to bring proposals labeled "empiricist" into the terminology of the "which generalizations" question is often to say that they rely on generalizing on the basis of linear-order predicates.)

    19. I agree with Tim that the distinction may often confuse rather than enlighten. My interest in it is as much philosophical as methodological. However, I don't think we can just attribute this to linear order issues. I assume that discovering that 'himself' is a reflexive in English is data driven. Of curse, once you know this IF BT-A is in some sense innate, then you know a lot about it's distribution. But learning that IT is an anaphor and that 'him' is not is a data driven enterprise. I mention this because it is then not just linear predicates that are at issue.

      Maybe a useful way of thinking about this revolves around those thing that we KNOW are data driven as they differ across Gs. These must be learned on the basis of exposure to the evidence. Most of the effects I cited before have a claim to being (relatively) invariant. This suggests (at least to me) that they are likely results of constitutive properties of FL. In this sense, they are likely preconditions for acquisition, rather than themselves acquired. That reflexives are interpreted as bound elements is likely a good clue as to which dependencies are worth scanning to look for anaphors. If something like this is correct, then it suggests we want built in factors not merely because it's hard to see how they could be learned, but because they focus the attention of the learner on what is relevant. As an example, indirect negative evidence, if it plays a role, only makes sense against a background of provided possibilities. Where do these come from? UG.

      Last point: there is an old notion that Chomsky once made much of between epistemologically prior notions and those that are not. These epistemologically prior concepts are "visible" in the input without the application of any linguistic analysis. Chomsky suggested that linear order as one example, and maybe transition probabilities (as in LSLT) and maybe thematic info if events and their participants are visible without aid of linguistic predicates (I tend to think animals "see" events). At any rate, these are clearly part of the input for Chomsky. These are distinguished form predicates that clearly call into use linguistic notions, e.g. Subject, Clause, Binding Domain, C-command. These latter are not "visible" though their effects might be in the PLD. Then there are notions whose effects are invisible in the PLD. This 3-way distinction might be useful.

  4. FWIW, my view of the model in the L&G paper is that it is a spelling out of the model that appears towards the end of the 1st chapter of Aspects, but with some data to back it up.