Friday, January 25, 2013

Parameters, anyone?

This post was prompted by Neil Smith's recent reflections on parameters on this blog. (It also builds on my own writings on the matter, which the interested reader can find on Lingbuzz.)

Neil writes that "Parametric Variation (PV) has been under attack. ... there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached." I beg to differ. I think the attack's over, the consensus has been reached: Parameters (with capital P, the parameters of the classical time, those actually worth fighting for) are gone. Proof? Even those who fought for them ("vigorously", as Neil puts it) have accepted defeat. They confess that parameters are not part of UG. They have begun to be pretty upfront about this. Here's a taste (from the first line of "Mafioso Parameters and the Limits of Syntactic Variation", abstract by Roberts and colleagues, vigorous fighters, who will present this at this year's WCCFL and other variation-related event like the Bilbao conference in June.)
"We build on recent work proposing non-UG-specified, emergent parameter hierarchies ([1]),
arguing that a system of this kind not only addresses objections levelled against earlier formulations
of parameters ([2], [3], [4]), but also potentially offers a suitably restrictive theory of the nature and
limits of syntactic variation."

See: non-UG-specified ...

So, I guess I have to qualify Norbert's statements stemming from his infatuation with the (L)GB model. We may have pretty good ideas about the laws of grammars regarding some principles, but not (I claim) about those laws/principles that had switches in them (i.e., parameters). The LGB promises in this domain have not been met (sadly). I won't go over the evidence backing up this statement. Take a look at Newmeyer's excellent book (OUP, 2005), my Lingbuzz papers, and references therein.
What I want to do in this post is look at why people hang onto parameters. Here's a list of things I've heard from people (I've discussed some of these in my papers, others are new additions.)

1. The LGB picture was too good to be false.
2. That's the only game in town.
3. The current parametric theory we have is so restrictive, hence so irresistible
4. The current parametric theory makes great typological predictions
5. The worry concerning the exponential growth of parameters is nothing to worry about
6. The current parametric theory has such a nice deductive structure to it
7. Current parameters are no longer embedded within principles, so don't suffer from problems raised by Newmeyer, Boeckx, and others.
8. If we drop parameters, we are back to the land of Tomasello, Piaget, Skinner, Locke, Satan, mother-in-laws: the land of infinite variation!
9. Current parameters are no longer part of the first factor, they are part of the third factor, so everything is fine.

Let's look at each of these briefly (spoiler alert: all of these argument in favor of hanging onto parameters fail miserably)

RE 1. People told me that yes, all the parameters put forth so far have not fared very well, certainly not as well as Chomsky predicted in LGB, but they have not lost the hope to find parameters of the right kind. When I ask them to give me an example, they insist that they are working on it, that what they have in mind would be something more abstract than anything proposed so far. (When they pronounced the word abstract, I think I've seen raised highbrows and maybe a repressed smile.) Since I have not (yet) figured out what such a parameter would look like (other than the ones in LGB, which did not materialize), I can't tell you if it exists.

RE 2 (only game in town). A theory can be demonstrably wrong even if we don't yet have a better theory.

RE 3(restrictiveness). Part of the difficulty here is figuring out what people mean by "current parametric theory" (note: this is not an innocent point). Most of the time, they have the "Chomsky-Borer" conjecture in mind. That's part of the problem: it's a conjecture, not a theory. Luigi Rizzi, for example, put forth the following restrictive definition of parameter:
"A parameter is an instruction for a certain syntactic action expressed as a feature on a lexical item and made operative when the lexical item enters syntax as a head."

Sounds great (and restrictive), except that no one's got the foggiest idea about what counts as a possible feature, lexical item, and head. To be restrictive, the characterization just mentioned out to be embedded within a theory about these things (i.e., a theory of the lexicon). But even my grandmother knows we don't have a theory there (and she tells me she is not very optimistic we'll have one anytime soon).
Incidentally, this is why the LGB picture about parameters was so great: parameters were embedded within a general theory of grammatical principles. If you take parameters out of that context, they look quite bad.

RE 4. Typological predictions. First, let's not confuse Plato's problem and Greenberg's problem. Second, let's listen to Mark Baker. "If it is correct to reduce all macroparameters to a series of relatively independent microparameters in this way, then one would expect to find a relatively smooth continuum of languages. ... [But] if natural human language permits both macroparameters and microparameters. We would expect there to be parametric clusters in something like the classical way. But we'd expect these clusters to be much noisier, as a result of microparametric variation." I understand the logic, but once we accept (as good minimalists should) that principles of syntax are invariant, and (therefore) macroparameters are aggregates of microparameters, Baker's picture is no longer tenable. Macro- and micro-parameters are not two different things.


RE 5. Exponential growth. Kayne writes that "We must of course keep in mind that as we discover finer- and finer-grained syntactic differences (by examining more and more languages and dialects) the number of parameters that we need to postulate, although it will rise, can be expected to rise much more slowly than the number of differences discovered, insofar as n independent binary-valued parameters can cover a space of 2n languages/ dialects (e.g. only 8 such parameters could cover 28 = 256 languages/dialects, etc." (Kayne)

True, but this does not the worry concerning exponential growth of parameters. The worry is about the size of the space the child has to go through to acquire her language. Once parameters are removed from the background of principles (see point 3), what's the structure that will guide the child?


RE 6. Deductive structure. Where does the deductive structure come from? In an impoverished theory like minimalists propose, it must come from some place else. (In addition, when you look at the deductive structure you are supposed to find in parametric proposals, what you find is something very ugly, nothing like the nice parmeter hierarchy in Baker's Atoms of Language, but something like the Tokyo subway map. I'll devote a separate post to this issue, and will report on work done with my student Evelina Leivada, where we discovered more subway maps than we imagined.)

RE 7. Principles/parameters separation. It's true that parameters are no longer embedded inside GB principles. Minimalism got rid of those nice GB principles. But there is another sense in which current parameters are still embedded within principles (sorry if this is confusing, it's not my fault; read the papers!): many minimalists out there are crypto-GB folks: they still hold on to a rich UG, with lots of lexical principles (micro-principles, with micro-parameters in them). That's what provides articifical life to parameters. But if you get rid of these principles, and become a Proper Minimalist (All you need is Merge), then there is no room for parameters within that single Ur-principle.

RE 8. The fear of infinite variation. This is a myth, friends. Don't believe people who tell you that without parameters, there is no limit to variation in language. Hang on to your wallet. Tell them that even if there is a little bit of invariant syntax (say, Merge), then this puts an end to what can vary in grammar.
Don't believe me? Read Chomsky: (note that he does not sound very worried)

"Most of the parameters, maybe all, have to do with the mappings [to the sensory-motor interface]. It might even turn out that there isn't a finite number of parameters, if there are lots of ways of solving this mapping problem"

By the way, this quote is taken from "The science of language" book that G. Pullum did not seem to like. Turns out that the book is not as "useless" as he said it was. Pullum wrote that the book has "essentially no academic value", but well, it looks like I found something of value in it (I always find something of value in what Chomsky writes. You should try too. It's very easy.)


RE 9. First/Third factor. They are not independent. Every good biologist (I've asked my friend Dick Lewontin) will tell you that the first, second, and third factors define one another, you can't separate them so easily (read Dick's "Triple helix"). So, appealing to third factor only, abstracting away from the first, is not fair.

In sum: Why would anyone hang onto parameters? How did this post get so long?

--Cedric Boeckx

28 comments:

  1. Perhaps due to the absence of a reasonably detailed non P&P learning theory for things that vaguely resemble real grammars, which don't employ 'oracles'? I would count Pinker's 1984 book as a sort of P&P for LFG, and of course there's Janet Fodor for GB/MP-based P&P. Statistical learning seems to me to be very promising and the right way to go, but the extant probabilistic theories (PCFG, basically) don't seem to have appropriate descriptive coverage. And, almost all the probabilistic grammar work seems to be directed towards the finding the best parse, but, scientifically, we need to understand what is producing the probabilities that people such as the Variational Sociolinguists observe (they observe them, but so far seem to me to be deeply uninterested in understanding anything about why they're there). There is Stochastic OT, but it has too many constraints/principles to be plausible, and I would regarded it as a slightly exotic P&P flavor.

    Indeed, everybody knows that linguistic performance is probabilistic, but I haven't found anything sensible to read on the subject of whether any of the probabilities are represented in linguistic knowledge, and, if so, how. Perhaps I'm missing something.

    ReplyDelete
    Replies
    1. "the absence of a reasonably detailed non P&P learning theory" Very fair point, but let's not be too generous to parametric models: what's the reasonably detailed P&P learning theory *that does not play with toy parameters* (of the sort no morpho-syntacticians would buy)? I think the switchboard metaphor was great as an idea. It gave us a way to imagine how the logical problem of language acquisition could be solved. But it did not give us a way to imagine how the bio-logical problem of language acquisition could be solved.

      Delete
    2. Pinker (1984) as P&P-ish LFG would be my example of something that, although it's certainly nowhere near fully formalized, addresses descriptive issues and more or less makes sense. All I know about Janet Fodor's work is that it exists, so I can't say to what extent its parameters are toy ones.

      Delete
    3. Just a quick plug for my combinatorial variability model for dealing with sociolinguistic variables and their probabilistic distribution. It's surely not right as is, but it does at least attempt to tackle your question. Adger and Smith 2005 and Adger 2006, et seq (on lingbuzz).

      Delete
  2. I agree with Avery -- I think argument 2 is the one that has real force.

    Back in the day, inference from the best explanation was one of the major arguments in favour of linguistic nativism (e.g. Fiona Cowie versus Fodor; see also the replies to Pullum and Scholz in the linguistic review). So in the absence of a suitable and acceptable alternative to P & P, parameters are here to stay.

    I am not as confident as Avery that a statistical learning theory will be any more convincing to advocates of parameters than one that uses oracles. Bob Berwick earlier for example made it clear that he thought that the basic assumptions of statistical learning oversimplified the problem (e.g. "The assumption that sentences are pitched to the learner in i.i.d fashion is plainly wrong. "),
    so even a probabilistic learner wouldn't be convincing.

    I think also many people hold to this view from one of Cedric's earlier papers:
    "We think that there is sense in which a parametric model of language
    acquisition is
    "logically" necessary, under the constraints of the poverty of
    stimulus, a selective (not
    instructive) acquisition process, and the morpho-lexical variability
    of languages."

    So empirical arguments aren't going to dislodge this view.

    ReplyDelete
    Replies
    1. not convinced argument 2 is that strong, Alex. The problem, really, is that we no longer have the principles that we need to get the parameters that we want, so I think we're left with no game in town. We had the idea of a game in town---a great idea---but one that we have had to abandon, not just for empirical arguments, but also theoretical ones (minimalism, broadly construed). People told me I am giving up too early. Perhaps, but that's why I find the quite from Roberts's abstract so interesting. Those who say I've given up too early are giving up too (though they don't always phrase it that way; they still use the word parameter, but it refers to a different thing).

      Delete
    2. You are not giving up too early -- it's been over 30 years! I think you have shown admirable patience... like many, I have never found P & P theories remotely plausible.

      But was is the alternative? Even if there is no alternative theory that you like, what is the alternative 'research paradigm'? What do *you* think researchers, who think like us that the central problem is language acquisition, should work on? What is the right direction, in your opinion?

      Delete
    3. People who are convinced of anything are probably a lost cause in any event, smart young people thinking of getting involved with linguistics are I think the most important targets.

      My interest in statistical learning comes form the fact that the 'Uniqueness Principle' of Culicover & Wexler is clearly false, but a principle of trying to maximize the probability of something being said in a given situation (especially to express a given meaning, if you can guess this from the context and already known word-meanings) has similar effects, without being incompatible with the apparent existence of inexplicable variation.

      But the lack of a statistical production theory for any of the descriptively developed grammatical frameworks certain is a problem (which can perhaps be temporarily evaded by assuming that all of the ways that the grammar provides for expressing a given meaning are equally probable).

      Delete
    4. Is there perhaps too strong a connection being made here between P&P and linguistic nativism? Linguistic nativism of the kind argued for in Chomsky (1965) is perfectly compatible with kids using fancy statistical methods to learn the rules of their native languages. As I read Newmeyer, this is the roughly the kind of model that he has in mind. If your language has, say, a rule of focus movement, then you learn that rule on the basis of the data. But you don't need to learn that focus movement triggers crossover effects, because that comes from UG (and couldn't be learned in any case due to the lack of sufficient data).

      Poverty of the Stimulus arguments aren't a form of inference to the best explanation. Take the well-known subject/aux inversion case as an example. If we grant the premise that kids never hear examples such as "Has the man who will arrive seen Mary?", then it simply follows logically that a kid could not learn that the structure dependent rule is correct, since all the available data are compatible with both the structure-dependent and linear rules. (Of course, there may well be a learning algorithm which could learn the correct rule, but that algorithm would have to have a built-in bias against the linear rule.) Any given POS argument stands or falls entirely independently of the success of a P&P acquisition model, so the arguments for nativism based on the POS are not tied to P&P.

      Delete
    5. @AD: I think the connection is being emphasized here because if P&P was descriptively (including typologically) adequate, then it would almost certainly provide an explanation of why language is learnable, but the premise appears to be false. I think it's still possible to argue for some kind of Nativism on the basis of typology (especially stuff that doesn't happen, as brought up by Cedric a few posts ago), but it's weaker, less spectacular, and doesn't produce an immediate explanation of learnability.

      Delete
    6. @AA: I agree with you on those points. I think it is worth emphasizing, though, that the classical arguments for Chomskyan nativism don't depend on the plausibility of some nativist model of language acquisition. That is, the central line of argument was not that the best available acquisition model was nativist (Chomsky didn't really put forward any acquisition model), but rather that people could be shown to know stuff that they couldn't possibly have learned. That line of argument can be very persuasive even in the absence of any plausible acquisition model. Just speaking personally, that's why I don't find my nativist convictions shaken by (what I take to be) the failure of P&P.

      Delete
    7. Yes AD; "Any given POS argument stands or falls entirely independently of the success of a P&P acquisition model, so the arguments for nativism based on the POS are not tied to P&P." That is quite correct.

      POS arguments are arguments ultimately against empiricist learning algorithms rather than for nativist learners. So you could refute them by showing that there are empiricist learners, but not by showing that specific nativist learners fail.

      But nativism in this context is linguistic nativism and it is not enough to show that there is some bias but that there is some linguistically specific bias. And a general bias that rules out a linear rule (by e.g. not considering movement rules at all) need not be linguistically specific. So at best the POS arguments are arguments in favour of general nativism (which is uncontroversial) and not in favour of Chomskyan linguistic nativism.

      Delete
    8. That all depends on the particular POS argument. I agree that the subject/aux inversion example does not argue strongly for a specifically linguistic nativism, but as you know, that's just a trivial example that Chomsky used to illustrate how a POS argument works. If the piece of knowledge acquired is, say, the Weak Crossover constraint, then it's quite a bit less plausible to maintain that the child's acquisition of the correct rule is due to a non-language-specific learning bias. Minimalism, to the extent that it's been successful, has somewhat undermined this line of argument: the principles of GB theory were just too bizarre and idiosyncratic for anyone to take them to follow from general cognitive principles. In any case, I for one am happy to settle for non-linguistic nativism if that is where the chips fall. (I don't think that's where they will fall, but we'll have to wait and see.)

      Delete
  3. I think it'd be really useful to try and get some categorialists in on this discussion. To the best of my knowledge, the closest thing to parameters in the categorial domain is Steedman's language-specific availability of different combinatorial operations.

    I think there's also some possible insight into the substance of a parameter-less theory from the categorial perspective in general. One could imagine that the space of all possible syntactic combinators has some sort of complexity metric defined over it (something like how specific the types of the premises are, or how asymmetric they are, or whatever), and that the LAD searches this space in order of complexity, trying out simpler combinators before more complex combinators. Then you wouldn't have any sort of parameters in the usual sense at all, you just have all logically possible combinatorial operations available, with a preference for grammars that employ simpler ones.

    ReplyDelete
  4. Yes, I had forgotten about the Kwiatkowski-Steedman learner. Which seems to me to work because of the inherent restrictiveness of CCG. It also learns to parse using probabilities, but not to produce, I think I remember.

    ReplyDelete
  5. Each time I come across the kind of arguments for nativism [AN} you use here, e.g., " the central line of argument was ... that people could be shown to know stuff that they couldn't possibly have learned" they seem just so wrong to me, that i think i must miss something really important. So maybe you can help me out here and explain what that is? I see 2 problems with the AN:

    1. Lets assume nativism is true, then it should be the case that for English we have some stuff people could have learned [L] and some stuff people could not have leaned [IL]. IL then is 'innate'. IL is of course the same for every speaker of English while L differs depending on your input. But surely in a biological organ as the language faculty is, we ought to expect some variation, especially for traits that have little or no effect on survival [take eye-colour]. So how can we explain that for IL there is NO variation whatsoever? As you say the persuasiveness of the argument for nativism is that it accounts for what seemingly could not have learned from input. As Chomsky pointed out early on some of the relevant constructions are such that a speaker could go through her entire life without encountering them - so they hardly can have much survival value...

    2. Assume we know the difference between L and IL for English. We really have only solved a tiny sub-problem. We also need to figure it out for German, Hungarian, Japanese, Piraha, and all the 6000+ languages currently spoken [+ languages that no longer are spoken and languages that might be spoken in the future]. Given that all languages have some IL and your infant could learn all of them if exposed to relevant input it seems to follow IL for every human language must be innate. So it must be in every human brain. Now no matter how much I disagree with Chomsky on other things i believe he is certainly right to say that at one point the theories we develop about nativism "have to be translated into some terms that are neurologically realizable" [Chomsky, 2012, p. 91]. So some neural structures must contain all this 'stuff that could not possibly be learned' for all human languages. These structures must have evolved and remained constant for some 50,000 to 100,000 years in every human.
    I'd be grateful for any suggestions re how to deal with 1. and 2.

    ReplyDelete
    Replies
    1. I take your IL is no more than a kind of negative definition of possible human grammars. We know pretty well how it works, for example, in chimps or dogs. Maybe some extraterrestrial creatures would easily identify our own. You are saying: “So some neural structures must contain all this 'stuff that could not possibly be learned' for all human languages. These structures must have evolved and remained constant for some 50,000 to 100,000 years in every human.” Pls let me know which neural structures in dogs contain their IL? Probably those that are not present or developed enough in them.

      If you want to think about things that have some survival value, you’d better have in mind language or expressing thoughts rather than a sentence. The point is that children get it not what value it has.

      Delete
    2. You're confuse cause and effect. The effects of FLN (your IL) may not be evolutionarily advantageous, but FLN itself could well be. That is to say, the effect of, say, parasitic gaps, or the coordinate structure constraint, could easily just be a side-effect of some other thing, just like eye-color was never selected-for but certainly melanin itself was.

      Delete
    3. First, IL is by my definition the stuff we could not have learned [Impossible to have been Learned] but nevertheless know. So it is not a 'negative definition' and nothing dogs have but stuff we allegedly all know and all share [that what makes the POSA persuasive].
      Second if passing on IL is like eye colour then it would differ between individuals by now, just as eye colour does. But in that case it no longer explains what it was supposed to explain: why we all know the stuff we have allegedly no input for and all end up with the same Grammar [I language or whatever your most current terminology is] for English .

      Delete
    4. This comment has been removed by the author.

      Delete
    5. There's no a priori reason why passing on IL should be like eye-color in that it differs between individuals. The analogy to eye-color was strictly to explain how something that's not directly selected-for can be a side-effect of something that IS selected-for.

      [edited for spelling]

      Delete
    6. OK, I misunderstood your point.

      We have to learn walking, but nevertheless walking capacity is innate and it doesn't matter whether one's legs are shorter/longer, weaker/stronger etc. than the average. The immune system is innate, yet it have to "learn" how to respond to, say, the pig flu virus. And, moreover, what it has learned depends on the "input". Why should language be any different?

      Delete
    7. Christina: But surely in a biological organ as the language faculty is, we ought to expect some variation, especially for traits that have little or no effect on survival [take eye-colour]. So how can we explain that for IL there is NO variation whatsoever?

      Why do you think that there is no variation? I wouldn't take that to be an explanandum in the first place. Certainly, various kinds of language disorders suggest that some people have ILs which don't work normally. Beyond that, there may well be small variations between individuals which don't have a very great effect on language acquisition. In the same way, there is individual variation between hearts, but leaving aside people with heart disorders, they all work pretty much the same (to the extent that students in a biology class can learn about how "the heart" works, abstracting away from individual variation).

      Delete
    8. But this is the problem, isn't it? According to Chomsky 'every child unerringly knows X' [X stands for ANY and EVERY structure that could not have been learned from the evidence] - so it won't do to have variation here or you do not have EVERY child UNERRINGLY knowing X - if there is variation then at least some children don't know X. Further if you allow for small variation, then you have to accept that over many generations small variation can add up to huge variation. And by definition you have no input that could 'correct' for this - so over generations the innate component would change along different trajectories in different individuals and we would no longer understand each other. [again i assume here that Chomsky is right and the function of language is expression of my thought, not communicating with you].

      Your analogy to hearts is not very helpful. There is a fairly narrow range of possible variation, if the heart exceeds those it's owner can't survive and won't pass the variation on to the next generation. But no one dies because she has a slightly different language faculty. So we should expect huge variety by now for exactly the things that can not possibly be learned from the input...

      Delete
    9. I'm a bit lost here Christina. Do we have some strong reason to think that there isn't minor genetically-determined individual variation in ILs? Suppose that such variation did exist. Then we'd expect to find differences between the ILs of different individuals growing up in the same speech communities. That is, to all appearances, what we do in fact find! So where's the problem? I personally doubt that genetic variation accounts for much if any linguistic variation (pathological cases aside), but the available evidence is nonetheless perfectly consistent with the hypothesis that it does.

      So we should expect huge variety by now

      Wouldn't we have to know a lot more about the underlying biology to make predictions like that? I don't think armchair guesstimates of the expected levels of variation are worth anything.

      Delete
    10. You say: "Wouldn't we have to know a lot more about the underlying biology to make predictions like that? I don't think armchair guesstimates of the expected levels of variation are worth anything."

      I could not agree more. But note my armchair guesstimates are just as good [or bad] as yours. What IS known about the underlying biology, about the genetic variation that does not account for much if any linguistic variation? Chomsky's entire biolinguistics is at this point based on armchair guesstimates. Maybe his guesstimates are good but without confirmation from biological/genetic research we really can't tell.

      I think we have been talking past each other [and probably will continue to do so, so we should just agree to disagree] because we focus on different aspects of evolution: I on the potential to produce change over time [e.g. from one celled organisms to humans...] you on the potential to preserve structure over time [e.g. the human spine now vs. 100,000 years ago]. Both are of course part of evolution and without knowing what the actual structures of the LF are it is pointless to debate which force is stronger in this case.

      I think armchair speculation has its place to challenge one's assumption, to motivate oneself to look for alternatives. But when it is used as [only] evidence for one's hypothesis it becomes a problem. And I am glad that we seem to agree on that last point.

      Delete
    11. Why not analogize to bilateral symmetry? Some variation, but not enough to change the basic facts. Is it required for survival or a basic feature of the biophysics. Just a thought.

      Delete
  6. What Alex Drummond said was implicit in my last comment (or it was my intention at least). See twin studies in the Neil Smith's post (Parametric Variation and Darwin’s Problem).

    Correction: "it has to "learn" how to respond "

    ReplyDelete