Faculty of Language: Parameters, anyone?

Friday, January 25, 2013

Parameters, anyone?

This post was prompted by Neil Smith's recent reflections on parameters on this blog. (It also builds on my own writings on the matter, which the interested reader can find on Lingbuzz.)

Neil writes that "Parametric Variation (PV) has been under attack. ... there has been widespread reluctance to accept the existence of parameters as opposed to particular rules (see e.g. Newmeyer 2005, Boeckx 2010). A vigorous counter-movement (see e.g. the special issue of Theoretical Linguistics 36.1, devoted to Holmberg 2010) shows that no consensus has been reached." I beg to differ. I think the attack's over, the consensus has been reached: Parameters (with capital P, the parameters of the classical time, those actually worth fighting for) are gone. Proof? Even those who fought for them ("vigorously", as Neil puts it) have accepted defeat. They confess that parameters are not part of UG. They have begun to be pretty upfront about this. Here's a taste (from the first line of "Mafioso Parameters and the Limits of Syntactic Variation", abstract by Roberts and colleagues, vigorous fighters, who will present this at this year's WCCFL and other variation-related event like the Bilbao conference in June.)
"We build on recent work proposing non-UG-specified, emergent parameter hierarchies ([1]),
arguing that a system of this kind not only addresses objections levelled against earlier formulations
of parameters ([2], [3], [4]), but also potentially offers a suitably restrictive theory of the nature and
limits of syntactic variation."

See: non-UG-specified ...

So, I guess I have to qualify Norbert's statements stemming from his infatuation with the (L)GB model. We may have pretty good ideas about the laws of grammars regarding some principles, but not (I claim) about those laws/principles that had switches in them (i.e., parameters). The LGB promises in this domain have not been met (sadly). I won't go over the evidence backing up this statement. Take a look at Newmeyer's excellent book (OUP, 2005), my Lingbuzz papers, and references therein.
What I want to do in this post is look at why people hang onto parameters. Here's a list of things I've heard from people (I've discussed some of these in my papers, others are new additions.)

1. The LGB picture was too good to be false.
2. That's the only game in town.
3. The current parametric theory we have is so restrictive, hence so irresistible
4. The current parametric theory makes great typological predictions
5. The worry concerning the exponential growth of parameters is nothing to worry about
6. The current parametric theory has such a nice deductive structure to it
7. Current parameters are no longer embedded within principles, so don't suffer from problems raised by Newmeyer, Boeckx, and others.
8. If we drop parameters, we are back to the land of Tomasello, Piaget, Skinner, Locke, Satan, mother-in-laws: the land of infinite variation!
9. Current parameters are no longer part of the first factor, they are part of the third factor, so everything is fine.

Let's look at each of these briefly (spoiler alert: all of these argument in favor of hanging onto parameters fail miserably)

RE 1. People told me that yes, all the parameters put forth so far have not fared very well, certainly not as well as Chomsky predicted in LGB, but they have not lost the hope to find parameters of the right kind. When I ask them to give me an example, they insist that they are working on it, that what they have in mind would be something more abstract than anything proposed so far. (When they pronounced the word abstract, I think I've seen raised highbrows and maybe a repressed smile.) Since I have not (yet) figured out what such a parameter would look like (other than the ones in LGB, which did not materialize), I can't tell you if it exists.

RE 2 (only game in town). A theory can be demonstrably wrong even if we don't yet have a better theory.

RE 3(restrictiveness). Part of the difficulty here is figuring out what people mean by "current parametric theory" (note: this is not an innocent point). Most of the time, they have the "Chomsky-Borer" conjecture in mind. That's part of the problem: it's a conjecture, not a theory. Luigi Rizzi, for example, put forth the following restrictive definition of parameter:
"A parameter is an instruction for a certain syntactic action expressed as a feature on a lexical item and made operative when the lexical item enters syntax as a head."

Sounds great (and restrictive), except that no one's got the foggiest idea about what counts as a possible feature, lexical item, and head. To be restrictive, the characterization just mentioned out to be embedded within a theory about these things (i.e., a theory of the lexicon). But even my grandmother knows we don't have a theory there (and she tells me she is not very optimistic we'll have one anytime soon).
Incidentally, this is why the LGB picture about parameters was so great: parameters were embedded within a general theory of grammatical principles. If you take parameters out of that context, they look quite bad.

RE 4. Typological predictions. First, let's not confuse Plato's problem and Greenberg's problem. Second, let's listen to Mark Baker. "If it is correct to reduce all macroparameters to a series of relatively independent microparameters in this way, then one would expect to find a relatively smooth continuum of languages. ... [But] if natural human language permits both macroparameters and microparameters. We would expect there to be parametric clusters in something like the classical way. But we'd expect these clusters to be much noisier, as a result of microparametric variation." I understand the logic, but once we accept (as good minimalists should) that principles of syntax are invariant, and (therefore) macroparameters are aggregates of microparameters, Baker's picture is no longer tenable. Macro- and micro-parameters are not two different things.

RE 5. Exponential growth. Kayne writes that "We must of course keep in mind that as we discover finer- and finer-grained syntactic differences (by examining more and more languages and dialects) the number of parameters that we need to postulate, although it will rise, can be expected to rise much more slowly than the number of differences discovered, insofar as n independent binary-valued parameters can cover a space of 2n languages/ dialects (e.g. only 8 such parameters could cover 28 = 256 languages/dialects, etc." (Kayne)

True, but this does not the worry concerning exponential growth of parameters. The worry is about the size of the space the child has to go through to acquire her language. Once parameters are removed from the background of principles (see point 3), what's the structure that will guide the child?

RE 6. Deductive structure. Where does the deductive structure come from? In an impoverished theory like minimalists propose, it must come from some place else. (In addition, when you look at the deductive structure you are supposed to find in parametric proposals, what you find is something very ugly, nothing like the nice parmeter hierarchy in Baker's Atoms of Language, but something like the Tokyo subway map. I'll devote a separate post to this issue, and will report on work done with my student Evelina Leivada, where we discovered more subway maps than we imagined.)

RE 7. Principles/parameters separation. It's true that parameters are no longer embedded inside GB principles. Minimalism got rid of those nice GB principles. But there is another sense in which current parameters are still embedded within principles (sorry if this is confusing, it's not my fault; read the papers!): many minimalists out there are crypto-GB folks: they still hold on to a rich UG, with lots of lexical principles (micro-principles, with micro-parameters in them). That's what provides articifical life to parameters. But if you get rid of these principles, and become a Proper Minimalist (All you need is Merge), then there is no room for parameters within that single Ur-principle.

RE 8. The fear of infinite variation. This is a myth, friends. Don't believe people who tell you that without parameters, there is no limit to variation in language. Hang on to your wallet. Tell them that even if there is a little bit of invariant syntax (say, Merge), then this puts an end to what can vary in grammar.
Don't believe me? Read Chomsky: (note that he does not sound very worried)

"Most of the parameters, maybe all, have to do with the mappings [to the sensory-motor interface]. It might even turn out that there isn't a finite number of parameters, if there are lots of ways of solving this mapping problem"

By the way, this quote is taken from "The science of language" book that G. Pullum did not seem to like. Turns out that the book is not as "useless" as he said it was. Pullum wrote that the book has "essentially no academic value", but well, it looks like I found something of value in it (I always find something of value in what Chomsky writes. You should try too. It's very easy.)

RE 9. First/Third factor. They are not independent. Every good biologist (I've asked my friend Dick Lewontin) will tell you that the first, second, and third factors define one another, you can't separate them so easily (read Dick's "Triple helix"). So, appealing to third factor only, abstracting away from the first, is not fair.

In sum: Why would anyone hang onto parameters? How did this post get so long?

--Cedric Boeckx

28 comments:

AveryAndrewsJanuary 25, 2013 at 6:39 PM
Perhaps due to the absence of a reasonably detailed non P&P learning theory for things that vaguely resemble real grammars, which don't employ 'oracles'? I would count Pinker's 1984 book as a sort of P&P for LFG, and of course there's Janet Fodor for GB/MP-based P&P. Statistical learning seems to me to be very promising and the right way to go, but the extant probabilistic theories (PCFG, basically) don't seem to have appropriate descriptive coverage. And, almost all the probabilistic grammar work seems to be directed towards the finding the best parse, but, scientifically, we need to understand what is producing the probabilities that people such as the Variational Sociolinguists observe (they observe them, but so far seem to me to be deeply uninterested in understanding anything about why they're there). There is Stochastic OT, but it has too many constraints/principles to be plausible, and I would regarded it as a slightly exotic P&P flavor.

Indeed, everybody knows that linguistic performance is probabilistic, but I haven't found anything sensible to read on the subject of whether any of the probabilities are represented in linguistic knowledge, and, if so, how. Perhaps I'm missing something.
ReplyDelete
Replies
Alex ClarkJanuary 25, 2013 at 11:52 PM
I agree with Avery -- I think argument 2 is the one that has real force.

Back in the day, inference from the best explanation was one of the major arguments in favour of linguistic nativism (e.g. Fiona Cowie versus Fodor; see also the replies to Pullum and Scholz in the linguistic review). So in the absence of a suitable and acceptable alternative to P & P, parameters are here to stay.

I am not as confident as Avery that a statistical learning theory will be any more convincing to advocates of parameters than one that uses oracles. Bob Berwick earlier for example made it clear that he thought that the basic assumptions of statistical learning oversimplified the problem (e.g. "The assumption that sentences are pitched to the learner in i.i.d fashion is plainly wrong. "),
so even a probabilistic learner wouldn't be convincing.

I think also many people hold to this view from one of Cedric's earlier papers:
"We think that there is sense in which a parametric model of language
acquisition is
"logically" necessary, under the constraints of the poverty of
stimulus, a selective (not
instructive) acquisition process, and the morpho-lexical variability
of languages."

So empirical arguments aren't going to dislodge this view.
ReplyDelete
Replies
Darryl McAdamsJanuary 26, 2013 at 8:49 PM
I think it'd be really useful to try and get some categorialists in on this discussion. To the best of my knowledge, the closest thing to parameters in the categorial domain is Steedman's language-specific availability of different combinatorial operations.

I think there's also some possible insight into the substance of a parameter-less theory from the categorial perspective in general. One could imagine that the space of all possible syntactic combinators has some sort of complexity metric defined over it (something like how specific the types of the premises are, or how asymmetric they are, or whatever), and that the LAD searches this space in order of complexity, trying out simpler combinators before more complex combinators. Then you wouldn't have any sort of parameters in the usual sense at all, you just have all logically possible combinatorial operations available, with a preference for grammars that employ simpler ones.
ReplyDelete
Replies
AveryAndrewsJanuary 26, 2013 at 11:54 PM
Yes, I had forgotten about the Kwiatkowski-Steedman learner. Which seems to me to work because of the inherent restrictiveness of CCG. It also learns to parse using probabilities, but not to produce, I think I remember.
ReplyDelete
Replies
UnknownJanuary 27, 2013 at 12:51 AM
Each time I come across the kind of arguments for nativism [AN} you use here, e.g., " the central line of argument was ... that people could be shown to know stuff that they couldn't possibly have learned" they seem just so wrong to me, that i think i must miss something really important. So maybe you can help me out here and explain what that is? I see 2 problems with the AN:

1. Lets assume nativism is true, then it should be the case that for English we have some stuff people could have learned [L] and some stuff people could not have leaned [IL]. IL then is 'innate'. IL is of course the same for every speaker of English while L differs depending on your input. But surely in a biological organ as the language faculty is, we ought to expect some variation, especially for traits that have little or no effect on survival [take eye-colour]. So how can we explain that for IL there is NO variation whatsoever? As you say the persuasiveness of the argument for nativism is that it accounts for what seemingly could not have learned from input. As Chomsky pointed out early on some of the relevant constructions are such that a speaker could go through her entire life without encountering them - so they hardly can have much survival value...

2. Assume we know the difference between L and IL for English. We really have only solved a tiny sub-problem. We also need to figure it out for German, Hungarian, Japanese, Piraha, and all the 6000+ languages currently spoken [+ languages that no longer are spoken and languages that might be spoken in the future]. Given that all languages have some IL and your infant could learn all of them if exposed to relevant input it seems to follow IL for every human language must be innate. So it must be in every human brain. Now no matter how much I disagree with Chomsky on other things i believe he is certainly right to say that at one point the theories we develop about nativism "have to be translated into some terms that are neurologically realizable" [Chomsky, 2012, p. 91]. So some neural structures must contain all this 'stuff that could not possibly be learned' for all human languages. These structures must have evolved and remained constant for some 50,000 to 100,000 years in every human.
I'd be grateful for any suggestions re how to deal with 1. and 2.
ReplyDelete
Replies
VilemKodytekJanuary 27, 2013 at 1:12 PM
What Alex Drummond said was implicit in my last comment (or it was my intention at least). See twin studies in the Neil Smith's post (Parametric Variation and Darwin’s Problem).

Correction: "it has to "learn" how to respond "
ReplyDelete
Replies

Faculty of Language

Comments

Friday, January 25, 2013

Parameters, anyone?

28 comments:

Contributors