Faculty of Language: Reply to Alex (on parameters)

Sunday, January 27, 2013

Reply to Alex (on parameters)

Commenting on my post, Alex asked "was is the alternative? Even if there is no alternative theory that you like, what is the alternative 'research paradigm'? What do *you* think researchers, who think like us that the central problem is language acquisition, should work on? What is the right direction, in your opinion?"
I had to start a new post, because I could not find a way to insert images in the 'reply to comment' option. You'll see I needed 2 images.

So, what's the alternative? I don't have a grand theory to offer, but here are a few things that have helped me in my research on this topic. I hope this can be of help to others as well.

I think the first thing to do is to get rid of the (tacit) belief that the problem is easy ('essentially solved', I'm told), and to take seriously the possibility that there won't be an adequate single-level theory for Plato's problem. Here's what I mean: In "What Darwin got wrong", Fodor and Piattelli-Palmarini rightly point out (end of the book) that unlike single-level theories in physics, single-level theories in biology don't work very well. Biology is just too messy. Theories that assume it's all selection (or it's all parameter fixation) are just not the right kind of theory. We've got to be pluralistic and open-minded. Biologists made progress when they realized that mapping the genotype to the phenotype was not as easy as the modern synthesis had it. Bean bag genetics is no good. Bean bag linguistics is not good either.

I once attended a talk by Lila Gleitman (who, like Chomsky, is almost always right) where she said something that generative grammarians (and, of course, all the others, to they extent they care about Plato's problem) ought to remember at all times: learning language is easy (it's easy because you don't learn it, it's 'innate'), but learning a language is really hard and you (the child) throw everything you can at that problem. I agree with Lila: you throw everything you can. So for Plato's problem, you resort to all the mechanisms you have available. (The prize will be given to those who figure out the right proportions.)
For us, generativists, this means, learning from the other guys: I personally have learned a lot from Tenenbaum and colleagues on hierarchical Bayesian networks and from Jacob Feldman's work on human concept learning. I think the work of Simon Kirby and colleagues is also very useful. Culbertson's thesis from Hopkins is also a must-read.
All of these guys provide interesting biases that could add structure to the minimal UG some of us entertain.
Add to that the sort of pattern detection mechanisms explored by Jacques Mehler, Gervain, Endress, and others to help us understand what the child uses as cues.
None of this is specifically linguistic, but we just have to learn to lose our fear about this. If UG is minimal, we've got to find structure somewhere else. Specificity, modularity, ... they'll have to be rethought.

The second thing to do is to try to figure out the right kind of grammatical priors to get these biases to work in the right way. Figure out the points of underspecification in (minimal) UG: what are the things about which UG does not say anything? (For example, syntax does not impose linear order, something else does) Since a growing number of people bet on variation being an 'externalization' issue (no parameter on the semantic side of the grammar), it would be great to have a fully worked out theory of the morphophonological component in the sense of Distributed Morphology (what are the operations there, what's structure of that component of the grammar?).
Halle and Bromberger said syntax and phonology are different (Idsardi and Heinz have nice work on this too). Would be nice to be clear about where the differences lie. (Bromberger and Halle put their fingers on rules (yes, for phononology, no for syntax). I think they were right about that difference. Curiously enough, when Newmeyer talks about rules, those who defend parameters go crazy, saying rules are no good to capture variation, but no one went crazy at Halle when he talked about phonological rules, and boy does phonology exhibit [constrained] variation ...)

The third thing is to take lessons from biology seriously. Drop the idealization that language acquisition is 'instantaneous' and (like biologists recognized the limit of geno-centrism --- in many ways, the same limits we find with parameters) take development seriously ("evodevo"). There is good work by a few linguists in this area (see the work by Guillermo Lorenzo and Victor Longa), but it's pretty marginalized in the field. We should also pay a lot of attention to simulations of the sort Baronchelli, Chater et al. did (2012, PLOS) (btw, the latter bears on Neil Smith's suggestions on the blog.)

The fourth thing (and this is why I could not use the 'reply to comment' option) is to develop better data structures. Not better data (I think we always have too many data points), but better data *structures*. Here's what I mean. Too many of us continue to believe that points of variation (parameters, if you want) will relate to one another along the lines of Baker's hierarchy. Nice binary branching trees, no crossing lines, no multi-dominance, like this (sorry for the resolution) [Taken from M. Baker's work]

Such representations are plausible with toy parameters. E.g., "pro-drop": Does your language allow pro? No, then your language is English. Yes., then next question: does it allow pro only in subject position? No, then your language is Chinese. Yes, then your language is Italian.
We all know this is too simplistic, but this is ALWAYS the illustration people use. It's fine to do so (like Baker did) in popular books, but it's far from what we all know is the case.
But if it's not as simple, how complex is it?
As far as I know, only one guy bothered to do the work. For many years, my friend Pino Longobardi has worked on variation in the nominal domain. He's come up with a list of some 50+ parameters. No like my toy 'pro-drop' parameter. Are there more then 50? You bet, but this is better than 2 or 3. More realistic. Well, look what he found: when he examined how parameters relate to one another (setting P1 influences setting P2, etc. ), what you get is nothing like Baker's hierarchy, but something far more complex (the subway map in my previous post) [Taken from G. Longobardi's work]

As they say, an image is worth a thousand words.
But the problem is that we only have this detailed structure for one domain of the grammar (my student Evelina Leivada is working hard on other domains, as I write, so stay tuned). Although we have learned an awful lot about variation in the GB days, when it comes to talking about parameter connectivity, we somehow refuse to exploit that knowledge (and organize it like Longobardi did), and we go back to the toy examples of LGB (pro-drop, wh-in-situ). This is an idealization we have to drop, I think, because when we get our hands dirty, as Longobardi did, and as Leivada is doing, we get data structures that don't resemble what we may have been led to expect from P&P. This dramatically affects the nature of the learning problem.

The fifth thing to do (this is related to the point just made) is to stop doing 'bad' typology. The big categories (Sapir's "genius of a language") like 'analytic, synthetic, etc' are not the right things to anticipate: there are no ergative language, analytic language, or whatever. So let's stop pretending there are parameters corresponding to these. (I once heard a talk about "[high analyticity] parameter" ... If you say 'no' to that parameter, do you speak a [less analytic] language? Is this a yes/no or more-or-less issue?) These categories don't have the right granularity, as my friend David Poeppel would say.

Most importantly, we should be clear about whether we want to do linguistics or languistics. Do we care about Plato's problem, or Greenberg's problem? These are not the same thing. IMHO, one of the great features of minimalism, compared to GB, is that it forces you to choose between the language faculty or languages. Lots of people still care about getting the grammar of English right (sometimes, they even say, I-English), but how about getting UG right? It's time we worry about the biological 'implementation' of (I-)language, as Paul (Pietroski) would say.

To conclude, step 0 of the alternative boils down to recognizing we have been wrong (that's the best thing we can aspire to, Popper would say, so why not admit it?).
Alex, I hope this answers your question.

18 comments:

AveryAndrewsJanuary 27, 2013 at 2:55 PM
So how would this be truly different from the approach taken in the approaches to grammar than Ivan Sag calls 'FS' (formal syntax, as opposed to Universal Grammar, and Typology)? FS people have some idea for an invariant architecture (eg the levels of c-structure, f-structure and some kind of semantics and other stuff in LFG), and some principles concerning their relationships, such that every c-structure node has a unique f-structure correspondent, and then come up with notations for expressing the language particular restrictions (assuming that rule notations with an evaluation metric are a decent approximation to what is learned, salvaged in principle at least by the Bayesians).

The languistics/linguistics distinction strikes me as unnecessary and in fact dangerous, since it can encourage people to ignore inconvenient phenomena in the pursuit of attractive UG-ish visions. Baker (2008) ignoring the literature on case stacking in his account of concord is perhaps a recent example of this happening.
ReplyDelete
Replies
Alex ClarkJanuary 28, 2013 at 7:14 AM
So I don't see the relevance of evo-devo to language acquisition; other than as a source of metaphors and analogies. And I don't see where the parameters that you discuss come from as I thought you had abandoned them- or are these parameters rather than Parameters?

But other than that this seems very reasonable -- indeed it more or less summarises what non Chomskyan linguists and cognitive scientists have been trying to do for the last 30 years. It also seems to summarise what Chomsky has been arguing against quite vociferously for the last 30 years. He is not a big fan of Bayesian learning, for example. So I don't see what if anything survives of the classic model here.

And one final trivial nit-pick. I have been getting told off over and over again by you, by Paul P, for using the word 'learning' -- it's not learning, it's acquisition or development or growth or whatever.
And now you are happily using 'learning a language' 'learning problem' etc.
So is it still taboo for dangerous empiricists like myself? Or can anyone use the term ?
ReplyDelete
Replies
Alex ClarkJanuary 29, 2013 at 11:48 AM
Avery,
for me the main point of the Chater and Vitanyi results (and the Horning results which it develops in some sense) is that it shows that you don't need negative evidence. I don't take it as being a theory of language acquisition but just as an argument against the claims that you can't control overgeneralisation without negative evidence -- the 'logical problem' of language acquisition -- and related ideas like the Subset principle.

*I* think it does that quite well, but opinions may differ.

I agree about complexity problems -- these are asymptotic results and don't tell you anything much about finite sample behaviour which is what is important, and neglect the computational issues, but that latter point may not be convincing to the crowd here who seem sceptical about computation.
ReplyDelete
Replies
Alex ClarkJanuary 30, 2013 at 7:31 AM
Norbert, I am not sure I understand your point about super/subsets.
If things don't fit neatly into super subset relations then there is no learnability problem (of this 'logical' sort) because you will see positive evidence that will distinguish A from B and B from A. I thought the problem here was when A is a neat subset of B, and then if your hypothesis is B, then you will never see a positive example which will explicitly disconfirm B, and so you are doomed.
That is the (bad) argument that the statistical learners are meant to overcome. I think the statistical learners can also handle the non-neat case, just like any learner can.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Sunday, January 27, 2013

Reply to Alex (on parameters)

18 comments:

Contributors