Thursday, January 31, 2013

Grammars and Brains

A post or two ago I got waylaid on my way to mentioning some important papers by David Poeppel and Dave Embick (P&E).  They have been writing about Broca’s Problem; how to fit linguistic work on FL/UG with neuro work on brains.  Several of their papers (e.g. here and here) review the state of the art. For the neuro illiterate (e.g. me) this is a useful quick review of what “we” know about how brains embody language.  The general impression I get (though please read these for yourself) is that we know relatively little. Or to be more exact, we know a little about one kind of problem, the “where” problem (“Where are certain capacities localized?” i.e. what Poeppel (here) calls this the map problem), but almost nothing about (what I, and P&E, take to be the more important issue) the “how” problem (“How do brains realize these capacities?”, i.e. what Poeppel calls the mapping problem).  Let me expand a bit.

P&E describe two hurdles to successfully integrating linguistic work with neuro-research. They dub these the Granularity Mismatch Problem (GMP) and the Ontological Incommensurability Problem (OIP). 

GMP names a fact that becomes obvious when a linguist (e.g. me) sits in on neuro-lab meetings (which I often do to make myself feel good about work and data in syntax) and listens to discussions about localizing syntax or semantics or whatever.  The units being mapped in the brain are, from a syntactician’s perspective, very big.  Syntax for me is not one thing but a richly structured system with many kinds of objects (nouns, verbs, phrases, chains, complements, adjuncts, antecedents, bound pronouns etc.), many kinds of relations (binding, agreement, case marking, subcategorization etc.) and subserved by (possibly many) different kinds of operations (e.g. Merge, AGREE, Copy, Delete etc.), while in the neuro literature it is one undifferentiated thing, usually found on the left in conspicuous proximity to Broca’s area or BA 44/45 (sounds like the Arizona redoubt where the government keeps captured aliens doesn’t it?).  Similarly, semantics as studied in the neuro literature hardly ever worries about compositionality or scope or binding but more about the interpretation of words and the semantic fields they are part of, something that linguistic semantics says almost nothing about (what's the meaning of ‘life’? Well, LIFE’ or lx.LIFE(x), helpful huh?). P&E identify several problems that arise from the fact that “the distinctions made in the neurological study of language are coarse in comparison with the distinction made by linguists (7).”  One that they highlight concerns the (incorrect, in their view) identification of Broca’s area with syntactic computation. I’ve mentioned another review reaching the same conclusion (here).

P&E make two important points: (i) at a coarse level of description many apparently different things light up Broca’s area (ii) that this should not be surprising for the mistaken identification of Broca’s area with “syntax” rests on identifying syntax as a “simplex unstructured computation” looking for a “single undifferentiated cortical region” to call home. Of course syntax is not simple and Broca’s area is not undifferentiated. What P&E believe to be true is that “one or perhaps several of the computational subroutines that are essential for syntactic processing are computed in the Inferior Temporal Gyrus (IFG). But these are not ‘syntax’ per se -- they are computational sub-components of syntax (9).” The game should be to identify these suboperations and see how they fit.

This is music to my ears for from a minimalist perspective, we should see many of the basic processes operative in the syntax also implicated in other kinds of cognitive manipulations. In other words, getting the grain right leads us to expect that Broca’s area should light up quite often. Or as P&E put it:

The natural assumption is that the differently structured cortical areas are specialized for performing different types of computations, and that some of these computations are necessary for language but also for other cognitive functions (10).

P&E also address the OIP and suggest that ‘computation’ is where linguistics and neurology might meet to unify into a serious neurolinguistics.  They note that this first requires dumping “terms like ‘psychologically real’ or ‘neurologically real.’” Why? Because, these notions mislead. How?  They “imply that there is some other type of reality to linguistic computations beyond being computed in the brain.” (Yes! Yes! Yes!) Linguistic analyses, they insist, are proposals about the “computations/representations that are computed in the minds/brains of speakers.” So understood the envisioned job of neurolinguistics is to identify how “these computations are implemented” in neural hardware (12). In other words, what linguistics study and what neuro-linguists study is one and the same object described in different ways. The aim is to unify these different descriptions and this requires dispensing with the silly distinction between true and psychologically/neurally real.

Poeppel suggests that right “place” for this unification is at the level of the ‘circuit.’ Circuits can be described functionally (this circuit adds two quantities, this one takes the difference between them) as well as physiologically (two and-gates plus a not-gate) and anatomically (current flows here when open, there when closed).  This is congenial to certain minimalist ways of talking (e.g. here) wherein the prize goes to s/he who finds the right basic circuits (and their wiring) that underlie the complex operations of the grammar (e.g. binding, phrase structure, movement).  At any rate, maybe ontology and granularity can happily join hands at the computational circuit.

The above just touches lightly on the content of these papers. Poeppel also elaborates on the useful distinction between the map vs mapping problem. The latter is closely related to the OIP issues discussed in P&E. with additional instructive bells and whistles. Let me end with a sample quote to whet your appetites for more (and to show that I stole the ‘circuit’ talk from David P) (p.35):

In a typical cognitive neuroscience study …participants will engage in some…task…while their brain activity is monitored. The analyses show that some area or areas are selectively modulated, and it is then argued that activation of a given area underpins, say, phonological processing, or lexical access or syntax.  …[I]t is fair to say that the canonical results –very much at the center of current research- are correlational…but we have no explanation, no sense of which properties of neuronal circuits that we understand account for the execution of function. How to proceed? …[W]e [should] decompose the cognitive tasks under investigation into computational primitives that can be related to local brain structure and function, in a sense instrumentalizing the computational theory of mind…more aggressively.

And let us all say: Amen!

Tuesday, January 29, 2013

Competence and Performance Redescribed

In discussion with Jeff Lidz one morning on the way to getting my cup of tea (this is my favorite time to waylay unsuspecting colleagues for linguist in the street interviews) we fell into discussion of how to explain the C(ompetence)/P(erformance) (C/P) distinction.  One thing I have often found useful is to find alternative descriptions of the same thing. Here’s a version I’ve found helpful. Let’s cast the C/P distinction in terms of Data Structures (DaSt) and the operations (or algorithms (algs)) that operate on them. What’s the difference between these notions and how might this terminology illuminate the C/P distinction?

Chomsky has standardly distinguished what we know (competence) from how we put what we know to use (performance).  He further proposes describing the former as a computational state; initial states describing the capacity for a certain kind of competence and the steady state as describing the actual competence.  Thus, UG describes the initial state of FL and the G attained describes the steady state of the language acquisition device (i.e. yours and your linguistic conspecifics).  How are we to understand these states?  The discussion in Gallistel & King (G&K) on DaSts and the operations/algs that work on them provides a useful handle.  

G&K (149) describe a DaSt as “a complex symbol” whose “constituents are themselves symbols” and whose “referent …is determined by the referents of its constituents and the syntactic…relation between them.”   Thus, “[c]omplex data structures encode the sorts of things that are asserted in what philosophers and logicians call propositions” (150) and what linguists call (sets of) phrase markers. Actually the linguistic notion might be closer as the operations manipulate DaSts exclusively in virtue of their formal syntactic properties. How these are interpreted plays no role in what kinds of things operations on DaSts can do.  This makes them effectively syntactic objects like phrase markers rather than semantic objects like propositions.

Using this terminology, a theory of competence aims to describe the linguistic DaSts that the mind/brain has. The initial state (UG) consists of certain kinds of DaSts, the final state (Gs) with particular instances of such DaSts. A particular G is a recursive specification of the permissible DaSts in a given I-langauge. UG specifies the kinds of DaSts (i.e. Gs) that FL allows by describing the range of permissible generative procedures (rules for recursively specifying Gs/I-language specific DaSts). The theory of UG aims to outline what kinds of features linguistic DaSts can have, what kind of syntactic relations are permissible, what kinds of constituents are possible etc.  The theory of a particular G (e.g. GE, the grammar of English) will specify which DaSt options that UG provides GE realizes. Both these accounts specify what the system “knows,” i.e. the kinds of representations it allows, aka the kinds of DaSts it tolerates.  In other words, a theory of linguistic DaSts is a competence theory.

Algs are procedures that use these data structures in various ways. For example, an alg can use DaSts to figure out what someone said or what to say or what rhymes with what or what follows from what. e.g. could operate on the phonological DaSts to determine the set of rhyming pairs. Algorithms can be fast or slow, computationally intensive or not, executable in linear time etc.  None of these predicates apply to DaSts. Different algs for different tasks can use the same DaSts.  Algs are ways of specifying the different things one can do with DaSts. Algs are parts of any performance theory.

G&K emphasize the close connection between DaSts and Algs. As they note:

There is an intimate and unbreakable relation between how information is arranged [viz. the DaSts, N]…and the computational routines that operate on the information [viz. algs, N]…(164).

Consequently, it is quite possible that the computational context can reveal a lot about the nature of the DaSts and vice versa.  I have suggested as much here, when I mooted the possibility that systems that have bounded content addressable memories might guard against the problems such systems are prone to by enforcing something like relativized minimality on the way that DaSts are organized.  In other words, the properties of the users of DaSts can tell us something about properties of the DaSts.

Another way of triangulating on properties of a DaSt D1 is by considering the properties of the other DaSts- D2, D3 – D1 regularly interacts with. A well-designed D1 should be sensitive to how D1 interacts with D2 and D3. How well do they fit together? What would be an optimal fit? As the government learns weekly when it tries to integrate its various databases, some DaSts play together more nicely than others do.

So, the theory of competence can be viewed as the theory of the linguistic DaSts; what are their primitive features? How are the assembled? What kinds of relations do they encode? The theory of performance can be viewed as asking about the algs that operate on these DaSts and the general cognitive architectural environment (e.g. memory) within which these algs operate.  Both DaSts and algs contribute to linguistic performance. In fact, one of minimalism’s bets is that one can learn a lot about linguistic DaSts by carefully considering what a well designed linguistic DaSt might be like given its interactions with other DaSts and how it will be used. So, performance considerations can (at least in principle) significantly inform us about the properties of linguistic DaSts. We want to know both about the properties of linguistic representations and about the operations that use these representations for various ends. However, though closely related, the aim of theory is to disentangle the contributions of each. And that’s why the distinction is important.

Sunday, January 27, 2013

Joining the Fun; A Ramble on Parameters

There a very interesting pair of posts and a long thread of insightful comments relating to parameters, both the empirical support for such as well as their suitability given current theoretical commitments.  Cederic, commenting on Neil’s initial post and then adding a longer elaboration, makes the point that nobody seems committed to parameters in the classical sense anymore. Avery and Alex C comment that whatever the empirical shortcomings of parameteric accounts, something is always better than nothing so they reasonably ask what do we replace it with. Alex D rightly points out that the success of parametric accounts is logically independent of the POS and claims about linguistic nativism. In this post, I want to reconstruct the history of how parameter theory arose so as to consider where we ought to go from here. The thoughts ramble on a bit, because I have been trying to figure this out for myself.  Apologies ahead of time.

In the beginning there was the evaluation metric (EM), and Chomsky looked on his work and saw that it was deficient.  The idea in Aspects was that there is a measure of grammatical complexity built into FL and that children in acquiring their I-language (an anachronism here) chose the simplest one compatible with the PLD (viz. the linguistic data available to and used by the child). EM effectively ordered grammars according to their complexity. The idea riffed on ideas of minimal description length around at the time but with the important addition that the aim of a grammatical theory with aspirations of explanatory adequacy was to find the correct UG for specifying the meta-language relevant to determining the correct notion of “description” and “length” in minimal description length. The problem was finding the right things to count when evaluating grammars. At any rate, on this conception, the theory of acquisition involved finding the best overall G compatible with PLD as specified by EM.  Chomsky concluded that this conception, though logically coherent, was not feasible as a learning theory, largely because it looked to be computationally intractable.  Nobody had (nor I believe, has) a good tractable idea of how to compare grammars overall so as to have a complete ordering. Chomsky in LSLT developed some pair-wise metrics for the local comparison of alternative rules, but this is a long way from having a total ordering of the alternative Gs that is required to make EM accounts feasible.  Chomsky’s remedy for this problem: divorce language acquisition from the evaluation of overall grammar formats.

The developments of the Extended Standard Theory, which culminated in GB theories, allows for an alternative conception of acquisition, one that divorces it from measuring overall grammar complexity.  How so? Well, first we eliminated the idea that Gs were compendia of constructions specific rules. And second, we proposed that UG consists of biologically provided schemata (hence part of UG and hence not in need of acquisition) that specify the overall shape of a particular G. On this view, acquisition consists in filling in values for the schematic variables.  Filling in values of UG specified variables is a different task from figuring out the overall shape of the grammar and, on the surface at least, a far more tractable task. The number of parameters being finite already distinguished this from earlier conceptions. In the earlier EM view of things there was no reason to think that the space of grammatical possibilities was finite. Now, as Chomsky emphasized, within a parameter setting model, the space of alternatives, though perhaps very large, was still finite and hence the computational problem was different in kind from the one lightly limned in Aspects. So, divorcing the question of grammatical formats (via the elimination of rules or their reduction to a bare minimum form like ‘move alpha’) from the question of acquisition allowed for what looked like a feasible solution to Plato’s Problem. In place of Gs being sets of constructions specific rules with EMs measuring their overall collective fitness, we had the idea that Gs were vectors of UG specified variables with two possible values (and hence “at most” 2n possible grammars, a finite number of options). Finding the values was divorced from evaluating sets of rules and this looked feasible.

Note that this is largely a conceptual argument. There is a reasonable hunch but no “proof.” I mention this because other conceptual considerations (we will get to them) can serve to challenge the conclusion and make parameter theories less appealing.

In addition to these conceptual considerations, the comparative grammar research in the 70s, 80s, and 90s provided wow-inducing empirical confirmation of parameter based conceptions. It is hard for current (youngish) practitioners of the grammatical dark arts to appreciate how exciting early work on parameter setting models was. There were effectively three lines of empirical support.

1.     The comparative synchronic grammar research. For example:
a.     The S versus S’ parameter distinguishing Italian from English islands (Rizzi, Sportiche, Torrego)).
b.     The pro drop parameter (correlating, null subjects, inversion and long movement apparently violating the fixed subject/that-t condition (Rizzi, Brandi and Cordin)).
c.     The parametric discussions of anaphoric classes (local and long distance anaphors (Wexler, Borer), to name just three, all uncovered a huge amount of new linguistic data and argued for the fecundity of parametric thinking.
2.     Crain’s “continuity thesis,” which provided evidence that kids “mistakes” in acquiring their particular Gs all actually conform to actual adult Gs. This provided evidence that the space of G options was pretty circumscribed, as a parameter theory implies it is.
3.     The work on diachronic change by Kroch, Lightfoot, Roberts (and more formal work by Berwick and Niyogi) a,o., which indicated that large shifts in grammatical structure over time (e.g. SOV to SVO) could be analyzed as changes in a small number of simple parameter changes.

So, there was a good conceptual reason for moving to parameter models of UG and the move proved to be empirically very fecund. Why the current skepticism?  What’s changed?

To my mind, three changes occurred. As usual, I will start with the conceptual challenges and then proceed to the empirical ones.

The first one can be traced to work first by Dresher and Kaye, and then taken up and further developed with great gusto by Fodor (viz. Janet) and Sakas. This work shows that finite parameter setting can present tractability problems almost as difficult as the ones that Chomsky identified in his rejection of EM models.  What this work demonstrates is that given current envisioned parameters, parameter setting cannot be incremental. Why not? Because parameter values are not independent.  In other words, the value of one parameter in a particular G may depend crucially on that of another. Indeed, the value of any may depend on the value of each and this makes for an explosive combinatory problem. It also makes incremental acquisition mysterious; how do the parameter values get set if any bit of later PLD can completely overturn values previously set?

There have been ingenious solutions to this problem, my favorite being cue-based conceptions (developed by Dresher, Fodor, Lightfoot a.o.). These rely on the notion that there is some data in the PLD that unambiguously determines the value of a parameter. Once set on the basis of this data, the value need never change.  Triggers effectively impose independence on the parameter space. If this is correct, then it renders UG yet more linguistically specific; not only are the parameters very linguistically specific, but the apparatus required to fix these is very linguistically specific as well. Those that don’t like linguistically parochial UGs should really hate both parameter theories and this fix to them. Which brings us to the second conceptual shift: Minimalism.

The minimalist conceit is to eliminate the parochialism of FL and show that the linguistically specific structure of UG can be accounted for in more general cognitive/computational terms. This is motivated both on general methodological grounds (factoring out what is cognitively general from what is linguistically specific is good science) and as a first step to answering Darwin’s Problem, as we’ve discussed at length in other posts. FL internal parameters are a very big challenge to this project. Why? Because UG specified parameters encumber FL with very linguistically specific information (e.g. it’s hard to see how the pro drop parameter (if correct) could possibly be stated in non linguistically specific terms!).

This is what I meant earlier when I noted that conceptual reasons could challenge Chomsky’s earlier conceptual arguments.  Even if parameters made addressing Plato’s Problem more tractable, they may not be a very good solution to the feasibility problem if they severely compromise any approach to Darwin’s. This is what motivates Cederic’s concerns (and others, e.g. Terje Lohndal) I believe, and rightly so.  So, the conceptual landscape has changed and it is not surprising that parameter theories have become less appealing and so open to challenge.

Moreover, as Cederic also stresses, the theoretical landscape has changed as well. A legacy of the GB era that has survived into Minimalism is the agreement that Gs do not consist of construction based rules. Rather, there are very general operations (Merge) with very general constraints (e.g. Extension, Minimality) that allow for a small set of dependencies universally.  Much of this (not all, but much) can be reanalyzed in non linguistically specific terms (or so I believe). With this factored out, there are featural idiosyncracies located in demands made by specific lexical items, but this kind of idiosyncracy may be tolerable as it is segregated to the lexicon, a well known repository of eccentrics.[1]  At any rate, it is easy to see what would motivate a reconsideration of UG internal parameters.

The tractability problems related to parameter setting noted by Dresher-Fodor and company simply add to these motivations. 

That leaves us with the empirical arguments. These alone are what make parameter accounts worth endorsing, if they are well founded, and this is what is currently up for grabs and way beyond my pay grade. Cederic and Fritz Newmeyer (among others) have challenged the empirical validity of the key results. The most important discoveries amounted to the clumping of surface effects with the settings of single values, e.g. pro drop+subject inversion+no that-t effects together as a unit. Find one, you find them all.  However, this is what has been challenged. Is it really true that the groupings of phenomena under single parameter settings is correct?  Do these patterns coagulate as proposed? If not, and this I believe is Newmeyer’s point and strongly empahasized by Cedric, then it is not clear what parameters buy us.  Yes I-languages are different. So? Why think that this difference is due to different parameter settings? So, there is an empirical argument: are there data groupings of the kind earlier proposals advocated? Is the continuity thesis accurate and if so how does one explain this without parameters? These are the two big empirical questions and it is likely to be where the battle over parameters has been joined and, one hopes, will ultimately get resolved.

I’d like to epmpahsize that this is an empirical question.  If the data falls on the classical side then this is a problem for minimalists and exacerbates our task of addressing Darwin’s problem. So be it. Minimalism as I understand it has an empirical core and if it turns out that there is richer structure to UG than I would like, well tough cookies on me (and you if your sympathies tend in the same direction)!

Last point and I will end the rambling here. One nice feature of parameter models is the pretty metaphor it afforded for language acquisition as parameter setting. The switch box model is intuitive and easy to grasp. There is no equivalent for EM models and this is partly why nobody knew what to do with the damn thing.  EM never really got used to generate actual empirical research the way parameter setting models did, at least not in syntax. So can we envision a metaphor for non parameter setting models. I think we can. I offered one in A theory of syntax that I’d like to try and push it again here (I know that this is self aggrandizing, but tooting one’s own horn can be so much fun).  Here’s what I said there (chapter 7):

Assume for a moment that the idea of specified parameters is abandoned. What then?  One attractive property of the GB story was the picture that it came with.  The LAD was analogized to a machine with open switches.  Learning amounts to flipping the switches ‘on’ or ‘off’.  A specific grammar is then just a vector of these switches in one of the two positions.  Given this view there are at most 2P grammars (P=number of parameters).  There is, in short, a finite amount of possible variation among grammars.
            We can replace this picture of acquisition with another one.  Say that FL provides the basic operations and conditions on their application (e.g. like minimality).  The acquisition process can now be seen as a curve fitting exercise using these given operations.  There is no upper bound on the ways that languages might differ though there are still some things that grammars cannot do.  A possible analogy for this conception of grammar is the variety of geometrical figures that can be drawn using a straight edge and compass.  There is no upper bound on the number of possible different figures.  However, there are many figures that cannot be drawn (e.g. there will be no triangles with 20 degree angles).  Similarly, languages may contain arbitrarily many different kinds of rules depending on the PLD they are trying to fit.

So think of the basic operations and conditions as the analogues of the straight edge and compass and think of language acquisition as fitting the data using these tools. Add to this a few general rules for figure fitting: add a functional category if required, pronounce a bottom copy of a chain rather than a top copy, add an escape hatch to a phase head. These are general procedures that can allow the LAD to escape the strictures of the limited operations a minimalistically stripped down FL makes available.  The analogy is not perfect. But the picture might be helpful in challenging the intuitive availability of the switch box metaphor.

That’s it. This post has also been way too long. Kudos to Neil and Cedric and the various very articulate commenters for making this such a fruitful topic for thought, at least for me. 

[1] Though I won’t discuss this now, it seems to me that the Cartographic Project and its discovery of what amounts to a universal base for all Gs is not so easily dismissed. The best hope is to see these substantive universals explicated in semantic terms, not something I am currently optimistic will soon appear.

Reply to Alex (on parameters)

Commenting on my post, Alex asked "was is the alternative? Even if there is no alternative theory that you like, what is the alternative 'research paradigm'? What do *you* think researchers, who think like us that the central problem is language acquisition, should work on? What is the right direction, in your opinion?"
I had to start a new post, because I could not find a way to insert images in the 'reply to comment' option. You'll see I needed 2 images.

So, what's the alternative? I don't have a grand theory to offer, but here are a few things that have helped me in my research on this topic. I hope this can be of help to others as well.

I think the first thing to do is to get rid of the (tacit) belief that the problem is easy ('essentially solved', I'm told), and to take seriously the possibility that there won't be an adequate single-level theory for Plato's problem. Here's what I mean: In "What Darwin got wrong", Fodor and Piattelli-Palmarini rightly point out (end of the book) that unlike single-level theories in physics, single-level theories in biology don't work very well. Biology is just too messy. Theories that assume it's all selection (or it's all parameter fixation) are just not the right kind of theory. We've got to be pluralistic and open-minded. Biologists made progress when they realized that mapping the genotype to the phenotype was not as easy as the modern synthesis had it. Bean bag genetics is no good. Bean bag linguistics is not good either.

I once attended a talk by Lila Gleitman (who, like Chomsky, is almost always right) where she said something that generative grammarians (and, of course, all the others, to they extent they care about Plato's problem) ought to remember at all times: learning language is easy (it's easy because you don't learn it, it's 'innate'), but learning a language is really hard and you (the child) throw everything you can at that problem. I agree with Lila: you throw everything you can. So for Plato's problem, you resort to all the mechanisms you have available. (The prize will be given to those who figure out the right proportions.)
For us, generativists, this means, learning from the other guys: I personally have learned a lot from Tenenbaum and colleagues on hierarchical Bayesian networks and from Jacob Feldman's work on human concept learning. I think the work of Simon Kirby and colleagues is also very useful. Culbertson's thesis from Hopkins is also a must-read.
All of these guys provide interesting biases that could add structure to the minimal UG some of us entertain.
Add to that the sort of pattern detection mechanisms explored by Jacques Mehler, Gervain, Endress, and others to help us understand what the child uses as cues.
None of this is specifically linguistic, but we just have to learn to lose our fear about this. If UG is minimal, we've got to find structure somewhere else. Specificity, modularity, ... they'll have to be rethought.

The second thing to do is to try to figure out the right kind of grammatical priors to get these biases to work in the right way. Figure out the points of underspecification in (minimal) UG: what are the things about which UG does not say anything? (For example, syntax does not impose linear order, something else does) Since a growing number of people bet on variation being an 'externalization' issue (no parameter on the semantic side of the grammar), it would be great to have a fully worked out theory of the morphophonological component in the sense of Distributed Morphology (what are the operations there, what's structure of that component of the grammar?).
Halle and Bromberger said syntax and phonology are different (Idsardi and Heinz have nice work on this too). Would be nice to be clear about where the differences lie. (Bromberger and Halle put their fingers on rules (yes, for phononology, no for syntax). I think they were right about that difference. Curiously enough, when Newmeyer talks about rules, those who defend parameters go crazy, saying rules are no good to capture variation, but no one went crazy at Halle when he talked about phonological rules, and boy does phonology exhibit [constrained] variation ...)

The third thing is to take lessons from biology seriously. Drop the idealization that language acquisition is 'instantaneous' and (like biologists recognized the limit of geno-centrism --- in many ways, the same limits we find with parameters) take development seriously ("evodevo"). There is good work by a few linguists in this area (see the work by Guillermo Lorenzo and Victor Longa), but it's pretty marginalized in the field. We should also pay a lot of attention to simulations of the sort Baronchelli, Chater et al. did (2012, PLOS) (btw, the latter bears on Neil Smith's suggestions on the blog.)

The fourth thing (and this is why I could not use the 'reply to comment' option) is to develop better data structures. Not better data (I think we always have too many data points), but better data *structures*. Here's what I mean. Too many of us continue to believe that points of variation (parameters, if you want) will relate to one another along the lines of Baker's hierarchy. Nice binary branching trees, no crossing lines, no multi-dominance, like this (sorry for the resolution) [Taken from M. Baker's work]

Such representations are plausible with toy parameters. E.g., "pro-drop": Does your language allow pro? No, then your language is English. Yes., then next question: does it allow pro only in subject position? No, then your language is Chinese. Yes, then your language is Italian.
We all know this is too simplistic, but this is ALWAYS the illustration people use. It's fine to do so (like Baker did) in popular books, but it's far from what we all know is the case.
But if it's not as simple, how complex is it?
As far as I know, only one guy bothered to do the work. For many years, my friend Pino Longobardi has worked on variation in the nominal domain. He's come up with a list of some 50+ parameters. No like my toy 'pro-drop' parameter. Are there more then 50? You bet, but this is better than 2 or 3. More realistic. Well, look what he found: when he examined how parameters relate to one another (setting P1 influences setting P2, etc. ), what you get is nothing like Baker's hierarchy, but something far more complex (the subway map in my previous post) [Taken from G. Longobardi's work]

As they say, an image is worth a thousand words.
But the problem is that we only have this detailed structure for one domain of the grammar (my student Evelina Leivada is working hard on other domains, as I write, so stay tuned). Although we have learned an awful lot about variation in the GB days, when it comes to talking about parameter connectivity, we somehow refuse to exploit that knowledge (and organize it like Longobardi did), and we go back to the toy examples of LGB (pro-drop, wh-in-situ). This is an idealization we have to drop, I think, because when we get our hands dirty, as Longobardi did, and as Leivada is doing, we get data structures that don't resemble what we may have been led to expect from P&P. This dramatically affects the nature of the learning problem.

The fifth thing to do (this is related to the point just made) is to stop doing 'bad' typology. The big categories (Sapir's "genius of a language") like 'analytic, synthetic, etc' are not the right things to anticipate: there are no ergative language, analytic language, or whatever. So let's stop pretending there are parameters corresponding to these. (I once heard a talk about "[high analyticity] parameter" ... If you say 'no' to that parameter, do you speak a [less analytic] language? Is this a yes/no or more-or-less issue?) These categories don't have the right granularity, as my friend David Poeppel would say.

Most importantly, we should be clear about whether we want to do linguistics or languistics. Do we care about Plato's problem, or Greenberg's problem? These are not the same thing. IMHO, one of the great features of minimalism, compared to GB, is that it forces you to choose between the language faculty or languages. Lots of people still care about getting the grammar of English right (sometimes, they even say, I-English), but how about getting UG right? It's time we worry about the biological 'implementation' of (I-)language, as Paul (Pietroski) would say.

To conclude, step 0 of the alternative boils down to recognizing we have been wrong (that's the best thing we can aspire to, Popper would say, so why not admit it?).
Alex, I hope this answers your question.