Alex C (in the comment section here (Feb. 1)) makes a point that I’ve encountered before that I would like to comment on. He notes that Chomsky has stopped worrying about Plato’s Problem (PP) (as has much of “theoretical” linguistics as I noted in the previous post) and suggests (maybe this is too much to attribute to him, if so, sorry Alex) that this is due to Darwin’s Problems (DP) occupying center stage at present. I don’t want to argue with this factual claim, for I believe that there’s lots of truth to it (though IMO, as readers of the last several posts have no doubt gathered, theory of any kind is largely absent from current research). What I want to observe is that (1) there is a tension between PP and DP and (2) that resolving it opens an important place for theoretical speculation. IMO, one of the more interesting facets of current theoretical work is that it proposes a way of resolving this tension in an empirically interesting way. This is what I want to talk about.
First the tension: PP is the observation that the PLD the child uses in developing its G is impoverished in various ways when one compares it to the properties of Gs that children attain. PP, then, is another name for the Poverty of Stimulus Problem (POS). Generative Grammarians have proposed to “solve” this problem by packing FL with principles of UG, many of which are very language specific (LS), at least if GB is taken as a guide to the content of FL. By LS, I mean that the principles advert to very linguisticky objects (e.g. Subjects, tensed clauses, governors, case assigners, barriers, islands, c-command, etc) and very linguisticky operations (agreement, movement, binding, case assignment, etc.). The idea has been that making UG rich enough and endowing it with LS innate structure will allow our theories of FL to attain explanatory adequacy, i.e. to explain how, say, Gs obey islands despite the absence of good and bad data relevant to fixing them present in the PLD.
By now, all of this is pretty standard stuff (which is not to say that everyone buys into the scheme (Alex?)), and, for the most part, I am a big fan of POS arguments of this kind and their attendant conclusions. However, even given this, the theoretical problem that PP poses has hardly been solved. What we do have (again assuming that the POS arguments are well founded (which I do believe)) is a list of (plausibly) invariant(ish) properties of Gs and an explanation for why these can emerge in Gs in the absence of the relevant data in the PLD required to fix them. Thus, why do movement rules in a given G resist extraction from islands? Because something like the Subjacency/Barriers theory is part of every Language Acquisition Device’s (LAD) FL, that’s why.
However, even given this, what we still don’t have is an adequate account of how the variant properties of Gs emerge when planted in a particular PLD environment. Why is there V to T in French but not in English? Why do we have inverse control in Tsez but not Polish? Why wh-in-situ in Chinese but multiple wh to C in Bulgarian. The answer GB provided (and so far as I can tell, the answer still) is that FL contains parameters that can be set in different ways on the basis of PLD and the various Gs we have are the result of differential parameter setting. This is the story, but we have known for quite a while that this is less a solution to the question of how Gs emerge in all their variety than it is an explanation schema for a solution. P&P models, in other words, are not so much well worked out theories than they are part of a general recipe for a theory that were we able to cook it, would produce just the kind of FL that could provide a satisfying answer to the question of how Gs can vary so much. Moreover, as many have observed (Dresher and Janet Fodor are two notable examples, see below) there are serious problems with successfully fleshing out a P&P model.
Here are two: (i) the hope that many variant properties of Gs would hinge on fixing a small number of parameters seems increasingly empirically uncertain. Cederic Boeckx and Fritz Newmeyer have been arguing this for a while, and while their claims are debated (and by very intelligent people so, at least for a non-expert like me, the dust is still too unsettled to reach firm conclusions), it seems pretty clear that the empirical merits of earlier proposed parameterizations are less obvious than we took them to be. Indeed, there appears to some skepticism about whether there are any macro-parameters (in Baker’s sense) and many of the micro-parametric proposals seem to end up restating what we observe in the data: that languages can differ. What made early macro-parameter theories interesting is the idea that differences among Gs come in largish clumps. The relation between a given parameter setting and the attested surface differences was understood as one to many. If, however, it turns out that every parameter correlates with just a single difference then the value of a parametric approach becomes quite unclear, at least so far as acquisition considerations are concerned. Why? Because it implies that surface differences are just due to differing PLD, not to the different options inherent in the structure of FL. In other words, if we end up with one parameter per surface difference then variation among Gs will not be as much of a window into the structure of FL as we thought it could be.
Here’s another problem: (ii) the likely parameters are not independent. Dresher (and friends) has demonstrated this for stress systems and Fodor (and friends) has provided analogous results for syntax. The problem with a theory where parameters are not independent is that they make it very hard to see how acquisition could be incremental. If it turns out that the value of any parameter is conditional on the value of every other parameter (or very many others) then it would seem that we are stuck with a model in which all parameters must be set at once (i.e. instantaneous learning). This is not good! To evade this problem, we need some way of imposing independence on the parameters so that they can be set piecemeal without fear of having to re-set them later on. Both Dresher and Fodor have proposed ways of solving this independence problem (both elaborate a richer learning theory for parameter values to accommodate this problem). But, I think that it is fair to say that we are still a long way from a working solution. Moreover, the solutions provided all involve greatly enriching FL in a very LS way. This is where PP runs into DP. So let’s return to the aforementioned tension between PP and DP.
One way to solve PP is to enrich FL. The problem is that the richer and more linguistically parochial FL is, the harder it becomes to understand how it might have evolved. In other words, our standard GB tack in solving PP (LS enrichment of FL) appears to make answering DP harder. Note I say ‘appears.’ There are really two problems, and they are not equally acute. Let me explain.
As noted above, we have two things that a rich FL has been used to explain; (a) invariances characteristic of all Gs and (b) the attested variation among Gs. In a P&P model, the first ‘P’ handles (a) and the second (b). I believe that we have seen glimmers of how to resolve the tension between PP’s demands on FL versus DP’s as regards the principles part of P&P. Where things have become far more obscure (and even this might be too kind) involves the second parametric P. Here’s what I mean.
As I’ve argued in the past, one important minimalist project has been to do for the principles of GB what Chomsky did for islands and movement via the theory of subjacency in On Wh Movement (OWM). What Chomsky did in this paper is theoretically unify the disparate island effects by unifying all non-local (A’) dependency constructions by proposing that they have a common movement core (viz. move WH) subject to locality restrictions characterized by Bounding Theory (BT). This was terrifically inventive theory and aside from rationalizing/unifying Ross’s very disparate Island Effects, the combination of Move WH + BT predicted that all long movement would have to be successive cyclic (and even predicted a few more islands, e.g. subject islands and Wh-islands).
But to get back to PP and DP, one way of regarding MP work over the last 20 years is as an attempt to do for GB modules what Chomsky did for Ross’s Islands. I’ve suggested this many times before but what I want to emphasize here is that this MP project is perfectly in harmony with the PP observation that we want to explain many of the invariances witnessed across Gs in terms of an innately structured FL. Here there is no real tension if this kind of unification can be realized. Why not? Because if successful we retain the GB generalizations. Just as Move WH + BT retain Ross’s generalizations, a successful unification within MP will retain GB’s (more or less) and so we can continue to tell the very same story about why Gs display the invariances attested as we did before. Thus, wrt this POS problem, there is a way to harmonize DP concerns with PP concerns. Of course, this does not mean that we will successfully manage to unify the GB modules in a Move WH + BT way, but we understand what a successful solution would look like and, IMO, we have every reason to be hopeful, though this is not the place to defend this view.
So, the principles part of P&P is, we might say, DP compatible (little joke here for the cognoscenti). The problem lies with the second P. FL on GB was understood to provide not only the principles of invariance but also to specify all the possible ways that Gs could differ. The parameters in GB were part of FL! And it is hard to see how to square this with DP given the terrific linguistic specificity of these parameters. The MP conceit has been to try and understand what Gs do in terms of one (perhaps) linguistically specific operation (Merge) interacting with many general cognitive/computational operations/principles. In other words, the aim has been to reduce the parochialism of the GB version of FL. The problem with the GB conception of parameters is that it is hard to see how to recast them in similarly general terms. All the parameters exploit notions that seem very very linguo-centric. This is especially true of micro parameters, but it is even true of macro ones. So, theoretically, parameters present a real problem for DP, and this is why the problems alluded to earlier have been taken by some (e.g. me) to suggest that maybe FL has little to say about G-variation. Moreover, it might explain why it is that, with DP becoming prominent, some of the interest in PP has seemed to wane. It is due to a dawning realization that maybe the structure of FL (our theory of UG) has little to say directly about grammatical variation and typology. Taken together PP and DP can usefully constrain our theories of FL, but mainly in licensing certain inferences about what kinds of invariances we will likely discover (indeed have discovered). However, when it comes to understanding variation, if parameters cannot be bleached of their LSity (and right now, this looks to me like a very rough road), it looks to me like they will never be made to fit with the leading ideas of MP, which are in turn driven by DP.
So, Alex C was onto something important IMO. Linguists tend to believe that understanding variation is key to understanding FL. This is taken as virtually an article of faith. However, I am no longer so sure that this is a well founded presumption. DP provides us with some reasons to doubt that the range of variation reflects intrinsic properties of FL. If that is correct, then variation per se may me of little interest for those interested in liming the basic architecture of FL. Studying various Gs will, of course, remain a useful tool for in getting the details of the invariant principles and operations right. But, unlike earlier GB P&P models, there is at least an argument to be made (and one that I personally find compelling) that the range of G-variation has nothing whatsoever to do with the structure of FL and so will shed no light on two of the fundamental questions in Generative Grammar: what’s the structure of FL and why?
 Though Baker, a really smart guy, thinks that there are so please don’t take me as endorsing the view that there aren’t any. I just don’t know. This is just my impression from linguist in the street interviews.
 The confirmation of this prediction was one of the great successes of generative grammar and the papers by, e.g. Kayne and Pollock, McCloskey, Chung, Torrego, and many others are still worth reading and re-reading. It is worth noting that the Move WH + BT story was largely driven by theoretical considerations, as Chomsky makes clear in OWM. The gratifying part is that the theory proved to be so empirically fecund.
 Note the ‘perhaps.’ If even merge is in the current parlance “third factor” then there is nothing taken to be linguistically special about FL.
 Note that this quite a bit of room for “learning” theory. For if the range of variation is not built into FL then why we see the variation we do must be due to how we acquire Gs given FL/UG. The latter will still be important (indeed critical) in that any larning theory will have to incorporate the isolated invariances. However, a large part of the range of variation will fall outside the purview of FL. I discuss this somewhat in the last chapter of A theory if syntax for any of you with a prurient interest in such matters. See, in particular, the suggestion that we drop the switch analogy in favor of a more geometrical one.