Friday, May 25, 2018

Three quintessentially minimalist projects

I was in Barcelona last week giving some lectures on the triumphant march forward of the Minimalist Program (MP). As readers may know, I believe that MP has been a success in its own terms in that it has gone a fair way towards answering the questions it first posed for itself, viz. Why do we have the FL we actually have and not some other? Others are more skeptical, but I believe that this is mainly because critics demand that MP address questions not central to its research mission. Of course, answering the MP question might leave many others untouched, but that hardly seems like a reason to disparage MP so much as a reason for pursuing other programs simultaneously. At any rate, this was what the lectures were about and I thank the gracious audience at the Autonomous University of Barcelona for letting me outline these views for their delectation. 

In getting these lectures into shape I started thinking about a question prompted by recent comments from Peter Svenonius (thx, see here). Peter thinks, if I understand him correctly, that the MP obsession (ok, my obsession) with Darwin’s Problem (DP) really adds nothing to the MP enterprise. Things could proceed more or less as we see them without this biological segue. This got me thinking about the following question: Which MP projects are largely motivated by DP concerns? I can think of three. They may well be motivated on other grounds as well. But they seem to me a direct consequence of taking the DP perspective on the emergence of FL seriously. This post is a first stab at enumerating these and explaining why I think they are intimately tied to DP (in much the same way that the P&P project was intimately tied to Plato’s Problem (PP)). So what are the three lines of inquiry? A warning: The order of discussion does not imply anything about their relative salience or importance to MP. I should note, that many of the points I make below, I have made elsewhere and before. So do not expect to be enlightened. This is probably more for my benefit than for yours.

First, Unification of the Modules (UoM). MP is based on the success of the P&P program, in particular the perceived success of GBish conceptions of FL/UG. Another way of saying this is that if you don’t think that GB managed to limn a fairly decent picture of the fine structure (i.e. the universals) of FL/UG then the MP project will seem to you to be, at best, premature and, at worst, hubristic. 

I believe that a lot of the problems that linguists have with MP has less to do with its failure to make progress in answering the MP question above, than with the belief that the whole project presupposes accepting as roughly right a deeply flawed conception of FL/UG (viz. roughly the GB conception). So, for example, if you don’t like classical case theory (and many syntacticians today do not) then you won’t like a project that takes it to be more or less accurate and tries to derive its properties from deeper principles. If you don’t think that classical binding theory is more or less correct then you won’t like a project that tries to reduce it to something else. The problem for many is that what MP presupposes (namely that GB was roughly correct, but not fundamental) is precisely what they believe ought to be very much up for grabs. 

I personally have a lot of sympathy for this attitude. However, I also think that it misses a useful feature of the presupposition. MP motivated unification/reduction doesn’t require that the GB description be correct so much that it be plausible enough (viz. that it be of roughly the right order of complexity) as to make deriving its properties a useful exercise (i.e. an exercise that if successful will provide a useful modelfor future projects armed with a more accurate conception of FL/UG). To put this another way: the GB conception of FL/UG has identified design features of FL/UG (aka, universals) that are theoretically plausible and empirically justifiable and so it is worth asking why principles such as theseshould govern the workings of our linguistic capacities. In Poeppel’s immortal phrase, these GB principles are the right grain size for analysis, have non-trivial empirical backing and so the exercise of showing why they should be part of FL would be doing something useful even should they fail to reflect the exact structure of FL.[1]

So, a central project animated by DP is unification of the disparate GB modules. And this is a very non-trivial project. As many of you know, GB attributes a rather high degree of internalmodularity to FL. There are diverse principles regulating binding vs control vs movement vs selection/subcategorization vs theta role assignment vs case assignment vs phrase structure. From the perspective of Plato’s Problem, the diversity of the modules does not much matter as their operations and principles are presumed to be innate (and hence not learned). In fact, the main impetus behind P&P architectures was to isolate the plausibly invariant features of Gs, and explain them by attributing them to the internal workings of FL thereby constraining the Gs FL produces to invariably respect these features. Thus the reason that Gs are always structure dependent is that FL has the property of only being able to construct Gs that are structure dependent. The reason that movement and binding require c-command is that FL imposes c-command as a condition on these diverse modular operations. The aim of GB was to identify and factor out the invariant properties of specific Gs and treat them as fixed features of FL/UG so that they did not have to be acquired on the basis of PLD (and a good thing too as there is not sufficient data in the PLD to fix them (aka PoS considerations apply to these)). The problem of G acquisition could then focus on the variable parts (where Gs differ) using the invariant parts as Archimedean fixed points for leveraging the PLD into specific Gs. That was the picture. And for this P&P project to succeed, it did not much matter how “complex” FL/UG was so long as it was innate. 

All of this changes, and changes dramatically, once one asks how this system couldhave arisen. Then the internal complexity matters, and matters a lot. Indeed, once one asks this question there is a great premium on simple FL architectures, with fewer modules and fewer disparate principles for the simpler the structure of FL, the easier it is to imagine how it mighthave arisen from the cognitive architecture of predecessors that did not have one. 

If this is correct, then one central MP project is to show that the diversity of the GB modules is only apparent and that they are only different reflections of the same underlying operations and principles. In other words, the project of unifying the modules is central to MP and it is central becauseof DP. A solution to DP requiresthat what appearsto be a very complex FL system (i.e. what GB depicts) is actually quite simple and what appearto be very different modules with different operations and regulative principles are really all reflections of the same underlying generative procedures. Why? Because short of this it will be impossible to explain how the system that GB describes could have arisen from a mind without it. 

This is entirely analogous, in its logic, to Plato’s Problem. How can kids acquire the Gs they do with the properties they have despite a poverty of the linguistic stimulus? Because much of what they know they do not have to learn. How could humans have evolved an FL from non-FL cognitive minds? Because FL minds are only a very small very simple step away from the minds that they emerged from and this requires that the modular complexity GB attributes to FL is only apparent. It’s what you get when you add to the contents of non-linguistic minds the small simple addition MP hypothesizes bridged the ling/non-ling gap.

Are there other plausible motives for such a project, the project of unifying the modules? Well perhaps. One might argue that an FL with unified modules are in some methodological sense better than one with non-unified ones. Something like a principle that says fewer modules are better than more. Again, I think that this is probably correct, but let’s face it, this kind of methodological Ockamist accounting is very weak (or at least perceived to be so). When push comes to shove data coverage (almost?) always trumps such niceties (remember the ceteris paribus clausethat always accompanies such dicta). So it is worth having a big empiricalfact of interest driving the agenda as well. And there are few facts bigger and heftier than the fact that FL arose from non-FL capable minds and it is easier to explain how this could havehappened if FL capable minds are only mildly different from non-FL capable minds and this means that the complex modularity that GB attributes to FL capable minds is almost certainly incorrect. That’s the line of argument. It rests on DPish assumptions and, to my mind, provides a powerful empirical motivation for module unification, which is what makes unification a central MP project.

It suggests a second related project: not only must the modules be unified, but the unification should makes use of the fewest possible linguistically proprietary operations and principles. In other words, linguistically capable minds, ones withFLs should be as minimally linguisticallyspecial as possible. Why? Because evolution proceeds most smoothly when there is minimal qualitative difference between the evolved states. If the aim is to explain how language ready minds appeared from non language ready minds than the fewer the differences between the two, the easier it will to be to account for the emergence of the former form the latter. If one assumes that what makes an FL mind language ready are linguistically special operations and principles then the fewer of these the better. In fact, in the best case there will be exactly a single relatively simple difference between the two, language ready minds just being non-language ready ones plus (at most) one linguistically special simple addition (the desideratum that it be simple motivated by the assumption that simple additions are more likely to become evolutionarily available than complex ones).[2]

So let’s assess: there are two closely related MP projects: unify the GB modules and unify them using largely non-linguistically proprietary operations and principles. How far has this project gotten? Well, IMO, quite far. Others are sure to disagree. But the projects though somewhat open textured have proven to be manageable and, the first in particular, has generated useful hypotheses (e.g. the Merge Hypothesis and extensions thereof, like the Movement Theory of Control and Construal), which even if wrong have the right flavor (Iknow, I know, this is self serving!). Indeed, IMO, trying to specify exactly where and how these theories go wrong (if they do, color me skeptical but I have dogs in these fights) and why they go wrong as they do, is a reasonable extension of the basic MP projects. It is a tribute to how little MP concerns drive contemporary syntax that such questions are, IMO, rarely broached. Let me rant a bit.

Darwin’s Problem (DP) currently enjoys as little interest among linguists today as Plato’s Problem (PP) does (and did, in earlier times). Indeed, from where I sit, even PP barely animates linguistic investigations. So, for example, people who study variation rarely ask how it might be fixed (though there are notable exceptions). Similarly, people who propose novel principles and operations rarely ask whether and how they might be integrated/unified with the rest of the features of FL. Indeed, most syntacticians take the basic apparatus as given and rarely critically examine it (e.g. how many people worry about the deep overlap between Agree and I-merge?). These are just not standard research concerns. IMO, sadly, most linguists could care less about the cognitive aspects of GG, let alone its possible bio-linguistic features. The object of study is language, not FL, and the technical apparatus is considered interesting to the degree that it provides a potentially powerful philological tool kit. 

Ok, so MP motivates two projects. There is one more, and it concerns variation. GB took variation to be bounded. It did this by conceiving UG as providing a finiteset of parameter values and conceived of language acquisition as fixing those parameters. So, even if the space of possible Gs is very large, for GB, it is finite. Now, given the linguistic specificityof the parameters, and given that GB treats them as internalto FL, the idea that variation is a matter of parameter setting proves to be a deep MP challenge. Indeed, I would go so far as to say, that ifMP is on the right track, thenFL does not contain a finite list of possible binary parameters and G acquisition cannot be a matter of parameter setting. It must be something else, something that is not specific to G acquisition. And this idea has caught on, big time. Let me explain.

I have many times mentioned the work by Berwick, Lidz and Yang on G acquisition. Each contains what is effectively a learning theory that constructs Gs from PLD using FL principles. It appears that this general idea is quite widely accepted now, with former parameter setting types (e.g. David Lightfoot) now arguing that “UG is open” and that there is “no evaluation of I-languages and no binary parameters” (1).[3]This view is much more congenial to MP as it removes the very specific parametric options fromFL and treats variation as entirely a “learning” problem. G learning is no different than other kinds, it is just aimed at Gs.[4]

Of course to make this work, will require specifying what kids come to the learning problem with, what kinds of data they exploit, and what the details of the G learning theory are. And this is hard. It requires more than pointing to differences in the PLD and attributing differences in Gs to these differences. However, this is a long way from an actual learning theory which specifies how PLD and properties of FL combine to give you a G. Not the least important fact is that there are many ways to generalize from PLD to Gs and kids only exploit some of these.[5]That said, if there is an MP “theory” of variation it will consist of adumbrating the innate assumptions the LAD uses to fix a particular G on the basis of PLD. To date, we have some interesting proposals (in particular from Lidz and Yang and their colleagues in syntax) but no overarching theory.

Interestingly, if this project can be made to fly, then it will also be the front end of an MP theory of variation. To date, the main focus of research has been on unifying and simplifying FL and trying to determine how much of FL is linguistically proprietary. However, there is no reason that the considerable current typological work on G variation shouldn’t feed into developing theories of learning aimed at explaining why we find the variation we do. It is just that thisproject is going to be very hard to execute well, as it will demand that linguists develop skills that are not currently part of standard PhD training, at least not in syntax (e.g. courses in stats, machine learning, and computation). But isn’t this as it should be? 

So, does taking MP seriously make a difference? Yes! It spawns three projects all animated by the MP problematic. These projects make sense in the context of trying to specify the internal structure of an FL that couldhave evolved from earlier minds. It suggests three concrete projects. So the programmatic aspects of MP are quite fecund, which is all that we can ask of a program.

And results? Well, here too I believe that we have made substantial progress as regards the first project, some as regards the third (though it is very very hard) and a little concerning the second.  IMO, this is not bad for 25 years and suggests that the DPish way of framing the MP issues has more than paid for itself.

[1]It’s worth adding that this sort of exercise is quite common in the real sciences. Ideal gases are not actual gases, planets are not point masses, and our universe may not be the only possible one but figuring out how they work has been very useful. 
[2]There is a lot of hand waving going on here. Thus, what evolves are genomes and what we are talking about here are phenotypic expressions thereof. We are assuming that simple genotypic difference reflect simple genetic differences. Who knows if this is right. However, it is the standard assumption for this kind of biological speculation so it would be a form of methodological dualism to treat it as suspect onlyin the linguistic case. See herefor discussion of this “phenotypic gambit” and its role in evolutionary thinking.
[3]See “Discovering New Variable Properties without Parameters,” in Massimo Piattelli-Palmarini and Simin Karimi, eds., “Parameters: What are they? Where are they?” Linguistic Analysis 41, special edition (2017).
            A very terse version of this view is advanced in Hornstein (2009) on entirely MP grounds. The main conceptual difference between approaches like Lightfoot’s and the one I advanced is that the former relies on the idea that “children DISCOVER variable properties of their language through parsing” (1), whereas I waved my hands and mumbled something about curve fitting given an enhanced representation provided by FL (see herefor slightly more elaboration).
[4]This folds together various important issues, the most important being that there is no overall evaluation metric for parameter setting. Chomsky argued that the shift from evaluation metrics to parameter setting modules increased the latters feasibility because applying global evaluation metrics to Gs is computationally intractable. I think Chomsky might have though that parameter setting is more localized than G evaluation and so will not require fancy learning theories. It turns, as Dresher and Kaye long ago noted, that parameter setting models have their own tractability issues unless the parameters can be set independently of one another. If they are not independent, problems quickly arise (e.g. it is hard to fix parameters once and for all). 
Furthermore, it is not clear to me that something like global measures of G fitness can be entirely avoided, though Lightfoot insists that they should be. The main reason for my skepticism is empirical and revolves around the question of whether the space of G options is scattered or not. At least in syntax, it seems that different Gs are kept relatively separate (e.g. bilinguals might code switch between French and English but they don’t syntactically blend them to get an “average” of the two in Frenglish. Why not?). This suggests that Gs enjoy an integrity and this is what keeps them cognitively apart. Bill Idsardi tells me that this might be less true on the sound side of things. But as regards the syntax, this looks more or less correct. If it is, then some global measure distinguishing different Gs might be required. 
I should add that more recently, if I recall correctly, Fodor and Sakas have argued that the evaluation metric cannot be completely dispensed with even on their “parsing” account.
[5]So, for example, invoking “parsing” as the driver behind acquisition does not do much unless one specifies howparsing works. Recall that standard parsers (e.g. the Marcus Parser) embody Gs that guide how it is that input data is analyzed. No G, no parsing. But if the aim is to explain how Gs are acquired then one cannot presuppose that the relevant G already exists as part of the parser. So what does a parse consist in in detail? This is a hard problem and it turns out that there are many factors that the child uses to analyze a string so as to recover a meaning. The MP project is to figure out what this is, not to name it.


  1. I’m happy to serve as a foil, but I don’t actually think that the pursuit of Darwin’s Problem is wrongheaded. I agree that the question of how FL evolved is useful in driving research into the nature of FL, and would naturally encourage unifying modules, eliminating parameters, and getting as much as possible out of independently motivated aspects of FLB, just as you say, and I agree that those are all good things.

    All I said, or meant to say, was that I don’t know how to determine what aspects of FL are “specific to language.” I like to think I understand a lot about how language works, but I don’t pretend to know much about how the rest of cognition works. So when I get to thinking about, for example, whether Internal Merge is Triggered or Free, then I have to decide whether interpretation at the CI interface could force subject raising out of the vP phase in order to allow the vP to have an unambiguous label, and what that would predict about where the subject ends up. I can see how it makes sense to try to get as much as possible to fall out from third factor principles of efficient computation and so on (Chomsky 2005), but failing that, it just seems premature to me to even speculate about whether the connection between labels and interpretation at the CI interface is specific to language or whether it also features in nonlinguistic cognition.

    Obviously the zero hypothesis is never going to be that anything is specific to language, but at the same time, just formulating the observations clearly enough to discuss them requires me to frame them in terms that are going to be very theory-internal and hence specific to language (since, remember, I don’t know anything about anything else).

    Clearly, it is very silly of the Tomaselli out there to claim that there’s nothing specific to language. They don’t spend enough quality time with the problem sets to understand what is at stake in the discussion of Free versus Triggered Merge, or in what direction probes probe, or whether exponent insertion is postsyntactic, or whether prosodic phrases correspond to phase complements, and if they did they would have no idea what aspect of “general cognition” was supposed to provide the relevant mechanism. The people who are most emotionally invested in denying the existence of specific-to-language-UG are talking out of their hats. But their perfidy is not an argument for the existence of specific-to-language-UG.

    1. Thx for being a joyous foil. One remark: conceded that there are many issues for which it is hard to say whether the principle etc is linguistic specific or not, I am not sure that this is always the case. To return to one that seems ripe for rethinking along more general lines, Minimality seems more than a tad related to properties of biological memory wherein we find tons of similarity based interference effects. This does not imply that Minimality IS similarity based interference, but that there is a likely relation between the two and it would be nice to figure out what it is to the benefit of linguistics. Ditto many other locality notions (as Chomsky himself urged when we first started looking into bounding and islands).

      IMO, as you no doubt know, the central research question in linguistics is the structure of FL/UG. The minimalist gloss on this is the question, what is specific to language (UG) and what is not (either cognitively general or computationally necessary). That's what we want to be able to specify. Your observation is that our tools are now too blunt to address these questions. Say you are right. What is the upshot? Here is one: new tools! IF that is the right question, then we look for ways to answer it. We don't ignore it because our tools don't look apposite.

  2. The traditional portrayal of Darwin's Problem always seemed problematic to me If the goal is to find a way that "a single relatively simple difference between the two, language ready minds just being non-language ready ones", why is language unique to humans? If it only takes a small change, why hasn't that change arisen in other cases.

    This other DP seems to have two possible solutions: (a) language has arisen in other species but it is either not advantageous enough to be selected for (or actively disadvantageous) or (b) the evolution of language is so complex that it is unlikely to arise more than once. Given that language seems to be the main advantage of human beings and the current success of humans (a) doesn't seem very promising. If we move to (b) and adopt the simplifying assumption that phenotypic complexity recapitulates genotypic complexity, we should be trying to derive the most baroque possible FoL.

    1. A small change give the right background. So, given ALL THE OTHER NON LINGUISTIC FACTORS adding a small novelty can have big effects. This implies that such an addition would not everywhere have the effects seen in humans.

      Is this plausible? I don't know. I so think that language has arisen exactly once in our species, though the cognitive factors that underlie it are probably widely present in other animals. Why other animals did not develop similar systems I leave to the low probability that ANY novelty persists. It is my understanding that novelties are hard to come by, and even if they arise, they are even harder to keep. So, a lucky accident plus the right background and we find language in humans.

  3. So it is worth having a big empirical fact of interest driving the agenda as well. And there are few facts bigger and heftier than the fact that FL arose from non-FL capable minds

    I think part of the reason for the varying degrees of interest in the MP is the tendency to not think of this as a "fact", or at least not a "linguistic fact". Linguistic facts are things that come with glosses, translations and example numbers! Similarly for the later point that "even PP barely animates linguistic investigations".

    Personally I'm inclined to think that general Ockham's razor arguments provide the most convincing support for the advances you mention here, rather than the arguments from Darwin's Problem, just because I don't see how the dots join up (perhaps because I don't know enough about biology). But I completely agree that Darwin's Problem takes the form of a fact to be explained.