The converse is also true: to the degree that a field of inquiry enjoys a “revolution” every decade to that degree it suggests that it has not yet made the jump from (possibly informed) speculation to scientific inquiry. Revolutions tend to discard the past, not build on it and this implies that what is discarded is not worth retaining. Permanent scientific revolution within a field is a leading indicator of ignorance.
Given this epistemological yardstick it is germane to ask how well modern Generative Grammar (GG) has measured up. Not well, is one standard view. Here we argue that this misreads the history of GG. We aim to back this judgment up by providing a Whig history (WH) of GG that displays its cumulative character. Before proceeding, a word of caution. WHs are not “real” histories. They present the past as “an inevitable progression towards ever greater…enlightenment.” Real history is filled with dead ends, lucky breaks, misunderstandings, confusions, petty rivalries, and more. WHs are not. They focus on the “successful chain of theories and experiments that led to the present-day science, while ignoring failed theories and dead ends” (see here). A less pejorative term for WH is “rational reconstruction.” At its best, a WH can expose the logic behind a set of questions and the attendant inquiries into them, exposing how the theories we consider today have built on the results of earlier theoretical and empirical investigations. A domain with a credible WH is a domain where the growth of knowledge has in fact been cumulative even if the path leading to this body of knowledge has been extremely crooked. GG has a very credible WH, as we intend to demonstrate.
1. Where to begin? The modern Generative enterprise rises from the consideration of a handful of obvious facts:
(1) A competent speaker of a given natural language (NL) has the capacity to deal with an effectively infinite number of linguistic objects.
(2) The linguistic objects in question are pairings of meanings with “sounds.” Thus, among the things a native speaker of English knows is that Dogs chase cats does not mean the same thing as Cats chase dogs while Cats are chased by dogs does. There are an unbounded number of such facts that a competent native speaker of a given NL knows.
(3) Any human child can acquire competence in any NL when placed in the appropriate speech community. Thus, any child of any parentage will grow up speaking English if it grows up in NYC, Chinese if it grows up in Shanghai, Hungarian in Budapest, etc. Moreover, all children regardless of the language, or the child, do this in essentially the same way.
(4) This capacity to acquire anything like an NL is a species-specific capacity that humans have. In other words, no other animals do language like we humans do language.
The first fact suggests that linguistic competence consists (in part) in mastery of a system of rules that specify the NL mastered. Why a rule system? Because that is the only way to specify an effectively infinite capacity. We cannot just list the objects in the domain of a native speaker’s competence. The capacity can only be specified in terms of a finite procedure that describes (aka: generates) it. Thus, we conclude that linguistic mastery consists (in part) in acquiring a set of rules (aka: a Grammar (G)) that generate the kinds of linguistic objects that a native speaker has competence with.
The second fact tells us something more about these Gs. They must specify pairings of meanings with sounds. Thus the rule systems that native speakers have mastered are rules that generate objects with two distinctive properties. Gs are generative procedures that tie a specific meaning profile together with a specific sound profile, and they do this over an effectively infinite domain. So Gs are functions whose range is meaning-sound pairs, viz. an infinite number of objects like this: <m,s>. What’s the domain? Some finite set of “atoms” that can combine again and again to yield more and more complex <m,s> pairs. Let’s call these atoms ‘morphemes.’
Putting this all together, we know from the basic facts and some very elementary reasoning that native speakers master Gs (recursive procedures) that map morphemes into an unbounded range of <m,s>s. This we know. What we don’t know is what the specific rules that Gs contain look like (or for that matter what the ‘m’s and ‘s’s look like). And that brings us to our first research question: describe some rules of NL Gs and specify their variety.
We know another thing about these Gs. They can be acquired by any human when placed in the appropriate linguistic environment. You might be a native speaker of English, but you could just as well have been a native speaker of Swahili but for an accident of birth location. There is nothing that intrinsically makes you a native speaker of English (nor of Swahili, Dutch, Japanese, etc.), though, given the third and fourth facts noted above, there is likely to be something biologically intrinsic to you that makes you capable of acquiring a NL (i.e. a G that generates the NL) at all. Let’s call the capacity that humans have to acquire a NL, the ‘Faculty of Language’ (FL). Note, postulating that there is a FL does not tell us what’s in FL. It simply names a fact: that humans (qua humans) are able to acquire any NL G in more or less the same way. We can think of FL as a function whose range is Gs of NLs. An obvious research question is what’s the fine structure of this function FL.
This second research question has been addressed in at least two different ways. The first has been to inspect many NL Gs to induce what all Gs have in common? Note, that doing this requires having a largish set of candidate Gs and a reasonable variety of them (e.g. romance Gs, Germanic Gs, Semitic Gs, Austonesian Gs, East Asian Gs, etc.). Clearly, we cannot inspect their commonalities without having them to inspect.
There is a second way of addressing this question. We can ask what FL must contain in order to produce even a single a G only guided by the kinds of data the child uses. Call the data the child actually uses to guide its G acquisition ‘Primary Linguistic Data’ (PLD). We can investigate the structure of FL by asking what (if anything) we must assume about the structure of FL to allow it to use the PLD(NL) (i.e. the PLD of a given NL, e.g. uttered bits of English) to arrive at the G(NL) (i.e. The G of that NL, e.g. the grammar of English) that the child attains.
What do we know about the PLD? Actually quite a bit. We know that it consists of relatively simple linguistic forms. There are not many sentences in the PLD that the child has access to (e.g. child directed language, a superset, most likely, of what it actually uses) that involve more than 2 levels of embedding. Indeed, most of the PLD seems to consist of simple phrases and sentences. Moreover, there are virtually no grammatically ill-formed utterances addressed to the child and no correction of mistakes that the child spontaneously makes. Thus, to a good first approximation, we can take the PLD to be simple well formed phrases and sentences addressed to the child. From this the child builds a G that permits it to generate an effectively unbounded number of phrases and sentences, both simple (like the one’s it is exposed to) and complex (language bits with structures unattested in the PLD). The idea is that we can investigate how FL is structured using the following argument form, called ‘The Poverty of the Stimulus Argument’ (POS): assume that anything that the child can acquire on the basis of the PLD it does acquire in this way. However, anywhere that the PLD is insufficient to fix the attained property implicates some built-in (i.e. innate) feature of FL. In contrast to the comparative method noted above, the POS licenses inferences about FL based on the properties of a single G(NL). The method is effectively subtractive: what you can acquire from PLD assume is so acquired, what’s left after you subtract this out is due to fixed (viz. innate) features of FL.
Two last points before proceeding further: note (i) that investigating FL in either way requires that we have some candidate Gs. The output of FL is a G, so until we know something about the operations that NL Gs contain, it is fruitless to pursue this question about FL. (ii) that the two forms of hunting for innate features of FL are complementary and both are useful. As we will see in our WH of GG below, what we learned in studying multiple particular Gs , and POS evaluations of single Gs have both contributed to GG’s understanding of FL’s structure.
Given (4) we know that FL is a biological novelty. That means that there was a time at which our ancestors did not have a FL. A reasonable question to ask is how FL arose in humans. Note that just as having candidate Gs is a precondition for studying the properties of FL, having plausible candidate FLs is a precondition for studying how FL arose in the species. Let’s be a little clearer here.
Let’s divide FL into those features (i) that are domain specific to the use and mastery of FL (call these principles ‘Universal Grammar’ (UG)), (ii) that FL shares with other cognitive capacities (call these ‘domain general cognitive features’ (DGCF)), (iii) features that FL has by physical necessity (PN). The more that FL is constructed from operations and principles drawn from (ii) and (iii) to that degree the story of how FL could have arisen in the species can be simplified. For example, if all of the ingredients but one necessary to build our FL are in DGCF or PN then we can trace the rise of FL in humans to the emergence of that one distinctive property of UG. Conversely, the more that must be packed into UG the more involved will be an explanation for how FL arose.
The difficulty is further exacerbated if FL is a relatively recent cognitive innovation, as it leaves less time for (the oft believed) gradual methods of natural selection to work their magic. This leads to the following conclusion: in the best of all possible worlds, FL uses operations and principles that cognitively pre-date the emergence of FL and that these suffice to construct an FL with the properties ours has when the (hopefully, very) small number (possibly zero, but more likely one or two) of language specific cognitive innovations (i.e. UG) are added to the prior mix. This line of reasoning suggests another clear project: show UG is pretty sparse and that very modest UGs can derive the operations and principles of richer UGs when combined with identifiable DGCFs and PNs.
The above formulation highlights an important tension between explaining how Gs arise in a single individual and explaining how FL arose in the species. The more we pack into UG (operations and principles specific to linguistic capacity), the easier we make the child’s task of projecting a G(NL) from the PLD(NL) it exploits. However, the richer the UG component of FL, the harder it is for our cognitive ancestors (who, by assumption, were sans FL) to evolve an FL like ours. That’s the tension, and contemporary GG tries to address it. However, for now, let’s just observe the tension and note that the enterprise of discussing how FL arose in the species can only get productively started once we have some idea of what properties FL has, and for this it is very useful to have some understanding of what operations and principles of FL might be linguistic specific (i.e. part of UG).
To conclude our little conceptual tour of the problem: we have here outlined the logic of the GG research program based on some pretty elementary facts. This project addresses three questions:
(5) a. What properties do individual Gs have?
b. What properties must FL have to so as to enable it to acquire these Gs?
c. Which of these properties of FL are proprietary to language and which
are more general?
Our WH will illustrate how research in GG can be understood as progressively addressing each of these questions, thereby setting the stage for a fruitful investigation of the next questions. This is what we should expect given our observation that (5a) is a precondition for (fruitfully) addressing (5b) and (5b) is a precondition for fruitfully addressing (5c). Note that ‘precondition’ is here used in the conceptual sense. It does not mean that answers to (5b) might not lead to rethinking claims about (5a) or (5c) to claims about (5b). Nor does it mean that these questions cannot be (and aren’t) pursued in tandem. As a matter of practice, all three questions are often addressed simultaneously. Rather, what we intend by ‘precondition’ is that it is nugatory to pursue the latter questions without some answers to the previous ones given the nature of the questions asked.
One more caveat before we get the show on the road: this is a very idealized account of the problem and its empirical boundary conditions. For example, most GGers do not think that humans have a single grammar of their NL (i.e. it is almost certain that humans develop multiple Gs). Nor do they think that NLs are natural kinds (e.g. looked at carefully, there is nothing like ‘English’ that all so-called speakers of English speak). Ontologically, Gs are more “real” than the NLs they are related to. However, this idealization is adopted because it is recognized that describing the properties of G and FL even under these idealized assumptions is already very hard and, GGers believe that relaxing the idealization will not significantly affect the shape of the answers provided. Of course, this may be incorrect. But we doubt it and what follows does not much worry about the legitimacy of so idealizing.
2. So given these three questions, we can divide the WH of GG into three epochs. Early work in GG (say from the mid 50s to the early 70s) concentrated on constructing sample Gs for fragments of a given NL. The second epoch goes from the mid 70s to the early 90s. This concentrated on simplifying the rules/operations of FL. This involved, categorizing the various rule types a G could have, factoring out common features within these types and enriching UG to prevent massive over-generation (a natural consequence of simplifying the rules, as we shall see). The third epoch goes from the mid 90s to the present. This period focuses on simplifying FL, trying to figure out which aspects of FL’s properties are language specific and which follow from more general cognitive/ and/or computational principles. The aim here has been to factor out those features of FL that are computationally general, leaving, it is hoped, a very small domain specific residue. So, three epochs: (i) exploring the rules Gs contain and how they interact, (ii) simplifying the structure of Gs by articulating the structure of UG and (iii) simplifying FL by separating the computationally general wheat from the linguistically specific chaff.
In what follows we describe in slightly more detail the kinds of results each epoch delivered and illustrate the progressive nature of the GG enterprise. Let’s begin with the first epoch.
 Though this sounds like an obvious method to pursue in studying the structure of FL, it is actually quite a bit harder to do than one might think. The reason is that Gs do not tend to have the same rules. What they have in common is far more abstract: e.g. all the rules adhere to a common rule schema, or all the rules obey similar constraints in the sense of no rules within Gs showing evidence of disobeying them. This makes any simple-minded process of looking for commonalities quite difficult.
 It is worth observing that the POS sets a very high standard for concluding that some property reflects innate structural features of FL. POS assumes that any feature of G that could be a data driven acquisition is one. However, this conclusion does not follow. Nonetheless, POS allows one to isolate a whole slew of promising candidate generalizations useful in probing the built-in structure of FL.
 Even if natural selection can operate quickly, time pressures may matter.
 There are others: How is FL instantiated in the brain? How are Gs used in linguistic performance? How is FL used to acquire a G in real time? We return to these.
 This does not mean to say that GGers have not explored models that relax these idealizations. For example, a staple of work in historical linguistics is to assume that speakers have multiple Gs that compete. Change is then modeled as the changing dominance relations between these multiple Gs.