The converse is also true: to the degree that a field of
inquiry enjoys a “revolution” every decade to that degree it suggests that it
has not yet made the jump from (possibly informed) speculation to scientific
inquiry.  Revolutions tend to discard the
past, not build on it and this implies that what is discarded is not worth
retaining. Permanent scientific revolution within a field is a leading
indicator of ignorance.
Given this epistemological yardstick it is germane to ask
how well modern Generative Grammar (GG) has measured up.  Not well, is one standard view.  Here we argue that this misreads the history
of GG. We aim to back this judgment up by providing a Whig history (WH) of GG
that displays its cumulative character. 
Before proceeding, a word of caution. WHs are not “real” histories. They
present the past as “an inevitable progression towards ever greater…enlightenment.”
Real history is filled with dead ends, lucky breaks, misunderstandings,
confusions, petty rivalries, and more. 
WHs are not. They focus on the “successful chain of theories and
experiments that led to the present-day science, while ignoring failed theories
and dead ends” (see here).
A less pejorative term for WH is “rational reconstruction.” At its best, a WH
can expose the logic behind a set of questions and the attendant inquiries into
them, exposing how the theories we consider today have built on the results of
earlier theoretical and empirical investigations.  A domain with a credible WH is a domain where
the growth of knowledge has in fact been cumulative even if the path leading to
this body of knowledge has been extremely crooked. GG has a very credible WH,
as we intend to demonstrate.
1. Where to begin? 
The modern Generative enterprise rises from the consideration of a
handful of obvious facts:
            (1) A
competent speaker of a given natural language (NL) has the capacity to deal
with an effectively infinite number of linguistic objects.
            (2) The
linguistic objects in question are pairings of meanings with “sounds.” Thus,
among the things a native speaker of English knows is that Dogs chase cats does not mean the same thing as Cats chase dogs while Cats are chased by dogs does. There are
an unbounded number of such facts that a competent native speaker of a given NL
knows.
            (3) Any
human child can acquire competence in any NL when placed in the appropriate
speech community. Thus, any child of any parentage will grow up speaking
English if it grows up in NYC, Chinese if it grows up in Shanghai, Hungarian in
Budapest, etc. Moreover, all children regardless of the language, or the child,
do this in essentially the same way.
            (4) This
capacity to acquire anything like an NL is a species-specific capacity that
humans have. In other words, no other animals do language like we humans do
language.
The first fact suggests that linguistic competence consists
(in part) in mastery of a system of rules that specify the NL mastered.  Why a rule system? Because that is the only
way to specify an effectively infinite capacity.  We cannot just list the objects in the domain
of a native speaker’s competence. The capacity can only be specified in terms
of a finite procedure that describes (aka: generates) it. Thus, we conclude
that linguistic mastery consists (in part) in acquiring a set of rules (aka: a
Grammar (G)) that generate the kinds of linguistic objects that a native
speaker has competence with.   
The second fact tells us something more about these Gs. They
must specify pairings of meanings with sounds. Thus the rule systems that
native speakers have mastered are rules that generate objects with two distinctive
properties. Gs are generative procedures that tie a specific meaning profile
together with a specific sound profile, and they do this over an effectively
infinite domain. So Gs are functions whose range is meaning-sound pairs, viz.
an infinite number of objects like this: <m,s>. What’s the domain? Some
finite set of “atoms” that can combine again and again to yield more and more
complex <m,s> pairs. Let’s call these atoms ‘morphemes.’ 
Putting this all together, we know from the basic facts and
some very elementary reasoning that native speakers master Gs (recursive
procedures) that map morphemes into an unbounded range of <m,s>s. This we
know. What we don’t know is what the specific rules that Gs contain look like
(or for that matter what the ‘m’s and ‘s’s look like). And that brings us to
our first research question: describe some rules of NL Gs and specify their
variety.
We know another thing about these Gs. They can be acquired
by any human when placed in the appropriate linguistic environment. You might
be a native speaker of English, but you could just as well have been a native
speaker of Swahili but for an accident of birth location. There is nothing that
intrinsically makes you a native speaker of English (nor of Swahili, Dutch,
Japanese, etc.), though, given the third and fourth facts noted above, there is
likely to be something biologically intrinsic to you that makes you capable of
acquiring a NL (i.e. a G that generates the NL) at all. Let’s call the capacity
that humans have to acquire a NL, the ‘Faculty of Language’ (FL).  Note, postulating that there is a FL does not
tell us what’s in FL. It simply names a fact: that humans (qua humans) are able to acquire any NL G in more or less the same
way. We can think of FL as a function whose range is Gs of NLs.  An obvious research question is what’s the
fine structure of this function FL.
This second research question has been addressed in at least
two different ways. The first has been to inspect many NL Gs to induce what all
Gs have in common? Note, that doing this requires having a largish set of
candidate Gs and a reasonable variety of them (e.g. romance Gs, Germanic Gs,
Semitic Gs, Austonesian Gs, East Asian Gs, etc.).  Clearly, we cannot inspect their
commonalities without having them to inspect.[1]
There is a second way of addressing this question. We can
ask what FL must contain in order to produce even a single a G only guided by
the kinds of data the child uses. Call the data the child actually uses to
guide its G acquisition ‘Primary Linguistic Data’ (PLD).  We can investigate the structure of FL by
asking what (if anything) we must assume about the structure of FL to allow it
to use the PLD(NL) (i.e. the PLD of a given NL, e.g. uttered bits of English)
to arrive at the G(NL) (i.e. The G of that NL, e.g. the grammar of English)
that the child attains.  
What do we know about the PLD? Actually quite a bit. We know
that it consists of relatively simple linguistic forms. There are not many
sentences in the PLD that the child has access to (e.g. child directed
language, a superset, most likely, of what it actually uses) that involve more
than 2 levels of embedding. Indeed, most of the PLD seems to consist of simple
phrases and sentences.  Moreover, there
are virtually no grammatically ill-formed utterances addressed to the child and
no correction of mistakes that the child spontaneously makes. Thus, to a good
first approximation, we can take the PLD to be simple well formed phrases and
sentences addressed to the child. From this the child builds a G that permits
it to generate an effectively unbounded number of phrases and sentences, both
simple (like the one’s it is exposed to) and complex (language bits with
structures unattested in the PLD). The idea is that we can investigate how FL
is structured using the following argument form, called ‘The Poverty of the Stimulus
Argument’ (POS): assume that anything that the child can acquire on the basis of the PLD it does acquire in this way.
However, anywhere that the PLD is insufficient
to fix the attained property implicates some built-in (i.e. innate) feature of
FL.  In contrast to the comparative
method noted above, the POS licenses inferences about FL based on the
properties of a single G(NL).  The method
is effectively subtractive: what you can acquire from PLD assume is so acquired,
what’s left after you subtract this out is due to fixed (viz. innate) features
of FL.[2]
Two last points before proceeding further: note (i) that
investigating FL in either way requires that we have some candidate Gs. The
output of FL is a G, so until we know something about the operations that NL Gs
contain, it is fruitless to pursue this question about FL. (ii) that the two
forms of hunting for innate features of FL are complementary and both are
useful. As we will see in our WH of GG below, what we learned in studying
multiple particular Gs , and POS evaluations of single Gs have both contributed
to GG’s understanding of FL’s structure.
Given (4) we know that FL is a biological novelty. That
means that there was a time at which our ancestors did not have a FL. A
reasonable question to ask is how FL arose in humans. Note that just as having
candidate Gs is a precondition for studying the properties of FL, having plausible
candidate FLs is a precondition for studying how FL arose in the species.  Let’s be a little clearer here.
Let’s divide FL into those features (i) that are domain
specific to the use and mastery of FL (call these principles ‘Universal Grammar’
(UG)), (ii) that FL shares with other cognitive capacities (call these ‘domain
general cognitive features’ (DGCF)), (iii) features that FL has by physical
necessity (PN).  The more that FL is
constructed from operations and principles drawn from (ii) and (iii) to that
degree the story of how FL could have arisen in the species can be simplified.
For example, if all of the ingredients but one necessary to build our FL are in
DGCF or PN then we can trace the rise of FL in humans to the emergence of that
one distinctive property of UG. 
Conversely, the more that must be packed into UG the more involved will
be an explanation for how FL arose. 
The difficulty is further exacerbated if FL is a relatively
recent cognitive innovation, as it leaves less time for (the oft believed) gradual
methods of natural selection to work their magic.[3] This
leads to the following conclusion: in the best of all possible worlds, FL uses
operations and principles that cognitively pre-date the emergence of FL and
that these suffice to construct an FL with the properties ours has when the
(hopefully, very) small number (possibly zero, but more likely one or two) of language
specific cognitive innovations (i.e. UG) are added to the prior mix.  This line of reasoning suggests another clear
project: show UG is pretty sparse and that very modest UGs can derive the
operations and principles of richer UGs when combined with identifiable DGCFs
and PNs.  
The above formulation highlights an important tension
between explaining how Gs arise in a single individual and explaining how FL
arose in the species.  The more we pack
into UG (operations and principles specific to linguistic capacity), the easier
we make the child’s task of projecting a G(NL) from the PLD(NL) it exploits.  However, the richer the UG component of FL, the
harder it is for our cognitive ancestors (who, by assumption, were sans FL) to
evolve an FL like ours.  That’s the
tension, and contemporary GG tries to address it. However, for now, let’s just
observe the tension and note that the enterprise of discussing how FL arose in
the species can only get productively started once we have some idea of what properties
FL has, and for this it is very useful to have some understanding of what
operations and principles of FL might be linguistic specific (i.e. part of UG).
To conclude our little conceptual tour of the problem: we have
here outlined the logic of the GG research program based on some pretty
elementary facts. This project addresses three questions:[4]
(5)       a. What properties do individual Gs have?
                        b.
What properties must FL have to so as to enable it to acquire these Gs?
                        c.
Which of these properties of FL are proprietary to language and which
are more general?
Our WH will illustrate how research in GG can be understood
as progressively addressing each of these
questions, thereby setting the stage for a fruitful investigation of the
next questions. This is what we should expect given our observation that (5a)
is a precondition for (fruitfully) addressing (5b) and (5b) is a precondition
for fruitfully addressing (5c). Note that ‘precondition’ is here used in the
conceptual sense.  It does not mean that
answers to (5b) might not lead to rethinking claims about (5a) or (5c) to
claims about (5b). Nor does it mean that these questions cannot be (and aren’t)
pursued in tandem. As a matter of practice, all three questions are often addressed
simultaneously. Rather, what we intend by ‘precondition’ is that it is nugatory
to pursue the latter questions without some answers to the previous ones given
the nature of the questions asked.
One more caveat before we get the show on the road: this is
a very idealized account of the
problem and its empirical boundary conditions. For example, most GGers do not
think that humans have a single grammar of their NL (i.e. it is almost certain
that humans develop multiple Gs). Nor do they think that NLs are natural kinds
(e.g. looked at carefully, there is nothing like ‘English’ that all so-called
speakers of English speak). Ontologically, Gs are more “real” than the NLs they
are related to. However, this idealization is adopted because it is recognized
that describing the properties of G and FL even under these idealized
assumptions is already very hard and, GGers believe that relaxing the
idealization will not significantly affect the shape of the answers
provided.  Of course, this may be
incorrect. But we doubt it and what follows does not much worry about the
legitimacy of so idealizing.[5]
2. So given these three questions, we can divide the WH of
GG into three epochs. Early work in GG (say from the mid 50s to the early 70s)
concentrated on constructing sample Gs for fragments of a given NL.  The second epoch goes from the mid 70s to the
early 90s. This concentrated on simplifying the rules/operations of FL. This
involved, categorizing the various rule types a G could have, factoring out
common features within these types and enriching UG to prevent massive
over-generation (a natural consequence of simplifying the rules, as we shall
see). The third epoch goes from the mid 90s to the present. This period focuses
on simplifying FL, trying to figure out which aspects of FL’s properties are
language specific and which follow from more general cognitive/ and/or computational
principles. The aim here has been to factor out those features of FL that are
computationally general, leaving, it is hoped, a very small domain specific
residue. So, three epochs: (i) exploring the rules Gs contain and how they
interact, (ii) simplifying the structure of Gs by articulating the structure of
UG and (iii) simplifying FL by separating the computationally general wheat
from the linguistically specific chaff. 
In what follows we describe in slightly more detail the
kinds of results each epoch delivered and illustrate the progressive nature of
the GG enterprise. Let’s begin with the first epoch.
[1]
Though this sounds like an obvious method to pursue in studying the structure
of FL, it is actually quite a bit harder to do than one might think. The reason
is that Gs do not tend to have the same rules. What they have in common is far
more abstract: e.g. all the rules adhere to a common rule schema, or all the
rules obey similar constraints in the sense of no rules within Gs showing
evidence of disobeying them.  This makes any simple-minded process of
looking for commonalities quite difficult.
[2]
It is worth observing that the POS sets a very high standard for concluding
that some property reflects innate structural features of FL. POS assumes that
any feature of G that could be a data
driven acquisition is one.  However, this conclusion does not follow. Nonetheless,
POS allows one to isolate a whole slew of promising candidate generalizations
useful in probing the built-in structure of FL. 
[3]
Even if natural selection can operate quickly, time pressures may matter.
[4]
There are others: How is FL instantiated in the brain? How are Gs used in
linguistic performance? How is FL used to acquire a G in real time? We return
to these.
[5]
This does not mean to say that GGers have not explored models that relax these
idealizations. For example, a staple of work in historical linguistics is to
assume that speakers have multiple Gs that compete. Change is then modeled as
the changing dominance relations between these multiple Gs. 
