Friday, April 17, 2015

Does the LSA and its flagship journal 'Language' have any regard for generative grammar?

It has come to my attention that Language is considering making an "event" of Vyvyan Evan's junk book The Language Myth. What do I mean by an "event"? Well, and here I quote: "because of the potentially controversial nature of the book, Language is planning a new type of review, in which we target the book for commentary papers by 4-5 individuals who have different academic perspectives." These reviews are to be about 1500 words. So, Language is going to make a BIG DEAL (6-7500 words of criticism plus a reaction by Evans, I would assume) out of this book pretending that there is something there, pretending that it is "controversial" in the sense that that the book raises many interesting issues that people of good faith can understand in different ways and that debating would enlightening. This is false. There are not and that's because the book is junk. The suggestion that Language (and, by extension, the LSA) believes otherwise is a terrible message to send.

Let me be clear: the book is not controversial. It is junk. Pure, unadulterated, complete junk. Reading it will make you dumber. The fact that Language is doing a "new type of review" will only suggest that this is not so. It will suggest that there really are various reasonable sides to the issue Evans book discusses and that the views in the book are worth taking seriously. After all, Language, the journal of the LSA, the main professional organization of linguistics, thinks that the book is is "controversial," (which in common parlance suggests well argued if still a bit out there). It does not suggest that the book is junk. Moreover, getting a wide range of reviews virtually guarantees that at least one of them will suggest that the views are not junk. After all, I bet Language wants to be "fair." What piece of junk could ask for a better endorsement than this?

Generativists have always considered Language the place you publish when you can't get your stuff into LI or NLLT, or Lingua or… It is far down the list of desirable publishing venues. If it ever was the journal that published the stuff at the cutting edge, it is no longer is that journal. Nonetheless, it is the official journal of the LSA and as such it should care about whether the works it highlights meet even minimal professional standards (one would hope for more than that, of course). The Evans book does not. To repeat, it's junk. So why exactly does Language want to showcase it? Do the editors hate Generative Grammar that much? Do they really think that generative linguistics has been an intellectual disaster? It would be nice to know if this is what the editors think, for if it is, maybe it's time for Generativists to either leave the LSA or the LSA should consider replacing the editors.

So, either Language hates 2/3 of the field (always a possibility) or the editors are filled with self-loathing. I find it hard to believe that any other professional journal would showcase work that is shoddy,  unprofessional, uninformed and logically lacking. Can you see Physical Review doing a special review on the latest approaches to perpetual motion? Or the American Anthropological Review doing a special issue on creation science? I can't. They have more self respect than that. They know that these topics are junk. But apparently Language is different. It's "open-minded" and willing to consider even junk as worthy of showcasing because of its "controversial" nature. This is not the first time Language  has done this (see here). Someone like me might get the impression that Language in no way respects what it is that Generative Grammar has done over the last 65 years. The idea that Evans' book is "controversial" suggests that the editors have lost all critical sense and are willing to admit the most egregious junk into its journals. This is not to say that Evans' book does not deserve special treatment in the pages of Language. It does. Language should be highlighting the fact that work like this is not worth the paper that it is written on. A decent hatchet job, now that I understand. But a "new type of review"? It sends entirely the wrong message.

Thursday, April 16, 2015

A (shortish) whig history of generative grammar (part 4, the end)

3.     Minimalism: The third epoch

Where are we?  We reviewed how the first period of syntactic research examined how grammars that had to generate an unbounded number of hierarchically organized objects might be structured. It did this by postulating rules whose interactions yielded interesting empirical coverage, generating both a fair number of acceptable sentences and not generating an interesting number of unacceptable sentences. In the process, this early work discovered an impressive number of effects that served as higher-level targets of explanation for subsequent theory. To say the same thing a little pompously, early GG discovered a bunch of “effects” which catalogued deep-seated generalizations characteristic of the products of human Gs.  These effects sometimes fell together as “laws of grammar” and were taken, reasonably, as consequences of the built-in structural properties of FL.

This work set the stage for the second stage of research: a more direct theoretical investigation of the properties of FL. The relevant entrée to this line of investigation was Plato’s Problem: the observation that what native speakers know about their languages far exceeds what they could have learned about it by examining the PLD available to them in the course of language acquisition. Conceptually, addressing Plato’s Problem suggested a two-prong attack: first, radical simplification of the rules that Gs contain and second, enrichment of what FL brings to the task of acquisition. By factoring out the complexity built into previous rules into simple operations like Move a made the language particular rules that were acquired easier to acquire. This simplification, however, threatened generative chaos. The theoretical task was to prevent this. This was accomplished by enriching the innate structure of FL in principled ways. The key theoretical innovation was trace theory.  Traces simplified derivations by making them structure preserving and it allowed for the unification of movement and binding. These theoretical moves addressed the over-generation problem.[1] They also set the stage for contemporary minimalist investigations. We turn to this now.

The main problem with the GB theory of FL from a minimalist perspective is its linguistic specificity.  Here’s what we mean.

Within GB, FL is both very complex and the proposed innate principles and operations are very linguistically specific. The complexity is evident in the modular architecture of the basic GB theory as well as in the specific principles and operations within each module. (26) and (27) reiterate the basic structure of the theory.

  (26)   a. X’ theory of phrase structure
b. Case
c. Theta
d. Movement
      i. Subjacency
      ii. ECP
e. Construal
      i. Binding
f. Control

(27)     DS: X’-rules, Theta Theory, input to T-rules
  |   Move a (T-rules)/ trace theory, output SS
            SS: Case Theory, Subjacency, gamma-marking, BT
             |    |       Move a (covert movement)
           |        |
         PF     LF: BT, *+gamma,

Though some critical relations crosscut (many of) the various modules (e.g. government), the modules each have their own special features. For example, X’ theory traffics in notions like specifier, complement, head, maximal projection, adjunct and bar level. Case theory also singles out heads but distinguishes between those that are case assigning and those that require case. There is also a case filter, case features and case assigning configurations (government). Theta theory also uses government but for the assignment of Q-roles, which are assigned in D-structure by heads and are regulated by the theta criterion; a condition that requires every argument to get one and at most one theta role. Movement exploits another set of concepts and primitives: bounding node/barrier, escape hatch, subjacency principle, antecedent government, head government, g-marking, a.o. Last the construal rules come in four different types, one for PRO, one for local anaphors like reflexives and reciprocals, one for pronouns and one for all the other kinds of DPs, dubbed R-expressions. There is also a specific licensing domain for anaphors and
pronouns, indexing procedures for the specification of syntactic antecedence relations and hierarchical requirements (c-command) between an antecedent and its anaphoric dependent. Furthermore all of these conditions are extrinsically ordered to apply at various derivational levels specified in the T-model.[2]

If the information outlined in (26) and (27) is on the right track, then FL is richly structured with very domain specific (viz. linguistically tuned) information. And though such linguistic specificity is a positive with regard to Plato’s Problem, it raises difficulties when trying to address Darwin’s Problem (i.e. how FL could have arisen from a pre-linguistic cognitive system). Indeed, the logic of the two problems seems to have them pulling in largely opposite directions.  A rich linguistically specific FL plausibly eases the child’s task by restricting what the child needs to use the PLD to acquire.  However, the more cognitively sui generis FL is, the more complicated the evolutionary path to FL.  Thus, from the perspective of Darwin’s problem, we want the operations and principles of FL to be cognitively (or computationally) general and very simple.  It is this tension that modern GG aims to address.

The tension is exacerbated when the evolutionary timeline is considered. The consensus opinion is that humans became linguistically facile about 100,000 years ago and that the capacity that evolved has remained effectively unchanged ever since.  Thus, whatever the addition, it must have been relatively minor (the addition of at most one or two operations/principles). Or, putting this another way, our FL is what you get when you wed (at most) one (or two) linguistically specific features with a cognitively generic brain. 

Threading the Plato’s/Darwin’s problem needle suggests a twofold strategy: (i) Simplify GB by unifying the various FL internal modules and (ii) Show that this simplified FL can be distilled into largely general cognitive an/or computational parts plus (at most) one linguistically specific one.[3]

Before illustrating how this might be managed, note that GB is the target of explanation. In other words, the Minimalist Program (MP) takes GB to be a good model of what FL looks like. It has largely correctly described (in extension) the innate structure of FL. However, the GB description is not fundamental. If MP is realizable, then FL is less linguistically parochial than GB supposes. If MP is realizable, then FL exploits many generic operations and principles (i.e. operations and principles not domain restricted to language) in its linguistic computations. Yet, MP takes GB’s answer to Plato’s Problem to be largely correct though it disagrees with GB about how domain specific the innate architecture of FL is. More concretely, MP agrees with GB that Binding Theory provides a concise description of certain grammatical laws that accurately reflect the structure of FL. But, though GB’s BT accurately describes these laws/effects and correctly distinguishes the data that reflects the innate features of FL from what is acquired, the actual fundamental principles of binding are different from those identified by GB (though these principles are roughly derivable from the less domain specific ones that characterize the structure of FL). (Grandiosely (pehaps)) borrowing terminology common in physics, MP takes GB to be a good effective theory of FL but denies that it is the fundamental theory of FL. A useful practical consequence of this is to take the principles of GB to be targets for derivation by the more fundamental principles that minimalist theories will discover.

That’s the basic idea.  Let’s illustrate with some examples. First let’s do some GB clean-up. If MP is correct, then GB must be much simpler than it appears to be. One way to simplify the model is to ask which features of the T-model are trivially true and which are making substantive claims. Chomsky (1993) argues that whereas PF and LF are obviously part of any theory of grammar, DS and SS are not (see below for why). The former two levels are unexciting features of any conceivable theory, while the latter two are empirically substantive. To make this point another way: PF and LF are conceptually motivated while DS and SS must be empirically motivated. Or, because DS and SS complicate the structure of FL, we should attribute them to FL only if the facts require it.  Note, that this implies that if we can reanalyze the facts that motivate DS and SS in ways that do not require adverting to these two levels, we can simplify the T-model to its bare conceptual minimum.

So what is this minimum? PF and LF simply state that FL interfaces with the conceptual system and the sound system (recall the earlier <m,s> pairs in section 0). This must be true: after all, linguistic products obviously pair meanings and sounds (or motor articulations of some sort). So FL’s products must interact with the thought system and the sound system.  Bluntly put, no theory of grammar that failed to interact with these two cognitive systems would be worth looking at.

This is not so with DS and SS. These are FL internal levels with FL internal properties. DS is where Q-structure and syntax meet. Lexical items are put into X’-formatted phrase markers in a way that directly reflects their thematic contributions. In particular, all and only Q-positions are occupied in DS and the positions they occupy reflect the thematic contributions the expressions make. Thus, DPs understood as being the logical objects of predicates are in syntactic object positions, logical subjects in syntactic subject positions etc.   

Furthermore, DS structure building operations (X’-operations, Lexical Insertion) are different in kind from the Transformations that follow (e.g. Move a) and the T-model stipulates that all DS operations apply before any Move a operation does (viz. DS operations and Move a never interleave). Thus, DS implicitly defines the positions move a can target and it also helps distinguish the various kinds of phonetically empty categories Gs can contain (e.g. PRO, versus traces left by Move a). 

So, DS is hardly innocent: it has distinctive rules, which apply in a certain manner, and produces structures meeting specific structural and thematic conditions.  All of this is non-trivial and coded in a very linguistically proprietary vocabulary.  One reasonable minimalist ambition is to eliminate DS by showing that its rules can be unified with those in the rest of the grammar, that its rules freely mix with other kinds of processes (e.g. movement), and that Q-relations can be defined without the benefit of a special pre-transformational level. 

This is what early minimalist work did. It showed that it was possible to unify phrase structure building rules with movement operations, that both these kinds of processes could interleave and (this is more controversial) that both structure building rules and movement operations could discharge Q-obligations. In other words, that the properties that DS distinguished were not proprietary to DS when looked at in the right way and so DS did not really exist. 

Let’s consider the unification of phrase structure and movement.  X’ theory took a big step in eliminating phrase structure rules by abandoning the distinction between phrase structure rules and lexical insertion operations.  The argument for doing so is that PS rules and lexical insertion processes are highly redundant. In particular, given a set of lexical items with specified thematic relations determines which PS rules must apply to generate the right structures to house them. In effect, the content of the lexical items determines the relevant PS rules. By reconceiving phrase structure as the projection of lexical information, the distinction between lexical insertion and PS rules can be eliminated. X’ theory specifies how to project a given lexical item with given lexical information into a syntactic schema based on its lexical content instead of first generating the phrase structure and then filtering the inappropriate options via lexical insertion.   

Minimalist theories carry this one step further. It unifies the operations that built phrases and the operations that underlie movement.  The unification has been dubbed Merge. What is it? Merge takes two linguistic items X, Y and puts them together in the simplest imaginable way. In particular, it just puts them together: it specifies no order between them, it does not change them in any way when putting them together and it puts them together in all the ways that two things can be put together.  Concretely, it takes two linguistic items and forms them into a set. Let’s see how.

Take the items eat and bagels. ‘Merge (eat, bagels)’ is the operation that forms the set {eat, bagels}. This object, the set, can itself be merged with Noam (i.e. Merge (Noam, {eat,bagels}) to form {Noam, {eat, bagels}}. And we can apply Merge to this too (i.e. Merge (bagels, {Noam, {eat,bagels}}) to get {bagels, {Noam, {eat, bagels}}.  This illustrates the two possible applications of Merge X,Y: The first two instances apply to linguistic elements X and Y where neither contains the other. The last applies to X and Y where one does contain the other (e.g. Y contains X). The first models PS operations like complementation, the second movement operations like Topicalization.  Merge is a very simple rule, arguably (and Chomsky has argued this) the simplest possible rule that derives an unbounded number of structured objects. It is recursive and it is information preserving (e.g. like the Projection Principle was). It unifies phrase building and movement. It also models movement without the benefit of traces. Two occurrences of the same element (i.e. the two instances of bagels above) express the movement dependency. If sufficient, then, Merge is just what we need to simplify the grammar. It simplifies it by uniting phrase building and movement and models movement without resorting to very “grammatiky” element like traces.[4]

Merge has other interesting properties, when combined with plausible generic computational constraints.  For example, as noted, the simplest possible combination operation would leave the combining elements unchanged. Call this the No Tampering Condition (NTC). The NTC clearly echoes the GB Projection Principle in being a conservation principle: objects once created must be conserved. Thus, structure once created cannot be destroyed. Interestingly, the NTC entails some important grammatical generalizations that traces and their licensing conditions had been used to explain before. For example, it is well known that human Gs do not have lowering rules, like (27) (we use traces to mark whence the movement began): [5]

            (27) [ t1…[…[ba1…]…]

The structure in (27) depicts the output of an operation that takes a and moves it down leaving behind an unlicensed trace t1. In GB, the ECP and principle A filtered such derivations out. However, the NTC suffices to achieve the same end without invoking traces (which, recall, MP aims to eliminate as being too “linguistiky”). How so? Lowering rules violate the NTC. In (27), if a lowers into the structure labeled b the constituent that is input to the operation (viz. b without a in it) will not be preserved in the output.  This serves to eliminate this possible class of movement operations without having to resort to traces and their licensing conditions (a good thing given Minimalist ambitions with regard to Darwin’s Problem).

Similarly, we can derive the fact that when movement occurs the moved expression moves to a position that c-commands the original movement cite (another effect derived via the ECP and Principle A in GB). This is illustrated in (28). Recall, Movement is just merging a subpart of a phrase marker with another part. So say we want to move a and combine it with the phrase marked b. Then, unless a merges with the root, the movement will violate the NTC. Thus, if Merge obeys the NTC, movement will always to a c-commanding position. Again a nice result, achieved without the benefit of traces.
(28)  [b ….[….a….]…] à [a [b ….[….a….]…]]

In fact, we can go one step further: the NTC plausibly requires the elimination of traces and their replacement with “copies.” In other words, not only can we replace traces with copies, given the NTC we must do so. The reason is that defining movement as an operation that leaves behind a “trace” violates the NTC if strictly interpreted. In (28), for example, were we to replace the lower a on the right of the derivational arrow with a trace (i.e. [e]1) we will not be conserving the input to the derivation in the output. This violates the NTC. Thus, strengthening the Projection Principle to the NTC eliminates the possibility of traces and requires that we derive earlier trace theoretic results in other ways. Interestingly, as we have shown, the NTC itself already prohibits lowering and requires that movement be to a c-commanding position; two important consequences of the GB theory of trace licensing. Thus, we derive the same results in a more principled fashion. In case you were wondering, this is a very nice result.[6]

In sum, not only can we unify movement and phrase building given a very simple rule like Merge, but arguably the computationally simplest version of it (viz. one that obeys the NTC, a very generic (non language specific) cognitive principle regarding computations) will also derive some of the basic features of movement that traces and their specific licensing conditions accounted for in GB.  In other words, we get many of the benefits of GB movement theory without their language specific apparatus (viz. traces).

We can go further still. In the best of all possible worlds, Merge should be the sole linguistically specific operation in the FL.  That means that the GB relations that the various different modules exploited should all resolve to a single Merge style dependency. In practice, this means that all non-local dependencies should resolve to movement dependencies. For example, binding, control, and case assignment should all be movement dependencies rather than licensed under the more parochial conditions GB assumed.  Once again, we want the GB data to fall out without the linguistiky GB apparatus. 

So, can the modules be unified as expressions of various kinds of movement dependencies? The outlook is promising. Let’s consider a couple of illustrative examples to help fix ideas. Recall, that the idea is that dependencies that in GB are not movement dependencies are now treated as products of movement (which, recall, can be unified with Phrase structure under a common operation Merge).  So, as a baseline, let’s consider a standard case of subject to subject raising (A-movement). The contrast in (29) illustrates the well-known fact that raising from non-finite subjects is possible, while raising from finite subjects is not.
(29)     a. John1 was believed t1 to be tall
b. *John1 was believed t1 is tall
Now observe that case marking patterns identically in the same contexts (c.f. (30)).  These are ECM structures, wherein the embedded subject him in (30a) is case licensed by the higher verb believe.[7] On the assumption that believe can case license him iff they form a constituent, we can explain the data in (30) on the model of (29) by assuming that the relevant structure for case licensing is that in (31). Note that where t1 is acceptable in (29) it is also acceptable in (31) and where not, not. In other words, we can unify the two cases as instances of movement.

(30)     a. John believes him to be tall
b. *John believes him is tall
(31)     a. John [him1 believes [t1 to be tall]]
b. *John [him1 believes [t1 is tall]]

The same approach will serve to unify Control and Reflexivization with movement. (32) illustrates the parallels between Raising and Control. If we assume that PRO is actually the residue of movement (i.e. the grammatical structure of (32c,d) is actually (32e,f)), we can unify the two cases.[8] Note the structural parallels between (29b), (31b), (32b) and (32f).
(32)     a. John1 seems t1 to like Mary (Raising)
            b. *John1 seems t1 will like Mary
c. John1 expects PRO1 to like Mary (Control)
d. *John expects t will like Mary
e. John1 expects [t1 to like Mary]
f. *John expects [t will like Mary]

The same analysis extends to explain the Reflexivization data in (33) on the assumption that reflexives are the morphological residues of movement.

(33)     a. John1 expects himself1 to win
b. John1 expects t1 to win
c. *John expects (t1=)himself will win
d. *John expects t1 will win

These are just illustrations, not full analyses. However, we hope that they serve to motivate as plausible a project aiming to unify phenomena that GB treated as different.

There are various other benefits of unification. Here’s one more. Another nice side-benefit of this unification is an explanation of the c-command condition in anatecedent-anaphor licensing that is part of the GB BT. Thus, the c-command condition on Reflexive licensing follows trivially once Reflexivization is unified with movement, as, recall, that a moved expression must c-command it’s launch site is a simple consequence of the NTC in a Merge based theory.  Thus if Reflexivization is an instance of movement, then the c-command condition packed into BT follows trivially. There are other nice consequences as well, but here is not the place to go into them. Remember, this is a shortish Whig History!

Before summing things up, note two features of this Minimalist line of inquiry. First, it takes the classical effects and laws very seriously. MP approaches build on prior GG results. Second, it extends the theoretical impulses that drove GB research. NTC bears more than a passing family resemblance to the Projection Principle. The radical simplification of Merge continues the process started with Move a. The unification of movement, case, reflexivization and control echoes the unification of movement and binding in GB. The replacement of traces with copies continues the process of eliminating the cognitive parochialism of grammatical processes that the elimination of constructions by GB began, as does the simplification of the T-model by the removal of D-structure and S-structure as “special” grammatical levels (which, incidentally is a necessary step in the unification of the four phenomena above in terms of movement). So the simplification of rules and derivations, and the unification of the various dependencies is a well-established theme within GG research, one that Minimalism is simply further extending. Modern syntax sits squarely on the empirical and theoretical results of earlier GG research. There is no radical discontinuity, though there is, one would hope, empirical and theoretical progress.

3. Conclusion.

As we noted at the outset, one mark of a successful science is that it is both empirically and theoretically cumulative. Even “revolutions” in the sciences tend to be conservative in the sense that new theories are (in part) evaluated by how they explain results form prior theory.  Einstein did not discredit Newton. He showed that Newton’s results were a special case of a more general understanding of gravity. Quantum mechanics did not overturn classical mechanics but showed that the latter were special cases of the former (when lots of stuff interacts).  This is the mark of a mature discipline. It’s earlier discoveries serve as boundary conditions for developing novelties. 

Moreover, this is generally true in several ways.  First, a successful field generally has a budget of “effects” that serve as targets of theoretical explication. Effects are robust generalizations of (often) manufactured data. By “manufactured” we mean not generally found in the wild but the result of careful and deliberate construction. In physics there are many many of these. Generative Grammar has a nice number of these as well (as we reviewed in section 2).  A nice feature of effects is that they are relatively immune to shifts in theory. SCO effects, Complex NP effects, CED effects, Fixed Subject Condition effects, Weak and Strong Crossover effects etc. are robust phenomena even were no good theory to explain why they have the properties they do.[9] This is why effects are good tests for theories.

Groups of effects, aka “laws,” are more theory dependent than effects but still useful targets of theoretical explanation. Examples of these in GG are Island conditions (which unify a whole variety of distinct island effects), Binding conditions, Minimality effects, Locality effects, etc.  As noted, these are more general versions of the simpler effects noted above and their existence relies on theoretical unification.  Unlike the simpler effects that compose them, laws are more liable to reinterpretation as theory progresses for they rely on more theory for their articulation. However, and this is important, a sign of scientific progress is that these laws are also generally conserved in later theoretical developments.  There may be some tidying up at the edges, but by and large treating Binding as a unified phenomenon applying to anaphoric dependencies in general has survived the theoretical shifts from the Standard theory to GB to MP.  So too with the general observations concerning how movement operations function (viz. cylically, no lowering, to c-commanding positions).  Good theories, then, conserve prior effects and tend to conserve prior laws.  Indeed, successful novel theories tend to treat prior theoretical results and laws as limit cases in the new schema.  As our WH above hopefully illustrates, this is also a characteristic of GG research over the last 60 years.

Last but not least, novel theories tend to conserve the themes that motivated earlier inquiry. GB is a very nice theory, which explains a lot.  The shift to Minimalist accounts, we have argued, extends the style of explanation that GB initiated. The focus on simplification of rules and derivations and the ambition to unify what appear to be disparate phenomena is not a MP novelty. What is novel (perhaps) is the scope of the ambition and the readiness to reanalyze grammar specific constructs (traces, DS, SS, the distinction between phrase structure and movement, etc.) in more general terms.  But, as we have shown, this impulse is not novel.  And, more important still, the ambition has been made possible by the empirical and theoretical advances that GB consolidated.  This is what happens in successful inquiry: the results of prior work provide a new set of problems that novel theory aims to explain without loosing the insights of prior theory. 

As we’ve argued, GG research has been both very successful and appropriately conservative.  Looked at in the right way (our WH!), we are where we are because we have been able to extend and build on earlier results.  We are making haste slowly and deliberately, just as a fruitful scientific program should. Three cheers for Generative Grammar!!!

[1] The prettiest possible theory, one that Chomsky advanced in early GB, failed to hold empirically. The first idea was, effectively, to treat all traces as anaphoric. Though this worked very well for A-traces, it proved inadequate for A’-traces, which seemed to function more like R-expressions than anaphors (or at least the “argument” A’-traces did). A virtue of assimilating A’-traces to R-expressions is that it led to an explanation of Strong Cross Over effects in terms of Principle C.  Unfortunately, it failed to explain a range of subject-object and argument- adjunct asymmetries that crystalized as the ECP. These ECP effects led to a whole new set of binding-like conditions (so-called “antecedent government”) that did not fit particularly comfortably with other parts of the theory. Indeed, the bulk of GB theory in the last part of the second epoch consisted in investigations of the ECP and various ways of trying to explain the subject/object and argument/adjunct effects.  Three important ideas came from this work: first that the domains relevant for ECP effects are largely identical to those relevant for subjacency effects. Second that ECP effects really do come in two flavors with the subject-object cases being quite different from the argument-adjunct cases. Third, Relativized Minimality.  This was an important idea due to Rizzi, and one that fit very well with later minimalist conceptions. This said, ECP effects, especially the argument/adjunct asymmetries have proven theoretically refractory and still remain puzzling, especially in the context of Minimalist theory.
[2] By ‘extrinsically’ we mean that the exact point in the derivation at which the conditions apply is stipulated.
[3] We distinguish cognitively general from computationally general for there are two possible sources of relief from GB specificity. Either the operations/principles are borrowed from other pre-linguistic cognitive domains or they arise as a general feature of complex computational systems as such. Chomsky has urged the possibility of the second in various places, suggesting that these general computational principles might be traced to as yet unknown physical principles.  However, for research purposes, the important issue lies with the non-linguistic specificity of the relevant operations and principles, not whether they arise as a result of general cognition or natural physical law.
[4] So understanding movement also raises a question that trace theory effectively answered by stipulation: why are some “copies” phonetically silent?  As traces were defined as phonetically empty, this is not a question that arose within GB. However, given a merge based conception it becomes important to give a non-stipulative answer to this question, and lots of interesting theoretical work has tried to answer it. This is a good example of how pursuit of deeper theory can reveal explanatory gaps which earlier accounts stipulated away rather than answered. As should be obvious, this is a very good thing.
[5] Traces are now being used for purely expository purposes. Minimalist theories eschew traces, replacing them with copies.
[6] Before we get too delighted with ourselves, we should note that there are other trace licensing effects that GB accommodated that are not currently explained in the more conceptually svelt minimalism.  So for example, there is currently no account for the argument/adjunct asymmetries that GB spent so much time and effort cataloguing and explaining. 
[7] This is not quite correct, but it will serve for a Whig History.
[8] To repeat, we are using trace notation here as a simple convenience. As indicated above, traces do not exist. What actually occupie the trace positions are copies of the moved expression, as discussed above.
[9] Which is not to imply that there are none. As of writing, there are interesting attempts to account for these effects in minimalist terms, some more successful than others.