Thursday, May 21, 2015

Manufacturing facts; the case of Subject Advantage Effects

Real science data is not natural. It is artificial. It is rarely encountered in the wild and (as Nancy Cartwright has emphasized (see here for discussion)) it standardly takes a lot of careful work to create the conditions in which the facts are observable. The idea that science proceeds by looking carefully at the natural world is deeply misleading, unless, of course, the world you inhabit happens to be CERN. I mention this because one of the hallmarks of a progressive research program is that it supports the manufacture of such novel artificial data and their bundling into large scale “effects,” artifacts which then become the targets of theoretical speculation.[1] Indeed, one measure of how far a science has gotten is the degree to which the data it concerns itself with is factitious and the number of well-established effects it has managed to manufacture. Actually, I am tempted to go further: as a general rule only very immature scientific endeavors are based on naturally available/occurring facts.[2]

Why do I mention this. Well, first, by this measure, Generative Grammar (GG) has been a raging success. I have repeatedly pointed to the large number of impressive effects that GG has collected over the last 60 years and the interesting theories that GGers have developed trying to explain them (e.g. here). Island and ECP effects, binding effects and WCO effects do not arise naturally in language use. They need to be constructed, and in this they are like most facts of scientific interest.

Second, one nice way to get a sense of what is happening in a nearby domain is to zero in on the effects its practitioners are addressing. Actually, more pointedly, one quick and dirty way of seeing whether some area is worth spending time on is to canvass the variety and number of different effects it has manufactured.  In what follows I would like to discuss one of these that has recently come to my attention that has some interests for a GGer like me.

A recent paper (here) by Jiwon Yun, Zhong Chen, Tim Hunter, John Whitman and John Hale (YCHWH) discusses an interesting processing fact concerning relative clauses (RC) that seems to hold robustly cross linguistically. The effect is called the “Subject Advantage” (SA). What’s interesting about this effect is that it holds in languages where the head both precedes and follows the relative clause (i.e. for languages like English and those like Japanese). Why is this interesting? 

Well, first, this argues against the idea that the SA simply reflects increasing memory load as a function of linear distance between gap and filler (i.e. head). This cannot be the relevant variable for though it could account for SA effects in languages like English where the head precedes the RC (thus making the subject gap closer to the head than the object gap is) in Japanese style RCs where heads follow the clause the object gap is linearly closer to the head than the subject gap is, hence predicting an object advantage, contrary to experimental fact.

Second, and here let me quote John Hale (p.c.):

SA effects defy explanation in terms of "surprisal". The surprisal idea is that low probability words are harder, in context. But in relative clauses surprisal values from simple phrase structure grammars either predict effort on the wrong word (Hale 2001) or get it completely backwards --- an object advantage, rather than a subject advantage (Levy 2008, page 1164).

Thus, SA effects are interesting in that they appear to be stable over languages as diverse as English on the one hand and Japanese on the other and seem to refractory to many of the usual processing explanations.

Furthermore, SA effects suggest that grammatical structure is important, or to put this in more provocative terms, that SA effects are structure dependent in some way. Note that this does not imply that SA effects are grammatical effects, only that G structure is implicated in their explanation.  In this, SA effects are a little like Island Effects as understood (here).[3] Purely functional stories that ignore G structure (e.g. like linearly dependent memory load or surprisal based on word-by-word processing difficulty) seem to be insufficient to explain these effects (see YCHWH 117-118).[4]

So how to explain the SA? YCHWH proposes an interesting idea: that what makes object relatives harder than subject relatives is have different amounts of “sentence medial ambiguity” (the former more than the latter) and that resolving this ambiguity takes work that is reflected in processing difficulty. Or put more flatfootedly, finding an object gap requires getting rid of more grammatical ambiguity than finding a subject gap and getting rid of this ambiguity requires work, which is reflected in processing difficulty. That’s the basic idea. He work is in the details that YCHWH provides. And there are a lot of them.  Here are some.

YCHWH defines a notion of “Entropy Reduction” based on the weighted possible continuations available at a given point in a parse. One feature of this is that the model provides a way of specifying how much work parsing is engaged in at a particular point. This contrasts with, for example, a structural measure of memory load. As note 4 observes, such a measure could explain a subject advantage but as John Hale (p.c.) has pointed out to me concerning this kind of story:

This general account is thus adequate but not very precise. It leaves open, for instance, the question of where exactly greater difficulty should start to accrue during incremental processing.

That said, whether to go for the YCHWH account or the less precise structural memory load account is ultimately an empirical matter.[5] One thing that YCHWH suggests is that it should be possible to obviate the SA effect given the right kind of corpus data. Here’s what I mean.

YCHWH defines entropy reduction by (i) specifying a G for a language that defines the possible G continuations in that language and (ii) assigning probabilistic weights to these continuations. Thusm YCHWH shows how to combine Gs and probabilities of use of these. Parsing, not surprisingly, relies on the details of a particular G and the details of the corpus of usages of those G possibilities. Thus, what options a particular G allows affects how much entropy reduction a given word licenses, as does the details of the corpus that are probabilize the G.  This thus means that it is possible that SA might disappear given the right corpus details. Or it allows us to ask what if any corpus details could wipe out SA effects. This, as Tim Hunter noted (p.c.) raises two possibilities. In his words:

An interesting (I think) question that arises is: what, if any, different patterns of corpus data would wipe out the subject advantage? If the answer were 'none', then that would mean that the grammar itself (i.e. the choice of rules) was the driving force. This is almost certainly not the case. But, at the other extreme, if the answer were 'any corpus data where SRCs are less frequent than ORCs', then one would be forgiven for wondering whether the grammar was doing anything at all, i.e. wondering whether this whole grammar-plus-entropy-reduction song and dance were just a very roundabout way of saying "SRCs are easier because you hear them more often".

One of the nice features of the YCHWH discussion is that it makes it possible to analytically approach this problem. It would be nice to know what the answer is both analytically as well as empirically.

Another one of he nice features of YCHWH is that it demonstrates how to probabilize MGs of the Stabler variety so that one can view parsing as a general kind of information processing problem. In such a context difficulties in language parsing are the natural result of general information processing demands. Thus, this conception of parsing locates it in a more general framework of information processing, parsing being one specific application where the problem is to determine the possible G compatible continuations of a sentence. Note that this provides a general model of how G knowledge can get used to perform some task.

Interestingly, on this view, parsing does not require a parser. Why? Because parsing just is information processing when the relevant information is fixed. It’s not like we do language parsing differently than we do, say, visual scene interpretation once we fix the relevant structures being manipulated. In other words, parsing on the YCHWH view is just information processing in the domain of language (i.e. there is nothing special about language processing except the fact that it is Gish structures that are being manipulated). Or, to say this another way, though we have lots of parsing, there is no parser that does it.

YCHWH  is a nice example of a happy marriage of grammar and probabilities to explain an interesting parsing effect, the SA. The latter is a discovery about the ease of parsing RCs that suggests that G structure matters and that language independent functional considerations just won’t cut it. It also shows how easy it is to combine MGs with corpora to deliver probabilistic Gs that are plausibly useful in language use. All in all, fun stuff, and very instructive.




[1] This is all well discussed by Bogen and Woodward (here).
[2] This is one reason why I find admonitions to focus on natural speech as a source of linguistic data to be bad advice in general. There may be exceptions, but as a general rule such data should be treated very gingerly.
[3] See, for example, the discussion in the paper by Sprouse, Wagers and Phillips.
[4] A measure of distance based on structure could explain the SA. For example, there are more nodes separating the object trace and the head than separating the subject trace and the head. If memory load were a function of depth of separation, that could account for the SA, at least at the whole sentence level. However, until someone defines an incremental version of the Whole-Sentence structural memory load theory, it seems that only Entropy Reduction can account for the word-by-word SA effect across both English-type and Japanese-type languages.
[5] The following is based on some correspondence with Tim Hunter. Thus he is entirely responsible for whatever falsehoods creep into the discussion here.

Monday, May 18, 2015

Two things to read

1. Alex Drummond sent me this link to a nice little paper on what appears to be an old topic that still stumps physicists. The chestnut is the question of whether hot water freezes more quickly than cold. The standard answer is "you gotta be kidding" and then lots of aspersions are cast on those that think that they have proven the contrary empirically. Read this, but what's interesting is that nobody ever thought that the right answer was anything but the obvious one. However, experiments convinced many over centuries that the unintuitive view (i.e. that hot water does freeze faster) was correct. The paper reviews the history of what is now called the "Mpemba Effect," named after a high school student who had the courage of his experiments and was ridiculed for this by teachers and fellow students until bigger shots concluded that his report was not nuts. Not that it was correct, however. It turns out that the question is very complex, takes a lot of careful reasoning to make clear and turns out to be incredibly hard to test. It's worthwhile reading for linguists for it gives a good taste of how complex interaction effects stymie even advanced sciences. So, following the adage that if it's tough for physics don't be surprised if it't bought for linguistics, it's good to wallow in the hardships and subtleties of a millennial old problem.


2. Here's a recent piece on how hard it is to think cleanly in the sciences. None of it is surprising. The bottom line is that there is lots of wiggle room even in the best sciences for developing theories that would enhance one were they true. So, there is a strong temptation to find them true and there are lots of ways of fudging the process so that what we would like to be the case has evidence in its favor. I personally find none of this surprising or disheartening.

Two points did strike me as curious.

First the suggestion that a success rate of 15% is something to worry about. Maybe it is, but what should we a priori believe the success rate should be? Maybe 15% is great for all we know. There is this presupposition that the scientific method (such as it is) should insulate us from publishing bad papers. But why think this? IMO, the real issue is not how many bad papers get out there but how many good ones. Maybe an 85% miss rate is required to generate the small number of good papers that drive a field forward.

Second, there is the suggestion that this is in part due to the exigencies of getting ahead in the academic game. The idea is that pressures today are such that there is lots to gain in painting rosy research pictures of ever expand revolutionary insight. Maybe. But do we really know if things were better in more relaxed times when these sorts of pressures were less common? I don't know. It would be nice to have a diachronic investigation to see that things have gotten worse. Personal anecdote: I once read through the proceedings of the Royal Society from the 17th and 18th centuries. It was a riot. Lots of the stuff was terrible. Of course, what survives to the present day is the gold, not the dross. So, how do we know that things have gotten worse and that the reason for this are contemporary pressures?

That's it. Science is hard. Gaining traction is difficult. Lots of useless work gets done and gets published. Contrary to scientific propaganda, there is no "method" for preventing this. Of course, we might be able to do better and we should if we can. But I for one am getting a little tired of this sky-is-falling stuff. The idea seems to be that if only we were more careful all problems could be solved. Why would anyone believe this? As the first paper outlines, even apparently simple problems are remarkably difficult, and this in areas we know a lot about.

Friday, May 15, 2015

David Adger talks to Thomas Graf about some very big issues

This David Adger post speaks for itself.

***

I’d intended this to be a response to Thomas’s comments but it got too long, and veered off in various directions.

Computational and other levels

Thomas makes the point that there’s too much work at the ‘implementational’ level, rather than at the proper Marrian computational level, and gives examples to do with overt vs covert movement, labelling etc. He makes an argument that all that stuff is known to be formally equivalent, and we essentially shouldn’t be wasting our time doing it. So ditch a lot of the work that goes on in syntax (sob!).

But I don’t think that’s right. Specification at the computational level for syntax is not answered fully by specifying the computational task as solving the problem of providing an infinite set of sound-meaning pairings; it’s solving the issue of why these pairings, and not some other thinkable set.  So, almost all of that `implementation’ level work about labels or whatever is actually at the computational level. In fact, I don’t really think there is an algorithmic level for syntax in the classical Marrian sense: the computational level for syntax defines the set of pairings and sure that has a physical realization in terms of brain matter, but there isn’t an algorithm per se. The information in the syntax is accessed by other systems, and that probably is algorithmic in the sense there’s a set by step process to transform information of one sort into another (to phonology, or thinking, or various other mental subsystems), but the syntax itself doesn’t undergo information transforming processes of this sort, it’s a static specification of legitimate structures (or derivations). I think that the fact that this isn’t appreciated sometimes within our field (and almost never beyond it) is actually a pretty big problem, perhaps connected with the hugely process oriented perspective of much cognitive psychology.

Back to the worry about the actual `implementational’ issue to do with Agree vs Move etc. I think that Thomas is right, and that some of it may be misguided, inasmuch as the different approaches under debate may have zero empirical consequences (that is, they don’t answer the question: why this pairing and not some other - derivations/representations is perhaps a paradigm case of this). In such cases the formal equivalence between grammars deploying these different devices is otiose and I agree that it would be useful to accept this for particular cases. But at least some of this ‘implementational’ work can be empirically sensitive: think of David Pesetsky’s arguments for covert phrasal as well as covert feature (=Agree) movement, or mine and Gillian’s work on using Agree vs overt movement to explain why Gaelic wh-phrases don’t reconstruct like English ones do but behave in a ways that’s intermediate between bound pronouns and traces. The point here is that this is work at Marr’s computational level to try to get to what the correct computational characterization of the system is.

Here’s a concrete example. In my old paper on features in minimalism, I suggested that we should not allow feature recursion in the specification of lexical items (unlike HPSG). I still think that’s right, but not allowing it causes a bunch of empirical issues to arise: we can’t deal with tough constructions by just saying that a tough-predicate selects an XP/NP predicate, like you can in HPSG, so the structures that are legitimized (or derivations if you prefer) by such an approach are quite different from those legitimized by HPSG. On the other hand, there are a whole set of non-local selectional analyses that are available in HPSG that just aren’t in a minimalist view restricted in the way I suggested (a good thing). So the specification at the computational level about the richness of feature structure directly impacts on the possible analyses that are available. If you look at that paper, it looks very implementational, in Thomas’s sense, as it’s about whether embedding of feature structures should be specified inside lexical items or outside them in the functional sequence, but the work it’s actually doing is at the computational level and has direct empirical (or at least analytical) consequences. I think the same is true for other apparently ‘implementational’ issues, and that’s why syntacticians spend time arguing about them.

Casting the Net

Another worry about current syntax that’s raised, and this is a new worry to me so it’s very interesting, is that it’s too ‘tight’: That is, that particular proposals are overly specific which is risky, because they’re almost always wrong, and ultimately a waste of energy. We syntacticians spend our time doing things that are just too falsifiable (tell that to Vyv Evans!). Thomas calls this net-syntax, as you try to cast a very particularly shaped net over the phenomena, and hence miss a bunch. There’s something to this, and I agree that sometimes insight can be gained by retracting a bit and proposing weaker generalizations (for example, the debate between Reinhart Style c-command for bound variable anaphora, and the alternative Higginbotham/Safir/Barker style Scope Requirement looks settled, for the moment, in the latter’s favour, and the latter is a much weaker claim). But I think that the worry misses an important point about the to and fro between descriptive/empirical work and theoretical work. You only get to have the ‘that’s weird’ moment when you have a clear set of theoretical assumptions that allow you to build on-the-fly analyses for particular empirical phenomena, but you then need a lot of work on the empirical phenomenon in question before you can figure out what the analysis of that phenomenon is such that you can know whether your computational level principles can account for it. That analytical work methodologically requires you to go down the net-syntax type lines, as you need to come up with restrictive hypotheses about particularities, in order to explore the phenomenon in the first case. So specific encodings are required, at least methodologically to make progress. I don’t disagree that you need to back off from those specific encodings, and not get too enraptured by them, but discovering high level generalisations about phenomena needs them, I think. We can only say true things when we know what the empirical lay of the land is, and the vocabulary we can say those true things in very much depends on a historical to and fro between quite specific implementations until we reach a point where the generalizations are stable. On top of this, during that period, we might actually find that the phenomena don’t fall together in the way we expected (so syntactic anaphor binding, unlike bound variable anaphora, seems to require not scope but structural c-command, at least as far as we can tell at the moment). The difference between syntax and maths, which was the model that Thomas gave, is that we don’t know in syntax where the hell we are going much of the time and what the problems are really going to be, whereas we have a pretty good idea of what the problems are in maths.

Structure and Interpretation

I’ll (almost) end on a (semi-)note of agreement. Thomas asks why we care about structure. I agree with him that structures are not important for the theoretical aspects of syntax, except as what systems generate, and I’m wholly on board with Thomas’s notion of derivational specifications and their potential lexicalizations (in fact, that was sort of the idea behind my 2010 thing on trying to encode variability in single grammars by lexicalising subsequences of functional hierarchies, but doing it via derivations as Thomas has been suggesting is even better).  I agree that if you have, for example, a feature system of any kind of complexity, you probably can’t do the real work of testing grammars by hand as the possible number of options just explodes. I see this as an important growth area for syntax: what are the relevant features, what are their interpretations, how do they interact, and my hunch is that we’ll need fairly powerful computational techniques to explore different grammars within the domains defined by different hypotheses about these questions, along the lines Thomas indicates. 

So why do we have syntax papers filled with structures? I think the reason is that, as syntacticians, we are really interested in how sign/sound relates to meaning (back to why these pairings), and unless you have a completely directly compositional system like a lexicalized categorial grammar, you need structures to effect this pairing, as interpretation needs structure to create distinctions that it can hook onto. Even if you lexicalize it all, you still have lexical structures that you need a theory of. So although syntactic structures are a function of lexical items and their possible combinations, the structure just has to go somewhere.

But we do need to get more explicit about saying how these structures are interpreted semantically and phonologically. Outside our field, the `recursion-only’ hypothesis (which was never, imo, a hypothesis that was ever proposed or one that anyone in syntax took seriously), has become a caricature that is used to beat our backs (apologies for the mixed metaphor). We need to keep emphasizing the role of the principles of the interpretation of structure by the systems of use. That means we need to talk more to people who are interested in how language is used, which leads me to …

The future’s bright, the future’s pluralistic.

On the issue of whether the future is rosy or not, I actually think it is, but it requires theoretical syntacticians to work with people who don’t automatically share our assumptions and to respect what assumptions those guys bring, and see where compatibilities or rapprochements lie, and where there are real, empirically detectable, differences. Part of the sociological problem Thomas and others have mentioned is insularity and perceived arrogance. My own feeling is that younger syntacticians are not as insular as those of my generation (how depressing – since when was my generation a generation ;-( ), so I’m actually quite sanguine about the future of our field; there’s a lot of stellar work in pure syntax but those same people doing that work are engaging with neuroscientists, ALL people, sociolinguists, computational people etc). But it will require more work on our (i.e. we theoretical syntacticians’) part: talking to non-syntacticians and nonlinguists, overcoming the legacy of past insularity, and engaging in topics that might seem outside of our comfort zones. But there is a huge amount of potential here, not just in the more computational areas that Thomas mentioned, but also in areas that have not had as much input from generative syntax as they could have had: multilingualism, language of ageing, language shift in immigrant populations, etc. There are areas we can really contribute to, and there are many more. I agree with Thomas that we shouldn’t shirk `applied’ research: we should make it our own.


Thursday, May 14, 2015

Thomas' way too long comment on the Athens conference

This was originally meant as a comment on the vision statements for the Athens conference, but it just kept growing and growing so I decided to use my supreme contributor powers instead and post it directly to the blog (after an extended moment of hesitation because what I have to say probably won't win me any brownie points).

Since I'm presenting a poster at the conference, I've had access to the statements for a bit longer than the average FoL reader. And that's not a good thing, because they left me seriously worried about the future of the field.

Wednesday, May 13, 2015

What the invitees to the Athen conference are thinking

The organizers for the conference have set up the following web site: https://castl.uit.no/index.php/conferences/road-ahead. One of the things you can find there are short papers by the invitees answering the questions about the future of the field that the organizers asked be addressed. I posted the original announcement here, and my own answers to the posed questions. Well, those interested can now see what the invitees are thinking, and it is a very fancy group.

Moreover, it will allow everyone to "participate" in the discussion. I would recommend reading the answers and commenting about them here. I am sure that the participants in Athens will take a look at what you are saying and try to respond. I know that I will. So, consider this an opportunity to engage in this important discussion.

I would suggest that comments be pegged to specific position papers to make it easier to follow. I'm looking forward to hearing how others react.

Tuesday, May 12, 2015

Darwin's problem; some reasonable

Karthik Durvasula has pointed me to a thoughtful blog post by the Confused Academic (CA) critical of using Darwin’s Problem (DP) kind of considerations as a criterion in the evaluation of linguistic proposals (see here).[1] The main thrust of the (to repeat, very reasonable) points made is that we know very little about how evolutionary considerations apply to cognitive phenomena in general and linguistic phenomena in particular and, as such, we should not expect too much from DP kinds of considerations. Indeed, as I noted here, there are several problems with giving a detailed account how FL evolved. Let me remind you of some of the more serious issues, as CA adverts to similar ones.

The most obvious is the remove between cognitive powers and genetic ones. In particular, for an account of the evolution of FL we need a story of how minds are incarnated in brains and how brains are coded in genes. Why? Because evolution rejiggers genomes, which in turn grow brains, which in turn secrete cognition.  Sadly, every link of this chain is weak, most particularly in the domain of language (though the rest of cognition is not in much better shape as Lewontin’s famous piece (noted by CA) emphasizes). We really don’t know much about the genetic bases of brain development, nor do we know much about how brains realize FL. So, though we do know a fair bit about the cognitive structure of FL, we don’t have any really good linking hypotheses taking us from this to the brain and the genome, which are the structures that evolution manipulates to work its magic. In other words, to really explain how FL evolved (at least in detail) we need to account for how the brains structures that embody FL rose via alterations in our ancestors’ genomes (broadly construed to include epigenetic factors), and right now, though we have decent cognitive descriptions of FL, we have no good way of linking these up to brains and genes.

Second, if Lewontin is right (and I for one found his discussion entirely persuasive) then the prospects of giving standard evo accounts of how FL evolved will be largely nugatory due to the virtual impossibility of finding the relevant evidence, e.g. there really exist no “fossil” records to exploit and there is really nothing like our linguistic facility evident in any of our primate “cousins.” This makes constructing a standard evolutionary account empirically very dicey. 

In sum, things do not look good, so there arises the very reasonable question of what good DP thinking is for the practicing linguist. That’s the question CA asks. Here’s what I think (acknowledging that the problems CA notes are serious): Despite these problems, I still think that DP thinking is useful. Let me say why.

As I’ve noted (I suspect too many times) before (e.g. here) there is a tension between Plato’s Problem (PP) and DP. The former encourages packing as much as possible into UG to make the acquisition problem easier while the latter eschews this strategy to make the evolvability problem more tractable. Put another way: the more linguistic knowledge is given rather than acquired[2], the easier it is to account for why acquisition is as easy and effortless as it seems to be. However, the more there is packed into FL/UG the more challenging it is to explain how this knowledge could have evolved. This is the tension, and here is why I like DP: this tension is a creative one. These two problems together generate a very interesting theoretical problem: how to have one’s DP cake and PP’s too (i.e. how to allow for a solution to both PP and DP). I have suggested several strategies about how this might be accomplished that leads, IMO, to an interesting research program (e.g. here). My claim is that if you find this program attractive, then you need to take DP semi-seriously. What do I mean by “semi” here? Well, nobody expects to explain how FL actually evolved given how little we know about the relevant bridging assumptions (see above), but by thinking of the problem in tandem with PP we know the kinds of things we need to do (e.g. effectively eliminate the G internal modularity of characteristic of GB style theories and show that all the apparently different linguistic dependencies found outlined in the separate GB modules are effectively one and the same). That’s the first necessary step needed to reconcile DP and PP.[3]

The second, also motivated by DP, is yet more interesting (and challenging): to try and factor out those G operations, principles and primitives that are not linguistically specific. Thus, given DP we want not only a simpler (more elegant, more beautiful yadda, yadda, yadda) theory, but a particular kind of simpler etc. theory. We want one with as little linguistic specificity as possible. Maybe an example would help here.

The cyclic nature of derivations has long been a staple of GG theory. Ok, how to explain this? Earlier GG theory simply stipulated it: rules apply cyclically. The Minimalist Program (MP) has tried to offer slightly deeper motivation. There are two prominent accounts in the literature: the cycle as expression of the Extension Condition (EC) and the cycle as the expression of feature checking (Featural Cyclicity (FC)). FC is the idea that “bad” (unvalued or un-interpretable) features must be discharged quickly (e.g. in the phase or phrase that introduces them). EC says that derivations are monotonic (i.e. constituents that are inputs to a G operation must be constituents in the output of the operation).

There are various empirical linguistic-theory internal reasons that have been offered for preferring one or the other of these ideas. Both apply over a pretty general domain of cases and, IMO, it is hard to argue that either is in any relevant sense “simpler” than the other. However, IMO, the world would be a better place minimalistically were the EC the right way to conceptually ground the cycle. Why? Because it has the right generic feel to it because it looks like a very generic property of computations. In other words, IMO, it would be natural to find that cognitive rules systems in general are monotonic (information preserving) so that to find this holding of Gs would not be a surprise. FC, on the other hand, strikes me as relying on quite linguistically special assumptions about the toxicity of certain linguistic features and how quickly they need to be neutralized (some examples of very linguistically special properties: only probes contain toxic features, only phase heads have toxic features, toxic features must be very quickly eliminated, Gs contain toxic features at all). Personally, I find it hard to see FC and its special conception of features generalizing to other cognitive domains of third factor considerations. Of course I could be wrong (indeed given my track record, I am likely wrong!). What’s nice about a DP perspective is that it encourages us to try and make this kind of evaluation (i.e. ask how generic/specific a proposed operation/principle is) and, if my reasoning is on the right track, it suggests that we should try to maintain EC as a core feature precisely because it is plausibly not linguistic specific (i.e. not tied to special properties of human Gs).[4]

You could rightly object that such considerations are hardly dispositive, and I would agree. But if the above form of reasoning is even moderately useful, it suggests that DP considerations can have some theoretical utility. So, DP points to certain kinds of accounts and encourages one to develop theories with a certain look. It encourages the unification of GB modules and the elimination of linguistically specific features of Gs. It does this by highlighting the tension between DP and PP. One might construe all of this in simplicity terms, but from where I sit, DP encourages a very specific kind of simple, elegant theory (i.e. it favors some simple theories over others) and for this alone it earns its theoretical keep.[5]

CA has one other objection to DP that I would like to briefly touch on: DP encourages reductionism and reductionism is not a great methodological stance. I have two comments.

First, I am not personally against reduction if you can get it. My problem with it is that it very hard to come by, and I do not expect it to occur any day soon in my neck of the scientific woods. However, I do like unification and think that we should be encouraged to seek it out. As my good and great friend Elan Dresher once sagely noted: there are really only two kinds of linguistic papers. The first shows that two things that appear completely different are roughly the same. The second shows that two things that are roughly the same are in fact identical. I don’t know about linguistic papers in general, but this is a pretty good description of what a good many theory papers look like. Unification is the name of the game. So if by reduction we mean unification, then I am for it. Nothing wrong with it and a great deal that is right In fact, there is nothing wrong with real reduction either, if you can pull it off. Sadly, it is very very hard to do.

Second, I think that for MP to succeed then we should expect lots of reduction/unification. We will need to unify the modules (as noted above) and unify certain cognitive and computational operations. If we cannot manage this, then I would conclude that the main ideas/intuitions behind MP are untenable/unworkable/sterile. So, maybe unlike CA I welcome the reductionist/unificationist challenge.  Not only is this good science, DP is betting that it is the next important step for GG.

This said, there is something bad about reduction, and maybe this is what CA was worried about. Some reductionism takes it as obvious that the reducing science is more epistemologically privileged than the reduced one. But I see no reason for thinking that the metaphysics of the case (e.g. if A reduces to B) implies that the reducing theory is more solidly grounded epistemologically than the reduced one. So the fact that A might reduce to B does not mean that B is the better theory and that A’s results must bow to B’s theoretical diktats. Like all science, reduction/unification is a completed affair. It is best attempted when there is a reasonable body of doctrine in the theories to be related. I suspect that CA (and I am pretty sure Karthik) thinks that this is actually the main problem with DP at this time. It is premature (and if Lewontin is right, it will remain premature for the foreseeable future) precisely because of the problems I noted above concerning the assumptions required to bridge genes and cognition. My only response to this is that I am (slightly) more optimistic. I think DP considerations, even if inchoate, have purchase, though I agree that we should proceed carefully given how little we know of the details. So, we should make haste very very slowly.

Let me end. I think that DP has raised important questions for theoretical linguistics. Like most questions, they are yet somewhat undefined and fuzzy. Our job is to try and make them clearer and find ways of making them empirically viable. I believe that MP has succeeded in this to a degree. This said, CA (and Karthik) are right to point out the problems. Needless to say (or as I have said), I remain convinced that DPish considerations can and should play a role in how we develop theory moving forward for if nothing else (and IMO this is enough) it helps imbue the all to vague notions of elegance and simplicity with some local linguistic content. In other words, DP clarifies what kinds of elegant and simple theories of FL and UG we should be aiming for.





[1] Importantly, CA is not anti minimalist and believes that general criteria like elegance and simplicity (suitably contextualized for linguistics) can play a useful role. It’s DP that bothers CA not theoretical desiderata on syntactic theories.
[2] Indeed the more linguistic specific this knowledge is, the easier it is to explain the facility of acquisition despite the limitations in the PLD for G construction.
[3] I should be careful here: it is conceptually possible that one small genetic change creates a brain that looks GBish. Recall, we really don’t know how a fold here and there in a brain unlocks cognition. But, if we assume that the small thing that happened genetically to change our brains resulted in a simple new cognitive operation being added to our previous inventory, we are home free. This seems like a reasonable assumption, though it might be wrong. Right now, it is the simplest least convoluted assumption. So it is a good place to start.
[4] To say what need not be said: none of this implies that EC is right and FC wrong. It means that EC has more MPish value than FC does if you share my views. So, if we want to pursue an MPish line of inquiry, then EC is a very good way to go. Or, you should demand lots of good empirical reasons for rejecting it. DP, like PP, when working well, conceptually orders hypotheses making some more desirable than others all things being equal.
[5] These considerations were prompted by a very fruitful e-mail exchange with Karthik. He pointed out that notions like simplicity etc., in order to be useful, need to be crafted to apply to a domain of inquiry. The above amounts to suggesting that DP helps in isolating the right kind of simplicity considerations.