Faculty of Language: 2016

Tuesday, December 20, 2016

talking brains

In a recent paper (here), Tecumseh Fitch (TF) and colleagues argue that monkey vocal tracts are structurally adequate for the production of human speech sounds. Why is this important? Because, as the paper puts it:

Our findings imply that the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy. Macaques have a speech-ready vocal tract but lack a speech-ready brain to control it.

This, in other words, puts another nail in the coffin of those that look to provide a continuity thesis style of explanation of human linguistic facility based on a quantitative extension of what appears in our nearest cousins (i.e. If this is right it then Phil Lieberman was wrong). If TF is right, then all that effort expended in trying to teach primates to speak was a waste of time (which it was) and the failure was not one that could be resolved by teaching them sign (which in fact didn’t help) because the problem was neural not vocal. IMO, the futility of this line of inquiry has been pretty obvious for a very long time, but it is always nice to have another nail in a zombie’s coffin.

The results are interesting for one other reason. It suggests that Chomsky’s assumption that externalization is a late add-on to linguistic competence is on the right track. FT provides evidence that vocalization of the kind that humans have is already in place engineering wise in macaques. Their vocal tracts have the wherewithal to produce a range of vowels and consonants similar to those found in natural language. If they don’t use this to produce words and sentences (or movie reviews or poems) it is not because they lack the vocal tract structure to do so. What they lack is something else, something akin to FL. And this is precisely Chomsky’s suggestion. Whatever changed later coupled with an available system of externalization. This coupling of the new biologically unique system with the old biologically speaking more generally available system was bound to be messy given they were not made for each other. Getting the two to fit together required gerrymandering and thus was born (that messy mongrel) morpho-phonology. FT supports this picture in broad outlines.

One more point: if externalization follows the emergence of FL, then communication cannot be the causal root of FL. Clearly, whatever happened to allow FL to emerge came to take advantage of an in-place system capable of exploitation for verbal communication. But it seems that these capacities stayed fallow language wise until the “miracle” that allowed FL to emerge obtained. On the assumption that coupling FL with an externalization mechanism took time, then the selective pressure that kept the “miracle” from being swept away cannot have been communicative enhancement (or at least not verbal communicative enhancement). This means that Chomsky-Jacob suggestion (here) that the emergence of FL allowed for the enhancement of thought and that is what endowed it with evolutionary advantage is also on the right track.

All in all, not a bad set of results for MP types.

Tuesday, December 13, 2016

Domesticating parameters

I have a confession to make: I am not a fan of parameters. I have come to dislike them for two reasons. First, they don’t fit well with what I take to be Minimalist Program (MP) commitments. Second, it appears that the evidence that they exist is relatively weak (see here and here for some discussion). Let me say a bit more about each point.

First the fit with MP: as Chomsky has rightly emphasized, the more we pack into UG (the linguistically proprietary features of FL) the harder it is to solve Darwin’s Problem in the domain of language. This is a quasi-logical point, and not really debatable. So, all things being equal we would like to minimize the linguistically specific content of FL. Parameters are poster children of this sort of linguistically specific information. So, any conception of FL that comes with a FL specified set of ways that Gs can differ (i.e. a specification of the degrees of possible options) comes at an MP cost. This means that the burden of proof for postulating FL internal parameters is heavy and should be resisted unless faced with overwhelming evidence that we need them.[1]

This brings us to the second point: it appears that the evidence for such FL internal parameters is weak (or so my informants tell me (when do fieldwork among variationists)). The classical evidence for parameters comes from the observation that Gs differ wholesale not just retail. What I mean by this is that surface changes come in large units. The classic example is Rizzi’s elegant proposal linking the fixed subject constraint, pro-drop and subject inversion. What made these proposals more than a little intriguing is that they reduced what looked like very diverse G phenomena to a single source that further appeared to be fixable on the basis of degree-0 PLD. This made circumscribing macro-variation via parameters very empirically enticing. The problem was that the proposals that linked together the variation in terms of single parameter setting differences proved to be empirically problematic.

What was the main problem? It appears that we were able to find Gs that dissociated each of the relevant factors. So we could get absence of fixed subject condition effects without subject inversion or pro-drop. And we could find pro-drop without subject inversion. And this was puzzling if these surface differences all reflect the setting of a single parameter value.

I used to be very moved by these considerations but a recent little paper on domestication has started me rethinking whether there may not be a better argument for parameters, one that focuses less on synchronic facts about how Gs differ and more on how Gs change over time. Let me lay out what I have in mind, but first I want to take a short detour into the biology of domestication because what follows was prompted by an article on animal domestication (here).

This article illustrates the close conceptual ties between modern P&P theories and biology/genetics. This connection is old news and leaders in both fields have noted the links repeatedly over the years (think Jacob, Chomsky). What is interesting for present purposes is how domestication has the contours of a classic parameter setting story.

It seems that Darwin was the first to note that domestication often resulted in changes not specifically selected for by the breeder (2):

Darwin noticed that, when it came to mammals, virtually all domesticated species shared a bundle of characteristics that their wild ancestors lacked. These included traits you might expect, such as tameness and increased sociability, but also a number of more surprising ones, such as smaller teeth, floppy ears, variable colour, shortened faces and limbs, curly tails, smaller brains, and extended juvenile behaviour. Darwin thought these features might have something to do with the hybridisation of different breeds or the better diet and gentler ‘conditions of living’ for tame animals – but he couldn’t explain how these processes would produce such a broad spectrum of attributes across so many different species.

So, we choose for tameness and we get floppy ears. Darwin’s observation was strongly confirmed many years later by a dissident Soviet biologist Dimitri Belyaev. Belyaev domesticated silver foxes. More specifically (5):

He selected his foxes based on a single trait: tameness, which he measured by their capacity to tolerate human proximity without fear or aggression. Only 5 per cent of the tamest males and 20 per cent of the tamest females were allowed to breed.

Within a few generations, Belyaev started noticing some odd things. After six generations, the foxes began to wag their tails for their caretakers. After 10, they would lick their faces. They were starting to act like puppies. Their appearance was also changing. Their ears grew more floppy. Their tails became curly. Their fur went from silver to mottled brown. Some foxes developed a white blaze. Their snouts got shorter and their faces became broader. Their breeding season lengthened. These pet foxes could also read human intentions, through gestures and glances.

So, selecting for tameness, narrowly specified, brought in its wake tail wagging, floppy ears etc. The reasonable conclusion from this large scale change in traits is that they are causally linked. As the piece puts it (5):

What the Belayaev results suggest is that the manifold aspects of domestication might have a common cause in a gene or set of genes, which occur naturally in different species but tend to be selected out by adaptive and environmental pressures.

There is even a suggested mechanism: something called “neural crest cells.” But the details do not matter really. What matters is the reasoning: things that change together do so because of some common cause. In other words, common change suggests common cause. This is related to (but is not identical to) the discussion about Gs above. The above looks at whether G traits necessarily co-occur at any given time. The discussion here zeros in on whether when they change they change together. These are different diagnostics. I mention this, because the fact that the traits are not always found together does not imply that they are would not change together.

The linguistic application of this line of reasoning is found in Tony Kroch’s diachronic work. He argued that tracking the rates of change of various G properties is a good way of identifying parameters.[2] However, what I did not appreciate when I first read this is that the fact that features change together need not imply that they must always be found together. Here’s what I mean,

Think of dogs. Domestication brings with it floppy ears. So select for approachability and you move from feral foxes with pointy ears to domesticated foxes with floppy ears. However, this does not mean that every domesticated dog will have floppy ears. No, this feature can be detached from the others (and breeders can do this while holding many other traits constant) even though without attempts to detach floppy ears the natural change will be to floppy ears. So we can select against a natural trait even if the underlying relationship is one that links them. As the quote above puts it: traits that occur naturally together can be adaptively selected out.

In the linguistic case, this suggests that even if a parameter links some properties together and so if one changes they all will, we need not find them together in any one G. What we find at any one time will be due to a confluence of causes, some of which might obscure otherwise extant causal dependencies.

So where does this leave us? Well, I mention all of this because though I still think that MP considerations argue against FL internal parameters, I don’t believe that the observation that Gs can treat these properties atomically is a dispositive argument against their being parametrically related. Changing together looks like a better indicator of parametric relatedness than living together.

Last point: common change implies common cause. But common cause need not rest on there being FL internal parameters. This is one way of causally linking seemingly disparate factors. It is not clear that it is the only or even the best way. What made Rizzi’s story so intriguing (at least for me) is that it tied together simple changes visible in main clauses with variation in (not PLD visible) embedded clause effects. So one could conclude from what is available in the PLD to what will be true of the LD in general. These cases are where parameter thinking really pay off, and these still seem to be pretty thin on the ground, as we might expect if indeed FL has no internal parameters.

[1] There is another use of ‘parameter’ where the term is descriptive and connotes the obvious fact that Gs differ. Nobody could (nor does) object to parameters in this sense. The MP challenging one is the usage wherein FL prescribes a (usually finite) set of options that (usually, finitely) circumscribe the number of possible Gs. Examples include the pro-drop parameter, the V raising parameter, the head parameter.

[2] See his “Reflexes of grammar in patterns of language change.” You can get this from his website here. Here’s a nice short quote summing up the logic: “…since V to I raising in English is lost in all finite clauses with tensed main verbs and at the same rate, there must be a factor or factors which globally favor this loss” (32).

It's that time of year again

My daughter loved this time of year for its gift giving possibilities. Why choose Hanukkah over Xmas over Kwanza when you can celebrate all gift giving holidays. Ecumenism has its advantages, or so she would have had me believe (though not in these exact words). There is one other nice feature of this time of year: the opportunity to get together and celebrate and part of the festivities is often a skit. I recently got hold of a possible script for those who wish to cheer the season in with a distinctive GG accent. The dialogue is by John Collins (here) and after reading it, I cannot imagine a better way to ring in the seasonal joy. It hits all the high points and is even something that you can give (or better do with) friends and family that ask you to explain to them what you do. It is a tad rich on the conceptual underpinnings of the enterprise, but it provides a fine foundation for further talk about your latest paper that you can conduct over the dinner table. So, happy holidays and feel free to report how the performances went.

Sunday, December 11, 2016

Two more little provocative pieces

I ran across two more things of interest.

The first is this post on Gelman's blog that reviews a recent paper in PNAS suggesting that the standard stats used to interpret fMRI findings are very very unreliable ("the general message is clear: don't trust FMRI p-values"). Gelman provides what seems to me a reasonable set of comments on all of this, including another discussion of the perverse incentives favoring statistical abuse. However, there is another issue that gets shorter shrift. It appears that even seasoned practitioners have a very hard time applying the techniques correctly (unless we make the silly assumption that most everyone using FMRI over the last 30 years is a fraud). This suggests that we ought to be very skeptical about any stats based report about anything. What the recent replication problems indicate is that even the best labs have a weak grasp of their stats tools.

Coming from a field which is often lectured on the primitive nature of its data collection techniques,
I admit to experiencing quite a bit of very pleasant schadenfreude reading that the biggest problem in science today seems to be coming from just the techniques that my own field of interest has done without. IMO, linguistics has done very well despite eschewing statistical sophistication, or indeed statistical crudeness. Of course I know the response: the right use of these stats techniques is what linguistics needs. My reply: first show me that the techniques can be reliably applied correctly! Right now it seems that this is far from obvious.

Indeed, it suggests a counter: maybe the right position is not to to apply the hard to apply technique correctly but to figure out how to get results that don't rely on these techniques at all. Call this Rutherford's dictum: "If your experiment needs statistics, you ought to have done a better experiment." One of the happy facts about most of linguistics is that our experiments, informal as they are, are generally speaking so good as to not require stats to interpret the results. Lucky us!

The second post is an interview with Freeman Dyson. It is short and fun. He says threes things that I found provocative and I'd be interested to hear opinions on them.

The first is his observation that great teachers are ones that can "find the right problem for each student, just difficult enough but not too difficult." I think that this is indeed one important mark of a great graduate mentor, and it is not something that I myself have been very good at. It also focuses on something that we often tend to take for granted. The capacity to generate good solvable problems is as important, maybe more important, that being able to provide solutions to said problems. Getting the problem "right" is more than half the battle, IMO, but I suspect that we tend to identify and value those that do this less than we should.

Second, Dyson rails against the PhD as a useful academic hurdle. He never received one and considers himself lucky never to have been required to. He thinks it antiquated, too arduous, and too intellectually disruptive.

Up to a point I agree. Certainly the classical thesis which develops a single topic over 300 pages with extensive critical review of the literature is more aimed at fields where the book is the primary research vehicle. Many places have long since replaced the book with the "stapled dissertation" where several research papers on possibly diverse topics are a thesis. This does not mean that a long form single topic thesis is a bad idea, only that a paper-oriented dissertation is a legit option. What the long form provides that the stapled essays don't is an opportunity to take a broader view of the discipline in which one is becoming a professional. Once one is a professional then until one attains senior status, thinking big is often (always?) frowned upon. This might be the only chance many get to see forests and not just trees. That said, I'd be curious to know what my younger colleagues think. And what if anything could replace the PhD that would be fair and useful.

Last point Dyson raises is where originality comes from. His vote, ignorance.

First of all, it helps to be ignorant. The time when I did my best work was when I was most ignorant. Knowing too much is a great handicap.

One of the complaints that older people always have about their younger colleagues concerns how little they know. Don't they know that we've done this before? Don't they know about so and sos research? Don't they know anything about the field before [put in very recent date here]? At any rate, what Dyson notes is that knowing too much may well be a problem. In fact, growing knowledge, rather than loss of energy coming with age, maybe what slows down senior scholars.

Syntacticians have had an antidote for this until recently. Chomsky used to change the landscape every decade or so, unnerving past students and emboldening young'uns. When junior you loved this. If senior you grumped. If Dyson is right, what Chomsky did was a great service for the field for he made it possible to be legitimately ignorant: things have changed so being erudite was not that important. Dyson's view is that ignorance is not so much bliss as liberating, allowing one to think about issues in new ways. Is he right? I'm not sure, but then look how old I am.

Thursday, December 8, 2016

Some things to read

Angel Galego passed along these links (here and here) to a recent Q&A that Chomsky had in Spain. The questions were fun and the answers illuminating. The discussion ranges over a wide number of topics including the ripeness of conjunctive work in linguistics (acquisition, processing, neuro), what Chomsky thinks of corpus work, deep learning and most statistical work currently done on language, phases and Spell Out, acceptability and grammaticality, and many more issues. I enjoyed them, and you might too.

Here is a second link (here). It's to a piece I wrote with Nathan Robinson on the recent Wolfe and Knight nonsense. It goes over old ground, but it is in a more accessible place.

Friday, December 2, 2016

What's a minimalist analysis

The proliferation of handbooks on linguistics identifies a gap in the field. There are so many now that there is an obvious need for a handbook of handbooks consisting of papers that are summaries of the various handbook summaries. And once we take this first tiny recursive step, as you all know, sky’s the limit.

You may be wondering why this thought crossed my mind. Well, it’s because I’ve been reading some handbook papers recently and many of those that take a historical trajectory through the material often have a penultimate section (before a rousing summary conclusion) with the latest minimalist take on the relevant subject matter. So, we go through the Standard Theory version of X, the Extended Standard Theory version, the GB version and finally an early minimalist and late minimalist version of X. This has naturally led me to think about the following question: what makes an analysis minimalist? When is an analysis minimalist and when not? And why should one care?

Before starting let me immediately caveat this. Being true is the greatest virtue an analysis can have. And being minimalist does not imply that an analysis is true. So not being minimalist is not in itself necessarily a criticism of any given proposal. Or at least not a decisive one. However, it is, IMO, a legit question to ask of a given proposal whether and how it is minimalist. Why? Well because I believe that Darwin’s Problem (and the simplicity metrics it favors) is well-posed (albeit fuzzy in places) and therefore that proposals dressed in assumptions that successfully address it gain empirical credibility. So, being minimalist is a virtue and suggestive of truth, even if not its guarantor.[1]

Perhaps I should add that I don’t think that anything guarantees truth in the empirical sciences and that I also tend to think that truth is the kind of virtue that one only gains slantwise. What I mean by this is that it is the kind of goal one attains indirectly rather than head on. True accounts are ones that economically cover reasonable data in interesting ways, shed light on fundamental questions and open up new avenues for further research.[2] If a story does all of that pretty well then we conclude it is true (or well on its way to it). In this way truth is to theory what happiness is to life plans. If you aim for it directly, you are unlikely to get it. Sort of like trying to fall asleep. As insomniacs will tell you, that doesn’t work.

That out of the way, what are the signs of a minimalist analysis (MA)? We can identify various grades of minimalist commitment.

The shallowest is technological minimalism. On this conception an MA is minimalist because it expresses its findings in terms of ‘I-merge’ rather than ‘move,’ ‘phases’ rather than ‘bounding nodes’/‘barriers,’ or ‘Agree’ rather than ‘binding.’ There is nothing wrong with this. But depending on the details there need not be much that is distinctively minimalist here. So, for example, there are versions of phase theory (so far as I can tell, most versions) that are isomorphic to previous GB theories of subjacency, modulo the addition of v as a bounding node (though see Barriers). The second version of the PIC (i.e. where Spell Out is delayed to the next phase) is virtually identical to 1-subjacency and the number of available phase edges is identical to the specification of “escape hatches.”

Similarly for many Agree based theories of anaphora and/or control. In place of local coindexing we express the identical dependency in terms of Agree in probe/goal configurations (antecedents as probes, anaphors as goals)[3] subject to some conception of locality. There are differences, of course, but largely the analyses inter-translate and the novel nomenclature serves to mask the continuity with prior analyses of the proposed account. In other words, what makes such analyses minimalist is less a grounding in basic features of the minimalist program, then a technical isomorphism between current and earlier technology. Or, to put this another way, when successful, such stories tell us that our earlier GB accounts were no less minimalist than our contemporary ones. Or, to put this yet another way, our current understanding is no less adequate than our earlier understanding (i.e. we’ve lost nothing by going minimalist). This is nice to know, but given that we thought that GB left Darwin’s Problem (DP) relatively intact (this being the main original motivation for going Minimalist (i.e. beyond explanatory adequacy) then analyses that are effectively the same as earlier GB analyses likely leave DP in the same opaque state. Does this mean that translating earlier proposals into current idiom is useless? No. But such translations often make a modest contribution to the program as a whole given the suppleness of current technology.

There is a second more interesting kind of MA. It starts from one of the main research projects that minimalism motivates. Let’s call this “reductive” or “unificational minimalism” (UM). Here’s what I mean.

The minimalist program (MP) starts from the observation that FL is a fairly recent cognitive novelty and thus what is linguistically proprietary is likely to be quite meager. This suggests that most of FL is cognitively or computationally general, with only a small linguistically specific residue. This suggests a research program given a GB backdrop (see here for discussion). Take the GB theory of FL/UG to provide a decent effective theory (i.e. descriptively pretty good but not fundamental) and try to find a more fundamental one that has these GB principles as consequences.[4] This conception provides a two pronged research program: (i) eliminate the internal modularity of GB (i.e. show that the various GB modules are all instances of the same principles and operations (see here)) and (ii) show that of the operations and principles that are required to effect the unification in (i), all save one are cognitively and/or computationally generic. If we can successfully realize this research project then we have a potential answer to DP: FL arose with the adventitious addition of the linguistically proprietary operation/principle to the cognitive/computational apparatus the species antecedently had.

That’s the main contours of the research program. UF concentrates on (i) and aims to reduce the different principles and operations within FL to the absolute minimum. It does this by proposing to unify domains that appear disparate on the surface and by reducing G options to an absolute minimum.[5] A reasonable heuristic for this kind of MA is the idea that Gs never do things in more than one way (e.g. there are not two ways (viz. via matching or raising) to form relative clauses). This is not to deny different surface patterns obtain, only that they are not the products of distinctive operations.

Let me put this another way: UM takes the GB disavowal of constructions to the limit. GB eschewed constructions in that it eliminated rules like Relativization and Topicalization, seeing both as instances of movement. However, it did not fully eliminate constructions for it proposed very different basic operations for (apparently) different kinds of dependencies. Thus, GB distinguishes movement from construal and binding from control and case assignment from theta checking. In fact, each of the modules is defined in terms of proprietary primitives, operations and constraints. This is to treat the modules as constructions. One way of understanding UM is that it is radically anti-constructivist and recognizes that all G dependencies are effected in the same way. There is, grammatically speaking, only ever one road to Rome.

Some of the central results of MP are of this ilk. So, for example, Chomsky’s conception of Merge unifies phrase structure theory and movement theory. The theory of case assignment in the Black Book unifies case theory and movement theory (the latter being just a specific reflex of movement) in much the way that move alpha unifies question formation, relativization, topicalization etc. The movement theory of control and binding unifies both modules with movement. The overall picture then is one in which binding, structure building, case licensing, movement, and control “reduce” to a single computational basis. There aren’t movement rules versus phrase structure rules versus binding rules versus control rules versus case assignment rules. Rather these are all different reflexes of a single Merge effected dependency with different features being licensed via the same operation. It is the logic of On wh movement writ large.

There are other examples of the same “less is more” logic: The elimination of D-structure and S-structure in the Black Book, Sportiche’s recent proposals to unify promotion and matching analyses of relativization, unifying reconstruction and movement via the copy theory of movement (in turn based on a set theoretic conception of Merge), Nunes theory of parasitic gaps, and Sportiche’s proposed elimination of late merger to name five. All of these are MAs in the specific sense that they aim to show that rich empirical coverage is compatible with a reduced inventory of basic operations and principles and that the architecture of FL as envisioned in GB can be simplified and unified thereby advancing the idea that a (one!) small change to the cognitive economy of our ancestors could have led to the emergence of an FL like the one that we have good (GB) evidence to think is ours. Thus, MAs of the UM variety clearly provide potential answers to the core minimalist DP question and hence deserve their ‘minimalist’ modifier.

The minimalist ambitions can be greater still. MAs have two related yet distinct goals. The first is to show that svelter Gs do no worse than the more complex ones that they replace (or at least don’t do much worse).[6] The second is to show that they do better. Chomsky contrasted these in chapter three of the Black Book and provided examples illustrating how doing less with more might be possible. I would like to mention a few by way of illustration, after a brief running start.

Chomsky made two methodological observations. First, if a svelter account does (nearly) empirically as well as a grosser one then it “wins” given MP desiderata. We noted why this was so above regarding DP, but really nobody considers Chomsky’s scoring controversial given that it is a lead footed application of Ockham. Fewer assumptions are always better than more for the simple reason that for a given empirical payoff K an explanation based on N assumptions leaves each assumption with greater empirical justification than one based on N+1 assumptions. Of course, things are hardly ever this clean, but often they are clean enough and the principle is not really contestable.[7]

However, Chomsky’s point extends this reasoning beyond simple assumption counting. For MP it’s not only the number of assumptions that matter but their pedigree. Here’s what I mean. Let’s distinguish FL from UG. Let ‘FL’ designate whatever allows the LAD to acquire a particular G_L based on PLD_L. Let ‘UG’ designate those features of FL that are linguistically proprietary (i.e. not reflexes of more generic cognitive or computational operations). A MA aims to reduce the UG part of FL. In the best case, it contains a single linguistically specific novelty.[8] So, it is not just a matter of counting assumptions. Rather what matters is counting UG (i.e. linguistically proprietary) assumptions. We prefer those FLs with minimal UGs and minimal language specific assumptions.

An example of this is Chomsky’s arguments against D-structure and S-structure as internal levels. Chomsky does not deny that Gs interface with interpretive interfaces, rather he objects to treating these as having linguistically special properties.[9] Of course, Gs interface with sound and meaning. That’s obvious (i.e. “conceptually necessary”). But this assumption does not imply that there need be anything linguistically special about the G levels that do the interfacing beyond the fact that they must be readable by these interfaces. So, any assumption that goes beyond this (e.g. the theta criterion) needs defending because it requires encumbering FL with UG strictures that specify the extras required.

All of this is old hat, and, IMO, perfectly straightforward and reasonable. But it points to another kind of MA: one that does not reduce the number of assumptions required for a particular analysis, but that reapportions the assumptions between UGish ones and generic cognitive-computational ones. Again, Chomsky’s discussions in chapter 3 of the Black Book provide nice examples of this kind of reasoning, as does the computational motivation for phases and Spell Out.

Let me add one more (and this will involve some self referentiality). One argument against PRO based conceptions of (obligatory) control is that they require a linguistically “special” account of the properties of PRO. After all, to get the trains to run on time PRO must be packed with features which force it to be subject to the G constraints it is subject to (PRO needs to be locally minimally bound, occurs largely in non-finite subject positions, and has very distinctive interpretive properties). In other words, PRO is a G internal formative with special G sensitive features (often of the possibly unspecified phi-varierty) that force it into G relations. Thus, it is MP problematic.[10] Thus a proposal that eschews PRO is prima facie an MA story of control for it dispenses with the requirement that there exists a G internal formative with linguistically specific requirements.[11] I would like to add, precisely because I have had skin in this game, that this does not imply that PRO-less accounts of control are correct or even superior to PRO based conceptions. No! But it does mean that eschewing PRO has minimalist advantages over accounts that adopt PRO as they minimize the UG aspects of FL when it comes to control.

Ok, enough self-promotion. Back to the main point. The point is not merely to count assumptions but to minimize UGish ones. In this sense, MAs aim to satisfy Darwin more than Ockham. A good MA minimizes UG assumptions and does (about) as well empirically as more UG encumbered alternatives. A good sign that a paper is providing an MA of this sort, is manifest concern to minimize the UG nature of the principles assumed.

Let’s now turn to (and end with) the last most ambitious MA: it is one that not merely does (almost) as well as more UG encumbered accounts, but does better. How can one do better. Recall that we should expect MAs to be more empirically brittle than less minimalist alternatives given that MP assumptions generally restrict an account’s descriptive apparatus.[12] So, how can a svelter account do better? It does so by having more explanatory oomph (see here). Here’s what I mean.

Again, the Black Book provides some examples.[13] Recall Chomsky’s discussion of examples like (1) with structures like (2):

(1) John wonders how many pictures of himself Frank took

(2) John wonders [[how many pictures of himself] Frank took [how many pictures of himself]]

The observation is that (1) has an idiomatic reading just in case Frank is the antecedent of the reflexive.[14] This can be explained if we assume that there is no D-structure level or S-structure level. Without these binding and idiom interpretation must be defined over that G level that is input to the CI interface. In other words, idiom interpretation and binding are computed over the same representation and we thus expect that the requirements of each will affect the possibilities of the other.

More concretely, to get the idiomatic reading of take pictures requires using the lower copy of the wh phrase. To get the John as potential antecedent of the reflexive requires using the higher copy. If we assume that only a single copy can be retained on the mapping to CI, this implies that if take pictures of himself is understood idiomatically, Frank is the only available local antecedent of the reflexive. The prediction relies on the assumption that idiom interpretation and binding exploit the same representation. Thus, by eliminating D-structure, the theory can no longer make D-structure the locus of idiom interpretation and by eliminating S-structure, the theory cannot make it the locus of binding. Thus by eliminating both levels the proposal predicts a correlation between idiomaticity and reflexive antecedence.

It is important to note that a GBish theory where idioms are licensed at D-structure and reflexives are licensed at S-structure (or later) is compatible with Chomsky’s reported data, but does not predict it. The relevant data can be tracked in a theory with the two internal levels. What is missing is the prediction that they must swing together. In other words, the MP story explains what the non-MP story must stipulate. Hence, the explanatory oomph. One gets more explanation with less G internal apparatus.

There are other examples of this kind of reasoning, but not that many. One of the reasons I have always liked Nunes’ theory of parasitic gaps is that it explains why they are licensed only in overt syntax. One of the reasons that I like the Movement Theory of Control is that it explains why one finds (OC) PRO in the subject position of non-finite clauses. No stipulations necessary, no ad hoc assumptions concerning flavors of case, no simple (but honest) stipulations restricting PRO to such positions. These are minimalist in a strong sense.

Let’s end here. I have tried to identify three kinds of MAs. What makes proposals minimalist is that they either answer or serve as steps towards answering the big minimalist question: why do we have the FL we have? How did FL arise in the species? That’s the question of interest. It’s not the only question of interest, but it is an important one. Precisely because the question is interesting it is worth identifying whether and in what respects a given proposal might be minimalist. Wouldn’t it be nice if papers in minimalist syntax regularly identified their minimalist assumptions so that we could not not only appreciate their empirical virtuosity, but could also evaluate their contributions to the programmatic goals.

[1] If pressed (even slightly) I might go further and admit that being minimalist is a necessary condition of being true. This follows if you agree that the minimalist characterization of DP in the domain of language is roughly accurate. If so, then true proposals will be minimalist for only such proposals will be compatible with the facts concerning the emergence of FL. That’s what I would argue, if pressed.

[2] And if this is so, then the way one arrives at truth in linguistics will plausibly go hand in hand with providing answers to fundamental problems like DP. This, proposals that are minimalist may thereby have a leg up on truth. But, again, I wouldn’t say this unless pressed.

[3] The agree dependency here established accompanied by a specific rule of interpretation whereby agreement signals co-valuation of some sort. This, btw, is not a trivial extra.

[4] This parallels the logic of On wh movement wrt islands and bounding theory. See here for discussion.

[5] Sportiche (here) describes this as eliminating extrinsic theoretical “enrichments” (i.e. theoretical additions motivated entirely by empirical demands).

[6] Note a priori one expects simpler proposals to be empirically less agile than more complex ones and to therefore cover less data. Thus, if a cut down account gets roughly the same coverage this is a big win for the more modest proposal.

[7] Indeed, it is often hard to individuate assumptions, especially given different theoretical starting points. However (IMO surprisingly), this is often doable in practice so I won’t dwell on it here.

[8] I personally don’t believe that it can contain less for it would make the fact that nothing does language like humans do a complete mystery. This fact strongly implies (IMO) that there is something UGishly special about FL. MP reasoning implies that this UG part is very small, though not null. I assume this here.

[9] That’s how I understand the proposal to eliminate G internal levels.

[10] It is worth noting that this is why PRO in earlier theories was not a lexical formative at all, but the residue of the operation of the grammar. This is discussed in the last chapter here if you are interested in the details.

[11] One more observation: this holds even if the proposed properties of PRO are universal, i.e. part of UG. The problem is not variability but linguistic specificity.

[12] Observe that empirical brittleness is the flip side of theoretical tightness. We want empirically brittle theories.

[13] The distinction between these two kinds of MAs is not original with me but clearly traces to the discussion in the Black Book.

[14] I report the argument. I confess that I do not personally get the judgments described. However, this does not matter for purposes of illustration of the logic.