Tuesday, January 15, 2019

Movement, islands and the ECP

Some papers reset the research agenda. This one by Lu and Yoshida (L&Y), I believe, is one of those (here is a slide conveying the basic point. The paper is under submission at LI and I assume it will be accepted and rapidly published (if not this is will tell us more about LI than it will about the quality of this paper)). The topic is the island status of Wh-in-situ (WIS) constructions in Chinese. The finding is that using judgment studies of the Sprouse experimental syntax (ES) variety provides evidence for two stunning conclusions: (i) that WISs respect islands and (ii) that there is no evidence for an argument/adjunct distinction wrt WISs. Both data points are theoretically pregnant and this post will largely concentrate on drawing out some of the implications. Many of these are mentioned in the paper (yup, I have a draft), so are not original with me. Let’s start.

L&Y is motivated by the premise that ES provides a useful tool for the refining linguistic judgments. The idea, as Sprouse has convincingly argued, is that grammatical complexity should induce a super-additivity effect in well-constructed judgment experiments (see, e.g. here and here for discussion and here for a nice review of the methodology). Importantly, super-additivity profiles arise in cases where less involved rating studies find nothing indicating un-grammaticality.

Before pressing on, let’s make an important and obvious point: all GGers distinguish (or should distinguish) acceptability from grammaticality. Acceptability is a probe for grammaticality. Acceptability is an observable property of utterances. Grammaticality is an abstract property of I-linguistic mental representations. Grammaticality is inferred from acceptability under the right conditions (given the right controls as realized by the appropriate minimal pairs). All of this is old hat, but a still very stylish and durable hat. 

Happily, for most of what we have done in GG, acceptability closely tracks grammaticality, but we also know that the two notions can and do diverge (see here for some discussion). ES is particularly useful for cases where this happens and the simple judgment elicitation procedure (e.g. ask a native speaker) indicates all is well. Diogo Almeida has dubbed cases of the latter “subliminal.” One of ES’s important contributions to syntax has been the discovery of such subliminal effects (SE), SEs being cases where ES procedures reveal super-additivity effects while more regular elicitation suggests grammaticality. So, for example, we now have many examples where standard elicitation has indicated that a certain dependency in a certain language shows no island sensitivity (i.e. the sentences are judged (highly) acceptable) while ES techniques indicate sensitivity to these same island effects (i.e. the relevant data display super-additivity effects).

We also find the converse: standard techniques indicating a profound difference in acceptability, while ES techniques showing nothing at all.[1]All in all then, ES has provided a useful additional kind of data, one that is often more sensitive to G structure than the quick and dirty (and largely accurate and hence very useful) standard judgment techniques, which sometimes fail to track these. 

So, back to the main point: L&Y is an ES study of WISs in Chinese and it has two important findings: that allWISs in Chinese exhibit relative clause island effects (henceforth RCI) (i.e. they alldisplay the super-additivity profile) and that there is no ES evidence that long “why” movement from an RCI is appreciably worse than long “why” movement absent an RCI (i.e. these cases when contrasted do not show a super-additvity profile). The first result argues that WISs are island sensitive and the second argues that there is no additional ECP effect distinguishing WISs like who/what from WISs like why. If correct, this is very big news, and, IMO, very welcome news. Let me say why.

First, as L&Y emphasizes, this result rules out most of the standard approaches to WIS constructions. In particular the result rules out two kinds of theories: (i) accounts that distinguish between overt movement vs covert movement (e.g. Huang’s) and treat island effects as effectively reflexes of overt movement (say, via a chain condition at SS) and (ii) theories that postulate two different kinds of operations (Movement vs Binding) to license WISs with movement subject to islands and binding exempt from them (as in, say, a Rizzi-Cinque approach to ECP effects). Both such kinds of theories will have problems with the apparent fact that WISs induce super-additivity effects.

It is worth noting, furthermore, that the sensitivity of WISs to islands is not the only example of apparent non-movement generated structures being island compliant. The same holds wrt resumptive pronoun (RP) constructions. These also appear to respect islands despite the absence of the main hallmark of movement (i.e. a gap in the “movement” site).[2]Both this RP data and now the WIS data point to the same conclusion: that island effects are notPF effects.[3]From my reading of the literature, this is the most popular current approach to islands and it has some terrifically interesting evidence in support (in particular the fact that some ellipsis (i.e. sluicing) obviates island violations). However, if L&Y are right, then we may have to rethink this assumption (see note 3 however).

Indeed, I would go further (and here it is NH speaking rather than L&Y). There have long been two general approaches to islands. 

First, we have Chomsky’s view of subjacency elaborated in ‘On wh movment’ that treats islands as reflecting bounds on the computational procedure. Island effects reflect the subjacency condition (aka PIC), which bounds the domain of computation (an idea motivated by the reasonable assumption that bounding a domain of computation makes doing computations more tractable).[4]

The second approach can be traced back to Ross’s thesis (islands restrict chopping rules) but has been developed as part of the linearization industry spurred by Kayne’s seminal work and mooted most explicitly by Uriagereka.[5]

The L&Y results argue pretty strongly, IMO, for Chomsky’s original conception precisely because they appear to hold whether or not the construction involves an obvious phonetic gap (gaps being problematic as they undo linearizations). If this is so, then it argues against linearization based approaches to the problem (leaving, of course a very big question: what to do about sluicing).[6]

We can go further still. The L&Y results also argue for a Merge only syntax. Here is what I mean. IMO, the central empirical thesis of the Minimalist Program (MP) is the Merge Hypothesis (MH). MH is the claim that the only specifically linguistic operation of FL is Merge. This entails that allG dependencies are merge mediated. The strong version of the thesis excludes operations like long distance Agree, which spans the same domains as I-merge but is a different operation. Note that it is natural to suppose that I-merge is movement and Agree is some kind of binding or feature sharing. At any rate, the classical conceptions gain empirical benefit from the “observation” that WISs do not display island effects. Why? Because, we might say, they are licensed by Agree not by I-merge and only the latter (being the MP analogue of movement) is subject to subjacency (or its current analogue). But as L&Y indicates this is precisely the wrong conclusion. WISs are subject to islands. A merge only syntax insists that all A’-dependencies are formed in the same way, via I-Merge, as this as the only way to establish any non-local grammatical dependency. So if WISs are G licensed, then they must be G licensed via I-merge and so will form a natural class with overt Wh movement. And this is what L&Y find. Both show super-additivity effects across islands. Thus, L&Y’s findings are what we should expect from a merge only syntax and it cautions against larding this best of MP theories with Agree/Probe-Goal titivations.[7]

We can milk a second important conclusion from L&Y. It solves a giant problem for MP. Which problem? The problem of unifying subjacency with the ECP. I have suggested elsewhere (see here) that the argument/adjunct asymmetries at the heart of the ECP are very MP problematic. This is so for a variety of reasons. The three that move me most are the fact that the ECP is a trace licensing condition and MP eschews traces, the huge theoretical redundancy between ECP and subjacency, and the “ugliness” of the basic technical machinery required to allow the ECP to track the argument/adjunct asymmetry. One of the nice implications of L&Y is that we need not worry about the problems that the ECP generates for MP because the theoretical apparatus is based on a mistaken description of the data. If L&Y is right, then there is no argument/adjunct asymmetry. Poof, the MP problem disappears and with it the ad-hoc theoretically unmotivated (within MP) technical apparatus required to track it.  

Of course, this overstates matters. It behooves us to go over the ECP data more carefully and see how to resolve the difference in acceptability that the standard literature identified. Why after all if all Whs are created equal do long distance adjuncts resist movement more fiercely than do long distance arguments?[8]L&Y offer a suggestion (no spoiler from me, read the squib when it comes out). But whatever the right answer is, it does not rest on making an invidiousgrammaticaldistinction between the two kinds of dependencies. And this is just what MP needs in order to start distinguishing ECP effects from the ECP theoretical apparatus in GB.

Let me hit this a bit harder. The ugliest parts of theories of A’-dependency within GB arise in response to argument/adjunct asymmetry effects. The technical machinery in Barriers (built on Lasnik and Saito foundations), though successful empirically (IMO, Lasnik and Saito’s theory was considerably more empirically effective than Barriers) had little of the virtual conceptual necessity MPers pine for. Nor were alternative theories (Generalized Binding, Connectedness) much prettier. Nonetheless, we put up with that stuff and developed it theoretically because it appeared to be empirically called for. The right aesthetic conclusion should have been (and actually was) that it was too contrived to be correct. L&Y provides courage for our aesthetic convictions. We should have judged these theories as suspect because ugly, though we would have been empirically premature in drawing that conclusion. Given L&Y, the facts are not what we took them to be despite reflecting very different acceptability profiles. 

There is a moral here, and you can all guess what it is but I cannot resist making it explicit anyhow. L&Y provide evidence for a methodological precept that we fail to respect enough: facts can change, no less than theory can. Or to put this another way: just as we can make theoretical wrong turns that we come to revise, we can adopt empirical generalizations that turn out to be misleading. The standard view is that data is hard and theory is fluffy and when the two clash it is best to revise the theory than rethink the data. L&Y provides a case where this is reversed. And I say hooray!

Let me make one more point and I will end this overly long post (long, and yet, filled with endlessly many loose ends). L&Y exemplifies something that I think is important. It is an empirical paper whose purpose is to directlytest a core theoretical assumption. This is not something that we generally see within syntax. Most papers are not out to test theoretical assumptions. Most papers use theory to explore more data. Theory might be tested but it is generally a by-product of better descriptive coverage. L&Y works differently. It starts from the theory and constructs an empirical intervention to probe it. Moreover, the question is quite precise and the assumptions required to answer it are clear within the confines of the project. This has all the look and smell of an honest to god experiment, a process whereby we query the theory using a relatively well-understood probe. Both empirical methods of exploration are worthwhile, but they are different and it is only relatively recently, I think, that we are seeing examples of the second experimental kind gaining traction. 

Curiously (perhaps), a feature of this second kind of paper is that the paper is short. L&Y is a squib. Empirical explorations in linguistics often read like novellas. L&Y is very definitely a very very short story. I would like to suggest that experimental papers like L&Y reflect the scientific health of linguistics. It is now possible to ask a sharp question, and give a sharp answer. We need more of these kinds of short pointed experimental forays into the data starting from well-formulated theoretical starting points.

That’s it. I have gone on far too long. L&Y is terrific. If correct, it is very important. I personally hope the results stand up. It would go a long way to cleaning up a particularly untidy part of syntactic theory and thereby further vindicating the promise of MP, indeed a particularly strong version of MP, one that endorses a Merge only conception of grammar.

[1]ES techniques have, for example, suggested that adjunct island effects might not be of a piece with other islands as they (often) fail to display super-additivity effects.
[2]To be slightly more careful and squinting at the ES results wrt resumptives the following is more accurate: resumptives uniformly ameliorate fixed subject constraint violations but note “mere” subjacency violations. Thus, resumptives inside islands seem to show the same super-additvity profiles as their moved counterparts.
[3]Perhaps a more felicitous way of putting matters is that RPs and WISs are also products of I-merge and so expected to be subject to islands. One reason for treating islands as PF effects is to capture the distinctionbetween these cases and more conventional examples of overt movement. If, however, they pattern the same then the motivation for treating islands as PF effects weakens. This said, I am pretty sure it is possible to model these cases as formed via movement/I-merge by, for example, treating them as cases of remnant movement, the moved Wh or Q morpheme starts as part of a doubled structure including the RP or WIS. 
[4]See Chomsky’s ‘On wh movement’ for discussion. See herefor more prose.
[5]Versions of this original idea were developed by many people including Hornstein, Lasnik and Uriagereka, and Fox and Pesetsky. The idea centers on the idea that the problem with movement is that it reorders elements and so can come into conflict with the ordering algorithm. In this sense, gaps are a big deal and what distinguish movement from other kinds of long distance dependencies like binding.
[6]Or again, it argues for treating the operation (e.g. movement) as in need of constraint rather than the output of the operation (a gap, or new linear order). What plausibly unifies cases of “overt” WH (as in English), “covert” WH (as in Chinese) and resumptive WH (as in Hebrew) is that they all involve relating an A’ element to a non-local syntactic position that can be arbitrarily far away. IT’s the span that seems to matter, not what sits at the tail of the chain.
[7]RP constructions must also be formed via I-merge and so too all forms of binding. This requirement fits well with the observation that RPs obey islands. Binding, especially pronominal binding, is likely to be more problematic. As they say at this point in a journal paper in reply to referee 2; these are topics for further research.
[8]As I have noted in other places, this description of the ECP contrast is not quite correct, as we have known since Rizzi’s work on minimality. The distinction seems less a matter of argument vs adjunct than object centered vs non object centered quantification. But this is a topic for another time.

Monday, January 14, 2019

More Elsevier Open Access defections

Today, in Nature , another editorial board resigns in protest of Elsevier's fees and open access policies. Of course, linguistics has been there, done that too.

Monday, January 7, 2019

A puzzler

Kleanthes sent me this little note yesterday:

George Walkden posted this brainteaser on Facebook recently:
“We do not want to know languages, we want to know language; what language is, how it can form a vehicle or an organ of thought; we want to know its origin, its nature, its laws”. Who wrote this, and when? (No Googling!)  

…Interestingly, the same author also wrote some ten years after the previous quote that because "between the language of animals and the language of man there is *no* natural bridge", "to account for human language such as we possess would require a faculty of which no trace has ever been discovered in lower animals”.  

I don't facebook. So in case others who read FoL don't either or did not chance to see the puzzle, you can try your luck here. Answer in footnote 1.[1]

The author turns out to be a fascinating person (and no I did not know who he was till Klea sent me the query) very much in the tradition of von Humboldt. Indeed, his remarks are apposite to this day. I have started reading some of his work (they are available online). They are well worth the time. 

Thx to Klea for sending me this puzzler and the other material.

[1]Max Muller. The first is from his lectures in 1861. They are available on line and well worth looking at. The second is from a public lecture that he gave on Darwin and Evolang.  It is reported in NatureDec 26 1872:145. Thx to Klea for sending me this info.

Omer on what syntax is vs what it looks like

Here is a new post from Omer on how we ought to think of syntax, and some confusions that seem to pervade some of our thinking. Enjoy.

Saturday, January 5, 2019

Open Access and the Piratical Journals

Adam Liter sent me this piece a while ago. Sybaritic pursuits over the New Year period kept me from posting it earlier (sorry, hiccup!). At any rate, the piece describes a fight that the U Cal system is about to have or is currently having with Elsevier (E) over open access to the articles that it publishes. It seems that E has a way of double dipping charging for access to the journals and then charging again for access to the articles. The ins and outs are complex but it basically relates to how and when we move to a fully open access system. From where I sit, U Cal is fighting the good fight and we should hope that they succeed.

There is something quite interesting about academic publishing. The content it receives it does not in any way pay for. The curation of this content (selection, editing, improving) is largely also unpaid for by them. In fact, both are effectively paid for by public bodies like the NSF, NIH, Wellcome Fund, SSHRC etc. yet the public that funds it does not have free access to it. Not surprisingly, this system has generated enormous profit for the relevant academic publishers (here). The profit margins are enormous (36% for E) and it is not incidentally tied to the fact that most of the content is freely provided. I am sure that E provides a service. What I am also sure about is that this service is vastly overpriced. At any rate, the days of this kind of monopolistic parasitism might be numbered. One can only hope.

Turing and Chomsky

There are two observations that motivate the Minimalist Project. 

The first is that the emergence of FL is a rather recent phenomenon biologically, say roughly 50-100kya. The argument based on this observation is that ifbiological complexity is a function of natural selection (NS) and NS is gradual then given the observation that language biologically arose “merely” 50-100kya implies whatever arose could not have been particularly complex. Why? Because complexity would require shaping by slow selection pressures and 50-100,000 years is not enough time to shape anything very complex. That’s the argument. And it relies, ahem, on many assumptions, not all of them at all obvious.

First, why think that 50-100,000 years is not enough time to develop a complex cognitive organ? Maybe that’s a lot of time. Second, how do we measure complexity? Biology selects genes, but MP measures complexity wrt the simplicity of the principles of FL/UG. Why assume that the phenotypic simplicity of linguistic descriptions of FL/UG line up well with the simplicity of the genetic foundations that express these phenotypic traits?[1]

This second problem is, in fact, not unique to EvoLang. It is part and parcel of the “phenotypic gambit” that I discussed elsewhere (here). Nonetheless, the fact that this is a generalissue in Evo accounts does not mean it is not also a problem for MP arguments. Third, every time one picks up the papers nowadays one reads that someone is arguing that language emerged further and further back. Apparently, many believe that Neanderthals jabbered as much as we did and if this is the case we push back the emergence of language many 100,000s of years. Of course, we have no idea what such language consisted in even if it existed (did it have an FL like ours?), but there is no question that were this fact established (and it is currently considered admissible I am told) then the simple minded argument noted above becomes less persuasive.

All in all then, the first kind of Evo motivation for a simpler FL/UG, though not nothing, is not particularly dispositive (some might even think it downright weak (and we might not be able to strongly rebut this churlish skepticism)). 

But there is a second argument, and I would like to spotlight it here. The second argument is that wheneverit arose it has remained stable since its inception. In other words, FL/UG has been conserved in the species since it arose. How do we know this? Well, largely because any human kid can learn any human language in effectively the same way if prompted by the relevant linguistic input. We should be very surprised that this is so if indeed FL/UG is a very complex system that slowly arose via NS. Why? Because if it did so slowly arise, why did it suddenly STOP evolving. Why don’t we have various FL/UGs with different human groups enjoying bespoke FL/UGs specially tailored to optimally fit the peccadillos of their respective languages or dialects? Why don’t we have ethnically demarcated FL/UGs, some of which are ultra sensitive to rich morphology and some more sensitive to linear properties of strings? In other words, if FL/UG is complex why is it basically the sameacross the species, even in groups that have been relatively isolated from other human groups over longish periods of time. Note, the problem of stability is the flip side of the problem of recency. If large swaths of time make for easier gradual selection stories, they also exacerbate the problem of stability. Stasis in the face of environmental diversity (and linguistic environments sure have the appearanceof boundless diversity, as my typologically inclined colleagues never tire of reminding me) is a problem when gradual NS is taken to shape genetic material to optimally fit environmental demands. 

Curiously, the fact of stability over large periods of Evo time has become a focus of interest in the Evo world (think of Hox genes). The term of art for this sort of stability is “strong conservation” and the phenomenon of interest has been the strong conservation of certain basic genetic mechanisms over extremely long periods of Evo time. I just read about another one of these strongly conserved mechanisms in Quanta (here). The relevant conserved mechanism is one that explains biological patterns like those that regulate “[t]he development of mammalian hair, the feathers of birds and even those ridges on the roof of your mouth” (2). It is a mechanism that Turing first mooted before anyone knew much about genes or development or much else of our contemporary bio wisdom (boy was this guy smart!). There are two interesting features of these Turing Mechanisms (TMs). First, they are very strongly conserved (as we shall see) and second, they are very simple. In what follows I would like to moot a claim that is implicit in the Quanta discussion: that simplicity enables strong conservation. You can see why I like this idea. It provides a biological motivation for “simple” mechanisms that seems relevant to the language case. Let me discuss the article a bit.

It makes several observations. 

First, the relevant TM, what is called a “reaction-diffusion” mechanism is “beautifully simple.” Here is the description (2):

It requires only two interacting agents, an activator and an inhibitor, that diffuse through tissue like ink dropped in water. The activator initiates some process, like the formation of a spot, and promotes the production of itself. The inhibitor halts both actions. 

Despite this simplicity, the process can regulate widely disparate kinds of patterns: “spaced dots, stripes, and other patterns” including the pattern of feathers on birds, hair, and, of relevance in the article, denticles (the skin patterning) on sharks (2). 

Second, this mechanism is very strongly conserved. As the same TM regulates bird feathers and denticles then we are talking about a mechanism conserved over hundreds of millions of years (4). As the article puts it quoting the author of the study (2):

According to Gareth Fraser, the researcher who led the study, the work suggests that the developing embryos of diverse backboned species set down patterns of features in their outer layers of tissue in the same way — a patterning mechanism “that likely evolved with the first vertebrates and has changed very little since.”

Third, the simplicity of the basic pattern forming mechanism does not preclude variation of patterns. Quite the contrary in fact. The simplicity of the mechanism lends itself to accommodating variation. Here is a longish quote (6):

To test whether a Turing-like mechanism could create the wide range of denticle patterns seen in other sharks and their kin, the researchers tweaked the production, degradation and diffusion rates of the activator and inhibitor in their model. They found that relatively simple changes could produce patterns that matched much of the diversity seen in this lineage. The skates, for example, tend to have more sparsely patterned denticles; by either increasing the diffusion rate or decreasing the degradation rate of the inhibitor, the researchers could make more sparse patterns emerge.
Once the initial pattern is set, other, non-Turing mechanisms complete the transformation of these rows into fully formed denticles, feathers or other epithelial appendages. “You have these deeply conserved master regulator mechanisms that act early on in the development of these appendages,” Boisvert explained, “but downstream, species-specific mechanisms kick in to refine that structure.” Still, Boisvert stressed how remarkable it is that the mechanism underlying so many different biological patterns was theorized “by a mathematician with no biological training, at a time when little about molecular biology was understood.”
So, the simple mechanisms can be tweaked to generate pattern diversity and can be easily combined with other downstream non-TM “species-specific” mechanisms to “refine the structure” the basic TM lays down.
Fourth, the similarity of mechanism exists despite a wide variety of functions supported. Feathers are not hairs, and hairs and feathers are not denticles. They served different functions, yet formally they are generated by the same mechanism. In other words, the similarity is formal not functional and it is at this abstract formal (think “syntactic”) level that the common biological basis of these traits is revealed.
Fifth, the discovery of TMs like this one (and Hox, I assume) “bolsters a growing theme in developmental biology that “nature tends to invent something once, and plays variations on that theme”” (quote is from Alexander Schier of Harvard bio). 
Sixth, the article moots the main point relevant to this wandering disquisition; that the reason TMs are conserved is because they are so very simple (6):
Turing mechanisms are theoretically not the only ways to build patterns, but nature seems to favor them. According to Fraser, the reliance on this mechanism by so many far-flung groups of organisms suggests that some kind of constraint may be at work. “There simply may not be many ways in which you can pattern something,” he said. Once a system emerges, especially one as simple and powerful as a Turing mechanism (my emphasis, NH), nature runs with it and doesn’t look back.
What makes the mechanism simple? Well, one that is relevant for linguists of the MP stripe is that you really cannot take part of the reaction-diffusion function and get it to work at all. You need both parts to generate a pattern and you need nothing but these two parts to generate the wide range of patterns attested.[2]In other words, half a generation diffusion pattern does you no good and once you have one you need nothing more (see first quoted passage above). I hope that this sounds familiar (don’t worry, I will return to this in a moment).
I think that each point made is very linguistically suggestive, and we could do worse than absorb these suggestions as regulative ideals for theoretical work in linguistics moving forward. Let me elaborate.
First, simplicity of mechanism can account for stability of that mechanism in that simple mechanisms are easily conservable. Why? Because they are the minimum required to generate the relevant patterns (the reaction-diffusion pattern is as simple a system as one needs to generated a wide variety of patterns). Being minimal means that so long as such patterns eventuate in functionally useful structure at leastthis much will be needed. And given that simple generative procedures combine nicely with other more specific “rules” they will be able to accommodate both variation and species-specific bespoke adjustments. Simple rules then are both stable (because simple) and play well with others (because they can be added onto) and that is what makes them very biologically useful.[3]  
IMO, this carries over to operations like Merge perfectly. Merge based dependencies come in a wide variety of flavors. Indeed, IMO, phrase structure, movement, binding, control, c-selection, constituency, structure dependence, case, theta assignment all supervene on merge based structures (again, IMO!). This is a wide variety of different linguistic functions all built on the same basic Merge generated pattern. Moreover, it is compatible with a large amount of language specific variation, variation that will be typically coded into lexical specifications. In effect, Merge creates an envelope of possibilities that lexical features will choose among. The analogy to the above Turing Mechanisms and the specificity of hair vs skin vs feathers should be obvious.
Second, Merge, like TMs, is a very simple recursive function. What does it do? All it does is combine two expressions and nothing more! It doesn’t change the expressions in combining them I any way. It doesn’t do anything butcombine them (e.g. adds no linear information). So if you want a combination operation then Merge will be as simple an operation as you could ask for. This very simplicity and the fact that it can generate a wide range of functionally useful dependencies is what makes it stable, on a par with TMs.
Third, we should steal a page from the biologists and assume that “nature tends to invent something once.” In the linguistic context this means we should be very wary of generative redundancy in FL/UG, of having different generative operations serving the same kinds of structural ends. So, we should be very suspicious of theories that multiply ways of establishing non-local dependencies (e.g. bothI-merge andAgree under Probing) or two ways of forming relative clauses (e.g. both matching (Agree) and raising (i.e. I-merge)).[4]In other words, if Merge is required to generate phrase structure and it also suffices to generate non-local dependencies then we should not immediately assume that we have otherways of generating these non-local dependencies. It seems that nature is Okhamist, and so venerating Okham is both methodologically andmetaphysically (i.e. biologically, linguistically) condign.
Fourth, it is hard to read this article and not recognize that the theoretical temperament behind Turing’s conjectures about mechanism is very similar to those that motivate Chomsky. Here is a nice version that theoretical sentiment (6):
“Biological diversity, across the board, is based on a fairly restricted set of principles that seem to work and are reused over and over again in evolution,” said Fraser. Nature, in all its exuberant inventiveness, may be more conservative than we thought.
And all that linguistic diversity we regularly survey might also be the output of a very restricted set of very simple Generative Procedures. That is the MP hope (and as I have noted, IMO it has been reasonably well vindicated (as I have argued in various papers recently released or forthcoming)), and it is nice to see that it is finding a home in mainstream biology.[5]
Enough. The problem of stability of FL/UG smells a lot like the problem of deep conservation in biology. It also sseems like simplicity might have something to say about why this might be the case. If so, the second motivation for MP simplicity might just have some non-trivial biological motivation.[6]
[1]It is likely worse than this. As Jerry Fodor often noted, we are doubly removed from the basic mechanisms in that genes grow brains and brains secrete minds. The inference from behavior to genes thus must transit through tacit assumptions about how brains subvene minds. We know very little about this in general and especially little about how brains support linguistic cognition. Hence, all inferences from phenotypic simplicity to genetic simplicity are necessarily tenuous. Of course, if this is the best that one can do, one does it realizing the pitfalls. Hence this is not a critique, just an observation, and one, apparently, that extends to virtually every attempt to ground “behavior” in genes (as Lewontin long ago noted). 
[2]Here’s another thought to chew on: it is the generative procedure that is the same (a reaction-diffusion mechanism) not the outputs. So it is the functions in intentionthat are conserved notthe extensions thereof, which are very different.
[3]I cannot currently spell this out but I suspect that simplicity ties in with modularity. You get a simple mechanism and it easily combines with others to create complexity. If modularity is related to evolvability (which sure smells right) then simplicity will be the kind of property that evolving systems prize.
[4]This is one reason I am a fan of Sportiche’s recent efforts to reanalyze all relativization in terms of raising (aka, I-merge). More specifically, we should resist the temptation to assume that when we see different constructions evincing different patterns that the generative procedures underlying these patterns are fundamentally different.
[5]And we got there first. It is interesting to see that Chomsky’s reasoning is being recapitulated inside biology. Indeed, contrary to the often voiced complaint that linguistics is out of step with the leading ideas in biology, it seems to have been very much ahead of the curve. 
[6]Of course, it does not need this to be an important ideal. Methodological virtue also prizes simplicity. But this is different, and if tenable, important.