Minimalists are moved by simplicity. But what is it that moves us and are we right to be so moved? What makes a hypothesis simple and why is simpler better? What makes a svelt G or a minimal UG better than its more rococo cousins? Here is a little discussion by Elliot Sober, reprising some of the main themes of a new book on Ockham and his many Razors (here). It makes for some interesting reading. Here are a few comments.
Sober’s big point is that simplicity is not an everywhere virtue. We know this already when it comes to art, where complicated and ornate need not mean poor. However, as Sober notes, unless one is a theist then the scientific virtues of simplicity need defending (as Sober notes, Newton defends simple theories by grounding it in the “perfection of God’s works,” not a form of argument that would be that popular today). As he puts it, how deep a razor cuts “depends in empirical assumptions about the problem.”
I mention this because “simplicity” and Ockham have been important in minimalist discussion and this suggests that arguing for one or another position based on simplicity is ultimately an empirical argument. Therefore, identifying the (implicit) empirical assumptions that license various simplicity claims are important. Sober discusses three useful versions.
The first of Ockham’s Razor’s rests on the claim that simpler theories are often empirically more probable. Thus, for example, if you can attribute a phenomena to a mundane cause rather than an exotic one, go for the common one. Why? Because common causes are common and hence more likely. Sober describes this as “avoid chasing zebras.”
This form of argument occurs quite a lot in linguistic practice. Here’s one personal example. In my experience, linguists love to promote the distinctiveness of the non-English language they are expert in. One of the ways that this is done is by isolating novel looking phenomena and providing them with novel looking analyses. Here is an example.
There is a phenomenon of switch reference (SR) wherein the subject of an embedded or adjunct clause is (or is not) marked as coreferential with the matrix one. SR is generally found in what I would call more “exotic” languages. Thus, for example, English is not generally analyzed as a SR language. But why not? We find cases where subjects of non-matrix clauses are either controlled or obviative wrt higher subjects (e.g. John1 left the party without PRO1/*him1 kissing Mary or John1 would prefer PRO1/for *him1 to leave). When there is a PRO the non-matrix subject must be coreferential with the matrix subject and if there is a pronoun it must be obviative). The English data are typical instances of control. Control phenomena are well studied and common and, so, not particularly recondite. Ockham would suggest treating SR as an instance of control if possible, rather than something special in these “exotic” languages. However, historically, this is not how things have played out. Rather than reduce the “exotic” to the linguistically “common,” analyses have treated SR as a phenomenon apart. All things being equal, Ockham would argue against this move. Don’t go exotic unless absolutely forced to, and even then only very reluctantly.
Consider now a second razor: all the lights in the house go out. Two explanations. Each light bulb burned out vs the house lost power. Both explain why the lights are out. However, the single cause account is preferable. Why? Here’s Sober (7): “Postulating a single common cause is more parsimonious than postulating a large number of independent, separate causes.”
Again, this form of simplicity argument is applicable to linguistic cases. For example, this reasoning underlies Koster’s 1984 argument for unifying A-chains, binding and obligatory control. I have personally found this simplicity argument very compelling (so compelling that I stole the idea and built on it in slightly altered form). Of course it could be that the parallelisms are adventitious. But a single cause is clearly the simpler hypothesis as it would explain why the shared features are shared. Is the simpler account also true? Well who knows? We cannot conclude that the simplest hypothesis is also the true one. We can only conclude that it is the default story, favored until proven faulty, and that we need good reasons to abandon it for a multi-causal account, which, we can see, will have no explanation for the overlapping properties of the “different” constructions.
There is one last razor Sober discusses: “parsimony is relevant to discussing how accurately a model will predict new observations” (8). Put simply, simple hypotheses benefit from not overfitting data. Conversely, the more parameters a theory has, the easier it is for unrepresentative data to mislead it.
This is related to another way that simplicity can matter. Simple theories are useful because they are lead footed. They make predictions. The more subtle or supple a theory is, the more adjustable parameters it has, the more leeway it provides, the less it says. Simple theories are blunt (and brittle), and even if they are wrong, they may not be very wrong. So, theories that cover given empirical ground more successfully, may be paying a high predictive/explanatory price for this success.
Here is another way of making this point. The more supple a theory the more data it can fit. And this is the problem. We want our theories to be brittle and simple theories have less wiggle room. This is what allows them to make relatively clear predictions.
Sober ends his short piece by noting that simplicity needs to be empirically grounded. Put another way, there is no a priori notion of simplicity and it is somewhat indexical. So when we talk simplicity, it is in a certain context of inquiry. I say this because Ockham has come to play an ever larger role in modern syntactic theory in the context of the minimalist program. However, unfortunately, it is not always clear in what way simplicity is to be understood in this context. Sometimes the claim seems to be that some stories should be favored because the concepts they deploy are “simpler” than those being opposed (e.g. sets are simpler than trees), sometimes the claim is that more general theories are to be preferred to those which assume the same mechanism but with some constraints (e.g. the most general conception of merge reduces E and I merge to the same basic operation (the implicit claim being that the most general is the simplest), sometimes it is argued that the simplest operations are the computationally optimal ones (e.g. merge plus inclusiveness plus extension is simpler than any other conception of merge). Whatever the virtues of these claims, they do not appear to be of the standard Ockham’s Razor variety. Let me end with one example that has exercised me for a while.
Chomsky has argued that treating displacement as an instance of merge (I-merge) is simpler than treating it as the combination of merge plus copy. The argument seems to be that there is no “need” for the copy operation once one adopts the simplest conception of merge. The Ockham Razor argument might go as follows: everyone needs an operation that puts two separate expressions together. The simplest version of that operation also has the wherewithal to represent displacement. Hence a theory that assumes a copy operation in addition to this conception of merge is adding s superfluous operation. Or Merge+Copy does no more than Merge alone and so Ockham prefers the second.
But do the two theories adopt the exact same merge operation? Not obviously, at least to me. Merge in the Copy+Merge theory can range over roots alone (call this merge1). Merge in the “simpler” theory (call this merge2) must range over roots and non-roots. Is one domain “simpler” than another? I have no idea. But it seems at least an open question whether having a larger domain makes an operation simpler than one that has a more restricted domain. Question: Is addition ranging over the integers more “complex” than addition ranging over the rationals? Beats me.
One might also argue that Merge2 should be preferred because it allows for one fewer operation in FL (i.e. it does not have or need the Copy operation). However, how serious an objection is this (putting aside whether Merge2 is simpler than Merge1)? Here’s what I mean.
Here is a line of argument: at bottom, simplicity in UG operations matters in minimalism because we assume that the evolutionary emergence of simple structures/operations is easier to explain than the emergence of complexity. The latter requires selection and selection requires time, often lots of time. Thus, if we assume that Merge is the linguistically distinctive special sauce that allowed for the emergence of FL, then we want merge to be simple so that its emergence is explicable. We also want the emergence of FL to bring with it both structure building and displacement operations. So, the emergence of Merge should bring with it hierarchical structure building plus displacement. And postulating Merge2 as the evolutionary innovation suffices to deliver this.
How about if we understand merge along the lines of Merge1? Then to get displacement we need Copy in addition to Merge1. Doesn’t adding Copyas a basic operation add to the evolutionary problem of explaining the emergence of structured hierarchy with displacement? Not necessarily. It all depends on whether the copy operation is linguistically proprietary. If it is, then its emergence needs explanation. However, if Copy is a generic cognitive operation, one that our pre-linguistic ancestors had, then Copy comes for free and we do not need Merge2 to explain how displacement arose in FL. It should arise if we add Merge1 given that Copy is already an available operation. So, from the perspective of Darwin’s Problem, there is no obvious sense in which Merge2 is simpler than Merge1. It all really depends on the pre-linguistic cognitive background.
So that’s it. Sober’s essay (and book that it advertises (and that I am now reading)) is useful and interesting for the minimalistically inclined. Take a look.
 And not only because we are not longer theistically inclined. After all, why does God prefer simple theories to complex ones? I love Rube Goldberg devices. Some even have a profound taste for the complicated. For example, Peter Gay says of Thomas Mann: “Mann did not like to be simple if it was at all possible to be complicated.” So, invoking the deities preferences can only get one so far (unless, perhaps, of one is Newton).
 These are tongue in cheek quotes.
 I should add that I am a fan of Merge2, though I once argued for the combo of Merge1 + Copy. However, my reason for opting for Merge2 is that it might explain something that is a problem for the combo theory (viz. why “movement” is target oriented). This is not the place to go into this, however.