Monday, May 23, 2016

The return of behaviorism

There is a resurgence of vulgar Empiricism (E). It’s rampant now, but be patient, it will soon die out as the groundless extravagant claims made on its behalf will soon be seen to, yet again, prove sterile. But it is back and getting airing in the popular press.

Of the above, the only part that is likely difficult to understand is what I intend by ‘vulgar.’ I am not a big fan of the E-weltanschauung, but even within Empiricism there are more and less sophisticated versions. The least sophisticated in the mental sciences is some version of behaviorism (B). What marks it out as particularly vulgar? Its complete repudiation of mental representations (MR). Most of the famous E philosophers (Locke and Hume for example) were not averse to MRs. They had no problem believing that the world, through the senses, produces representations in the mind and that these representations are causally implicated in much of cognitive behavior. What differentiates classical E from classical Rationalism (R) is not MRs but the degree to which MRs are structured by experience alone. For E, the MR structure pretty closely tracks the structure of the environmental input as sampled by the senses. For R, the structure of MRs reflects innate properties of the mind in combination with what the senses provide of the environmental landscape. This is what the debate about blank/wax tablets is all about. Not whether the mind has MRs but whether the properties of the MRs we have reduce to sensory properties (statistical or otherwise) of the environment. Es say ‘yes,’ Rs ‘no.’

Actually this is a bit of a caricature. Everyone believes that the brain/mind brings something to the table. Thus, nobody thinks that the brain/mind is unstructured as such brains/minds cannot generalize and everyone believes that brains/minds that do not generalize cannot acquire/learn anything. The question then is really how structured is the brain/mind. For Es the mind/brain is largely a near perfect absorber of environmental information with some statistical smoothing techniques thrown in. For R extracting useful information from sensory input requires a whole lot of given/innate structure to support the inductions required. Thus, for Es the gap between what you perceive and what you acquire is pretty slim, while for Rs the gap is quite wide and bridging this gap requires a lot of pre-packaged knowledge. So everyone is a nativist. The debate is what kinds of native structure is imputed.

If this is right, the logical conclusion of E is B. In particular, in the limit, the mind brings nothing but the capacity to perfectly reflect environmental input to cognition. And if this is so, then all talk of MRs is just a convenient way of coding environmental input and its statistical regularities. And if so, MRs are actually dispensable and so we can (and should) dump reference to them. This was Skinner’s gambit. B takes all the E talk of MRs as theoretically nugatory given that all MRs do is recapitulate the structure of the environment as sampled by the senses. MRs, on this view, are just summaries of experience and are explanatorily eliminable. The logical conclusion, the one that B endorses, is to dump the representational middlemen (i.e. MRs) that stand between the environment and behavior. All the brain is, on this view, is a way of mapping between stimulus inputs and behavior, all the talk of MRs just being misleading ways of talking about the history of stimuli. Or, we don’t need this talk of minds and the MR talk it suggests, we can just think of the brain as a giant I/O device that “somehow” maps stimuli to behaviors.

Note, that without representations there is no real place for information processing and the computer picture of the mind. Indeed, this is exactly the point that critics of E and B have long made (e.g. Chomsky, Fodor, and Gallistel to name three of my favorites). But, of course the argument can be aimed in the reverse direction (as Jerry Fodor sagely noted someone’s modus ponens can be someone else’s modus tollens): ‘If B then the brain does not process information’ (i.e. the opposite of ‘If the brain processes info then not B’). And this is what I mean by the resurgence of vulgar E. B is back, and getting popular press.

 Aeon has a recent piece against the view of the brain as an information processing device (here). The author is Robert Epstein. The view is B through and through. The brain is just a vehicle for pairing inputs with behaviors based on reward (no, I am not kidding).  Here is the relevant quote (13) :

As we navigate through the world, we are changed by a variety of experiences. Of special note are experiences of three types: (1) we observe what is happening around us (other people behaving, sounds of music, instructions directed at us, words on pages, images on screens); (2) we are exposed to the pairing of unimportant stimuli (such as sirens) with important stimuli (such as the appearance of police cars); (3) we are punished or rewarded for behaving in certain ways.

No MRs mediate input and output. I/O is all there is. 

Misleading headlines notwithstanding, no one really has the slightest idea how the brain changes after we have learned to sing a song or recite a poem. But neither the song nor the poem has been ‘stored’ in it. The brain has simply changed in an orderly way that now allows us to sing the song or recite the poem under certain conditions. When called on to perform, neither the song nor the poem is in any sense ‘retrieved’ from anywhere in the brain, any more than my finger movements are ‘retrieved’ when I tap my finger on my desk. We simply sing or recite – no retrieval necessary (14).

No need for memory banks or MRs. All we need is “the brain to change in an orderly way as a result of our experiences” (17). So sensory inputs, rewards, behavioral outputs. And the brain? That organ that mediates this process. Skinner must be schepping nachas!

Let me end with a couple of references and observations.

First, there are several very good long detailed critiques of this Epstein piece out there (Thx to Bill Idsardi for sending them my way). Here and here are two useful ones. I take heart in these quick replies for it seems that this time around there are a large number of people who appreciate just how vulgar B conceptions of the brain are. Aeon, which published this piece, is, I have concluded, a serious source of scientific disinformation. Anything printed therein should be treated with the utmost care, and, if it is on cog-neuro topics, the presumption must be that it is junk. Recall that Vyvyan Evans found a home here too. And talk about junk!

Second, there is something logically pleasing about articles like Epstein’s; they do take an idea to its logical conclusion. B really is the natural endpoint of E. Intellectually, it’s vulgarity is a virtue for it displays what much E succeeds in hiding. Critics of E (especially Randy and Jerry) have noted its lack of fit with the leading ideas of computational approaches to neuro-cognition. In an odd way, the Epstein piece agrees with these critiques. It agrees that the logical terminus of E (i.e. B) is inimical with the information processing view of the brain. If this is right, the brain has no intrinsic structure. It is “empty,” a mere bit of meat serving as physiological venue for combining experience and reward with an eye towards behavior. Randy and Jerry and Noam (and moi!) could not agree more. On this behaviorist view of things the brain is empty and pretty simple. And that’s the problem with this view. The Epstein piece has the logic right, it just doesn’t recognize a reductio, no matter how glaring.

Third, the piece identifies B’s fellow travellers. So, not surprisingly embodied cognition makes an appearance and the piece is more than a bit redolent of connectionist obfuscation. In the old days, connectionists liked to make holistic pronouncements about the opacity of the inner workings of the neural nets. This gave it a nice anti-reductionist feel and legislated questions about how the innards of the system worked unaskable. It gave the whole theory a kind of new age, post-modern gloss with an Aquarian appeal. Well, the Epstein piece assembles the same cast of characters in roughly the same way.

Last observation: the critiques I linked to above both dwell on how misinformed this piece is. I agree. There is very little argumentation and what there is, is amazingly thin. I am not surprised, really. It is hard to make a good case for E in general and B in particular. Chomsky’s justly famous review of Skinner’s Verbal Behavior demonstrated this in detail. Nonetheless, E is back. If this be so, for my money, I prefer the vulgar forms, the ones that flaunt the basic flaws. And if you are looking for a good version of a really bad set of Eish ideas, the Epstein article is the one for you.

Sunday, May 22, 2016

Closing open access?

For those that have heard, it seems that Elsevier is getting its hands on one of the largest open source feeds in the social sciences. This, recall, is the same publisher that saw a defection of the Lingua editorial board. At any rate, it seems that Elsevier is at it again. They have just acquired SSRN (here, here and here).  The

Elsevier claims that it will leave everything as it is. The reaction to this is, at best, wary and skeptical. At any rate, you might want to take a look and weigh in on the pros and cons.

To help the debate along, I found a pretty good defense of the value added that big publishers bring to the scholarly enterprise (see here). So, even if Elsevier does overcharge and make obscene profits from what is largely the counter work of others, maybe they do provide a useful service or two. Take a look.

Tuesday, May 17, 2016

Slow profs and the crapification of the university

Karthik sent me this link to a piece on the slowing down the pace of university life; the academic analogue of the slow food movement. The piece makes the case that the basic function of the university is being sacrificed to a progressive corporatization of university life where more papers in more journals, more talks at more conferences, more grants from more sources sets the tone of current academic life. This leads, it is argued, to a place where there is little room to think because everyone is rushing around trying to meet the ever higher "standards" of success. Here's a quote from the slow prof whose book is being reviewed:

"The corporate university's language of new findings, technology transfer, knowledge economy, grant generation, frontier research, efficiency, and accountability dominates how academic scholarship is now framed both within the institution and outside it."

I have more than a little sympathy for this view of things. Moreover, as Karthik noted in his email to me, this is not just the fault of faceless bureaucrats out there running the place. Much of the professoriate has internalized this conception, taking bigger CVs and more grant money to be important marks of success. Some of this is in response to pressures from above regarding hiring, promotion and tenure. However, some of it reflects an acceptance of these standards.

In my experience, what has become much harder to find than it used to be is time to think. Thinking by its nature involves lost time. It is not immediately productive. It is desultory, not clearly aimed in a particular direction. It seems self-indulgent. Despite this, IMO, it is critical to enlarge the imagination and it is necessary for producing the best work. And we have less and less of this.

And this not only affects the profs. When I was a grad student, I had lots of time to waste. I find that students today have much less time than I did to just sit around and waste time thinking, talking, exchanging inane possibilities, weeding these out, joking, intellectually playing etc. There is less unorganized intellectual life in departments than there used to be. It's a little like what one finds with parents and play dates. It used to be that kids just got together and played. Now, they also play, but the activities are organized, supervised, scheduled  etc. So we have plenty of lab meetings, mentoring sessions, talks, colloquia, conferences, but relatively little time for just chat. There is a perception that just sitting around and talking is wasted time, impeding productive research and serious inquiry. I agree. It is not always serious, which is why it is so important.

There is one other reason for the changed academic atmosphere: counting lines on a CV is easy. And it is also "fair" in that it removes judgment. Absent mechanical procedures like counting CV lines, we can only evaluate one another's work using judgment. And so long as not everyone who wants one gets an academic job, we will have to evaluate each other and makes choices. Judgment is messy. Life is easier if all we need to do is count. The slow prof notes that counting comes with its own costs, and serious thinking might be one of them.

Is there anything we can do? Not sure. It's easy to say slow down and take time to think. However, given the realities, this is not a luxury the untenured can afford. Nor, given the realities, is it something that the unemployed can afford. Moreover, it seems to be something that not even the comfortably tenured can enjoy.

Last point: when I first entered academic life many debts had individuals considered very valuable but who did not publish much. They talked to everyone, spurred debates, stirred the intellectual pots, read a lot, kibitzed and more. Moreover, and this is important, they were very highly valued academic citizens. Such people are today unhirable and certainly untenurable. Too bad.

Grad school?

Color me sheepish. I misspelled Rachael Tatman's name. I left out the  'a'. I am sorry. I will now correct it but this is an apology.

Rachael Tatman has written a very thoughtful post on going to grad school in linguistics (here). She is currently a ling grad student and her post considers the question of whether it is worth going to grad school in linguistics given the dim prospects of landing a tenure track job in an academic linguistics department. As she points out, the odds are stacked against this possibility, so going to grad school is akin to buying a lottery ticket if one’s hope is for a permanent academic appointment that can sustain a semi-decent standard of living.  I have a couple of comments on her piece, but I urge you to take a look. It is an excellent post. Some comments.

First, this horrid job market is a long-standing problem. It occurred when I was looking for work in the early 1980s as well. I recall that at that time grad schools would send out letters with acceptances noting the paucity of academic jobs while all the while noting the intellectual stimulation that grad school would provide. If anything, things have gotten worse. This is especially so given that there are many part time/adjunct jobs that often pay miserably, have no benefits and give the illusion that something better might crop up. For many this never pans out. So, things stink now in a different way than they were lousy in my time.

The paucity of jobs has a second effect, one that I think we might be able to mitigate somewhat. If you leave academia then you are generally also leaving the discipline. This need not be so, but it is. One could imagine dedicated non-academic linguists still enjoying a professional association with linguistics. For example, they could be affiliated with departments even if not paid by them, they would be welcome to conferences and workshops etc. I don’t know how many would partake, but the possibility of not leaving linguistics when not getting a job might be attractive to some who really are doing linguistics because they love the work. For many, it’s the issues and research that is the most attractive feature and this need not become impossible to do in the absence of a paid academic position. However, right now, it seems to me, that there is really no place for the dedicated amateur (i.e. non paid professional).

Third, I cannot tell whether Rachael is suggesting this or not, but one way of making life less distressing is simply to not admit as many grad students to begin with. I personally do not like this option, though most university administrators do. I don’t like it because it ends up seeing graduate education as only instrumentally valuable. What’s good is what trains you for a job. But I don’t see education’s virtues in this way. The work is intellectually interesting and intrinsically rewarding.  The possibility of doing it should be up to the individual, though grad admissions should make clear that though the work can be rewarding, the job prospects are tough.

But, the lousy situation does put more responsibility on grad depts. They (we) should do their best to prevent students from going deeply into hock, i.e. decent stipends should be routine. We should help students do the rewarding stuff well (write papers, go to conferences, provide feedback etc.). We should try to make the grad years really intellectually fulfilling.

We should also encourage MA+PhD degrees, where the MA might lead to employment. Rachel discusses this too insightfully. I agree with her.

Last point, again reiterating a point that Rachael makes: don’t go to grad school unless you really like doing the work. It is hard and frustrating and need not lead anywhere career wise. Go if you like the problems and you like doing research on problems for which there are, as yet, no explanations. Many very smart people don’t like the unsettled nature of basic research. They don’t like working on problems for which there is no back of the book to glance at to find the “right” answer. If this is not your cup of tea, don’t go to grad school. Rachel has made a strong case that going to grad school in linguistics is not a smart career choice. The only good reason to go is the intellectual allure. If this suits you, it’s a great 4-5 years whatever else happens. If no, don’t!

Thursday, May 12, 2016

Is the scientific sky falling?

It seems that everywhere you look science is collapsing. I hope my tongue was visibly in cheek as you read first sentence. There is currently a fad deploring the irreplicability of experiments in various fields. Here’s a lamentation I read lately, the following being the money line:

The deeper problem is that much of cancer research in the lab—maybe even most of it—simply can’t be trusted. The data are corrupt. The findings are unstable. The science doesn’t work.

Why? Because there is “a replication crisis in biomedicine.”

I am actually skeptical about the claims that the scientific sky is falling. But before I get to that, I have to admit to a bit of shadenfreude. Compared to what we see in large parts of psychology, and neuroscience and, if the above is correct, biomedicine, linguistic “data” is amazingly robust and stable. It is easy to get, easy to vet, and easy to replicate.[1] There is no “data” problem in linguistics analogous to what we are hearing exists in the other domains of inquiry. And it is worth thinking about why this is. Here’s my view.

First, FL is a robust mental organ. What I mean by this is that Gs tend to have a large effect on acceptability, and acceptability is something that native speakers are (or can be trained) to judge reliably. This is a big deal. Linguists are lucky in this way. There are occasional problems inferring Gish properties from acceptability judgments, and we ought not to confuse grammaticality with acceptability. However, as a matter of fact, the two often swing in tandem and the contribution of grammaticality to acceptability is very often quite large. This need not have been true, but it appears that it is.

We should be appropriately amazed by this. Many things go into an acceptability judgment. However, it is hard to swamp the G factor. This is almost certainly a reflection of the modular nature of FL and Gish knowledge. Gishness doesn’t give a hoot for the many contextual factors involved in language use. Context matters little, coherence matters little, ease of processing matters little. What really matters is formal kashrut. So when contextual/performance factors do affect acceptability, as they do, they don’t wipe out the effects of G.

Some advice: When you are inclined to think otherwise repeat to yourself colorless green ideas sleep furiously or recall that instinctively eagles that fly swim puts the instinct with the obviously wrong eagle trait. Gs aren’t trumped by sense or pragmatic naturalness and because of this we linguists can use very cheap and dirty methods to get reliable data, in many domain of interest.[2]

So, we are lucky and we do not have a data problem. However, putting my gloating aside, let’s return to the data crises in science. Let me make three points.

First, experiments are always hard. They involve lots of tacit knowledge on the part of the experimenters. Much of this knowledge cannot be written down in notebooks and is part of what it is to get an experiment to run right (see here). It is not surprising that this knowledge gets easily lost and that redoing experiments from long ago become challenging (as the Slate piece makes clear). This need not imply sloppiness or methodological sloth or corruption. Lab notes do not (and likely cannot) record important intangibles, or, if they do, they don’t do so well. Experiments are performances and, as we all know, a score does not record every detail of how to perform a piece. So, even in the best case, experiments, at least complex ones, will be hard to replicate, especially after some time has passed.

Second, IMO, much of the brouhaha occurs in areas where we have confused experiments relevant to science with those relevant to engineering. Science experiments are aimed at isolating basic underlying causal factors. They are not designed to produce useful product. In fact, they are not engineering at all for they generally abstract from precisely those problems that are the most interesting engineering wise. Here's a nice quote from Thornton Fry, once head of Bell Labs math unit:

The mathematician tends to idealize any situation with which he is confronted. His gases are “ideal,” his conductors “perfect, “ his surfaces “smooth” He call this “getting down to the essentials.” The engineer is likely to dub it “ignoring the facts.”

Science experiments are generally investigating the properties of these ideal objects and their experiments are not worried about the fine details that the engineer would rightly worry about. This is a problem when the interest in the findings becomes interesting from an engineering point of view. Here’s the Slate piece:

When cancer research does get tested, it’s almost always by a private research lab. Pharmaceutical and biotech businesses have the money and incentive to proceed—but these companies mostly keep their findings to themselves. (That’s another break in the feedback loop of self-correction.) In 2012, the former head of cancer research at Amgen, Glenn Begley, brought wide attention to this issue when he decided to go public with his findings in a piece for Nature. Over a 10-year stretch, he said, Amgen’s scientists had tried to replicate the findings of 53 “landmark” studies in cancer biology. Just six of them came up with positive results.

I am not trying to suggest that being replicable is a bad idea, but I am suggesting that what counts as a good experiment for scientific purposes might not be one that suffices for engineering purposes. Thus, I would not be at all surprised that there is a much smaller replication crisis in molecular or cell biology than there is in biomedicine, the former further removed from the engineering “promise” of bioscience than the latter. If this is correct, then part of the problem we see might be attributed to the NIH and NSF’s insistence that science payoff (“bench to bed” requirements). Here, IMO, is one of the less positive consequences of the “wider impact” sections of contemporary grants.

Third, at least in some areas, the problem of replication really is a problem of ignorance. When you know very little, an experiment can be very fragile. We try to mitigate the fragility by statistical massaging, but ignorance makes it hard to know what to control for. IMO, domains where we find replicability problems look like domains where our knowledge of the true causal structures is very spotty. This is certainly true of large parts of psychology. It strikes me that the same might hold in biomedicine (medicine being as much an art as a science as anyone who has visited a doctor likely knows). To repeat Eddington’s dictum: never trust an experiment until it’s been verified by theory! Theory poor domains will also be experimentally fragile ones. This does not mean that science is in trouble. It means that not everything we call a science really is.

Let me repeat this point more vigoroulsy: there is a tendency to identify science with certain techniques of investigation: experiments, stats, controls, design etc. But this does not science make. The real sciences are not distinguished by their techniques but are domains where, for some happy reason, we have identified the right idealizations to investigate. Real science arises when our idealizations gain empirical purchase, when they fit. Thinking these up, moreover, is very hard for any number of reasons. Here is one: idealizations rely on abstractions, and some domains lend themselves to abstraction more easily than others. Thus some domains will be more scientifically successful than others. Experiments work and are useful when we have some ideas where the causal joints are and this comes form correctly conceiving of the problems our experiments are constructed to address. Sadly, in most domains if interest, we know little and it should be no surprise that when you don’t know much you can be easily misled even if you are careful.

Let me put this another way: there is a cargo cult conception of science that the end of science lamentations seem to presuppose. Do experiments, stats, controls, be careful etc. and knowledge will come. Science on this view is the careful accumulation and vetting of data. Get the data right and the science will take care of itself. It lives very comfortably with an Empiricist conception of knowledge. IMO, it is wrong. Science arises when we manage to get the problem right. Then these techniques (and they are important) gain traction. We then understand what experiments are telling us. The lamentations we are seeing routinely now about the collapse of science has less to do with the real thing than with our misguided conception of what the enterprise consists in. It is a reflection of the overwhelming dominance of Empiricist ideology, which, at bottom comes down to the belief that insight is just a matter of more and more factual detail. The modern twist on this is tha though one fact might not speak for itself, lots and lots of them do (hence the appeal of big data). What we are finding is that there is no real substitute for insight and thought. This might be unwelcome news to many, but that’s the way it is and always will be. The “crises” is largely a product of the fact that for most domains of interest we have very little idea about what’s going on, and urging more careful attention to experimental detail will not be able to finesse this.

[1] Again see the work by Jon  Sprouse, Diogo Almeida and colleagues on this. The take home message from their work is that what we always thought to be reliable data is in fact reliable data and that our methods of collecting it are largely fine.
[2] This is why stats for data collection is not generally required (or useful). I read a nice quote from Rutherford: If your experiment needs statistics, you ought to have done a better experiment.” You draw the inference.

Sunday, May 8, 2016

Simplicity and Ockham

Minimalists are moved by simplicity. But what is it that moves us and are we right to be so moved? What makes a hypothesis simple and why is simpler better? What makes a svelt G or a minimal UG better than its more rococo cousins? Here is a little discussion by Elliot Sober, reprising some of the main themes of a new book on Ockham and his many Razors (here). It makes for some interesting reading. Here are a few comments.

Sober’s big point is that simplicity is not an everywhere virtue. We know this already when it comes to art, where complicated and ornate need not mean poor. However, as Sober notes, unless one is a theist  then the scientific virtues of simplicity need defending (as Sober notes, Newton defends simple theories by grounding it in the “perfection of God’s works,” not a form of argument that would be that popular today)[1]. As he puts it, how deep a razor cuts “depends in empirical assumptions about the problem.”

I mention this because “simplicity” and Ockham have been important in minimalist discussion and this suggests that arguing for one or another position based on simplicity is ultimately an empirical argument. Therefore, identifying the (implicit) empirical assumptions that license various simplicity claims are important. Sober discusses three useful versions.

The first of Ockham’s Razor’s rests on the claim that simpler theories are often empirically more probable. Thus, for example, if you can attribute a phenomena to a mundane cause rather than an exotic one, go for the common one. Why? Because common causes are common and hence more likely. Sober describes this as “avoid chasing zebras.”

This form of argument occurs quite a lot in linguistic practice.  Here’s one personal example. In my experience, linguists love to promote the distinctiveness of the non-English language they are expert in. One of the ways that this is done is by isolating novel looking phenomena and providing them with novel looking analyses. Here is an example.

There is a phenomenon of switch reference (SR) wherein the subject of an embedded or adjunct clause is (or is not) marked as coreferential with the matrix one. SR is generally found in what I would call more “exotic” languages.[2] Thus, for example, English is not generally analyzed as a SR language. But why not? We find cases where subjects of non-matrix clauses are either controlled or obviative wrt higher subjects (e.g. John1 left the party without PRO1/*him1 kissing Mary or John1 would prefer PRO1/for *him1 to leave). When there is a PRO the non-matrix subject must be coreferential with the matrix subject and if there is a pronoun it must be obviative). The English data are typical instances of control. Control phenomena are well studied and common and, so, not particularly recondite. Ockham would suggest treating SR as an instance of control if possible, rather than something special in these “exotic” languages. However, historically, this is not how things have played out. Rather than reduce the “exotic” to the linguistically “common,” analyses have treated SR as a phenomenon apart. All things being equal, Ockham would argue against this move. Don’t go exotic unless absolutely forced to, and even then only very reluctantly.

Consider now a second razor: all the lights in the house go out. Two explanations. Each light bulb burned out vs the house lost power. Both explain why the lights are out. However, the single cause account is preferable. Why? Here’s Sober (7): “Postulating a single common cause is more parsimonious than postulating a large number of independent, separate causes.”

Again, this form of simplicity argument is applicable to linguistic cases. For example, this reasoning underlies Koster’s 1984 argument for unifying A-chains, binding and obligatory control. I have personally found this simplicity argument very compelling (so compelling that I stole the idea and built on it in slightly altered form). Of course it could be that the parallelisms are adventitious. But a single cause is clearly the simpler hypothesis as it would explain why the shared features are shared. Is the simpler account also true? Well who knows? We cannot conclude that the simplest hypothesis is also the true one. We can only conclude that it is the default story, favored until proven faulty, and that we need good reasons to abandon it for a multi-causal account, which, we can see, will have no explanation for the overlapping properties of the “different” constructions.

There is one last razor Sober discusses: “parsimony is relevant to discussing how accurately a model will predict new observations” (8). Put simply, simple hypotheses benefit from not overfitting data. Conversely, the more parameters a theory has, the easier it is for unrepresentative data to mislead it.

This is related to another way that simplicity can matter. Simple theories are useful because they are lead footed. They make predictions. The more subtle or supple a theory is, the more adjustable parameters it has, the more leeway it provides, the less it says. Simple theories are blunt (and brittle), and even if they are wrong, they may not be very wrong. So, theories that cover given empirical ground more successfully, may be paying a high predictive/explanatory price for this success.

Here is another way of making this point. The more supple a theory the more data it can fit. And this is the problem. We want our theories to be brittle and simple theories have less wiggle room. This is what allows them to make relatively clear predictions.

Sober ends his short piece by noting that simplicity needs to be empirically grounded. Put another way, there is no a priori notion of simplicity and it is somewhat indexical. So when we talk simplicity, it is in a certain context of inquiry. I say this because Ockham has come to play an ever larger role in modern syntactic theory in the context of the minimalist program. However, unfortunately, it is not always clear in what way simplicity is to be understood in this context. Sometimes the claim seems to be that some stories should be favored because the concepts they deploy are “simpler” than those being opposed (e.g. sets are simpler than trees), sometimes the claim is that more general theories are to be preferred to those which assume the same mechanism but with some constraints (e.g. the most general conception of merge reduces E and I merge to the same basic operation (the implicit claim being that the most general is the simplest), sometimes it is argued that the simplest operations are the computationally optimal ones (e.g. merge plus inclusiveness plus extension is simpler than any other conception of merge). Whatever the virtues of these claims, they do not appear to be of the standard Ockham’s Razor variety. Let me end with one example that has exercised me for a while.

 Chomsky has argued that treating displacement as an instance of merge (I-merge) is simpler than treating it as the combination of merge plus copy. The argument seems to be that there is no “need” for the copy operation once one adopts the simplest conception of merge. The Ockham Razor argument might go as follows: everyone needs an operation that puts two separate expressions together. The simplest version of that operation also has the wherewithal to represent displacement. Hence a theory that assumes a copy operation in addition to this conception of merge is adding s superfluous operation. Or Merge+Copy does no more than Merge alone and so Ockham prefers the second.

But do the two theories adopt the exact same merge operation? Not obviously, at least to me. Merge in the Copy+Merge theory can range over roots alone (call this merge1). Merge in the “simpler” theory (call this merge2) must range over roots and non-roots. Is one domain “simpler” than another? I have no idea. But it seems at least an open question whether having a larger domain makes an operation simpler than one that has a more restricted domain. Question: Is addition ranging over the integers more “complex” than addition ranging over the rationals? Beats me.

One might also argue that Merge2 should be preferred because it allows for one fewer operation in FL (i.e. it does not have or need the Copy operation). However, how serious an objection is this (putting aside whether Merge2 is simpler than Merge1)? Here’s what I mean.

Here is a line of argument: at bottom, simplicity in UG operations matters in minimalism because we assume that the evolutionary emergence of simple structures/operations is easier to explain than the emergence of complexity.  The latter requires selection and selection requires time, often lots of time. Thus, if we assume that Merge is the linguistically distinctive special sauce that allowed for the emergence of FL, then we want merge to be simple so that its emergence is explicable. We also want the emergence of FL to bring with it both structure building and displacement operations. So, the emergence of Merge should bring with it hierarchical structure building plus displacement. And postulating Merge2 as the evolutionary innovation suffices to deliver this.

How about if we understand merge along the lines of Merge1? Then to get displacement we need Copy in addition to Merge1. Doesn’t adding Copyas a basic operation add to the evolutionary problem of explaining the emergence of structured hierarchy with displacement? Not necessarily. It all depends on whether the copy operation is linguistically proprietary. If it is, then its emergence needs explanation. However, if Copy is a generic cognitive operation, one that our pre-linguistic ancestors had, then Copy comes for free and we do not need Merge2 to explain how displacement arose in FL. It should arise if we add Merge1 given that Copy is already an available operation. So, from the perspective of Darwin’s Problem, there is no obvious sense in which Merge2 is simpler than Merge1. It all really depends on the pre-linguistic cognitive background.[3]

So that’s it. Sober’s essay (and book that it advertises (and that I am now reading)) is useful and interesting for the minimalistically inclined. Take a look.

[1] And not only because we are not longer theistically inclined. After all, why does God prefer simple theories to complex ones? I love Rube Goldberg devices. Some even have a profound taste for the complicated. For example, Peter Gay says of Thomas Mann: “Mann did not like to be simple if it was at all possible to be complicated.” So, invoking the deities preferences can only get one so far (unless, perhaps, of one is Newton).
[2] These are tongue in cheek quotes.
[3] I should add that I am a fan of Merge2, though I once argued for the combo of Merge1 + Copy. However, my reason for opting for Merge2 is that it might explain something that is a problem for the combo theory (viz. why “movement” is target oriented). This is not the place to go into this, however.