Faculty of Language: Occam

I should have been warned off of Aeon because of the Evans’ fiasco, but the magazine is not really all that culpable. Its content just reflects the stuff that is “out there” and the editors seem to largely act to make topical stuff available for easy reading in one place. This is how I came to read this. It just popped into my mailbox recently one morning and I couldn’t resist. It’s about the relation between truth and beauty, or more accurately, how and whether aesthetic judgments about a theory are or should be taken to be marks of its truth. The author Philip Ball thinks not. I’m not sure that I agree, but it got me thinking, so I thought I would try and write something about it. I’ve done this before (here) and some of what I say rehearses earlier themes. Here it is.

Mr. Ball notes that many scientists (Einstein, Dirac, Weyl (three real biggies), Greene and Arkani-Hamed are mentioned by name) have believed that theoretical beauty is a mark of truth. Some can be quoted as saying that theoretical beauty even trumps empirical verification. Mr. Ball takes these quotes at face value. Greene suggests that what Einstein and Weyl (the two culprits) intended is that beauty should count as having some evaluative power re truth but that “Ultimately, theories are judged how they fare when faced with cold, hard, experimental facts” (p. 3 Ball quoting Greene).[1] I’m not sure what Einstein or Weyl meant. But I personally am interested in how beauty is relevant in theory evaluation, if it is relevant at all. This is, after all, a theme talked up in early minimalism, which I personally found useful. So is beauty linked to truth? And if so, how?

Before approaching these questions, it is worth trying to pin down what it is: what are the marks of theoretical beauty? As Ball notes, scientists themselves are all over the place on this. For some, it’s like porn to supreme court justices (viz. know it when they see it). But Ball’s essay contains at least one attempt at a specification. Arkani-Hamed (AH) (he is, btw, an important young hot shot (here)) associates theoretical beauty with a certain kind of inevitability:

“There are very few principles and there’s no other possible way they could work once you understand them deeply enough” (p. 4, Ball quoting Arkani-Hamed).

Note three things about AH’s proposal: beauty is the combination of (i) simplicity of the Occam variety (few principles) coupled with (ii) a certain kind of modality (no other possible way they could work) and (iii) is not surface visible in that it takes intellectual effort to perceive (“once you understand them deeply enough”). The third is of particular interest wrt Ball’s many comments on the issue, for much of his discussion, I believe, revolves around the fact that beauty is hard to discern, that there can be lots of disagreement and that it is elusive. All fair points, but not really immediately relevant to AH’s point. AH’s view seems to be that finding beauty requires hard intellectual work. It may be the eye of the beholder, but only an eye that’s put in the hours to train itself to see properly. So beauty need not be (and generally won’t be) immediately evident on inspection. Consequently, the fact that what’s beautiful is not immediately discernable or that it is sensitive to cultural mores does not seem all that relevant to AH’s point. That said, let’s concentrate on points (i) and (ii): simple and inevitable.

Occam’s razor is now part of the accepted methodological wisdom. All things being equal (ATE), simpler theories are better than more complex ones. And how is “simple” measured? By the number of axioms or assumptions (how to enumerate these is not trivial but I will put that to one side). So a theory with three axioms trumps one with four and one with four beats one with five ATE. As Ball notes, things are seldom equal, so applying Occam in the wild is never easy. But, it is a principle that all are ready to accept in some form and we should ask why? Why is simpler better, or, more specifically, truer?

In the good old 16^th and 17^th centuries (if not earlier) there was theological justification for this assumption. We live in God’s universe governed by his/her laws. God is an elegant thinker and would never do in a complicated manner what could be done simply (Why not? Is God lazy?).[2] Thus, God’s character guarantees a universe governed by simple laws.[3]

Many of us find this justification hard to accept nowadays, but luckily for us, there are other ways of buying into this viewpoint. Thus, simpler theories are generally more epistemologically solid than are profligate ones. Think of a stool carrying a weight. If three legged, each leg supports 1/3 the load. If four legged, each supports 1/4. Take weight as evidence and legs as axioms and you can see how fewer axioms means greater evidence for each the fewer there are. Bayesians can even formalize this insight, and they have tended to make a big deal out if it. So, something like Occam is something even the theologically fussy can sign onto as a virtue of theory

Improtantly, for my purposes, this epistemological virtue carries metaphysical kudos. If we assume that theories that are better empirically supported are more likely to be true, then simpler theories ATE are more likely to be true. I’m not sure why we think that better supported theories are more likely to be true, but we do think this and this provides at least one link between theoretical simplicity and truth.

Of course, things are much much more complicated. Comparing different theories can be very hard. Indeed, there is no general recipe for how to compare different axioms, definitions, primitives etc. And there seems to be no absolute measure of “simplicity,” or at lest none that garners general assent. But, though there is no general measure, there are lots of local ones and the principle can and does have teeth in many places. Examples: why assume an aether if all goes well without it? Why assume two conceptions of mass if you can get away with one? And, most relevant to us: Why assume internal a syntactic theory with levels if they are not required empirically? Why assume three rule types if one suffices? We all know the drill, and a good one it is, for this kind of simplicity ties elegance together with a notion of “independent evidence,” and every scientist worth her/his salt loves independent evidence (consilience rules!!!).

So, we value simple theories as pointing towards truth and if simplicity is part of beauty, this partly partly explains why we value beauty if it can be had (which, as Ball notes, is not always clear given that things are seldom equal).

There is a further reason to value simple theories: they are easier to explore. Simple theories are often readily intelligible (as things with fewer moving parts often are). So, methodologically, if one’s aim is to find out what’s what then using an instrument (theory) that is understandable (simple) to probe things is ATE better than one that is opaque. Simple ideas are generally more manageable, so they enjoy a kind of methodological or epistemological advantage. But note, this does not imply that they are more often true, though perhaps (weasel word!) a case can be made that they are good ways at getting to truth. If so, beauty may not mark truth but it may be the best road to it.

There is still another way to link simplicity and truth; via explanation. Here’s what I have in mind. What’s the “ugliest” possible theory of anything? I would say it’s a list. Why? Because it has zero explanatory value.[4] The virtue of simple theories is that they seem to carry more explanatory oomph. And what better mark of a theory’s truth than its explanatory power? What we want out of a theory is not only that it cover the data, but that it cover it in such as way as to explain why the data is the way it is.

This locution, “why the data is the way it is” has two related sub-parts with slightly different foci. It means explaining not only why we have the data we happen to have, but also the data we will have and could have. And it means explaining why we don’t have other than the data we have (viz. why the data we don’t see is missing).[5] The main problem with a list, then, is that it does not tell you how to expand itself so as to include what it must and exclude what it must. A simple theory, being general, does. Simple theories can explain and part of being beautiful is having explanatory oomph, which modally relates to what is possible.[6]

Note the explanatory property of simple theories already introduces the modal feature of beauty that AH noted. Why X? Because X is the only way things could have been. Theories explain the actual in terms of the possible. This suggests the following thought: what makes a theory truly beautiful is that it perfectly explains the actual in terms of the possible. Or, put another way, in the best case, the fit between what is possible and what is actual is perfect. All that can be observed has been and all that has been observed can be. There is no spillover.[7]

A joke I’ve told before (see here) illustrates the logic, I think: why are there 1-hump camels and 2-hump camels but not 3,4…N-humped camels? Because there are concave camels and convex camels and that’s all there is. The “joke” illustrates how changing our conception of humps (look at the curves not the humps) allows us to exhaust the kinds of humps+valley combos we expect to find. Looking at the number of humps invites the question of why are there at most 2. Why 2, and not 3,4,…N? The stopping point seems arbitrary. Looking at the curves and seeing them as concave and convex seems to exhaust the space of options (simple curves being one or the other). And this restricted space also coincides with what we find. The possible hump+valley options perfectly fit the attested realizations (viz. 2 and only 2). Thus, once we think in terms of curves, it seems clear that things could not have been otherwise hump/camel-wise (but see comments to above link for problems with this “joke”)..

Now, before I get lots of comments taking the “joke” apart (again, see earlier post), let me try to illustrate what I have in mind with a more relevant syntactic example (there are several others in the earlier post). Chomsky has famously argued that properly understood Merge yields both phrase structure and movement. What is the proper way to understand it? Well, merge is an operation that takes two linguistic items A and B and puts them together (forms a set). What As and Bs? Chomsky says that there are two possibilities: (i) neither A nor B contains the other or (ii) one of A or B contains the other. In the first case, Merge(A,B) yields a constituent like {A,B}. In the other it yields a constituent like {B,{_A… B}}. Thus, if (i) and (ii) exhaust the options then it looks like every possible instance of a very simple combination rule (merge) results in just the two products that we find prominently in Gs (viz. products of PS rules and movement rules). What makes Chomsky’s story attractive is that it looks like the two principle properties of Gs (hierarchy and displacement) follow exhaustively from the possible ways two inputs can fall under the rule. Either the to-be-combined form a part whole relation or they do not. Thus, a simple exhaustive account of the options yields (all and only?) the attested structural dependencies.[8]

I don’t know about you, but if Chomsky is right here, then I consider this to be a very nice kind of account. Why do Gs have both PS rules and movement rules? Because the simplest conception of the combination operation has these two “kinds” of rules (and only these) as consequence.

There are other examples of this kind of thinking in the syntax literature as I discussed here. Nor is this restricted to syntax. It is also a staple of phonological feature theories, where the aim is to adumbrate all and only the possible linguistic sounds and, if I understand Heinz and Idsardi’s Science paper, the range of possible phonological processes.

I think that there is another way of describing what lends such stories AH’s feeling of inevitability. When properly framed, a theory serves to close off further questions. Here’s what I mean. A really satisfying account brings a kind of question closure with it. The concave/vex theory of camels explains why the question of 3 humped camels is, in some sense, ill-formed. Chomsky’s account of structure dependence discussed here not only removes linear processes from the grammar but explains why they could not have been there. Properly understood, such rules cannot be stated within the theory and that’s why they don’t exist. Such rules don’t merely fail to exist, they cannot exist for they cannot be stated. Thus, they are not merely contingently absent, they are necessarily absent. Indeed, when the theory is properly understood, one sees that their possibility is actually inconceivable. It’s this sense of theoretical beauty that AH’s remark pointed to, I am suggesting.

So, is beauty a mark of truth? I think so. The problem is that which conception of beauty is the right one is generally what is theoretically up for grabs. The theorist’s challenge is to provide beautiful accounts; simple stories that exhaustively and completely adumbrate the options that completely describe what in fact happens. Such stories are simple and exhaustive and hence beautiful. So, does beauty count? Sure. Is it the only virtue? No. But IMO it is one often worth sacrificing a few data points to. But you knew I would say that, right?

[1] I am not sure what Greene meant here. “Ultimately” is a very long time. As Keynes noted, ultimately (in the long run) we are all dead. I suspect what Greene is doing here is CYAing a bit. Among themselves, scientists like to talk about the intangibles that count towards theory evaluation. To the public, they like to appear hard-headed so that they can crap on the artsy-fartsy emotive types, especially those with religious or spiritual inclinations. Talking about experimental facts serves this aim well. In truth, everyone wants new data to back up interesting claims and everyone wants theories that are not clunky. How these different features get weighted at any particular time is an art, not the product of algorithm. So, here Greene, a well known purveyor of beauty in theory, is trying to cover his public posterior.

[2] In this regard God is the anti-Thomas Mann, of whom Peter Gay said: “Mann did not like to be simple if it was at all possible to be complicated.”

[3] I never quite understood why these thinkers thought they knew God’s aesthetic preferences or work habits. But, it seems that they could and did.

[4] For the statistically inclined compare histograms with the statistics that describe them. The former represents actual data points. It does not explain them in any way. The latter carries some explanatory force as it not only “accounts” for the histograms but (in the best case) describes where a possible data point can and cannot fall.

[5] This has obvious relationship to the issue of negative data near and dear to a linguist’s heart.

[6] The link between explanatorieness and truth gets us into deep waters very quickly: why after all should the universe be comprehensible? Damn if I know. Descarte’s benevolent deity? Darwin? Dumb luck?

[7] Needless to say, this cannot be understood as every possible instance has been observed, rather an instance of every type has been observed and no instance of an impossible type has been.

[8] I am not sure whether this argument is in fact accurate. Prima facie, there are more kinds of grammatical rules (e.g. construal, agreement, deletion). My own view is that the right next step is to try and unify construal, etc. with movement in some way. Doing this would then derive all possible G operations from Merge as Chomsky wants to do. However, I believe that my desires here are idiosyncratic outliers theoretically. However, unless this is done, Chomsky’s “derivation” is not complete or exhaustive and so, not “perfect.”

Faculty of Language

Comments

Thursday, January 15, 2015

Truth and beauty

Contributors