Comments

Showing posts with label interpretability. Show all posts
Showing posts with label interpretability. Show all posts

Thursday, September 28, 2017

Physics envy and the dream of an interpretable theory

I have long believed that physics envy is an excellent foundation for linguistic inquiry (see here). Why? Because physics is the paradigmatic science. Hence, if it is ok to do something there it’s ok to do it anywhere else in the sciences (e.g. including in the cog-neuro (CN) sciences, including linguistics) and if a suggested methodological precept fails for physics, then others (including CNers) have every right to treat it with disdain. Here’s a useful prophylactic against methodological sadists: Try your methodological dicta out on physics before you encumber the rest of us with them. Down with methodological dualism!

However, my envy goes further: I have often looked to (popular) discussions about hot topics in physical theory to fuel my own speculations. And recently I ran across a stimulating suggestive piece about how some are trying to rebuild quantum theory from the ground up using simple physical principles (QTFSPP) (here). The discussion is interesting for me in that it leads to a plausible suggestion for how to enrich minimalist practice. Let me elaborate.

The consensus opinion among physicists is that nobody really understands quantum mechanics (QM). Feynman is alleged to have said that anyone who claims to understand it, doesn’t. And though he appears not to have said exactly this (see here section 9), it's a widely shared sentiment. Nonetheless, QM (or the Standard Theory) is, apparently, the most empirically successful theory ever devised. So, we have a theory that works yet we have no real clarity as to why it works. Some (IMO, rightly) find this a challenge. In response they have decided to reconstitute QM on new foundations. Interestingly, what is described are efforts to recapture the main effects of QM within theories with more natural starting points/axioms. The aim, in other words, is reminiscent of the Minimalist Program (MP): construct theories that have the characteristic signature properties of QM but are grounded in more interpretable axioms. What’s this mean? First let’s take a peak at a couple of examples from the article and then return to MP.

A prominent contrast within physics is between QM and Relativity. The latter (the piece mentions special relativity) is based on two fundamental principles that are easy to understand and from which all the weird and wonderful effects of relativity follow. The two principles are: (1) the speed of light is constant and (2) the laws of physics are the same for two observers moving at constant speed relative to one another (or, no frame of reference is privileged when it comes to doing physics). Grant these two principles and the rest follows. As QTFSPP outs it: “Not only are the axioms simple, but we can see at once what they mean in physical terms” (my emphasis, NH) (5).

Standard theories of QM fail to be physically perspicuous and the aim of reconstructionists is to remedy this by finding principles to ground QM as natural and physically transparent as those that Einstein found for special relativity.  The proposals are fascinating. Here are a couple:

One theorist, Lucien Hardy, proposed focusing on “the probabilities that relate the possible states of a system with the chance of observing each state in a measurement” (6). The proposal consists of a set of probabilistic rules about “how systems can carry information and how they can be combined and interconverted” (7). The claim was that “the simplest possible theory to describe such systems is quantum mechanics, with all its characteristic phenomena such as wavelike interference and entanglement…” (8). Can any MPer fail to reverberate to the phrase “the simplest possible theory”? At any rate, on this approach, QMs is fundamentally probabilistic and how probabilities mediate the conversion between states of the system are taken as the basic of the theory.  I cannot say that I understand what this entails, but I think I get the general idea and how if this were to work it would serve to explain why QM has some of the odd properties it does.

Another reconstruction takes three basic principles to generate a theory of QM. Here’s QTFSPP quoting a physicist named Jacques Pienaar: “Loosely speaking, their principles state that information should be localized in space and time, that systems should be able to encode information about each other, and that every process should be in principle reversible, so that information is conserved.” Apparently, given these assumptions, suitably formalized, leads to theories with “all the familiar quantum behaviors, such as superposition and entanglement.” Pienaar identifies what makes these axioms reasonable/interpretable: “They all pertain directly to the elements of human experience, namely what real experimenters ought to be able to do with systems in their laboratories…” So, specifying conditions on what experimenters can do in their labs leads to systems of data that look QMish. Again, the principles, if correct, rationalize the standard QM effects that we see. Good.

QTFSPP goes over other attempts to ground QM in interpretable axioms. Frankly, I can only follow this, if at all, impressionistically as the details are all quite above my capacities. However, I like the idea. I like the idea of looking for basic axioms that are interpretable (i.e. whose (physical) meaning we can immediately grasp) not merely compact. I want my starting points to make sense too. I want axioms that make sense computationally, whose meaning I can immediately grasp in computational terms. Why? Because, I think that our best theories have what Steven Weinberg described as a kind of inevitability and they have this in virtue of having interpretable foundations. Here’s a quote (see here and links provided there):

…there are explanations and explanations.  We should not be satisfied with a theory that explains the Standard Model in terms of something complicated an arbitrary…To qualify as an explanation, a fundamental theory has to be simple- not necessarily a few short equations, but equations that are based on a simple physical principle…And the theory has to be compelling- it has to give us the feeling that it could scarcely be different from what it is. 

Sensible interpretable axioms are the source of this compulsion. We want first principles that meet the Wheeler T-shirt criteria (after John Wheeler): they make sense and are simple enough to be stated “in one simple sentences that the non sophisticate could understand,” (or, more likely, a few simple sentences). So, with this in mind, what about fundamental starting points for MP accounts. What might these look like?

Well, first, they will not look like the principles of GB. IMO, these principles (more or less) “work,” but they are just too complicated and complex to be fundamental. That’s why GB lacks Weinberg’s inevitability. In fact, it takes little imagination to imagine how GB could “be different.” The central problem with GB principles is that they are ad hoc and have the shape they do precisely because the data happens to have the shape it does. Put differently, were the facts different we could rejigger the principles so that they would come to mirror those facts and not be in any other way the worse off for that. In this regard, GB shares the problem QTFSPP identifies with current QM: “It’s a complex framework, but it’s also an ad hoc patchwork, lacking any obvious physical interpretation or justification” (5).

So, GB can’t be fundamental because it is too much of a hodgepodge. But, as I noted, it works pretty well (IMO, very well actually, though no doubt others would disagree). This is precisely what makes the MP project to develop a simple natural theory with a specified kind of output (viz. a theory with the properties that GB describes) worthwhile.

Ok, given this kind of GB reconstruction project, what kinds of starting points would fit?  I am about to go out on a limb here (fortunately, the fall, when it happens, will not be from a great height!) and suggest a few that I find congenial.

First, the fundamental principle of grammar (FPG)[1]: There is no grammatical action at a distance. What this means is that for two expressions A and B to grammatically interact, they must form a unit. You can see where this is going, I bet: for A and B to G interact, they must Merge.[2]

Second, Merge is the simplest possible operation that unitizes expressions. One way of thinking of this is that all Merge does is make A and B, which are heretofore separate, into a unit. Negatively, this implies that it in no way changes A and B in making them a unit, and does nothing more than make them a unit (e.g. negatively, it imposes no order on A and B as this would be doing more than unitizing them). One can represent this formally as saying that Merge takes A,B and forms the set {A,B}, but this is not because Merge is a set forming operation, but because sets are the kinds of objects that do nothing more than unitize the objects that form the set. They don’t order the elements or change them in any way. Treating Merge (A,B) as creating leaves of a Calder Mobile would have the same effect and so we can say that Merge forms C-mobiles just as well as we can say that it forms sets. At any rate, it is plausible that Merge so conceived is indeed as simple a unitizing operation as can be imagined.

Third, Merge is closed in the domain of its application (i.e. its domain and range are the same). Note that this implies that the outputs of Merge must be analogous to lexical atoms in some sense given the ineluctable assumption that all Merges begin with lexical atoms. The problem is that unitized lexical atoms (the “set”-likeoutputs of Merge) are not themselves lexical atoms and so unless we say something more, Merge is not closed. So, how to close it? By mapping the Merged unit back to one of the elements Merged in composing it. So if we map {A,B} back to A or to B we will have closed the operation in the domain of the primitive atoms. Note that by doing this, we will, in effect, have formed an equivalence class of expressions with the modulus being the lexical atoms. Note, that this, in effect, gives us labels (oh nooooo!), or labeled units (aka, constituents) and endorses an endocentric view of labels. Indeed, closing Merge via labeling in effect creates equivalence classes of expressions centered on the lexical atoms (and more abstract classes if the atoms themselves form higher order classes). Interestingly (at least to me) so closing Merge allows for labeled objects of unbounded hierarchical complexity.[3]

These three principles seem computationally natural. The first imposes a kind of strict locality condition on G interactions. E and I merge adhere to it (and do so strictly given labels). Merge is a simple, very simple, combination operation and closure is a nice natural property for formal systems of (arbitrarily complex) “equations” to have. That they combine to yield unbounded hierarchically structured objects of the right kind (I’ve discussed this before, see here and here) is good as this is what we have been aiming for. Are the principles natural and simple? I think so (at least form a kind of natural computation point of view), but I would wouldn’t I?  At any rate, here’s a stab at what interpretable axioms might look like. I doubt that they are unique, but I don’t really care if they aren’t. The goal is to add interpretatbility to the demands we make on theory, not to insist that there is only one way to understand things.

Nor do we have to stop here. Other simple computational principles include things like the following: (i) shorter dependencies are preferred to longer dependencies (minimality?), (ii) bounded computation is preferred to unbounded computation (phases?), (iii) All features are created equal (the way you discharge/check one is the way you discharge/check all). The idea is then to see how much you get starting from these simple and transparent and computationally natural first principles. If one could derive GBish FLs from this then it would, IMO, go some way towards providing a sense that the way FL is constructed and its myriad apparent complexities are not complexities at all but the unfolding of a simple system adhering to natural computational strictures (snowflakes anyone?). That, at least, is the dream.

I will end here. I am still in the middle of pleasant reverie, having mesmerized myself by this picture. I doubt that others will be as enthralled, but that is not the real point. I think that looking for general interpretable principles on which to found grammatical theory makes sense and that it should be part of any theoretical project. I think that trying to derive the “laws” of GB is the right kind of empirical target. Physics envy prompts this kind of search. Another good reason, IMO, to cultivate it.



[1] I could have said, the central dogma of syntax, but refrained. I have used FPG in talks to great (and hilarious) effect.
[2] Note, that this has the pleasant effect of making AGREE (and probe-goal architectures in general) illicit G operations. Good!
[3] This is not the place to go into this, but the analogy to clock arithmetic is useful. Here too via the notion of equivalence classes it is possible to extend operations defined for some finite base of expressions (1-12) to any number. I would love to be able to say that this is the only feasible way of closing a finite domain, but I doubt that this is so. The other suspects however are clearly linguistically untenable (e.g. mapping any unit to a constant, mapping any unit randomly to some other atom). Maybe there is a nice principle (statable on one simple sentence) that would rule these out.