Thursday, September 28, 2017

Physics envy and the dream of an interpretable theory

I have long believed that physics envy is an excellent foundation for linguistic inquiry (see here). Why? Because physics is the paradigmatic science. Hence, if it is ok to do something there it’s ok to do it anywhere else in the sciences (e.g. including in the cog-neuro (CN) sciences, including linguistics) and if a suggested methodological precept fails for physics, then others (including CNers) have every right to treat it with disdain. Here’s a useful prophylactic against methodological sadists: Try your methodological dicta out on physics before you encumber the rest of us with them. Down with methodological dualism!

However, my envy goes further: I have often looked to (popular) discussions about hot topics in physical theory to fuel my own speculations. And recently I ran across a stimulating suggestive piece about how some are trying to rebuild quantum theory from the ground up using simple physical principles (QTFSPP) (here). The discussion is interesting for me in that it leads to a plausible suggestion for how to enrich minimalist practice. Let me elaborate.

The consensus opinion among physicists is that nobody really understands quantum mechanics (QM). Feynman is alleged to have said that anyone who claims to understand it, doesn’t. And though he appears not to have said exactly this (see here section 9), it's a widely shared sentiment. Nonetheless, QM (or the Standard Theory) is, apparently, the most empirically successful theory ever devised. So, we have a theory that works yet we have no real clarity as to why it works. Some (IMO, rightly) find this a challenge. In response they have decided to reconstitute QM on new foundations. Interestingly, what is described are efforts to recapture the main effects of QM within theories with more natural starting points/axioms. The aim, in other words, is reminiscent of the Minimalist Program (MP): construct theories that have the characteristic signature properties of QM but are grounded in more interpretable axioms. What’s this mean? First let’s take a peak at a couple of examples from the article and then return to MP.

A prominent contrast within physics is between QM and Relativity. The latter (the piece mentions special relativity) is based on two fundamental principles that are easy to understand and from which all the weird and wonderful effects of relativity follow. The two principles are: (1) the speed of light is constant and (2) the laws of physics are the same for two observers moving at constant speed relative to one another (or, no frame of reference is privileged when it comes to doing physics). Grant these two principles and the rest follows. As QTFSPP outs it: “Not only are the axioms simple, but we can see at once what they mean in physical terms” (my emphasis, NH) (5).

Standard theories of QM fail to be physically perspicuous and the aim of reconstructionists is to remedy this by finding principles to ground QM as natural and physically transparent as those that Einstein found for special relativity.  The proposals are fascinating. Here are a couple:

One theorist, Lucien Hardy, proposed focusing on “the probabilities that relate the possible states of a system with the chance of observing each state in a measurement” (6). The proposal consists of a set of probabilistic rules about “how systems can carry information and how they can be combined and interconverted” (7). The claim was that “the simplest possible theory to describe such systems is quantum mechanics, with all its characteristic phenomena such as wavelike interference and entanglement…” (8). Can any MPer fail to reverberate to the phrase “the simplest possible theory”? At any rate, on this approach, QMs is fundamentally probabilistic and how probabilities mediate the conversion between states of the system are taken as the basic of the theory.  I cannot say that I understand what this entails, but I think I get the general idea and how if this were to work it would serve to explain why QM has some of the odd properties it does.

Another reconstruction takes three basic principles to generate a theory of QM. Here’s QTFSPP quoting a physicist named Jacques Pienaar: “Loosely speaking, their principles state that information should be localized in space and time, that systems should be able to encode information about each other, and that every process should be in principle reversible, so that information is conserved.” Apparently, given these assumptions, suitably formalized, leads to theories with “all the familiar quantum behaviors, such as superposition and entanglement.” Pienaar identifies what makes these axioms reasonable/interpretable: “They all pertain directly to the elements of human experience, namely what real experimenters ought to be able to do with systems in their laboratories…” So, specifying conditions on what experimenters can do in their labs leads to systems of data that look QMish. Again, the principles, if correct, rationalize the standard QM effects that we see. Good.

QTFSPP goes over other attempts to ground QM in interpretable axioms. Frankly, I can only follow this, if at all, impressionistically as the details are all quite above my capacities. However, I like the idea. I like the idea of looking for basic axioms that are interpretable (i.e. whose (physical) meaning we can immediately grasp) not merely compact. I want my starting points to make sense too. I want axioms that make sense computationally, whose meaning I can immediately grasp in computational terms. Why? Because, I think that our best theories have what Steven Weinberg described as a kind of inevitability and they have this in virtue of having interpretable foundations. Here’s a quote (see here and links provided there):

…there are explanations and explanations.  We should not be satisfied with a theory that explains the Standard Model in terms of something complicated an arbitrary…To qualify as an explanation, a fundamental theory has to be simple- not necessarily a few short equations, but equations that are based on a simple physical principle…And the theory has to be compelling- it has to give us the feeling that it could scarcely be different from what it is. 

Sensible interpretable axioms are the source of this compulsion. We want first principles that meet the Wheeler T-shirt criteria (after John Wheeler): they make sense and are simple enough to be stated “in one simple sentences that the non sophisticate could understand,” (or, more likely, a few simple sentences). So, with this in mind, what about fundamental starting points for MP accounts. What might these look like?

Well, first, they will not look like the principles of GB. IMO, these principles (more or less) “work,” but they are just too complicated and complex to be fundamental. That’s why GB lacks Weinberg’s inevitability. In fact, it takes little imagination to imagine how GB could “be different.” The central problem with GB principles is that they are ad hoc and have the shape they do precisely because the data happens to have the shape it does. Put differently, were the facts different we could rejigger the principles so that they would come to mirror those facts and not be in any other way the worse off for that. In this regard, GB shares the problem QTFSPP identifies with current QM: “It’s a complex framework, but it’s also an ad hoc patchwork, lacking any obvious physical interpretation or justification” (5).

So, GB can’t be fundamental because it is too much of a hodgepodge. But, as I noted, it works pretty well (IMO, very well actually, though no doubt others would disagree). This is precisely what makes the MP project to develop a simple natural theory with a specified kind of output (viz. a theory with the properties that GB describes) worthwhile.

Ok, given this kind of GB reconstruction project, what kinds of starting points would fit?  I am about to go out on a limb here (fortunately, the fall, when it happens, will not be from a great height!) and suggest a few that I find congenial.

First, the fundamental principle of grammar (FPG)[1]: There is no grammatical action at a distance. What this means is that for two expressions A and B to grammatically interact, they must form a unit. You can see where this is going, I bet: for A and B to G interact, they must Merge.[2]

Second, Merge is the simplest possible operation that unitizes expressions. One way of thinking of this is that all Merge does is make A and B, which are heretofore separate, into a unit. Negatively, this implies that it in no way changes A and B in making them a unit, and does nothing more than make them a unit (e.g. negatively, it imposes no order on A and B as this would be doing more than unitizing them). One can represent this formally as saying that Merge takes A,B and forms the set {A,B}, but this is not because Merge is a set forming operation, but because sets are the kinds of objects that do nothing more than unitize the objects that form the set. They don’t order the elements or change them in any way. Treating Merge (A,B) as creating leaves of a Calder Mobile would have the same effect and so we can say that Merge forms C-mobiles just as well as we can say that it forms sets. At any rate, it is plausible that Merge so conceived is indeed as simple a unitizing operation as can be imagined.

Third, Merge is closed in the domain of its application (i.e. its domain and range are the same). Note that this implies that the outputs of Merge must be analogous to lexical atoms in some sense given the ineluctable assumption that all Merges begin with lexical atoms. The problem is that unitized lexical atoms (the “set”-likeoutputs of Merge) are not themselves lexical atoms and so unless we say something more, Merge is not closed. So, how to close it? By mapping the Merged unit back to one of the elements Merged in composing it. So if we map {A,B} back to A or to B we will have closed the operation in the domain of the primitive atoms. Note that by doing this, we will, in effect, have formed an equivalence class of expressions with the modulus being the lexical atoms. Note, that this, in effect, gives us labels (oh nooooo!), or labeled units (aka, constituents) and endorses an endocentric view of labels. Indeed, closing Merge via labeling in effect creates equivalence classes of expressions centered on the lexical atoms (and more abstract classes if the atoms themselves form higher order classes). Interestingly (at least to me) so closing Merge allows for labeled objects of unbounded hierarchical complexity.[3]

These three principles seem computationally natural. The first imposes a kind of strict locality condition on G interactions. E and I merge adhere to it (and do so strictly given labels). Merge is a simple, very simple, combination operation and closure is a nice natural property for formal systems of (arbitrarily complex) “equations” to have. That they combine to yield unbounded hierarchically structured objects of the right kind (I’ve discussed this before, see here and here) is good as this is what we have been aiming for. Are the principles natural and simple? I think so (at least form a kind of natural computation point of view), but I would wouldn’t I?  At any rate, here’s a stab at what interpretable axioms might look like. I doubt that they are unique, but I don’t really care if they aren’t. The goal is to add interpretatbility to the demands we make on theory, not to insist that there is only one way to understand things.

Nor do we have to stop here. Other simple computational principles include things like the following: (i) shorter dependencies are preferred to longer dependencies (minimality?), (ii) bounded computation is preferred to unbounded computation (phases?), (iii) All features are created equal (the way you discharge/check one is the way you discharge/check all). The idea is then to see how much you get starting from these simple and transparent and computationally natural first principles. If one could derive GBish FLs from this then it would, IMO, go some way towards providing a sense that the way FL is constructed and its myriad apparent complexities are not complexities at all but the unfolding of a simple system adhering to natural computational strictures (snowflakes anyone?). That, at least, is the dream.

I will end here. I am still in the middle of pleasant reverie, having mesmerized myself by this picture. I doubt that others will be as enthralled, but that is not the real point. I think that looking for general interpretable principles on which to found grammatical theory makes sense and that it should be part of any theoretical project. I think that trying to derive the “laws” of GB is the right kind of empirical target. Physics envy prompts this kind of search. Another good reason, IMO, to cultivate it.

[1] I could have said, the central dogma of syntax, but refrained. I have used FPG in talks to great (and hilarious) effect.
[2] Note, that this has the pleasant effect of making AGREE (and probe-goal architectures in general) illicit G operations. Good!
[3] This is not the place to go into this, but the analogy to clock arithmetic is useful. Here too via the notion of equivalence classes it is possible to extend operations defined for some finite base of expressions (1-12) to any number. I would love to be able to say that this is the only feasible way of closing a finite domain, but I doubt that this is so. The other suspects however are clearly linguistically untenable (e.g. mapping any unit to a constant, mapping any unit randomly to some other atom). Maybe there is a nice principle (statable on one simple sentence) that would rule these out.


  1. Interesting ideas. They sound reasonable (or at least "not completely implausible", which is not bad for a starting point). A couple of comments: it can be hard to define simplicity. Are sets simpler than semi-groups or monoids? I have no idea. I am also not sure that searching for the simplest possible foundations necessarily clarifies a subject. As you've remarked many times about chemistry and physics, the grounding of chemistry in quantum physics left many of the core chemical ideas (e.g., valence) essentially unchanged. (Of course I'm neither a mathematician nor a chemist; I may be completely wrong here).

    1. People who are into category theory tend to think, I believe, that that is simpler than either sets or algebraic systems ... the notion of 'equivalent proof/identical arrow' seems to me to be a reasonable idea of what sentence structures are trying to be, a representation of all the different ways that a sentence has the meaning and other kinds of status (sociolinguistic register etc) that it is supposed to have.

    2. 'ways that' => 'ways of proving that'.

    3. @Mark/Avery: I agree with both points. It is hard. That is why I think that the best we can ask for are various kinds of natural starting points, with different implications. For now, that is fine with me. Re sets, one remark: I also thought that the idea behind Merge is that it made sets and that this was because sets, in some sense, were conceptually simplest. But I now think I was wrong in reading things this way (note the admission, a rare thing so enjoy). I think that what the idea was that we ask what properties a simple combination operation would have and a very simple one would do nothing more than combine the elements combined. And this means change them in no way in combining them and doing nothing but combining them (i.e.impose no other properties on the combination). Now, sets respect these two ideas (I think). Other objects might as well. IF they do, then there is nothing to choose between thinking that Merge yields sets rather than these other things (recall, I suggested Calder mobiles). At any rate, I think that this is now a better way to think of what we want from Merge, the search for the right math object may be a bit of a red herring.

      Last point (again): looking for simple foundations might mislead. We don't know till we try. But finding something that seems intuitively simple and that does get us some way into the problem (e.g., merge understood to imply no tampering does get one c-command for movement) is what endows an account with explanatory oomph (via inevitability). So yes, be cautious. But also be bold. This is what we want: natural interpretable axioms with wide empirical reach (i.e. result in theories with the kinds of properties we see).

    4. One problem with sets is that they seem to be loaded with some assumptions that seem a bit extraneous to linguistic purposes, such as extensionality, which can bite people on the bum. This actually happened to the LFG+glue community, when somebody realized (nobody knows who) that thanks to extensionality, if you think of the f-structures and their projections as sets, which is the 'standard' view, then all the empty structures on the 'semantic projection' would be the same object, the empty set = theoretical core meltdown. There are various easy solutions to this problem, so it is not in the end a disaster, but illustrates the danger of importing tools meant for a completely different purpose (making sense out of the foundations of calculus, to the limited extent that I can discern).

  2. Merge can be closed without anything like modular arithmetic provided we properly define its domain. In fact the standard definition of Merge is closed in the domain of syntactic objects (SOs) defined recursively as follows:

    X is an SO iff (a) X is a lexical item, or (b) X is a set of lexical items (cf. Collins and Stabler, 2015)

    In this case, LIs are merely the base case of SOs (Just like 0 or 1 is the base case of the natural numbers)

  3. Correct, it can be. The question is whether we can also define it the other way (I think we can) and what advantages if any might accrue. I believe that the biggest advantage is that it allows us to get recursion tied to the emergence of labelled constituents. In other words we get endocentric phrases as part ot the class of syntactic objects. I like this idea for several reasons. The two biggest is that it allows us to explain why phrases seem to be uniquitous syntactic units. The other is that allows for a simple account of why head to head selection is restricted as it is. There is also a thirs advantage, IMO: it permits us to dispense with non local agree operations whose properties are quite redundant with I-merge. If we could get these three properties out from thinking in modular arithmatic terms and get recursion as a geature too, it seems to me it is worth exploring this path. S yes we can define things recursively via syntactic objects, but if we also do it this other way then we can ask if there are advantages to the difgerent axioms. That was my point.

  4. Because Merge takes two things, A and B, and makes them a unit, it always produces binary branching structures. Jackendoff and other CG types have observed that there are constructions that can't be adequately described by binary branching structures. (Jackendoff's prime example being the PNP construction.) He has also pointed out that in other domains like vision we have no problem processing sets that are ternary, quaternary, etc. I have yet to find a MP response to this argument - but I'd be glad to read one. A citation would do.

    1. This comment has been removed by the author.

    2. Merge as defined is 2-place. But this does not follow from much. This is why we have no deep explanation for binary branching, if it exists. I personally find the evidence compelling (mainly starting with Kayne) but I can imagine there ate puzzles. I am not familir with J’s exanples. What are they?

      The main point; sadly minimalism does little more than stipulate binary branching, rather than explain it. There is not problem defining an N-merge rule where n-ry branching is possible. And this is a problem, but that is life as we are now living it.

    3. Something possibly relevant:

      “John Collins (p.c) has addressed some of this: ‘This is difficult, but at a first stab, I’d say that this [non-binary branching/Merge] would make merge non-uniform, since we know that binary branching is fine for say verbs and their objects, etc. Non-uniformity or a lack of symmetry is kind of an imperfection. Also, given that branching is required, binary is the most economical, as only two things at most need to be in the workspace, as it were. So, perfection is the most economical and uniform meeting of interface conditions by the most general operations we find throughout nature. The interfaces rule out singletons and the empty set, and general uniformity/economy considerations go for binary, given that there must be branching for composition (lack of composition would be an imperfection as set formation would need to be restricted). Thus, something other than binary would thus be an imperfection, as far as we can tell.’”

      Footnote #7 from "No Derivation without Representation", Chapter 14 of The Oxford Handbook of Linguistics Minimalism, C.Boeckx, ed.,2011, p.313.

    4. It's important not to conflate the binarity of Merge with the binarity of the output of Merge. Minimalist grammars, for instance, have Merge as a binary operation, but it is very easy to define a mapping from derivation trees to phrase structure trees that flattens some binary branching structures into n-ary branching ones, e.g. for coordination.

      This is a special case of a more general property: binary trees can be flattened, often without loss of information (e.g. when the trees are also ordered and structural prominence correlates with precedence). The same is also true in the other direction: every n-ary branching tree can be binarized without loss of information, and every n-ary function can be binarized without loss of information (currying/schönfinkelization). So one shouldn't attach too much importance to Jackendoff's argument against binarity, either.

      What's at stake is not whether trees are binary, but how tree structure informs dependency formation. Merge by itself has nothing to say about that, it is only within a system of ancillary assumptions such as the LCA that binarity gets some bite. But then one can't single out binarity as the main failing of the system.

  5. Thanks, Rob, for remembering the exchange:) Indeed, I think Chomsky nowhere says what is supposed to be especially simple or basic about binary merge, as opposed to n-ary merge (he does mention Kayne’s work on paths in BEA and elsewhere, but I think that presupposes rather than explains the specialness of binarity). Still, binary merge does strike me as special, not from a set-theoretical perspective, of course, but from the perspective of Chomsky’s ‘super engineer’. In short, binary merge appears to be the only uniform combination principle that is necessary and sufficient for meeting interface demands. It is necessary because one sometimes does need to put two things together, as opposed to just one thing (as it were) or more than two things. It might also be sufficient – that is an empirical issue concerning how one thinks of self-merge, co-ordination, etc. Crucially, though, given that binary merge is necessary, if it proves insufficient, and n-ary merge (or self-merge as well) is adopted, then one forgoes uniformity, with some structures/applications of principles being unary or ternary, others binary, and so on. So, binary merge is unique in holding out the promise of necessity amounting to sufficiency – the way Leibniz thought God made reality, and God hates waste.

  6. Addendum: I talk about this issue in my Chomsky book and more fully in my Unity of Linguistic Meaning (OUP, 2011), wherein I respond to Jackendoff & Culicover. Apologies for the plug.