Comments

Monday, October 16, 2017

When Moocs bestrode the world like colossi

The Moocs, it seems, have come and gone. A short but bright life that likely made some people a lot of money, occupied the waking hours of countless deans and provosts and university presidents and promised to radically redesign higher education. Some were skeptical. Some even said as much (and yes, I was one of these! Type ‘Moocs’ on the search box and you will see my little contribution). Here (by John Warner) is an obituary written for Inside Higher Ed. It gloats a little, and with good reason. Now that the owl of Minerva has spread her wings, we see that Moocs were just the latest attempt to mechanize education (click on the Audrey Watters’ time line in the link). Like the others, it flamed for a brief period and disappeared. How do we know this is over with till memories fade and someone else tries to shake the money tree (it seems that “personalized learning” is the next big thing, until it is not)? Because Udacity has thrown in the towel. As Warner writes about Udacity:

From transforming all of higher ed to targeted training in five short years.

It is worth thinking about the Mooc explosion for a while before we happily purge it from our memories. It is worth recalling all the awards offered for this new disruptive technology, all the hype about transforming education, all the frenzy to get in on the action, all the money thrown at those ready to redesign courses to fit the format, all the techno enthusiasm and the hailing of a new messianic age of education. Remember it all, because believe me it is coming again. In fact, it has already come even if we are not sure what it is. So keep this one in mind to inoculate yourself against the next big thing.

We live in the age of techno hucksters. It will come again.

Thursday, October 5, 2017

Pity the non-Chomskyan!

Pity the non Chomskyans! They don’t value their work except in opposition to what Chomsky does (or doesn’t). The only glory they prize is reflected, and they will go to great lengths to sun themselves in it, even to the point of (knowingly?) distorting the mirror in which they reflect themselves. Imagine the irony that someone like me perceives. I am constantly remonstrated with for not sufficiently valuing non GG work and then when I look at some I find that the practitioners themselves only prize their research to the degree that it overturns some GG nostrum and thereby “revolutionizes” the study of language (never a brick in the wall for them, always a complete overturning of the basics). It would appear that for these investigators Chomsky has indeed defined the limits of the interesting in the study of language (a view I have some sympathy with, I would add) and that anything that does not directly address a point that he has (allegedly) made is of little value. Indeed, compared to them, my insistence that one can study language with interests orthogonal to GG’s must seem disingenuous. To non-Chomskyans, Chomsky is everywhere and always and their research is nowhere and never unless it confronts his.

A recent addition to this literature of self-loathing is making quite a splash (here, here, here, here). Part of the splash can be traced to the PR-Academy complex that I mentioned in a previous post (here). Some of the co-authors have Max Planck affiliation and so the powerful PR Wurlitzer has been fully cranked up to spread the word.

However, part of the splash is due to the claim the paper makes that Chomsky is, once again, wrong, more specifically that culture rather than biology is what drives language structure. Of course, as you all know, this is one fork in the intellectual road that any sane person should immediately take. Is it culture or biology? Well, depending on the linguistic feature of interest it could be either, neither or both.  Language, we all know, is a complex thing and the confluence of many different interacting causal forces. Everybody knows this, so it is not news (though it is often intoned as if it were a great discovery, like people noting, sagely, that any given cognitive capacity is the combination of learned and innate factors (duh!)). What is news is finding out which factor predominates for any given property of interest and how it does so. But finding this out in any given case will not (and I can guarantee this) discredit the causal efficacy of other factors in other cases. And, moreover this can hardly be news. So, this is something that GG has acknowledged for a very very very …very long time.  Even if you think, as I do, that biology (widely construed) plays the lead role in restricting the class of Generative Procedures (GP) available to human Gs, you need not think that culture (widely construed) plays no role in determining what a given G looks like. For example, why I have the G that I have is not exclusively due to my having a human FL/UG. I have an Englishy G because I grew up in an English speaking environment, was culturally exposed to Howdy Doody, Captain Kangaroo, and Rocky and Bullwinkle, and read Bill Shakespeare in high school. Many of my G’s idiosyncrasies are similarly cultural (e.g. I am a proud speaker of a dialect in which Topicalization (aka Yiddish movement) runs rampant). But I very much doubt the fact that my Topicalization forming G displays ECP effects has much to do with Rocky, Shakespeare, Bullwinkle or Sholem Aleichem. Here I look to biology to explain why my G obeys the ECP (and for the familiar Poverty of Stimulus reasons which I could go on about for hours (and have)). So biology AND culture, with each playing a more prominent role depending in the phenomenon of interest.

Curiously, this most obvious position is tacitly denied by non-Chomsyans. They act as if Chomskyans must think that anything languagy must reflect innate features of the mind/brain and so that if anything is shown to not be such, then this shows that Chomsky was wrong. And their obsession with showing that Chomsky is wrong suggests that they believe that unless he is, then what they have shown about, say, the influence of cultural mechanisms on some languagy fact is inherently BORING, without any possible intrinsic interest. This, atleast, would neatly explain why non-Chomskyans consistently assume that Chomsky’s position consists in the absurd claim that anything involving language in any way must be innate.

You probably think that I am exaggerating here. But I am not, really. Here is the authors’ summary of the Dunn, Greenhill, Levinson, and Gray (DGLG) paper published in Nature:

Languages vary widely but not without limit. The central goal of linguistics is to describe the diversity of human languages and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language (1, 2). In contrast, other linguists following Greenberg have claimed that there are statistical tendencies for co-occurrence of traits reflecting universal systems biases (3, 4, 5), rather than absolute constraints or parametric variation. Here we use computational phylogenetic methods to address the nature of constraints on linguistic diversity in an evolutionary framework (6). First, contrary to the generative account of parameter setting, we show that the evolution of only a few word-order features of languages are strongly correlated. Second, contrary to the Greenbergian generalizations, we show that most observed functional dependencies between traits are lineage-specific rather than universal tendencies. These findings support the view that—at least with respect to word order—cultural evolution is the primary factor that determines linguistic structure, with the current state of a linguistic system shaping and constraining future states.

Let’s engage in some initial parsing. The paper aims to show that language change (in particular word order changes in diachronically related languages) is path dependent, with different dependencies changing at different rates across different groupings of languages. DGLG concludes from this that the transitions between the languages is not driven by innate features of FL/UG, nor does it reflect systematic universal probabilistic biases. And they conclude form this that Chomsky and Greenberg must be wrong. I am not qualified to discuss Greenberg’s positions in any detail, but I would like to cast a very skeptical eye towards the claims made for Chomsky’s parameter views.

Let’s read the above précis a little more carefully.

First, DGLG focuses on “languages” and the diachronic changes between them. To be  GG/Chomsky relevant, we need to unpack this and relate it to grammars. With this translation we get the following:

Grammars (G) vary widely but not without limit. The central goal of linguistics is to describe the diversity of human Gs and explain the constraints on that diversity. Generative linguists following Chomsky have claimed that linguistic diversity must be constrained by innate parameters that are set as a child learns a language…

Second, I disagree with even this reworked version of DGLG’s claims about the aim of linguistics, at least as GG and Chomsky understand it. The ultimate aim is to describe the structure of FL/UG. A way station towards this end is to understand the structure of human Gs used by speakers of different languages. Hence, describing these (their commonalities and differences) is a useful proximate goal towards the ultimate end. Linguists have traced some of the differences between human languages to differences in the Gs that native speakers use. This implies diverse Gs and this further implies that FL must be capable of acquiring Gs at least as diverse as these (and maybe yet more diverse given that it is unclear that 7,000 (the purported number of languages out there) marks the limit of G diversity). So yes, describing the variety of human languages to the degree that it enables us to describe the variety of human Gs is a useful step in exploring the structure of FL/UG but the ultimate aim of linguistics is to understand the structure of FL/UG not to describe the diversity of human Gs, or “the diversity of human languages and the constraints on that diversity”.[1]

Third, the phrase “constraints on that diversity” is ambiguous. One reading is anodyne and correct. One aim of GG has been to describe principles of grammar invariant across Gs, the idea being that these will reveal the design features of FL/UG.[2] This does not imply that the language products of these Gs will manifest invariant patterns. Missing this is to again confuse the difference between Chomsky and Greenberg Universals. Two Gs may embody the very same principle and yet the products of those Gs might differ greatly. Thus, for example, Rizzi’s early proposal concerning the Fixed Subject Effect is that the ECP (which, let us say underlies the effect) holds in both Italian and English but the two Gs derive subject A’-movement in different ways so that only English perceptibly falls under the scope of the ECP.[3] In other words, Italian manages to derivationally escape the purview of the G invariant ECP and hence does not show Fixed Subject Effects. Note, crucially, Italian does embody the ECP but it does not show Fixed Subject Effects. The “languages” (English vs Italian) differ, the invariant principle (the ECP) is the same.  So the move from invariant principles to invariant language effects is not one that any GGer can or should blithely license. In sum, if you mean that a goal of Chomskyan linguistics is to describe the properties of Gs that arise as products of the design features of FL/UG then you would be correct. But this still leaves distance between this and invariant properties of languages.[4]

So rightly understood, describing G invariants is a proximate goal of GG inquiry. But this must be distinguished from a different project: explaining the limits of diversity. It is entirely possible that Gs have invariant properties without it being the case that there is a limit on G diversity. Let me explain. One of the innovations of LGB is its Principles and Parameters (P&P) architecture. The idea is that FL/UG specifies not only the invariant properties of Gs but all of the ways that Gs could possibly differ. These differences are coded as a finite number of two valued parameters with a given G being a (vector) specification of these specific values. As P&P was understood to have a finite number of parameters, say N, and as they could only bear one of two values this meant that there were at most 2N distinct Gs that FL/UG permitted. On this LGB/P&P conception there is a reasonable sense in which FL/UG could explain the limits of G diversity.  So, DGLG are correct in thinking that some version of GG, the LGB/P&P theory aimed to place strong limits on G diversity.

However, this theory did not address how parameters were set or how parameters changed over time as Gs changed. P&P theories must be supplemented with theories of learning/acquisition to provide a theory of language change. Or, even if you think that there are only a finite number of Gs because any G is simply a list of finite parameter values, you still need a theory of how parameters are fixed to explain how parameters change over time. Now, one theory of G change would be that it tracks some intrinsic structure/fault lines of the parameter space (e.g. parameter 1 links to 2 and to 4 so that if you change the value of 1 from A to a then you need to change the values of 2 from B to b and 4 from C to c). This is one possible theory. Call it an endogenous theory of parameter setting (EnPS). EnPS accounts would provide FL/UG internal paths along which G change would occur and would provide a very strong implicit theory on the dynamics of G diversity. It would not only explain what the range of possibility is, but would also specify the possible range of changes between Gs that is available. Note that this kind of view need not endorse the position that all G change is canalized by FL/UG. It is possible that some changes are and some are not. But the strongest view would aim to predict the dynamics of G change entirely from the endogenous structure of FL/UG.

To my knowledge nobody has ever proposed such a view. In fact, to my knowledge, such a view has been understood to be very problematic, the reason being that the degree to which the parameters are mutually dependent to that degree the problem of incremental parameter setting increases. In fact, were all the parameters to speak to one another (i.e. the value of any parameter being conditional on the value of every parameter) the problem of incremental parameter setting becomes effectively impossible.

Dresher and Kaye discussed this first a while ago as regards stress systems, and Fodor and Sakas have explored this in detail as regards syntactic parameters. The solution has been to try and identify linguistic “triggers,” types of data that relied exclusively on the value of a single parameter. Triggers, in other words, are ways of trying to finesse the intractability of incremental parameter setting without denying that parameters are inter-twined. The idea is that their intermingling need not appear everywhere in the PLD and that all that setting requires is that there be some PLD data that unambiguously reveals what value a given parameter has. In other words, the idea is that in some domains the parameters function as if independent of one another (do not interact) and this relieves the computational problem that intertwining presents.

Why do I mention this? Because, the bulk of work on parameters has not been in trying to limn their interdependencies, but to isolate them and render them relatively independent so that incremental parameter setting be possible. In other words, so far as I know, there has precious little work or commitment to a EnPS kind of theory within Chomskyan GG. Moreover, and this is the important bit, a Chomskyan theory of GG does not need such an account. In other words, if it is true, then it is very interesting, but there is nothing as regards the Chomskyan project that requires that something like this be available. It is entirely consistent with that project that there be no explicit or implicit dynamics coded in the parameter space. So asserting that the absence of such a theory of parameters challenges the Chomsky conception of FL/UG (which is what DGLG does) is just plain wrong. Or, to put this another way: claiming that there is a richly structured FL/UG is compatible with the claim that FL/UG does not determine how populations of speakers  move from one G in the space of possible Gs to another.[5]

We can in fact go further. As those who have read some linguistics since LGB know, the idea that FL/UG contains a finite list of parameters that delimit the range of possible Gs has been under debate. We have even talked about this on FoL (e.g. here, here, here, here). What’s important here is that the parameters part of P&P is not an intrinsic part of the Chomskyan problematic. It might be true, but then it might not be. There are theories like GB that endorse a P&P architecture and there are accounts like that in Aspects or LSLT (and, from the way I read it, current MP accounts) that do not. If FL/UG has a list of specified parameters, that would be an amazing and remarkable discovery. But the evidence is not overwhelming that this is the case (so far as I can tell as a non expert in these matters) and if it is not the case it does not mean that Chomsky is wrong in claiming that we have an innately specified FL/UG that limits the properties of human Gs. All it means is that there are some features of Gs about which FL/UG is mute. Happily, that would leave something for non syntacticians to do (e.g. provide theories of learning that would address how we go from PLD to Gs given FL/UG (e.g. as Yang and Lidz do, for example)).

In sum, DGLG’s claims about the implications of their results for the Chomsky enterprise are flatly wrong, and in two ways. First, nobody has proposed the kind of theory that DGLG’s data is meant to refute and second the Chomskyan conception does need a theory of the kind that DGLG’s data is meant to refute. So, whether or not DGLG is correct is at right angles to Chomsky’s central claims. And, what is more, I suspect that DGLG knows this (or at least should have). Let me say a word about this.

DGLG cites LGB and Baker’s book as the source of the ideas that the paper argues to be incorrect. However, DGLG cites no specific passages or pages for this claim. Why not? When Chomsky goes after someone critically he does so chapter and verse. He quotes exactly what his protagonist says before arguing that it is incorrect. This is not what Chomsky’s critics generally do. Rather they assert in very broad brushstrokes what Chomsky’s views are and then go on to state that they are inadequate.[6] The problem is that what they criticize is often not his views. The fact that this happens so often leads me to think that this is not accidental. Either critics do not care what his views are (they only care to discredit them so as to discredit him) or they are too lazy to do serious criticism. I am not sure which is worse, but they are both serious intellectual failings.

What I did not realize until recently is that Chomsky’s critics might well be motivated less by malice and sloth than by a deep intellectual insecurity. Many of Chomsky’s critics are upset by the possibility (fact?) that he does (might?) not care about what they are doing. What motivates some critics, then, is the suspicion (fear?) that what they are doing is of little value. Too assuage this desperation, they orient their conclusions as rebuttals of Chomsky’s putative views. Why? Because they are sure that what Chomsky does is interesting and so they reassure themselves that their work has value by arguing that it shows that his views are wrong. The implication is that if this were not the case (and very often it is in fact not the case as the empirical conclusions are generally irrelevant to Chomsky’s claims (Everett is the poster child for this)) then their own work was boring and of dubious interest.[7] And I thought that I was in thrall of Chomsky![8]

Let me end with one more point regarding the DGLG paper and then point you to a very good review.

DGLG focuses entirely on word order. What DGLG means by “linguistic structure” is word order properties of utterances/sentences. It says this very explicitly. But if this is the focus then DGLG must know that it will be of dubious relevance to Chomsky’s central claims which, as everyone knows by now, considers word order effects to be at best second order (and maybe even less relevant) as regards the central features of FL/UG. Word order effects are, of course, centrally relevant to Greenberg’s conceptions and there are GGers who have concentrated on this (e.g. Kayne) and we have even covered some of this in FoL (see Culbertson and Adger here). But as regards Chomsky’s views, word order effects are decidedly secondary. Indeed, from what I know of Chomsky’s views, he might agree that word order effects are entirely “cultural” in the sense of driven by the properties of the child’s ambient linguistic environment.[9] So far as I can tell, nothing Chomsky has said in recent years (or before) would be inconsistent with this. So the fact that DGLG knowingly focuses on the kinds of effects that the authors (or at least some of the authors (I am looking at you Mr Levinson)) know are at right angles to Chomsky’s concerns further buttresses my conclusion that DGLG thinks its results worthless unless they directly gainsay Chomsky’s views. Sadly, if this is DGLG’s position, then worthless it is. The paper even if completely correct scarcely bears on Chomsky’s central claims. Fortunately, the DGLG conclusions are worth thinking about, IMO, even if they do not bear on Chomsky’s views at all. Let me turn to this briefly.

I have spent a lot of time pooping on the DGLG paper’s claim that it overturns some central Chomskyan dogma. However, contrary to the authors, I am not as sure as they seem to be that its results are uninteresting unless they bear on Chomsyan/GG concerns. My tastes then are more catholic than DGLG’s. I believe that finding that G change is path dependent is potentially very interesting, even if not all that new. It is not a new idea as it is already embodied in the position that language contact can affect how Gs change (how Gs change (and maybe even their rates of change) is likely a function of the specific properties of Gs in contact. If so, change will be path dependent. Indeed, from what I know this is the standard view, which is why Bickerton’s contrary claim is so contentious.

Mark Liberman has an excellent discussion of the DGLG paper (here) that touches on this point as well as many others way above my pay grade (damn I wish I knew more stats). There is also some interesting response by Greenhill in the comment section of Mark’s post, though I personally think that he fails to engage the main point concerning the large number of ignored dimensions and the kind of structure they might contain. As Mark observes, there is lots of room in these ignored dimensions for an EnPS story should one care to make one. 

I also think that Liberman’s last point touches on something critical. As he points out, at least as regards GGers who work on G change (like Tony Kroch or David Lightfoot or Ian Roberts): “…features like “OBV” (the code for whether objects follow verbs) should be seen as superficial grammatical symptoms rather than atomic grammatical traits” (3). This points to a larger problem of the relevance of DGLG to GG research into diachrony; DGLG takes the project to be language change rather than G change (as I noted above, these are not the same thing). Greenhill responds to this that these categories are not his, but those that other people have identified, DGLG just aiming to test them. The problem Mark is pointing to is that they are the wrong things to test, at least if one’s interests lie with the structure of Gs.  Going from overt language to G rules/parameters is not straightforward (see Dresher and Kaye, and Fodor and Sakas). What is relevant to speakers qua Language Acquisition Devices is the features of the Gs, so abstracting from this is, as Mark observes, a problem.

Ok, should you read the DGLG paper? Sure. It is very short and potentially interesting (though, IMO, inevitably overhyped) and, as Mark Liberman notes, the product of a lot of hard work. But, it is also deeply misleading and, IMO, borders (well, IMO, crosses the border to) dishonest. The source of the dishonesty is likely overdetermined. I mentioned malice and sloth. But I suspect that intellectual insecurity is really what is driving the anti GG, anti Chomsky slant. Anti Chomskyans do not have the courage of their stated interests. So, when you are done reading DGLG, spend a second mourning the sad plight of the non Chomskyan. Only by being anti can they be at all. Sad really. And I would be greatly sympathetic regarding this insecurity were they not sullying the intellectual landscape in trying to convince themselves of the value of their research.



[1] An analogy: the aim of astronomy is not to describe the motion of the planets but to describe the forces that have the planets move as they do. Of course an excellent first step towards the latter is a decent description of the former. But the two goals are not identical. Moreover, as Newton discovered, describing the motion of non-planets here on earth is also a useful step in exposing the forces at work our there in the heavens.
[2] Though, IMO, looking for common detectable features across individual Gs is not as useful as generally supposed.
[3] Perceptible to the linguist that is, not the LAD as the evidence will be of the negative variety; fixed subject violations lead to unacceptability.
[4] My reading of Greenberg is that his project was to identify invariant properties of languages. From what I’ve seen (which is limited) my impression is that typologists are pretty skeptical that many of these exist. If this is right, then it seems as regards “surfacy” language properties, invariants are pretty hard to come by. This would be no surprise to a Chomskyan GGer.
[5] Which is not to say that this might not be a very interesting question to investigate, and GGers did so. See Berwick and Niyogi’s early work on this in a kind of Neo-Darwinian setting.
[6]  I want to emphasize the broad brushstroke nature here. Like any interesting view, Chomsky’s has several interacting sub-parts and is based on several assumptions. It is entirely possible (probable?) that he may be right in some ways and wrong in others (as is true of most everyone’s views). The goal of a critic is to isolate how someone is wrong, and then means getting into details. What specific assumption is false? What particular inference would we like to challenge? Quoting passage and verse forces one to zero in on the problem. Citing LGB as the source does not. So, not only is this lazy, but it allows a critic to let him/herself off the hook and allows him/her to avoid explaining in detail how someone is wrong. As criticism is valuable in allowing us to isolate troublesome assumptions, this kind of lazy citation promotes obfuscation. So what part of Chomsky’s assumptions are wrong? Well you know the LGB part. Really? Common.
[7] I personally would not conclude this. But it provides a reasonable motive for the otherwise inexplicable regular incapacity of Chomsky’s critics to get his views right.
[8] It is interesting that Chomsky does not feel equally threatened by work different from his own. He just gets on with it. Yes, he responds to critics. But he what he mainly does is define the project and get on with it. It would be nice if this were the norm.
[9] To be honest, I do not know what ‘cultural’ means. I am assuming it means to contrast with biological, memes and all that. In effect, fixed via something like learning rather than fixed by biological make-up.

Tuesday, October 3, 2017

A quick read on science PR

Here's a little piece by Yarden Katz on science writing and its PR function. It goes over the public relation functions of popular science writing (at least in the press) and, of late, its function as the marketing arm "for elite research centers" (2). Towards the end it goes over how the major journals, the large research centers and the press together further the commercialization of science. It is not a pretty picture. There is definitely an element of show business in todays big science. Prestige journals role out hot new results the way movie studios role out big budget films. Universities see their academic stars and hustle to patent anything remotely patentable. Faculty strive for high visibility and the kudos and big bucks that come with it. It is possible that out of this show business ethic good science will emerge (just like it is possible for Hollywood to produce decent films). But it is pretty clear that truth is not the main target here, truthy suffices.

Gelman notes (here) that it was always thus. I think that he is partially right. Indeed, Katz's piece traces the roots of modern science writing back to Edward Bernays who wrote on how propaganda was required in a democratic society to keep the demos in line. He included within the purview of propaganda the dissemination of science to the public. So, the PR side goes way back. Nonetheless I think things have changed, at lease in degree. We now have an intimate alliance between business and basic science and the university. Once there was lots of money around to do science and there was no need to hype the work to get enough of it to do the work. Things changed when university funding started drying up and big business became more interested in "monetizing" basic research and government begun to squeeze university funding. When this occurred there developed a market for science PR, for treating science as just another thing to sell. Before, the selling did not go deep into the research community. The latter was insulated from the demands of market. Not now. Now everyone is pitching. And I suspect that one of the reasons that so many are wary of "experts" is that the idea that scientists are engaged in the disinterested pursuit of truth has been discredited. This piece gies one reasons to think that this suspicion is not entirely incorrect.

Thursday, September 28, 2017

Physics envy and the dream of an interpretable theory

I have long believed that physics envy is an excellent foundation for linguistic inquiry (see here). Why? Because physics is the paradigmatic science. Hence, if it is ok to do something there it’s ok to do it anywhere else in the sciences (e.g. including in the cog-neuro (CN) sciences, including linguistics) and if a suggested methodological precept fails for physics, then others (including CNers) have every right to treat it with disdain. Here’s a useful prophylactic against methodological sadists: Try your methodological dicta out on physics before you encumber the rest of us with them. Down with methodological dualism!

However, my envy goes further: I have often looked to (popular) discussions about hot topics in physical theory to fuel my own speculations. And recently I ran across a stimulating suggestive piece about how some are trying to rebuild quantum theory from the ground up using simple physical principles (QTFSPP) (here). The discussion is interesting for me in that it leads to a plausible suggestion for how to enrich minimalist practice. Let me elaborate.

The consensus opinion among physicists is that nobody really understands quantum mechanics (QM). Feynman is alleged to have said that anyone who claims to understand it, doesn’t. And though he appears not to have said exactly this (see here section 9), it's a widely shared sentiment. Nonetheless, QM (or the Standard Theory) is, apparently, the most empirically successful theory ever devised. So, we have a theory that works yet we have no real clarity as to why it works. Some (IMO, rightly) find this a challenge. In response they have decided to reconstitute QM on new foundations. Interestingly, what is described are efforts to recapture the main effects of QM within theories with more natural starting points/axioms. The aim, in other words, is reminiscent of the Minimalist Program (MP): construct theories that have the characteristic signature properties of QM but are grounded in more interpretable axioms. What’s this mean? First let’s take a peak at a couple of examples from the article and then return to MP.

A prominent contrast within physics is between QM and Relativity. The latter (the piece mentions special relativity) is based on two fundamental principles that are easy to understand and from which all the weird and wonderful effects of relativity follow. The two principles are: (1) the speed of light is constant and (2) the laws of physics are the same for two observers moving at constant speed relative to one another (or, no frame of reference is privileged when it comes to doing physics). Grant these two principles and the rest follows. As QTFSPP outs it: “Not only are the axioms simple, but we can see at once what they mean in physical terms” (my emphasis, NH) (5).

Standard theories of QM fail to be physically perspicuous and the aim of reconstructionists is to remedy this by finding principles to ground QM as natural and physically transparent as those that Einstein found for special relativity.  The proposals are fascinating. Here are a couple:

One theorist, Lucien Hardy, proposed focusing on “the probabilities that relate the possible states of a system with the chance of observing each state in a measurement” (6). The proposal consists of a set of probabilistic rules about “how systems can carry information and how they can be combined and interconverted” (7). The claim was that “the simplest possible theory to describe such systems is quantum mechanics, with all its characteristic phenomena such as wavelike interference and entanglement…” (8). Can any MPer fail to reverberate to the phrase “the simplest possible theory”? At any rate, on this approach, QMs is fundamentally probabilistic and how probabilities mediate the conversion between states of the system are taken as the basic of the theory.  I cannot say that I understand what this entails, but I think I get the general idea and how if this were to work it would serve to explain why QM has some of the odd properties it does.

Another reconstruction takes three basic principles to generate a theory of QM. Here’s QTFSPP quoting a physicist named Jacques Pienaar: “Loosely speaking, their principles state that information should be localized in space and time, that systems should be able to encode information about each other, and that every process should be in principle reversible, so that information is conserved.” Apparently, given these assumptions, suitably formalized, leads to theories with “all the familiar quantum behaviors, such as superposition and entanglement.” Pienaar identifies what makes these axioms reasonable/interpretable: “They all pertain directly to the elements of human experience, namely what real experimenters ought to be able to do with systems in their laboratories…” So, specifying conditions on what experimenters can do in their labs leads to systems of data that look QMish. Again, the principles, if correct, rationalize the standard QM effects that we see. Good.

QTFSPP goes over other attempts to ground QM in interpretable axioms. Frankly, I can only follow this, if at all, impressionistically as the details are all quite above my capacities. However, I like the idea. I like the idea of looking for basic axioms that are interpretable (i.e. whose (physical) meaning we can immediately grasp) not merely compact. I want my starting points to make sense too. I want axioms that make sense computationally, whose meaning I can immediately grasp in computational terms. Why? Because, I think that our best theories have what Steven Weinberg described as a kind of inevitability and they have this in virtue of having interpretable foundations. Here’s a quote (see here and links provided there):

…there are explanations and explanations.  We should not be satisfied with a theory that explains the Standard Model in terms of something complicated an arbitrary…To qualify as an explanation, a fundamental theory has to be simple- not necessarily a few short equations, but equations that are based on a simple physical principle…And the theory has to be compelling- it has to give us the feeling that it could scarcely be different from what it is. 

Sensible interpretable axioms are the source of this compulsion. We want first principles that meet the Wheeler T-shirt criteria (after John Wheeler): they make sense and are simple enough to be stated “in one simple sentences that the non sophisticate could understand,” (or, more likely, a few simple sentences). So, with this in mind, what about fundamental starting points for MP accounts. What might these look like?

Well, first, they will not look like the principles of GB. IMO, these principles (more or less) “work,” but they are just too complicated and complex to be fundamental. That’s why GB lacks Weinberg’s inevitability. In fact, it takes little imagination to imagine how GB could “be different.” The central problem with GB principles is that they are ad hoc and have the shape they do precisely because the data happens to have the shape it does. Put differently, were the facts different we could rejigger the principles so that they would come to mirror those facts and not be in any other way the worse off for that. In this regard, GB shares the problem QTFSPP identifies with current QM: “It’s a complex framework, but it’s also an ad hoc patchwork, lacking any obvious physical interpretation or justification” (5).

So, GB can’t be fundamental because it is too much of a hodgepodge. But, as I noted, it works pretty well (IMO, very well actually, though no doubt others would disagree). This is precisely what makes the MP project to develop a simple natural theory with a specified kind of output (viz. a theory with the properties that GB describes) worthwhile.

Ok, given this kind of GB reconstruction project, what kinds of starting points would fit?  I am about to go out on a limb here (fortunately, the fall, when it happens, will not be from a great height!) and suggest a few that I find congenial.

First, the fundamental principle of grammar (FPG)[1]: There is no grammatical action at a distance. What this means is that for two expressions A and B to grammatically interact, they must form a unit. You can see where this is going, I bet: for A and B to G interact, they must Merge.[2]

Second, Merge is the simplest possible operation that unitizes expressions. One way of thinking of this is that all Merge does is make A and B, which are heretofore separate, into a unit. Negatively, this implies that it in no way changes A and B in making them a unit, and does nothing more than make them a unit (e.g. negatively, it imposes no order on A and B as this would be doing more than unitizing them). One can represent this formally as saying that Merge takes A,B and forms the set {A,B}, but this is not because Merge is a set forming operation, but because sets are the kinds of objects that do nothing more than unitize the objects that form the set. They don’t order the elements or change them in any way. Treating Merge (A,B) as creating leaves of a Calder Mobile would have the same effect and so we can say that Merge forms C-mobiles just as well as we can say that it forms sets. At any rate, it is plausible that Merge so conceived is indeed as simple a unitizing operation as can be imagined.

Third, Merge is closed in the domain of its application (i.e. its domain and range are the same). Note that this implies that the outputs of Merge must be analogous to lexical atoms in some sense given the ineluctable assumption that all Merges begin with lexical atoms. The problem is that unitized lexical atoms (the “set”-likeoutputs of Merge) are not themselves lexical atoms and so unless we say something more, Merge is not closed. So, how to close it? By mapping the Merged unit back to one of the elements Merged in composing it. So if we map {A,B} back to A or to B we will have closed the operation in the domain of the primitive atoms. Note that by doing this, we will, in effect, have formed an equivalence class of expressions with the modulus being the lexical atoms. Note, that this, in effect, gives us labels (oh nooooo!), or labeled units (aka, constituents) and endorses an endocentric view of labels. Indeed, closing Merge via labeling in effect creates equivalence classes of expressions centered on the lexical atoms (and more abstract classes if the atoms themselves form higher order classes). Interestingly (at least to me) so closing Merge allows for labeled objects of unbounded hierarchical complexity.[3]

These three principles seem computationally natural. The first imposes a kind of strict locality condition on G interactions. E and I merge adhere to it (and do so strictly given labels). Merge is a simple, very simple, combination operation and closure is a nice natural property for formal systems of (arbitrarily complex) “equations” to have. That they combine to yield unbounded hierarchically structured objects of the right kind (I’ve discussed this before, see here and here) is good as this is what we have been aiming for. Are the principles natural and simple? I think so (at least form a kind of natural computation point of view), but I would wouldn’t I?  At any rate, here’s a stab at what interpretable axioms might look like. I doubt that they are unique, but I don’t really care if they aren’t. The goal is to add interpretatbility to the demands we make on theory, not to insist that there is only one way to understand things.

Nor do we have to stop here. Other simple computational principles include things like the following: (i) shorter dependencies are preferred to longer dependencies (minimality?), (ii) bounded computation is preferred to unbounded computation (phases?), (iii) All features are created equal (the way you discharge/check one is the way you discharge/check all). The idea is then to see how much you get starting from these simple and transparent and computationally natural first principles. If one could derive GBish FLs from this then it would, IMO, go some way towards providing a sense that the way FL is constructed and its myriad apparent complexities are not complexities at all but the unfolding of a simple system adhering to natural computational strictures (snowflakes anyone?). That, at least, is the dream.

I will end here. I am still in the middle of pleasant reverie, having mesmerized myself by this picture. I doubt that others will be as enthralled, but that is not the real point. I think that looking for general interpretable principles on which to found grammatical theory makes sense and that it should be part of any theoretical project. I think that trying to derive the “laws” of GB is the right kind of empirical target. Physics envy prompts this kind of search. Another good reason, IMO, to cultivate it.



[1] I could have said, the central dogma of syntax, but refrained. I have used FPG in talks to great (and hilarious) effect.
[2] Note, that this has the pleasant effect of making AGREE (and probe-goal architectures in general) illicit G operations. Good!
[3] This is not the place to go into this, but the analogy to clock arithmetic is useful. Here too via the notion of equivalence classes it is possible to extend operations defined for some finite base of expressions (1-12) to any number. I would love to be able to say that this is the only feasible way of closing a finite domain, but I doubt that this is so. The other suspects however are clearly linguistically untenable (e.g. mapping any unit to a constant, mapping any unit randomly to some other atom). Maybe there is a nice principle (statable on one simple sentence) that would rule these out.