I have long believed that physics envy is an excellent foundation
for linguistic inquiry (see here). Why?
Because physics is the paradigmatic
science. Hence, if it is ok to do something there it’s ok to do it anywhere
else in the sciences (e.g. including in the cog-neuro (CN) sciences, including
linguistics) and if a suggested methodological precept fails for physics, then others
(including CNers) have every right to treat it with disdain. Here’s a useful
prophylactic against methodological sadists: Try your methodological dicta out
on physics before you encumber the rest of us with them. Down with
methodological dualism!
However, my envy goes further: I have often looked to (popular)
discussions about hot topics in physical theory to fuel my own speculations.
And recently I ran across a stimulating suggestive piece about how some are
trying to rebuild quantum theory from the ground up using simple physical
principles (QTFSPP) (here).
The discussion is interesting for me in that it leads to a plausible suggestion
for how to enrich minimalist practice. Let me elaborate.
The consensus opinion among physicists is that nobody really
understands quantum mechanics (QM). Feynman is alleged to have said that anyone
who claims to understand it, doesn’t. And though he appears not to have said exactly this (see here
section 9), it's a widely shared sentiment. Nonetheless, QM (or the Standard Theory)
is, apparently, the most empirically successful theory ever devised. So, we
have a theory that works yet we have no real clarity as to why it works. Some
(IMO, rightly) find this a challenge. In response they have decided to
reconstitute QM on new foundations. Interestingly, what is described are
efforts to recapture the main effects of QM within theories with more natural
starting points/axioms. The aim, in other words, is reminiscent of the
Minimalist Program (MP): construct theories that have the characteristic
signature properties of QM but are grounded in more interpretable axioms.
What’s this mean? First let’s take a peak at a couple of examples from the
article and then return to MP.
A prominent contrast within physics is between QM and
Relativity. The latter (the piece mentions special relativity) is based on two
fundamental principles that are easy to understand and from which all the weird
and wonderful effects of relativity follow. The two principles are: (1) the
speed of light is constant and (2) the laws of physics are the same for two
observers moving at constant speed relative to one another (or, no frame of
reference is privileged when it comes to doing physics). Grant these two
principles and the rest follows. As QTFSPP outs it: “Not only are the axioms
simple, but we can see at once what they
mean in physical terms” (my emphasis, NH) (5).
Standard theories of QM fail to be physically perspicuous and
the aim of reconstructionists is to remedy this by finding principles to ground
QM as natural and physically transparent as those that Einstein found for
special relativity. The proposals are
fascinating. Here are a couple:
One theorist, Lucien Hardy, proposed focusing on “the
probabilities that relate the possible states of a system with the chance of
observing each state in a measurement” (6). The proposal consists of a set of
probabilistic rules about “how systems can carry information and how they can
be combined and interconverted” (7). The claim was that “the simplest possible
theory to describe such systems is quantum mechanics, with all its
characteristic phenomena such as wavelike interference and entanglement…” (8).
Can any MPer fail to reverberate to the phrase “the simplest possible theory”?
At any rate, on this approach, QMs is fundamentally probabilistic and how
probabilities mediate the conversion between states of the system are taken as
the basic of the theory. I cannot say
that I understand what this entails, but I think I get the general idea and how
if this were to work it would serve to explain why QM has some of the odd
properties it does.
Another reconstruction takes three basic principles to
generate a theory of QM. Here’s QTFSPP quoting a physicist named Jacques
Pienaar: “Loosely speaking, their principles state that information should be localized
in space and time, that systems should be able to encode information about each
other, and that every process should be in principle reversible, so that
information is conserved.” Apparently, given these assumptions, suitably
formalized, leads to theories with “all the familiar quantum behaviors, such as
superposition and entanglement.” Pienaar identifies what makes these axioms
reasonable/interpretable: “They all pertain directly to the elements of human
experience, namely what real experimenters ought to be able to do with systems
in their laboratories…” So, specifying conditions on what experimenters can do
in their labs leads to systems of data that look QMish. Again, the principles,
if correct, rationalize the standard QM effects that we see. Good.
QTFSPP goes over other attempts to ground QM in
interpretable axioms. Frankly, I can only follow this, if at all,
impressionistically as the details are all quite above my capacities. However,
I like the idea. I like the idea of looking for basic axioms that are interpretable (i.e. whose (physical)
meaning we can immediately grasp) not merely compact. I want my starting points
to make sense too. I want axioms that make sense computationally, whose meaning
I can immediately grasp in computational terms. Why? Because, I think that our
best theories have what Steven Weinberg described as a kind of inevitability and
they have this in virtue of having interpretable foundations. Here’s a quote
(see here
and links provided there):
…there are explanations and
explanations. We should not be
satisfied with a theory that explains the Standard Model in terms of something
complicated an arbitrary…To qualify as an explanation, a fundamental theory has
to be simple- not necessarily a few short equations, but equations that are
based on a simple physical principle…And the theory has to be compelling- it
has to give us the feeling that it could scarcely be different from what it is.
Sensible interpretable axioms are the source of this
compulsion. We want first principles that meet the Wheeler T-shirt criteria (after
John Wheeler): they make sense and are simple enough to be stated “in one
simple sentences that the non sophisticate could understand,” (or, more likely,
a few simple sentences). So, with this in mind, what about fundamental starting
points for MP accounts. What might these look like?
Well, first, they will not
look like the principles of GB. IMO, these principles (more or less) “work,”
but they are just too complicated and complex to be fundamental. That’s why GB
lacks Weinberg’s inevitability. In fact, it takes little imagination to imagine
how GB could “be different.” The central problem with GB principles is that
they are ad hoc and have the shape
they do precisely because the data happens to have the shape it does. Put
differently, were the facts different we could rejigger the principles so that
they would come to mirror those facts and not
be in any other way the worse off for that. In this regard, GB shares the
problem QTFSPP identifies with current QM: “It’s a complex framework, but it’s
also an ad hoc patchwork, lacking any obvious physical interpretation or
justification” (5).
So, GB can’t be fundamental because it is too much of a
hodgepodge. But, as I noted, it works pretty well (IMO, very well actually,
though no doubt others would disagree). This is precisely what makes the MP
project to develop a simple natural theory with a specified kind of output
(viz. a theory with the properties that GB describes) worthwhile.
Ok, given this kind of GB reconstruction project, what kinds
of starting points would fit? I am about
to go out on a limb here (fortunately, the fall, when it happens, will not be
from a great height!) and suggest a few that I find congenial.
First, the fundamental principle of grammar (FPG)[1]:
There is no grammatical action at a distance. What this means is that for two
expressions A and B to grammatically interact, they must form a unit. You can
see where this is going, I bet: for A and B to G interact, they must Merge.[2]
Second, Merge is the simplest possible operation that
unitizes expressions. One way of thinking of this is that all Merge does is make A and B, which are heretofore separate, into
a unit. Negatively, this implies that it in no way changes A and B in making
them a unit, and does nothing more than make them a unit (e.g. negatively, it
imposes no order on A and B as this would be doing more than unitizing them).
One can represent this formally as saying that Merge takes A,B and forms the
set {A,B}, but this is not because
Merge is a set forming operation, but because sets are the kinds of objects
that do nothing more than unitize the objects that form the set. They don’t
order the elements or change them in any way. Treating Merge (A,B) as creating
leaves of a Calder Mobile would have the same effect and so we can say that
Merge forms C-mobiles just as well as we can say that it forms sets. At any
rate, it is plausible that Merge so conceived is indeed as simple a unitizing
operation as can be imagined.
Third, Merge is closed in the domain of its application
(i.e. its domain and range are the same). Note that this implies that the
outputs of Merge must be analogous to lexical atoms in some sense given the ineluctable
assumption that all Merges begin with lexical atoms. The problem is that
unitized lexical atoms (the “set”-likeoutputs of Merge) are not themselves lexical
atoms and so unless we say something more, Merge is not closed. So, how to close it? By mapping the Merged unit back to
one of the elements Merged in composing it. So if we map {A,B} back to A or to
B we will have closed the operation in the domain of the primitive atoms. Note
that by doing this, we will, in effect, have formed an equivalence class of
expressions with the modulus being the lexical atoms. Note, that this, in
effect, gives us labels (oh nooooo!), or labeled units (aka, constituents) and
endorses an endocentric view of labels. Indeed, closing Merge via labeling in
effect creates equivalence classes of expressions centered on the lexical atoms
(and more abstract classes if the atoms themselves form higher order classes).
Interestingly (at least to me) so closing Merge allows for labeled objects of
unbounded hierarchical complexity.[3]
These three principles seem computationally natural. The
first imposes a kind of strict locality condition on G interactions. E and I merge
adhere to it (and do so strictly given labels). Merge is a simple, very simple,
combination operation and closure is a nice natural property for formal systems
of (arbitrarily complex) “equations” to have. That they combine to yield
unbounded hierarchically structured objects of the right kind (I’ve discussed
this before, see here
and here)
is good as this is what we have been aiming for. Are the principles natural and
simple? I think so (at least form a kind of natural computation point of view),
but I would wouldn’t I? At any rate,
here’s a stab at what interpretable axioms might
look like. I doubt that they are unique, but I don’t really care if they
aren’t. The goal is to add interpretatbility to the demands we make on theory,
not to insist that there is only one way to understand things.
Nor do we have to stop here. Other simple computational
principles include things like the following: (i) shorter dependencies are
preferred to longer dependencies (minimality?), (ii) bounded computation is
preferred to unbounded computation (phases?), (iii) All features are created
equal (the way you discharge/check one is the way you discharge/check all). The
idea is then to see how much you get starting from these simple and transparent
and computationally natural first principles. If one could derive GBish FLs
from this then it would, IMO, go some way towards providing a sense that the
way FL is constructed and its myriad apparent complexities are not complexities
at all but the unfolding of a simple system adhering to natural computational
strictures (snowflakes anyone?). That, at least, is the dream.
I will end here. I am still in the middle of pleasant
reverie, having mesmerized myself by this picture. I doubt that others will be
as enthralled, but that is not the real point. I think that looking for general
interpretable principles on which to found grammatical theory makes sense and that
it should be part of any theoretical project. I think that trying to derive the
“laws” of GB is the right kind of empirical target. Physics envy prompts this
kind of search. Another good reason, IMO, to cultivate it.
[1]
I could have said, the central dogma of syntax, but refrained. I have used FPG
in talks to great (and hilarious) effect.
[2]
Note, that this has the pleasant effect of making AGREE (and probe-goal
architectures in general) illicit G operations. Good!
[3]
This is not the place to go into this, but the analogy to clock arithmetic is
useful. Here too via the notion of equivalence classes it is possible to extend
operations defined for some finite base of expressions (1-12) to any number. I
would love to be able to say that this is the only feasible way of closing a finite domain, but I doubt that this
is so. The other suspects however are clearly linguistically untenable (e.g.
mapping any unit to a constant, mapping any unit randomly to some other atom).
Maybe there is a nice principle (statable on one simple sentence) that would
rule these out.
Interesting ideas. They sound reasonable (or at least "not completely implausible", which is not bad for a starting point). A couple of comments: it can be hard to define simplicity. Are sets simpler than semi-groups or monoids? I have no idea. I am also not sure that searching for the simplest possible foundations necessarily clarifies a subject. As you've remarked many times about chemistry and physics, the grounding of chemistry in quantum physics left many of the core chemical ideas (e.g., valence) essentially unchanged. (Of course I'm neither a mathematician nor a chemist; I may be completely wrong here).
ReplyDeletePeople who are into category theory tend to think, I believe, that that is simpler than either sets or algebraic systems ... the notion of 'equivalent proof/identical arrow' seems to me to be a reasonable idea of what sentence structures are trying to be, a representation of all the different ways that a sentence has the meaning and other kinds of status (sociolinguistic register etc) that it is supposed to have.
Delete'ways that' => 'ways of proving that'.
Delete@Mark/Avery: I agree with both points. It is hard. That is why I think that the best we can ask for are various kinds of natural starting points, with different implications. For now, that is fine with me. Re sets, one remark: I also thought that the idea behind Merge is that it made sets and that this was because sets, in some sense, were conceptually simplest. But I now think I was wrong in reading things this way (note the admission, a rare thing so enjoy). I think that what the idea was that we ask what properties a simple combination operation would have and a very simple one would do nothing more than combine the elements combined. And this means change them in no way in combining them and doing nothing but combining them (i.e.impose no other properties on the combination). Now, sets respect these two ideas (I think). Other objects might as well. IF they do, then there is nothing to choose between thinking that Merge yields sets rather than these other things (recall, I suggested Calder mobiles). At any rate, I think that this is now a better way to think of what we want from Merge, the search for the right math object may be a bit of a red herring.
DeleteLast point (again): looking for simple foundations might mislead. We don't know till we try. But finding something that seems intuitively simple and that does get us some way into the problem (e.g., merge understood to imply no tampering does get one c-command for movement) is what endows an account with explanatory oomph (via inevitability). So yes, be cautious. But also be bold. This is what we want: natural interpretable axioms with wide empirical reach (i.e. result in theories with the kinds of properties we see).
One problem with sets is that they seem to be loaded with some assumptions that seem a bit extraneous to linguistic purposes, such as extensionality, which can bite people on the bum. This actually happened to the LFG+glue community, when somebody realized (nobody knows who) that thanks to extensionality, if you think of the f-structures and their projections as sets, which is the 'standard' view, then all the empty structures on the 'semantic projection' would be the same object, the empty set = theoretical core meltdown. There are various easy solutions to this problem, so it is not in the end a disaster, but illustrates the danger of importing tools meant for a completely different purpose (making sense out of the foundations of calculus, to the limited extent that I can discern).
DeleteMerge can be closed without anything like modular arithmetic provided we properly define its domain. In fact the standard definition of Merge is closed in the domain of syntactic objects (SOs) defined recursively as follows:
ReplyDeleteX is an SO iff (a) X is a lexical item, or (b) X is a set of lexical items (cf. Collins and Stabler, 2015)
In this case, LIs are merely the base case of SOs (Just like 0 or 1 is the base case of the natural numbers)
Correct, it can be. The question is whether we can also define it the other way (I think we can) and what advantages if any might accrue. I believe that the biggest advantage is that it allows us to get recursion tied to the emergence of labelled constituents. In other words we get endocentric phrases as part ot the class of syntactic objects. I like this idea for several reasons. The two biggest is that it allows us to explain why phrases seem to be uniquitous syntactic units. The other is that allows for a simple account of why head to head selection is restricted as it is. There is also a thirs advantage, IMO: it permits us to dispense with non local agree operations whose properties are quite redundant with I-merge. If we could get these three properties out from thinking in modular arithmatic terms and get recursion as a geature too, it seems to me it is worth exploring this path. S yes we can define things recursively via syntactic objects, but if we also do it this other way then we can ask if there are advantages to the difgerent axioms. That was my point.
ReplyDeleteBecause Merge takes two things, A and B, and makes them a unit, it always produces binary branching structures. Jackendoff and other CG types have observed that there are constructions that can't be adequately described by binary branching structures. (Jackendoff's prime example being the PNP construction.) He has also pointed out that in other domains like vision we have no problem processing sets that are ternary, quaternary, etc. I have yet to find a MP response to this argument - but I'd be glad to read one. A citation would do.
ReplyDeleteThis comment has been removed by the author.
DeleteMerge as defined is 2-place. But this does not follow from much. This is why we have no deep explanation for binary branching, if it exists. I personally find the evidence compelling (mainly starting with Kayne) but I can imagine there ate puzzles. I am not familir with J’s exanples. What are they?
DeleteThe main point; sadly minimalism does little more than stipulate binary branching, rather than explain it. There is not problem defining an N-merge rule where n-ry branching is possible. And this is a problem, but that is life as we are now living it.
Something possibly relevant:
Delete“John Collins (p.c) has addressed some of this: ‘This is difficult, but at a first stab, I’d say that this [non-binary branching/Merge] would make merge non-uniform, since we know that binary branching is fine for say verbs and their objects, etc. Non-uniformity or a lack of symmetry is kind of an imperfection. Also, given that branching is required, binary is the most economical, as only two things at most need to be in the workspace, as it were. So, perfection is the most economical and uniform meeting of interface conditions by the most general operations we find throughout nature. The interfaces rule out singletons and the empty set, and general uniformity/economy considerations go for binary, given that there must be branching for composition (lack of composition would be an imperfection as set formation would need to be restricted). Thus, something other than binary would thus be an imperfection, as far as we can tell.’”
Footnote #7 from "No Derivation without Representation", Chapter 14 of The Oxford Handbook of Linguistics Minimalism, C.Boeckx, ed.,2011, p.313.
It's important not to conflate the binarity of Merge with the binarity of the output of Merge. Minimalist grammars, for instance, have Merge as a binary operation, but it is very easy to define a mapping from derivation trees to phrase structure trees that flattens some binary branching structures into n-ary branching ones, e.g. for coordination.
DeleteThis is a special case of a more general property: binary trees can be flattened, often without loss of information (e.g. when the trees are also ordered and structural prominence correlates with precedence). The same is also true in the other direction: every n-ary branching tree can be binarized without loss of information, and every n-ary function can be binarized without loss of information (currying/schönfinkelization). So one shouldn't attach too much importance to Jackendoff's argument against binarity, either.
What's at stake is not whether trees are binary, but how tree structure informs dependency formation. Merge by itself has nothing to say about that, it is only within a system of ancillary assumptions such as the LCA that binarity gets some bite. But then one can't single out binarity as the main failing of the system.
Thanks, Rob, for remembering the exchange:) Indeed, I think Chomsky nowhere says what is supposed to be especially simple or basic about binary merge, as opposed to n-ary merge (he does mention Kayne’s work on paths in BEA and elsewhere, but I think that presupposes rather than explains the specialness of binarity). Still, binary merge does strike me as special, not from a set-theoretical perspective, of course, but from the perspective of Chomsky’s ‘super engineer’. In short, binary merge appears to be the only uniform combination principle that is necessary and sufficient for meeting interface demands. It is necessary because one sometimes does need to put two things together, as opposed to just one thing (as it were) or more than two things. It might also be sufficient – that is an empirical issue concerning how one thinks of self-merge, co-ordination, etc. Crucially, though, given that binary merge is necessary, if it proves insufficient, and n-ary merge (or self-merge as well) is adopted, then one forgoes uniformity, with some structures/applications of principles being unary or ternary, others binary, and so on. So, binary merge is unique in holding out the promise of necessity amounting to sufficiency – the way Leibniz thought God made reality, and God hates waste.
ReplyDeleteAddendum: I talk about this issue in my Chomsky book and more fully in my Unity of Linguistic Meaning (OUP, 2011), wherein I respond to Jackendoff & Culicover. Apologies for the plug.
ReplyDelete