Wednesday, May 1, 2013

Formal and Substantive Universals

This post will be pretty free form, involving more than a little thinking out loud (aka rambling). It will maunder a bit and end pretty inconclusively. If this sort of thing is not to your liking, here would be a good place to stop.

I’ve recently read an interesting paper on a question that I’ve been thinking about off and on for about a decade (sounds longer than 10 years eh?) by Epstein, Kitihara and Seely (EKS) (here).  The question: to what degree are licit formal dependencies of interacting expressions functions of the substantive characteristics of the dependent elements? This is a mouthful of a sentence, but the idea is pretty simple: we have lots of grammatical dependencies, how much do they depend on the specific properties of specific lexical/functional items involved?[1]  Let me give a couple of illustrations to clarify what I’m trying to get at.

Take the original subjacency condition. It prohibited two expressions from interacting if one is within an island and the other is outside that island. So in (1) Y cannot move to X:

(1)  […X…[island…Y…]…]

Now, we can list islands by name (e.g. CNPC, WH-island, Subject Islands etc.) or we can try to unify them in some way. The first unification (due to Chomsky) involved two parts; the first a specification of how far is too far (at most one bounding node between X and Y), the second an identification of the bounding nodes (BN) (DP and CP, optionally TP and PP etc.). Now, the way I always understood things is that the first part of the “definition” was formal (i.e. the same principle holds regardless of the BN inventory), the second substantive (i.e. the attested dependencies depend on the actual choice of BNs). Indeed, Rizzi’s famous paper (actually the one limned in the footnotes, rather than the one in the text) was all about how to model typological differences via small changes in the inventory of BNs for a given grammar.  So, the classical theory of subjacency comprises a formal part that does not care about the actual categories involved and a substantive part, that cares a lot.

Later theories of islands cut things up a little differently. So, for example, one intriguing feature of Barriers was its ambition to eliminate the substantive part of subjacency theory.  Rather than actually listing the BNs, Barriers tried to deduce the class of BNs to general formal properties of the phrase marker.  Roughly speaking, complements are porous, while non-complements are barriers.[2] Complementation is itself an abstract formal dependency, largely independent of the contents of the interacting expressions.  I say “largely independent” for in Barriers it was critical that there be some form of L-marking that was itself dependent on theta marking. However, the L-marking relation was very generic and applied widely to many different kinds of expressions.

Cut to the present and phases: phases have returned to the original conception of BNs. Of course we now call them phase heads rather than BNs, and we include v as well as CP (an inheritance from Barriers) but what is important is that we list them.[3] The grammar functions as it does because v and C are points of transfer and they are points of transfer because they are phase heads. Thus, if you are not a phase head you are not a point of transfer. However, theoretically, you are a phase head because you have been so listed. BTW, as you all know, unless D is included in this inventory, we cannot code island effects in terms of phases.  And as you also all know, the phase-based account of islands is no more principled than the older subjacency account.[4]  However, this is not my topic here.  All I want to observe is how substantive assumptions interact with formal ones to determine the class of licit dependencies and how some accounts have a “larger” substantive component than others. I also want to register a minimalist observation (by no means original) that the substantive assumption about the inventory of Phases/BNs raises non-trivial minimalist queries: “why these?” being the obvious one. [5]

Let’s contrast this case with Minimality.  This, so far as I can tell, is a purely formal restriction, even in its relativized form. It states that in a configuration like (2), X,Y,Z being of the same type (i.e. sharing the same relevant features) Y cannot interact with X over an intervening Z. For (2) the actual feature specifications do not matter. Whatever they are, minimality will block interaction in these cases. This is what I mean by treating it as a purely formal condition.

            (2) …X…Z…Y…

So, we now have two different examples, let’s get back to the question posed in EKS: we all assume a universal base hypothesis with the rough structure C-T-v-V, to what degree does this base hierarchical order follow from formal principles?  Note, the base theory names the relevant heads in their relevant hierarchical order, the question is to what degree do formal principles force this order. EKS discuss this and argue that given certain current assumptions about phases, we can derive the fact that theta domains are nested within case domains and, suggest, that the same reasoning can apply to the upper C-T part of the base structure. Like I said the paper is interesting and I recommend it.  However, I would like to ask EKSs question in a slightly different way, stealing a trick from our friends in physics (recall, I am deep green with physics envy). 

Among the symmetries physicists study is one in which different elements are swapped for one another. Thus, as Carrol (here) noted concerning nuclear structure: “In 1954, Chen Ning Yang and Robert Mills came up with the idea that this symmetry should be promoted to a local symmetry – i.e., that we should be allowed to “rotate” neutrons and protons into each other at every point in space (154).” They did this to consider whether the strong force really cared about the obvious differences between protons and neutrons. Let’s try a similar trick within the C-T-v-V domain, this time “rotating” theta and case markers into each other, to see whether the ordering of elements in the base really affects what kinds of formal dependencies we find.

More specifically: consider the basic form of the sentence, and let’s consider only the dependencies within TP:

            (3) [CP C [TP …T…[vP Subj v [VP V Obj]]]]

In (3) Subj gets theta from v and case from T. Object gets theta from V and case from v. So there is a T-Subj relation, a Subj-v relation a v-Obj relation and a V-Obj relation.  Does it matter to these relations and further derivations that in fact the specific features noted are checked by the indicated heads. To get a handle on this, imagine if we systematically changed case for theta assignments above (i.e. rotated case and theta into each other so that T assigns theta to Subj and v assigns case, v assigns theta to Obj and V assigns case etc.) what would go wrong? If nothing goes wrong, then the actual labels here make no difference. If nothing goes wrong then the formal properties do not determine the actual substantive order

To sound puffed up and super scientific we might say that the formal properties are symmetric wrt the substantive features of case and theta assignment. Note, btw, we already think this way for theta and case values. The grammatical operations are symmetric with respect to these (i.e. they don’t care what the actual theta role or case value is).  We are just extending this reasoning one step further by asking about assignment as well as values.

Observe that things can go “wrong” in various ways: we could get lots of decent looking derivations honoring the formal restrictions but the derivations either under or over generate.  For example, If T assigns the external theta role then transitive small clauses might be impossible if small clauses have no structure higher than v.  This seems false. Or, if this is right, then we might expect expletives to always sit under neg in English as they cannot move to Spec t this being a theta position. Again, this seems wrong. So, there seem to be, at least at first blush, empirical consequences of making this rotation.  However, the “look” of the system is not that different if this is the only kind of problem, i.e. the resulting system is language like if not exactly identical to what we actually find. In other words, it’s a possible UG, just not our UG. There is clearly a minimalist question lurking here.

A second way things could go wrong is that we do not in general get convergent derivations. EKS argue that certain phase-based accounts have this more expansive consequence. The problem is not a little over/under generation, the problem is that we can barely get a decent derivation at all. In our little though experiment this means that rotating the case and theta values results in impossible UGs. This would be a fascinating result, with obvious minimalist intepretations.

Both kinds of “problems” are interesting. This first showing that our UG deeply cares about the substantive heads having the specific properties they do. The second suggests that there is a very strong tie between the basic structure of the clause and the formal universals we have.  Both kinds of results would be interesting.

I have no worked out answer to the ‘what goes wrong?’ question (though if you get one I would love to hear about it). Note that I have abstracted away from everything but what is assumed to be syntactically relevant; case and theta “features.” I have also assumed that how these features are assigned is symmetrical: that both are assigned in the same way. If this is false then this might be the source of the substantive base order noted (e.g. if theta were only under merge and case could be under agree). However, right now I am satisfied to leave my version of the EKS question open.

Let me end with two further random observations:

First, whatever answer we find, I find the question to be really cool and novel and exciting! We may need to stipulate all sorts of things but until we start asking the kinds of question EKS pose, we won’t know what is principled and what not.

Second, as noted, the answer to this question will have significant implications for the Minimalist Program (MP).  To date, many minimalists (e.g. me) have concentrated on trying the unify the various dependencies as prelude to explaining their properties in generic cognitive/computational terms.  However, if C-T-v-V base structure is part of FL/UG then it presents a different kind of challenge to MP, given the apparent linguistic specificity of the universals.  Few notions are more linguistically parochial than BN or phase head or C-T-v-V.  It would be nice if some of this followed from architectural features of the grammar as EKS suggest, or from the demands of the interface (e.g. think Heim’s tri-partite semantic structures), or something else still. The real challenge to MP arises if these kinds of substantive universals are brut. At any rate, it seems to me that substantive universals present a different kind of challenge to MP and so they are worth thinking about very carefully. 

That’s enough rambling.


[1] I know, all the rage lately has been to pack all interesting grammatical properties into functional heads, specific lexical content being restricted to roots. The question still arises: do we need to know the special properties of the involved heads, be they functional or not, to get the right class of grammatical dependencies?
[2] This idea was not original to Barriers.  Cattell and Cinque, I believe, had a similar theoretical intuition earlier.
[3] I think that we can all agree that Chomsky’s attempt to relate the selction of C and v to semantic properties has not been a singular success. Moreover, his rationalization leaves out D, without which extending phases to cover island effects is impossible. If phases do not explain island phenomena, then their utility is somewhat circumscribed, given the view over the last 30 years that cyclicity and islandhood are tightly connected. Indeed, one might say that the central empirical prediction of the subjacency account was successive cyclic movement. All the other stuff was there to code Ross’s observations. Successive cyclicity was a novel (and verified) consequence.
[4] Boeckx and Grohmann (2007) have gone through this in detail and so far as I can tell, the theoretical landscape has stayed pretty much the same.
[5] There are attempts to give a “Barriers” version of phases (e.g. Den Dikken) and attempts to argue that virtually every “Max P” is a phase in order to finesse this minimalist problem.


  1. I think what's interesting is that even the structural definition of minimality is substantive, to the extent that it takes certain relations to be primitive. I'm not sure there's a principled reason why we could say that a particular set of categories (Cat = {C, D, ...}) is more substantive than a particular class of relations (Immediate Dominance, at the minimum). On purely mathematical grounds, a set of categories surely is simpler than a whole class of relations. But cognitively this probably isn't the case -- I could imagine that a class of relations would result from specific cognitive components being used in particular ways, a sort of epiphenomenon. I think that kind of story a lot harder to tell for sets of things. I think that's probably one of the bigger mysteries of language that we haven't begun to approach -- why is it that these kinds of things are what language uses? Why are the "substantive" universals, in the usual sense, probably not the right answer?

  2. I think of Substantive Universals as being like constants in a physical theory. They have certain specific properties that look, from where we sit now, to be arbitrary. This seems less true (or I hope it is less true) for the formal universals, which (I hope) can be unified with other more generic cognitive principles/operations. If this analogy is roughly right, then the question is whether the specific values we have for these constants is derivable from the properties of the computational system (a project familiar from contemporary physics (which, btw, looks like it has failed)). Maybe some are, maybe others aren't. For those that are, great. For those that aren't, we should ask whether they are actually part of FL/UG or maybe just appropriated from the rest of cognition for linguistic purposes. There is a tendency to see grammatical features as internal to FL/UG e.g. animacy, phi-features, etc. But is this necessary? I don't know.