Comments

Showing posts sorted by relevance for query Epstein, Kitihara and Seely. Sort by date Show all posts
Showing posts sorted by relevance for query Epstein, Kitihara and Seely. Sort by date Show all posts

Wednesday, May 1, 2013

Formal and Substantive Universals


This post will be pretty free form, involving more than a little thinking out loud (aka rambling). It will maunder a bit and end pretty inconclusively. If this sort of thing is not to your liking, here would be a good place to stop.

I’ve recently read an interesting paper on a question that I’ve been thinking about off and on for about a decade (sounds longer than 10 years eh?) by Epstein, Kitihara and Seely (EKS) (here).  The question: to what degree are licit formal dependencies of interacting expressions functions of the substantive characteristics of the dependent elements? This is a mouthful of a sentence, but the idea is pretty simple: we have lots of grammatical dependencies, how much do they depend on the specific properties of specific lexical/functional items involved?[1]  Let me give a couple of illustrations to clarify what I’m trying to get at.

Take the original subjacency condition. It prohibited two expressions from interacting if one is within an island and the other is outside that island. So in (1) Y cannot move to X:

(1)  […X…[island…Y…]…]

Now, we can list islands by name (e.g. CNPC, WH-island, Subject Islands etc.) or we can try to unify them in some way. The first unification (due to Chomsky) involved two parts; the first a specification of how far is too far (at most one bounding node between X and Y), the second an identification of the bounding nodes (BN) (DP and CP, optionally TP and PP etc.). Now, the way I always understood things is that the first part of the “definition” was formal (i.e. the same principle holds regardless of the BN inventory), the second substantive (i.e. the attested dependencies depend on the actual choice of BNs). Indeed, Rizzi’s famous paper (actually the one limned in the footnotes, rather than the one in the text) was all about how to model typological differences via small changes in the inventory of BNs for a given grammar.  So, the classical theory of subjacency comprises a formal part that does not care about the actual categories involved and a substantive part, that cares a lot.

Later theories of islands cut things up a little differently. So, for example, one intriguing feature of Barriers was its ambition to eliminate the substantive part of subjacency theory.  Rather than actually listing the BNs, Barriers tried to deduce the class of BNs to general formal properties of the phrase marker.  Roughly speaking, complements are porous, while non-complements are barriers.[2] Complementation is itself an abstract formal dependency, largely independent of the contents of the interacting expressions.  I say “largely independent” for in Barriers it was critical that there be some form of L-marking that was itself dependent on theta marking. However, the L-marking relation was very generic and applied widely to many different kinds of expressions.

Cut to the present and phases: phases have returned to the original conception of BNs. Of course we now call them phase heads rather than BNs, and we include v as well as CP (an inheritance from Barriers) but what is important is that we list them.[3] The grammar functions as it does because v and C are points of transfer and they are points of transfer because they are phase heads. Thus, if you are not a phase head you are not a point of transfer. However, theoretically, you are a phase head because you have been so listed. BTW, as you all know, unless D is included in this inventory, we cannot code island effects in terms of phases.  And as you also all know, the phase-based account of islands is no more principled than the older subjacency account.[4]  However, this is not my topic here.  All I want to observe is how substantive assumptions interact with formal ones to determine the class of licit dependencies and how some accounts have a “larger” substantive component than others. I also want to register a minimalist observation (by no means original) that the substantive assumption about the inventory of Phases/BNs raises non-trivial minimalist queries: “why these?” being the obvious one. [5]

Let’s contrast this case with Minimality.  This, so far as I can tell, is a purely formal restriction, even in its relativized form. It states that in a configuration like (2), X,Y,Z being of the same type (i.e. sharing the same relevant features) Y cannot interact with X over an intervening Z. For (2) the actual feature specifications do not matter. Whatever they are, minimality will block interaction in these cases. This is what I mean by treating it as a purely formal condition.

            (2) …X…Z…Y…

So, we now have two different examples, let’s get back to the question posed in EKS: we all assume a universal base hypothesis with the rough structure C-T-v-V, to what degree does this base hierarchical order follow from formal principles?  Note, the base theory names the relevant heads in their relevant hierarchical order, the question is to what degree do formal principles force this order. EKS discuss this and argue that given certain current assumptions about phases, we can derive the fact that theta domains are nested within case domains and, suggest, that the same reasoning can apply to the upper C-T part of the base structure. Like I said the paper is interesting and I recommend it.  However, I would like to ask EKSs question in a slightly different way, stealing a trick from our friends in physics (recall, I am deep green with physics envy). 

Among the symmetries physicists study is one in which different elements are swapped for one another. Thus, as Carrol (here) noted concerning nuclear structure: “In 1954, Chen Ning Yang and Robert Mills came up with the idea that this symmetry should be promoted to a local symmetry – i.e., that we should be allowed to “rotate” neutrons and protons into each other at every point in space (154).” They did this to consider whether the strong force really cared about the obvious differences between protons and neutrons. Let’s try a similar trick within the C-T-v-V domain, this time “rotating” theta and case markers into each other, to see whether the ordering of elements in the base really affects what kinds of formal dependencies we find.

More specifically: consider the basic form of the sentence, and let’s consider only the dependencies within TP:

            (3) [CP C [TP …T…[vP Subj v [VP V Obj]]]]

In (3) Subj gets theta from v and case from T. Object gets theta from V and case from v. So there is a T-Subj relation, a Subj-v relation a v-Obj relation and a V-Obj relation.  Does it matter to these relations and further derivations that in fact the specific features noted are checked by the indicated heads. To get a handle on this, imagine if we systematically changed case for theta assignments above (i.e. rotated case and theta into each other so that T assigns theta to Subj and v assigns case, v assigns theta to Obj and V assigns case etc.) what would go wrong? If nothing goes wrong, then the actual labels here make no difference. If nothing goes wrong then the formal properties do not determine the actual substantive order

To sound puffed up and super scientific we might say that the formal properties are symmetric wrt the substantive features of case and theta assignment. Note, btw, we already think this way for theta and case values. The grammatical operations are symmetric with respect to these (i.e. they don’t care what the actual theta role or case value is).  We are just extending this reasoning one step further by asking about assignment as well as values.

Observe that things can go “wrong” in various ways: we could get lots of decent looking derivations honoring the formal restrictions but the derivations either under or over generate.  For example, If T assigns the external theta role then transitive small clauses might be impossible if small clauses have no structure higher than v.  This seems false. Or, if this is right, then we might expect expletives to always sit under neg in English as they cannot move to Spec t this being a theta position. Again, this seems wrong. So, there seem to be, at least at first blush, empirical consequences of making this rotation.  However, the “look” of the system is not that different if this is the only kind of problem, i.e. the resulting system is language like if not exactly identical to what we actually find. In other words, it’s a possible UG, just not our UG. There is clearly a minimalist question lurking here.

A second way things could go wrong is that we do not in general get convergent derivations. EKS argue that certain phase-based accounts have this more expansive consequence. The problem is not a little over/under generation, the problem is that we can barely get a decent derivation at all. In our little though experiment this means that rotating the case and theta values results in impossible UGs. This would be a fascinating result, with obvious minimalist intepretations.

Both kinds of “problems” are interesting. This first showing that our UG deeply cares about the substantive heads having the specific properties they do. The second suggests that there is a very strong tie between the basic structure of the clause and the formal universals we have.  Both kinds of results would be interesting.

I have no worked out answer to the ‘what goes wrong?’ question (though if you get one I would love to hear about it). Note that I have abstracted away from everything but what is assumed to be syntactically relevant; case and theta “features.” I have also assumed that how these features are assigned is symmetrical: that both are assigned in the same way. If this is false then this might be the source of the substantive base order noted (e.g. if theta were only under merge and case could be under agree). However, right now I am satisfied to leave my version of the EKS question open.

Let me end with two further random observations:

First, whatever answer we find, I find the question to be really cool and novel and exciting! We may need to stipulate all sorts of things but until we start asking the kinds of question EKS pose, we won’t know what is principled and what not.

Second, as noted, the answer to this question will have significant implications for the Minimalist Program (MP).  To date, many minimalists (e.g. me) have concentrated on trying the unify the various dependencies as prelude to explaining their properties in generic cognitive/computational terms.  However, if C-T-v-V base structure is part of FL/UG then it presents a different kind of challenge to MP, given the apparent linguistic specificity of the universals.  Few notions are more linguistically parochial than BN or phase head or C-T-v-V.  It would be nice if some of this followed from architectural features of the grammar as EKS suggest, or from the demands of the interface (e.g. think Heim’s tri-partite semantic structures), or something else still. The real challenge to MP arises if these kinds of substantive universals are brut. At any rate, it seems to me that substantive universals present a different kind of challenge to MP and so they are worth thinking about very carefully. 

That’s enough rambling.

   


[1] I know, all the rage lately has been to pack all interesting grammatical properties into functional heads, specific lexical content being restricted to roots. The question still arises: do we need to know the special properties of the involved heads, be they functional or not, to get the right class of grammatical dependencies?
[2] This idea was not original to Barriers.  Cattell and Cinque, I believe, had a similar theoretical intuition earlier.
[3] I think that we can all agree that Chomsky’s attempt to relate the selction of C and v to semantic properties has not been a singular success. Moreover, his rationalization leaves out D, without which extending phases to cover island effects is impossible. If phases do not explain island phenomena, then their utility is somewhat circumscribed, given the view over the last 30 years that cyclicity and islandhood are tightly connected. Indeed, one might say that the central empirical prediction of the subjacency account was successive cyclic movement. All the other stuff was there to code Ross’s observations. Successive cyclicity was a novel (and verified) consequence.
[4] Boeckx and Grohmann (2007) have gone through this in detail and so far as I can tell, the theoretical landscape has stayed pretty much the same.
[5] There are attempts to give a “Barriers” version of phases (e.g. Den Dikken) and attempts to argue that virtually every “Max P” is a phase in order to finesse this minimalist problem.

Wednesday, July 24, 2013

LSA Summer Camp


The LSA summer institute just finished last week. Here are some impressions.

In many ways it was a wonderful experience and it brought back to me my life as a graduate student.  My apartment was “functional” (i.e. spare and tending towards the slovenly). As in my first grad student apartments, I had a mattress on the floor and an AC unit that I slept under. The main difference this time around was that that the AC unit I had at U Mich was considerably smaller than the earlier industrial strength machine that was able to turn my various abodes into a meat locker (I’m Canadian/Quebecois and ¯“mon pays ce n’est pas un pays c’est l’hiver…”¯ !). In fact, this time around the AC was more like ten flies flapping vigorously. It was ok if I slept directly under the fan (hence the floor mattress).  The downside, something that I do not remember from my experience 40 years ago, was that this time around, getting up out of bed was more demanding now than it was then.

I was at the LSA to teach intro to minimalist syntax.  It was a fun course to teach. There were between 80-90 people that attended regularly, about half taking the course for some kind of credit. To my delight, there was real enthusiasm for minimalist topics and the discussion in class was always lively.  The master narrative for the course was that the Minimalist Program (MP) aims to answer a “newish” question: what features of FL are peculiarly linguistic? The first lecture and a half consisted of a Whig history of Generative Grammar, which tried to locate the MP project historically. The main idea was that if one’s interest lies in distinguishing the cognitively general from the linguistically parochial within FL there have to be candidate theories of FL to investigate. GB (for the first time) provides an articulated version of such a theory, with the sub-modules, (i.e. Binding theory, control theory, movement, subjacency, the ECP, X’ theory etc.) providing candidate “laws of grammar.” The goal of MP is to repackage these “laws” in such a way as to factor out those features that are peculiar to FL from those that are part of general cognition/computation.  I then suggested that this project could be advanced by unifying the various principles in the different modules in terms of Merge, in effect eliminating the modular structure of FL. In this frame of mind, I showed how various proposals within MP could be seen as doing just this: Phrase Structure and Movement as instances of Merge (E and I respectively), case theory as an instance of I-merge, control, and anaphoric binding as instances of I-merge (A-chain variety) etc.  It was fun. The last lectures were by far the most speculative (it involved seeing if we could model pronominal binding as an instance of A-to-A’-to-A movement (don’t ask)) but there was a lot of interesting ongoing discussion as we examined various approaches for possible unification.  We went over a lot of the standard technology and I think we had a pretty good time going over the material. 

I also went on a personal crusade against AGREE.  I did this partly to be provocative (after all most current approaches to non-local dependencies rely on AGREE in a probe-goal configuration to mediate I-merge) and partly because I believe that AGREE introduces a lot of redundancy into the theory, not a good thing, so it allowed us to have a lively discussion of some of the more recondite evaluative considerations that MP elevates.[1]  At any rate, here the discussion was particularly lively (thanks Vicki) and fun. I would love to say that the class was a big hit, but this is an evaluation better left to the attendees than to me. Suffice it to say, I had a good time and the attrition rate seemed to be pretty low.

One of the perks of teaching at the institute is that one can sit in on one’s colleagues’ classes. I attended the class given by Sam Epstein, Hisa Kitihara and Dan Seely (EKS).  It was attended by about 60 people (like I said, minimalism did well at this LSA summer camp).  The material they covered required more background than the intro course I taught and EKS walked us through some of their recent research. It was very interesting. The aim was to develop of an account of why transfer applies when it does. The key idea was that cyclic transfer is forced in computations that result in in multi-peaked structures that themselves result from strict adherence to derivations that respect (an analogue of) Merge-Over-Move and feature lowering of the kind that Chomsky has recently proposed.  The technical details are non-trivial so those interested should hunt down some of their recent papers.[2]

A second important benefit of EKS’s course was the careful way that they went through some of Chomsky’s more demanding technical suggestions, sympathetically yet critically.  We had a great time discussing various conceptions of Merge and how/if labeling should be incorporated into core syntax. As many of you know, Chomsky has lately made noises that labeling should be dispensed with on simplicity grounds. Hisa (with kibbitzing from Sam and Dan) walked us though some of his arguments (especially those outlined in “Problems of Projection”). I was not convinced, but I was enlightened. 

Happily, in the third week, Chomsky himself came and discussed these issues in EKS’s class.  The idea he proposed therein was that phrases require labels at least when transferred to the CI interface. Indeed, Chomsky proposed a labeling algorithm that incorporated Spec-Head agreement as a core component (yes, it’s back folks!!).  It resolves labeling ambiguities.  To be slightly less opaque: in {X, YP} configurations the label is the most prominent (least embedded) lexical item (LI) (viz. X). In {XP, YP} configurations there are two least embedded LIs (viz. descriptively, the head of X and the head of Y). In these cases, agreement enters to resolve the ambiguity by identifying the two heads (i.e. thereby making them the same). Where agreement is possible, labeling is as well. Where it is not, one of the phrases must move to allow labeling to occur in transfer to CI.  Chomsky suggested that this requirement for unambiguous labeling (viz. the demand that labels be deterministically computed) underlies successive cyclic movement. 

To be honest, I am not sure that I yet fully understand the details enough to evaluate it (to be more honest, I think I get enough of it to be very skeptical). However, I can say that the class was a lot of fun and very thought provoking. As an added bonus, it brought me and Vicki Carstens together on a common squibbish project (currently under construction). For me it felt like being back in one of Chomsky’s Thursday lectures. It was great.

Chomsky gave two other less technical talks that were also very well attended. All in all, a great two days.

There were other highlights. I got to talk to Rick Lewis a lot. We “discussed” matters of great moment over excellent local beer and some very good single malt scotch. It was as part of one of these outings that I got him to allow me to post his two papers here. One particularly enlightening discussion involved the interpretation of the competence/performance distinction. He proposed that it be interpreted as analogous to the distinction between capacities and exercisings of capacities.  A performance is the exercise of a capacity. Capacities are never exhausted by their exercisings.  As he noted, on this version of the distinction one can have competence theories of grammars, of parsers, and of producers. On this view, it’s not that grammars are part of the theory of competence and parsers part of the theory of performance. Rather, the distinction marks the important point that the aim of cognitive theory is to understand capacities, not particular exercisings thereof. I’m not sure if this is exactly what Chomsky had in mind when he introduced the distinction, but I do think that it marks an important distinction that should be highlighted (one further discussed here).

Let me end with one last impression, maybe an inaccurate one, but one that I nonetheless left with.  Despite the evident interest in minimalist/biolinguistic themes at the institute, it struck me that this conception of linguistics is very much in the minority within the discipline at large. There really is a linguistics/languistics divide that is quite deep, with a very large part of the field focused on the proper description of language data in all of its vast complexity as the central object of study. Though, there is no a priori reason why this endeavor should clash with the biolinguistic one, in practice it does. 

The two pursuits are animated by very different aesthetics, and increasingly by different analytical techniques.  They endorse different conceptions of the role of idealization, and different attitudes towards variation and complexity. For biolinguists, the aim is to eliminate the variation, in effect to see through it and isolate the individual interacting sub-systems that combine to produce the surface complexity. The trick on this view is to find a way of ignoring a lot of the complex surface data and hone in on the simple underlying mechanisms. This contrasts with a second conception, one that embraces the complexity and thinks that it needs to be understood as a whole. On this second view, abstracting from the complex variety manifested in the surface forms is to abstract away from the key features of language.  On this second view, language IS variation, whereas from the biolinguistic perspective a good deal of variation is noise.

This, of course, is a vast over-simplification. But I sense that it reflects two different approaches to the study of language, approaches that won’t (and can’t) fit comfortably together. If so, linguistics will (has) split into two disciplines, one closer to philology (albeit with fancy new statistical techniques to bolster the descriptive enterprise) and one closer to Chomsky’s original biolinguistic conception whose central object of inquiry is FL.

Last point: One thing I also discovered is how much work running one of these Insitutes can be. The organizers at U Michigan did an outstanding job. I would like to thank Andries Coetze, Robin Queen, Jennifer Nguyen and all their student helpers for all their efforts.  I can be very cranky (and I was on some days) and when I was, instead of hitting me upside the head, they calmly and graciously settled me down, solved my “very pressing” problem and sent me on my merry way. Thanks for your efforts, forbearance and constant good cheer.



[1] I make this argument in chapter 6 here.
[2] See the three papers in 2010, 2011, and 2012 by EKS noted here