Wednesday, July 23, 2014

Academia as business

One of the forces behind the "crapification" of academic life (see here) is the fact that it is increasingly being managed by "bidnessmen."  Part of this is because governments have decided to drop support for universities (the cold war is over and we beat the Russians) and the shortfall needs to made up from somewhere so universities have effectively turned themselves into fund raising machines and heading up a supercharged money grubber making machine requires the unctuous talents that only big bucks can attract.  So, as Chomsky might put it, universities are not longer educational institutions and endowments but endowments with educational institutions.  As all good minimalists know, being after a with means being of lesser significance!

At any rate, (here) is a piece that discusses some of the current dynamics.  The ever increasing bureaucratization of the university is not a bug that can be eliminated, but a feature of how things are done now.  Given the current state of play, universities do need alternative sources of funding and rich people and corporations (other "kinds" of people, at least in the USA) are the obvious source.  Catering to these sources of income requires work and the people that are good at it, not surprisingly, are only tangentially interested in what universities do. The figures on the relative growth rates of faculty to administrators and the relevant salary comparisons are significant, IMO. IT's not actually clear that much can be done, but it's worth knowing that this is no accident. It's just how things work now.

Monday, July 21, 2014

What's in a Category? [Part 2]

Last week I wondered about the notion of syntactic category, aka part of speech (POS). My worry is that we have no clear idea what kind of work POS are supposed to do for syntax. We have some criteria for assigning POS to lexical items (LI) --- morphology, distribution, semantics --- but there are no clear-cut rules for how these are weighed against each other. Even worse, we have no idea why these are relevant criteria while plausible candidates such as phonological weight and arity seem to be irrelevant.1 So what we have is an integral part of pretty much every syntactic formalism for which we cannot say
  • what exactly it encompasses,
  • why it is necessary,
  • why it shows certain properties but not others.
Okay, that's a pretty unsatisfying state of affairs. Actually, things are even more unsatisfying once you look at the issue from a formal perspective. But the formal perspective also suggests a way out of this mess.

Comments on lecture 4-I

I have just finished listening to Chomsky’s fourth lecture and so this will be the last series of posts on them (here). If you have not seen them, let me again suggest that you take the time to watch. They are very good and well worth the (not inconsiderable) time commitment.

In 4, Chomsky’s does three things. First he again tries to sell the style of investigation that the lectures as a whole illustrate. Second, he reviews the motivations and basic results of his way of approaching the Darwin’s Problem.  Third, he proposes ways of tidying up some of the loose ends that the outline in 3 generates (at least they were loose ends that I did not understand).  Let me review each of these points in turn.

1. The central issues and the Strong Minimalist Thesis (SMT)

Chomsky, as is his wont, returns to the key issues as he sees them. There are two of particular importance.

First, he believes that we should be looking for simple theories.  He names this dictum Galileo’s Maxim (GM).  GM asserts (i) that nature is simple and (ii) it is the task of the scientist to prove that it is. Chomsky notes that this is not merely good general methodological advice (which it is), but that in the particular context of the study of FL there are substantive domain specific reasons for adopting it. Namely: Darwin’s Problem (DP). Chomsky claims that DP rests on three observations: (i) That our linguistic competence is not learnable from simple data, (ii) There is no analogue of our linguistic capacity anywhere else in the natural world, and (iii) The capacity for language emerged recently (in the last 100k years or so), emerged suddenly and has remained stable in its properties since its emergence.[1]  These three points together imply that we have a non-trival FL, that it is species specific and that it arose as a result of a very “simple” addition to ancestor’s cognitive repertoire.  So, in addition to the general (i.e. external to the specific practice of linguistics) methodological virtues of looking for simple and elegant theories, DP provides a more substantive (i.e. internal to linguistics) incentive, as simple theories are just the sorts of things that could emerge rapidly in a lineage and remain stable after emerging. 

I very much like this way of framing the central aims of the Minimalist Program (MP).  It reconciles two apparently contradictory themes that have motivated MP. The first theme is that looking for simple theories is just good methodology and so MP is nothing new.  On this reading, MP is just the rational extension of GG theorizing, just the application of general scientific principles/standards of rational inquiry to linguistic investigations.  On this view, MP concerns are nothing new and the standards MP applies to theory evaluation are just the same as they always were.  The second view, one that also seems to be a common theme, is that MP does add a new dimension to inquiry. DP, though always a concern, is now ripe for investigation. And thinking about DP motivates developing simple theories for substantive reasons internal to linguistic investigations, motivations in addition to the standard ones prompted by concerns of scientific hygiene.  On this view, raising DP to prominence changes the relevant standards for theoretical evaluation. Adding DP to Plato’s Problem, then, changes the nature of the problem to be addressed in interesting ways.

This combined view, I think, gets MP right.  It is both novel and old hat.  What Chomsky notes is that at some times, depending on how developed theory is, new questions can emerge or become accented and at those times the virtues of simplicity have a bite that goes beyond general methodological concerns.  Another way of saying this, perhaps, is that there are times (now being one in linguistics) where the value of theoretical simplicity is elevated and the task of finding simple non-trivial coherent theories is the central research project. The SMT is intended to respond to this way of viewing the current project (I comment on this below).

Chomsky makes a second very important point. He notes that our explanatory target should be the kinds of effects that GG has discovered over the last 60 years.  Thus, we should try to develop accounts as to why FL generates an unbounded number of structured linguistic objects (SLO), why it incorporates displacement operations, why it obeys locality restrictions (strict cyclicity, PIC), why there is overt morphology, why there are subject/object asymmetries (Fixed Subject Effects/ECP), why there are EPP effects, etc. So, Chomsky identifies both a method of inquiry  (viz. Galileo’s Maxim) and a target of inquiry (viz. the discovered laws and effects of GG). Theory should aim to explain the second while taking DM very very seriously.

The SMT, as Chomsky sees it, is an example of how to do this (actually, I don’t think he believes it is an example, but the only possible conceptually coherent way to proceed).  Here’s the guts of the SMT: look for the conceptually simplest computational procedures that generate SLOs and that are interpreted at CI and (secondarily) SM.  Embed these conceptually simple operations in a computationally efficient system (one that adheres to obvious and generic principles of efficient computation like minimal search, No Tampering, Inclusiveness, Memory load reduction) and show that from these optimal starting points one can derive a good chunk of the properties that GG has discovered natural language grammars to have.  And, when confronted with apparent counter-examples to the SMT, look harder for a solution that redeems the SMT.  This, Chomsky argues is the right way, today, to do theoretical syntax.

I like almost all of this, as you might have guessed. IMO, the only caveat I would have is that the conceptually simple is often a very hard to discern. Moreover, what Occam might endorse, DP might not. I have discussed before that what’s simple in a DP context might well depend on what was cognitively available to our ancestors prior to the emergence of FL. Thus, there may be many plausible simple starting points that lead to different kinds of theories of FL all of which respond to Chomsky’s methodological and substantive vision of MP. For what it’s worth, contra Chomsky, I think (or, at least believe that it is rational to suggest) that Merge is not simple but complex and that it is composed of a more cognitively primitive operation (viz. Iteration) and a novel part (viz. Labeling). For those who care about this, I discuss what I have in mind further here in part 4 (the finale) of my comments to lecture 3.[2] However, that said, I could not agree with Chomsky’s general approach more. An MP that respects DP should deify GM and target the laws of GG.  Right on.

[1] Chomsky has a nice riff where he notes that though it seems to him (and to any sane researcher) that (i)-(iii) are obviously correct, nonetheless these are highly controversial claims, if judged by the bulk of research on language. He particularly zeros in on big data statistical learning types and observes (correctly in my view) that not only have they not been able to deliver on even the simplest PoS problems (e.g. structure dependence in Y/N questions) but that they are currently incapable of delivering anything of interest given that they have misconstrued the problem to be solved. Chomsky develops this theme further, pointing out that to date, in his opinion, we have learned nothing of interest from these pursuits either in syntax or semantics. I completely agree and have said so here. Still, I get great pleasure in hearing Chomsky’s completely accurate dismissive comments. 
[2] I also discuss this in a chapter co-written with Bill Idsardi forthcoming in a collection edited by Peter Kosta from Benjamins.

Thursday, July 17, 2014

Big money, big science and brains

Gary Marcus here discusses a recent brouhaha taking place in the European neuro-science community. The kerfuffle, not surprisingly, is about how to study the brain. In other words, it's about money. The Europeans have decided to spend a lot of Euros (real money!) to try to find out how brains function. Rather than throw lots of it at many different projects haphazardly and see which gain traction, the science bureaucrats in the EU have decided to pick winners (an unlikely strategy for success given how little we know, but bureaucratic hubris really knows no bounds). And, here’s a surprise, many of those left behind are complaining. 

Now, truth be told, in this case my sympathies lie with (at least some) of those cut out.  One of these is Stan Dehaene, who, IMO, is really one of the best cog-neuro people working today.  What makes him good is his understanding that good neuroscience requires good cognitive science (i.e. that trying to figure out how brains do things requires having some specification of what it is that they are doing). It seems that this, unfortunately, is a minority opinion. And this is not good. Marcus explains why.

His op-ed makes several important points concerning the current state of the neuro art in addition to providing links to aforementioned funding battle (I admit it: I can’t help enjoy watching others fighting important “intellectual battles” that revolve around very large amounts of cash). His most important point is that, at this point in time, we really have no bridge between cognitive theories and neuro theories. Or as Marcus puts it:

What we are really looking for is a bridge, some way of connecting two separate scientific languages — those of neuroscience and psychology.

In fact, this is a nice and polite way of putting it. What we are really looking for is some recognition from the hard-core neuro community that their default psychological theories are deeply inadequate. You see, much of the neuro community consists of crude (as if there were another kind) associationists, and the neuro models they pursue reflect this. I have pointed to several critical discussions of this shortcoming in the past by Randy Gallistel and friends (here).  Marcus himself has usefully trashed the standard connectionist psycho models (here). However, they just refuse to die and this has had the effect of diverting attention from the important problem that Marcus points to above; finding that bridge.

Actually, it’s worse than that. I doubt that Marcus’s point of view is widely shared in the neuro community. Why? They think that they already have the required bridge. Gallistel & King (here) review the current state of play: connectionist neural models combine with associationist psychology to provide a unified picture of how brains and minds interact.  The problem is not that neuroscience has no bridge, it’s that it has one and it’s a bridge to nowhere. That’s the real problem. You can’t find what you are not looking for and you won’t look for something if you think you already have it.

And this brings us back to the aforementioned battle in Europe.  Markham and colleagues have a project.  It is described here as attempting to “reverse engineer the mammalian brain by recreating the behavior of billions of neurons in a computer.” The game plan seems to be to mimic the behavior of real brains by building a fully connected brain within the computer. The idea seems to be that once we have this fully connected neural net of billions of “neurons” it will become evident how brains think and perceive. In other words, Markham and colleagues “know” how brains think, it’s just a big neural net.[1] What’s missing is not the basic concepts, but the details. From their point of view the problems is roughly to detail the fine structure of the net (i.e. what’s connected to what). This is a very complex problem for brains are very complicated nets. However, nets they are. And once you buy this, then the problem of understanding the brain becomes, as Science put it (in the July 11/2014 issue), “an information technology” issue.[2]

And that’s where Marcus and Dehaene and Gallistel and a few notable others disagree: they think that we still don’t know the most basic features of how the brain processes information. We don’t know how it stores info in memory, how it retrieves it from memory, how it calls functions, how it binds variables, how, in a word, it computes. And this is a very big thing not to know. It means that we don’t know how brains incarnate even the most basic computational operations.

In the op-ed, Marcus develops an analogy that Gallistel is also fond of pointing to between the state of current neuroscience and biology before Watson and Crick.[3]  Here’s Marcus on the cognition-neuro bridge again:

Such bridges don’t come easily or often, maybe once in a generation, but when they do arrive, they can change everything. An example is the discovery of DNA, which allowed us to understand how genetic information could be represented and replicated in a physical structure. In one stroke, this bridge transformed biology from a mystery — in which the physical basis of life was almost entirely unknown — into a tractable if challenging set of problems, such as sequencing genes, working out the proteins that they encode and discerning the circumstances that govern their distribution in the body.
Neuroscience awaits a similar breakthrough. We know that there must be some lawful relation between assemblies of neurons and the elements of thought, but we are currently at a loss to describe those laws. We don’t know, for example, whether our memories for individual words inhere in individual neurons or in sets of neurons, or in what way sets of neurons might underwrite our memories for words, if in fact they do.

The presence of money (indeed, even the whiff of lucre) has a way of sharpening intellectual disputes. This one is no different. The problem from my point of view is that the wrong ideas appear to be cashing in. Those controlling the resources do not seem (as Marcus puts it) “devoted to spanning the chasm.” I am pretty sure I know why too: they don’t see one. If your psychology is associationist (even if only tacitly so), then the problem is one of detail not principle. The problem is getting the wiring diagram right (it is very complex you know), the problem is getting the right probes to reveal the detailed connections to reveal the full networks. The problem is not fundamental but practical; problems that we can be confident will advance if we throw lots of money at them.

And, as always, things are worse than this. Big money calls forth busy bureaucrats  whose job it is to measure progress, write reports, convene panels to manage the money and the science.  The basic  problem is that fundamental science is impossible to manage due to its inherent unpredictability (as Popper noted long ago). So in place of basic fundamental research, big money begets big science which begets the strategic pursuit of the manageable. This is not always a bad thing.  When questions are crisp and we understand roughly what's going on big science can find us the Higgs field or W bosons. However, when we are awaiting our "breakthrough" the virtues of this kind of research are far more debatable. Why? Because in this process, sadly, the hard fundamental questions can easily get lost for they are too hard (quirky, offbeat, novel) for the system to digest. Even more sadly, this kind of big money science follows a Gresham’s Law sort of logic with Big (heavily monied) Science driving out small bore fundamental research. That’s what Marcus is pointing to, and he is right to be disappointed.

[1] I don’t understand why the failure of the full wiring diagram of the nematode (which we have) to explain nematode behavior has not impressed so many of the leading figures in the field (Cristof Koch is an exception here).  If the problem were just the details of the wiring diagram, then the nematode “cognition” should be an open book, which it is most definitely not. 
[2] And these large scale technology/Big Data projects are a bureaucrats dream. Here there is lots of room to manage the project, set up indices of progress and success and do all the pointless things that bureaucrats love to do. Sadly, this has nothing to do with real science.  Popper noted long ago that the problem with scientific progress is that it is inherently unpredictable. You cannot schedule the arrival of breakthrough ideas.  But this very unpredictability is what makes such research unpalatable to science managers and why it is that they prefer big all encompassing sciency projects to the real thing. 
[3] Gallistel has made an interesting observation about this earlier period in molecular biology. Most of the biochemistry predating Watson and Crick has been thrown away.  The genetics that predates Watson and Crick has largely survived although elaborated.  The analogy in the cognitive neurosciences is that much of what we think of as cutting edge neuroscience might possibly disappear once Marcus’s bridge is built. Cognitive theory, however, will largely remain intact.  So, curiously, if the prior developments in molecular biology are any guide, the cognitive results in areas like linguistics, vision, face recognition etc. will prove to be far more robust when insight finally arrives than the stuff that most neuroscientists are currently invested in.  For a nice discussion of this earlier period in molecular biology read this. It’s a terrific book.

Monday, July 14, 2014

What's in a Category? [Part 1]

Norbert's most recent comments on Chomsky's lecture implicitly touched on an issue that I've been pondering ever since I realized how categories can be linked to constraints. Norbert's primary concern is the role of labels and how labeling may drive the syntactic machinery. Here's what caught my attention in his description of these ideas:
In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a "complex" set {a,b} to either a or b, thereby putting it in the equivalence class of 'a' or 'b'. If Labels allow Select to apply to anything in the equivalence class of 'a' (and not just to 'a' alone), we can derive [structured linguistic objects] via Iteration.
Unless I'm vastly miscontruing Norbert's proposal, this is a generalization of the idea of labels as distribution classes. Linguists classify slept and killed Mary as VPs because they are interchangeable in all grammatical sentences of English. Now in Norbert's case the labels presumably aren't XPs but just lexical items, following standard ideas of Bare Phrase Structure. Let's ignore this complication for now (we'll come back to it later, I promise) and just focus on the issue that causes my panties to require some serious untwisting:
  • Are syntactic categories tied to distribution classes in some way?
  • If not, what is their contribution to the formalism?
  • What does it mean for a lexical item to be, say, a verb rather than a noun?
  • And why should we even care?

A question about feature valuation

I've been working in a "whig history" (WH) of generative grammar. A WH is a kind of rational reconstruction which, if doable, serves to reconstruct the logical development of a field of inquiry. WHs, then, are not “real” histories. Rather, they present the past as “an inevitable progression towards ever greater…enlightenment.” Real history is filled with dead ends, lucky breaks, misunderstandings, confusions, petty rivalries, and more.  WHs are not. They focus on the “successful chain of theories and experiments that led to the present-day science, while ignoring failed theories and dead ends” (see here). The value of WHs is that they expose the cumulative nature of a given trajectory of inquiry. As one sign of a real science is that it has a cumulative structure and given that many think that the history of Generative Grammar (GG) fails to have a cumulative structure, many think that this tells against the GG enterprise. However, the "many" are wrong: GG has a perfectly respectable WH and both empirically and theoretically the development has been cumulative. In a word, we've made loads of progress.  But this is not the topic for this post. What is?

As I went about reconstructing the relation between current minimalist theory and earlier GB theory, I came to appreciate just how powerful the No Tampering Condition (NTC) really is (I know I know, I should have understood this before, but dim bulb that I am, I didn't). I understand the NTC as follows: the inputs to a given grammatical operation must be preserved in the outputs of that operation. In effect, the NTC is a conservation principle that says that structure can be created but not destroyed. Replacing an expression with a trace of that expression destroys (i.e. fails to preserve) the input structure in the output and so the GB conception of traces is theoretically inadmissible in a minimalist theory that assumes the NTC (which, let me remind you is a very nice computational principle and part of most (all?) current minimalist proposals).

The NTC has many other virtues as well. For example, it derives the fact that movement rules cannot "lower" and that movement (at least within a single rooted sub-"tree") is always to a "c-commanding" position. Those of you who have listened to any of the Chomsky lectures I posted earlier will understand why I have used scare quotes above. If you don't know why and don't want to listen to the lectures, as David Pesetsky. He can tell you.  

At any rate, the NTC also suffices to derive the GB Projection Principle and the MP Extension Condition. In addition, it suffices to eliminate trace theory as a theoretical option (viz. co-indexed empty categories that are residues of movement: [e]1). Why? because traces cannot exist in the input to the derivation and so they cannot exist in the output given the NTC. Thus, given the NTC, the only way to implement the Projection Principle is via the Copy Theory. This is all very satisfying theoretically for the usual minimalist reasons. However, it also raises a question in my mind, which I would like to ask here.

Why doesn't the NTC rule out feature valuation?  One of the current grammatical operations within MP grammars is AGREE. What it does is relate two expressions (heads actually) in a Probe/Goal configuration and the goal "values" the features of the probe.  Now, the way I've understood this is that the Probe is akin to a property, something like P(x) (maybe with a lambdaish binder, but who really cares) and the goal serves to turn that 'x' into some value, so turns P(x) into P(phi) for example (if you want, via something like lambda conversion, but again who really cares). At any rate, and here's my question: doesn't this violate the NTC? After all, the input to AGREE is P(x) and the output is, e.g. P(phi). Doesn't this violate a strict version of the NTC?

Note, interestingly, feature checking per se is consistent with the NTC, as no feature changing/valuing need go on to "check" if sets of features are "compatible."  However, if I understand the valuation idea, then it is thought to go beyond mere bookkeeping. It is intended to change the feature composition of a probe based on the feature composition of the goal.  Indeed, it is precisely for this reason that phases are required to strip off the valued yet uninterpretable features before Transfer. But if AGREE changes feature matrices then it seems incompatible with the NTC.

The same line of reasoning suggests that feature lowering is also incompatible with the NTC. To wit: if features really transfer from C to T or from v to V (either by being copied from the former to the latter or actually copied from the higher to the lower and deleted from the higher) then again the NTC in its strongest form seems to be violated. 

So, my question: are theories that adopt feature valuation and feature lowering inconsistent with the NTC or not? Note, we can massage the NTC so that it does not apply to such feature "checking" operations. But then we could massage the NTC so that it does not prohibit traces. We can, after all, do anything we wish. For example, current theory stipulates that pair merge, unlike set merge, is not subject to Extension, viz. the NTC (though I think that Chomsky is not happy with this given some oblique remarks he made in lecture 3). However, if  the NTC is strictly speaking incompatible with these two operations, then it is worth knowing, as it would seem to be theoretically very consequential. For example, a good chunk of phase theory, as currently understood, depends on these operations and would we discover that they are incompatible with the NTC then this might (IMO, likely does) have consequences for Darwin's Problem.

So, all you thoroughly modern minimalists out there: what say you?

Friday, July 11, 2014

Comments on lecture 3: the finale

This is the final part of my comments on lecture 3.  The first three parts are (here, here and here). I depart from explication mode in these last comments and turn instead to a critical evaluation of what I take to be Chomsky’s main line of argument (and it is NOT empirical).  His approach to labels emerges directly from his conception of the basic operation Merge. How so? Well, there are only two “places” that MPish approaches can look to in order to ground linguistic processes, the computational system (CS) or the interface conditions (Bare Output Conditions (BOC)).  Given Chomsky’s conceptually spare understanding of Merge, it is not surprising that labeling must be understood as a BOC. I here endorse this logic and conclude that Chomsky’s modus ponens is my modus tolens. If correct, this requires us to rethink the basic operation. Here’s what I believe we should be aiming for: a conception that traces the kind of recursion we find in FL to labeling. In other words, labeling is not a BOC but intrinsic to CS; indeed the very operation that allows for the construction of SLOs. Thus, just as Merge now (though not in MPs early days) includes both phrase building and movement, the basic operation, when properly conceptualized, should also include labeling.

To motivate you to aim for such a conception, it’s worth recalling that in early MP it was considered conceptually obvious that Move and Merge were different kinds of things and that the latter was more basic and that the former was an “imperfection.” Some (including me and Chris Collins) did not buy this dichotomy suggesting that whatever process produced phrase structure (now E-Merge) should also suffice to give one move (now I-merge). In other words, that FL when it arose came fully equipped with both merge and move neither being more basic than the other. On this view, Move is not an “imperfection” at all. Chomsky’s later work endorsed this conception. He derived the same conclusion form other (arguably (though I’m not sure I would so argue) simpler) premises. I derive one moral from this little history: what looks to be conceptually obvious is a lot clearer after the fact than ex ante. Here’s a good place to mention the owl of Minerva, but I will refrain. Thus, here’s a project: rethink the basic operation in CS so that labels are intrinsic consequences. I will suggest one way of doing this below, but it is only a suggestion. What I think there are arguments for is that Chomsky’s way of including labels in FL is very problematic (this is as close as I come to saying that it’s wrong!) and misdiagnoses the relevant issues. The logic is terrific, it just starts from the wrong place. Here goes.

1.     The logic revisited and another perspective on “merge”

There are probably other issues to address if one wants to pursue Chomsky’s proposal. IMO, right now his basic idea, though suggestive, is not that well articulated. There are many technical and empirical issues that need to be ironed out. However, I doubt that this will deter those convinced by Chomsky’s conceptual arguments. So before ending I want to discuss them. And I want to make two points: first that I think that there is something right about his argument. What I mean is that if you buy Chomsky’s conception of Merge, then adding something like a labeling algorithm in CS is conceptually inelegant, if not worse. In other words, Chomsky is right in thinking that adding labeling to his conception of Merge is not a good theoretical move conceptually. And second, I want to suggest that Chomsky’s idea that projection is effectively an interface requirement, a BOC in older terminology, has things backwards.  The interfaces do not require labeled structures to do what they do. At least CI doesn’t, so far as I can tell. The syntax needs them. The interfaces do not. The two points together point to the conclusion that we need to re-think Merge. I will very briefly suggest how we might do this.

Let’s start.  First, Chomsky is making exactly the right kind of argument. As noted at the outset, Chomsky is right to question labeling as part of CS given his view that Merge is the minimal syntactic operation. His version of Merge provides unboundedly many SLOs (plus movement) all by itself. One can add projection (i.e. labeling) considerations to the rule but this addition will necessarily go beyond the conceptual minimum. Thus, Merge cannot have a labeling sub-part (as earlier versions of Merge did).  In fact, the only theoretical place for labels is the interface as the only place for anything in an MP-style account is as an interface BOC or the CS. But as labels cannot be part of CS, they must be traced to properties of the CI/SM interface. And given Chomsky’s view that the CI interface is really where all the action is, this means that labeling is primarily required for CI interpretation.  That’s the logic and it strikes me as a very very nice argument. 

Let me also add, before I pick at some of the premises of Chomsky’s argument, that lecture 3 once again illustrates what minimalist theorizing should aim for: the derivation of deep properties of FL from simple assumptions.  Lecture 3 continues the agenda from lecture 2 by aiming to explain three prominent effects: successive cyclicity, FSCs and EPP effects.  As I have stressed before in other places, these discovered effects are the glory of GG and we should evaluate any theoretical proposal by how well and how many it can explain. Indeed, that’s what theory does in any scientific discipline. In linguistics theory should explain the myriad effects we have discovered over the last 60 years of GG research. In sum, not surprisingly, and even though I am going to disagree with Chomsky’s proposal, I think that lecture 3 offers an excellent model of what theorists should be doing.

So which premise don’t I like?  I am very unconvinced that labels reflect BOCs. I do not see why CI, for example, needs labeled structures to interpret SLOs.  What is needed is structured objects (to provide compositional structure) but I don’t see that it needs labeled SLOs.  The primitives in standard accounts of semantic interpretation are things like arguments, predicates, events, proposition, operator, variable, scope, etc. Not agreeing phrases, VPs or vPs or Question Ps etc.  Thus, for example, though we need to identify the Q operator in questions to give the structure a question “meaning” and we need to determine the scope of this operator (something like its CC domain), it is not clear to me that we also need to identify a question phrase or an agreement phrase.  At least in the standard semantic accounts I am familiar with, be it Heim and Kratzer or Neo-Davidsonian, we don’t really need to know anything about the labels to interpret SLOs at CI. It’s the branching that matters, not what labels sit on the nodes.[1]

I know little about SM (I grew up in a philo dept and have never taken a phonology course (though some of my best friends are phonologists)), but from what I can gather the same seems true on the SM side. There are difference between stress in some Ns and Vs but at the higher levels, the relevant units are XPs not DPs vs VPs vs TP vs CPs etc. Indeed the general procedure in getting to phrasal phonology involves erasing headedness information. In other words, the phonology does not seem to care about labels beyond the N vs V level (i.e. the level of phonological atoms).

If this impression is accurate (and Chomsky asserts but does not illustrate why he thinks that the interfaces should care about labeled SLOs) then we can treat Chomsky’s proposal as a reductio: He is right about how the pieces must fit together given his starting assumptions, but they imply something clearly false (that labels are necessary for interface legibility) therefore there must be something wrong with Chomsky’s starting point, viz. that Merge as he understands it is the right basic operation.

I would go further. If labeling is largely irrelevant for interface interpretation (and so cannot be traced to BOCs) then labeling must be part of CS and this means that Chomsky’s conception of Merge needs reconsideration.[2] So let’s do that.[3]

What follows relies on some work I did (here). I apologize for the self-referential nature of what follows, but hey it’s the end of a very long post.

Here’s the idea: the basic CS operation consists of two parts, only one of which is language specific. The “unbounded” part is the product of a capacity for producing unboundedly big flat structures that is not peculiarly linguistic or unique to humans. Call this operation Iteration. Birds (and mice and bats and whales) do it with songs. Ants do it with path integration.  Iteration allows for the production of “beads on a string” kinds of structures and there is no limit in principle to how long/big these structures can be.

The distinctive feature of Iteration is that it could care less about bracketing. Consider an example: Addition can iterate. ((a+b)+c)+d) is the same as (a+(b+c+d)) which is the same as (a+b+c+d) etc. Brackets in iterative structures make no difference.  The same is true in path integration. What the ant does is add up all the information but the adding up needs no particular bracketing to succeed. So if the ant goes 2 ft N and then 3 ft W and then 6 feet south and then 4 ft E, it makes no difference to calculation how these bits of directional information are added together. However you do this provides the same result. Bracketing does not matter. The same is true for simple conjunction: ((a&b)&c)&d) is equivalent to (a & (b & (c&d))) which is the same as (a&b&c&d). Again brackets don’t matter. Let’s assume then that iterative procedures do not bracket. So there are two basic features of Iteration: (i) there is no upper bound to the objects it can produce (i.e. there is no upper bound on the length of the beaded string), and (ii) bracketing is irrelevant, viz. Iteration does not bracket. It’s just like beads on a string.

Here’s a little model. Assume that we treat the basic Iterative operation as the set union operation. And assume that the capacity to iterate involves being able to map atoms (but only atoms to their unit sets (e.g. a--> {a}). Let’s call this Select. Select is an operation whose domain is the lexical atoms and whose range is the unit set of that atom. Then given a lexicon we can get arbitrarily big sets using U and Select.[4] For example: If ‘a’, ‘b’ and ‘c’ are atoms, then we can form {a} U {b} U {c} to give us {a,b,c}. And so forth. Arbitrarily big unstructured sets.

Clearly, what we have in FL cannot just be Iteration (ie. U plus Select). After all we get SLOs. Question: what if added to Iteration would yield SLOs? I suggest the capacity to Select the outputs of Iteration. More particularly, let’s assume the little model above. How might we get structured sets? By allowing the output to Iteration to be the input to Select. So, if {a,b} has been formed (viz. {a} U {b}-> {a,b}) and Select applies to {a,b} then out comes the structured SLO {{a,b}, c} (viz. {{a,b}} U {c} -> {{a,b},c}. One can also get an analogue of I-merge: select {{a,b},c} (i.e. {{{a,b},c}}, select c (i.e. {c}), Union the sets (i.e. {{{a,b},c}}} U {c}) and out comes {c, {{a,b},c}}.  So if we can extend the domain of Select to include outputs of the union operation then we can get use Iteration to deliver unboundedly many SLOs.

The important question then is what licenses extending the domain of Select to the outputs of Union?  Labeling. Labeling is just the name we give for closing Iteration in the domain of the lexical atoms.[5]  In effect, labels are how we create equivalence classes of expressions based on the basic atomic inventory. Another way of saying this is that Labeling maps a “complex” set {a,b}to either a or b, thereby putting it in the equivalence class of ‘a’ or ‘b’. If Labels allow Select to apply to anything in the equivalence class of ‘a’ (and not just to ‘a’ alone), we can derive SLOs via Iteration.[6]

Ok, on this view, what’s the “miracle”? For Chomsky, the miracle is Merge. On the view above, the miracle is Label, the operation that closes Iteration in the domain of the lexical atoms. Label effectively maps any complex set into the equivalence class of one of its members (creating a modular structure) and then treats these as syntactically indistinguishable from the elements that head them (as happens in modular arithmetic (i.e. ‘1’ and ‘13’ and ‘12’ and ‘24’ are computationally identical in clock arithmetic). Effectively the lexicon serves as the modulus with labels mapping complexes of atoms to single atoms bringing them within the purview of Select.[7]

Note that this “story” presupposes that Iteration pre-exists the capacity to generate SLOs. The U(nion) operation is cognitively general as is Select which allows U to form arbitrarily large unstructured objects. Thus, Iteration is not species specific (which is why birds, ants and whales can do it). What is species specific is Label, the operation that closes U in the domain of the lexical atoms and this what leads to a modular combinatoric system (viz. allows an operation defined over lexical atoms to also operate over non-atomic structures). Note that if this is right, then labels are intrinsic to CS; without it there are no SLOs for without it U, the sole combination operation, cannot derive sets embedded within sets (i.e. hierarchy).

The toy account above has other pleasant features. For example, the operation that combines things is the very general U operation. There are few conceivably simpler operations.  The products of U produce objects that necessarily obey NTC, Inclusiveness and produce copies under “I-merge.” Indeed, this proposal treats U as the main combinatoric operation (the operation that constructs sets containing more than one member). And if combination is effectively U, then phrases must be sets (i.e. U is a set theoretic operation so the objects it applies to must be sets). And that’s why the products of this combinatoric operation respect the NTC, Inclusiveness and produce “copies.”[8]

Let’s now get back to the main point: on this reconstruction, hierarchical recursion is the product of Iterate plus Label. To be “mergeable” you need a label for only then are you in the range of Select and U.  So, labels are a big deal and intrinsic to CS. Moreover, this makes labeling facts CS facts, not BOC facts.

This is not the place to argue that this conception is superior to Chomsky’s. My only point is that if my reservations above about treating Labels as BOCs is correct, then we need to find a way of understanding labels as intrinsic to the syntax, which in turn requires reanalyzing the minimal basic operation (i.e. rethinking the “miracle”).

IMO, the situation regarding projection is not unlike what took place when minimalists rethought the Merge/Move distinction central to early Minimalism (see the “Black Book”).  Movement was taken to be an “imperfection.” Rethinking the basic operation allowed for the unification of E and I-merge (i.e. Gs with SLOs would also have displacement). I think we need to do the same thing for Labeling. We need to find a way to make labels intrinsic features of SLOs, labels being necessary for building structure and displacing them.  Chomsky’s views on projection don’t do this. They start from the assumption that Labels are BOCs. If this strikes you as unconvincing as it does me, then we need to rethink the basic minimal operation.

That’s it. These comments are way too long. But that’s what happens when you try and think about what Chomsky is up to. Agree or not, it’s endlessly fascinating.

[1] Edwin Williams once noted that syntactic categories cross cut semantic ones. Predicative nominals have the same syntactic structure as argument nominal, though they differ a lot semantically. I think Edwin’s point is more generally correct. And if it is, then syntactic labels contribute very little (if anything) to CI interpretation.
[2] Though I won’t go into this here, there is plenty of apparent evidence that Gs care about labeled SLOs. So languages target different categories for movement and deletion. Moreover there are structure preservation principles that need explaining: XPs move to Max P positions, X’s don’t move and heads target heads.  In a non-labeling theory, it is still unclear why phrases move at all. And the Pied Piping mantra is getting a bit thin after 20 years.  So, not only is there little evidence that the interfaces care about labels, there is non-negligible evidence that CS does. If correct, this strengthens the argument against Chomsky’s approach to projection.
[3] One more aside: I am always wary of explanations that concentrate in interface requirements. We know next to nothing about the interfaces, especially CI, so stories that build on these requirements always seem to me to have a “just so” character.  So, though the logic Chomsky deploys is fine, the premise he needs about BOCs will have little independent motivation. This does not make the claims wrong, but it does make the arguments weak.
[4] If we distinguish selections from the “lexicon” so that two selections of a are distinguished (a vs a’), we can get unboundedly big sets. Bags can be substituted for sets if you don’t like distinguishing different selections of atoms.
[5] Chomsky flirted with this idea in his earlier discussion of “edge features” (EF). As yourself where EFs came from? They were taken as endemic to lexical atoms. It is natural to assume that complexes of such atoms inherited EFs from their atomic parts. Sound familiar?  EFs, Labels? Hmm.  The cognoscenti know that Chomsky abandoned this way of looking at things. This is an attempt to revive this idea by putting it on what might be a more principled basis.
[6] For those who care, this is a grammatical analogue of “clock/modular arithmetic” (see here).
[7] Here’s how Wikipedia describes the process:
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" upon reaching a certain value—the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.
A familiar use of modular arithmetic is in the 12-hour clock, in which the day is divided into two 12-hour periods. If the time is 7:00 now, then 8 hours later it will be 3:00. Usual addition would suggest that the later time should be 7 + 8 = 15, but this is not the answer because clock time "wraps around" every 12 hours; in 12-hour time, there is no "15 o'clock". Likewise, if the clock starts at 12:00 (noon) and 21 hours elapse, then the time will be 9:00 the next day, rather than 33:00. Since the hour number starts over after it reaches 12, this is arithmetic modulo 12. 12 is congruent not only to 12 itself, but also to 0, so the time called "12:00" could also be called "0:00", since 12 is congruent to 0 modulo 12.
[8] Note, that Labels allow one to dispense with Probe/Goal architectures as heads are now visible in “Spec-head” configurations. Not that there is anything “special” about Specs (as opposed to complements or anything else). It’s just that given labels, XPs can combine with YPs even after “first” merge and still allow their heads to “see” each other. This, in fact, is what endocentricity was made to do: put expressions that are not simple heads “next to” each other. And they will be adjacent whether the elements combined are complements or specifiers. Chomsky is right that there is nothing “special” about specifiers. But that’s just as true of complements.