Faculty of Language

Friday, March 21, 2014

Let's pour some oil on the flames: A tale of too simple a story

Olaf K asks in the comments section to this post why I am not impressed with ML accounts of Aux-to-C (AC) in English. Here’s the short answer: proposed “solutions” have misconstrued the problem (both the relevant data and its general shape) and so are largely irrelevant. As this judgment will no doubt seem harsh and “unhelpful” (and probably offend the sensibilities of many (I’m thinking of you GK and BB!!)) I would like to explain why I think that the work as conducted heretofore is not worth the considerable time and effort expended on it. IMO, there is nothing helpful to be said, except maybe STOP!!! Here is the longer story. Readers be warned: this is a long post. So if you want to read it, you might want to get comfortable first.[1]

It’s the best of tales and the worst of tales. What’s ‘it’? The AC story that Chomsky told to explicate the logic of the Poverty of Stimulus (POS) argument.[2] What makes it a great example is its simplicity. To be understood requires no great technical knowledge and so the AC version of the POS is accessible even to those with the barest of abilities to diagram a sentence (a skill no longer imparted in grade school with the demise of Latin).

BTW, I know this from personal experience for I have effectively used AC to illustrate to many undergrads and high school students, to family members and beer swilling companions how looking at the details of English can lead to non-obvious insights into the structure of FL. Thus, AC is a near perfect instrument for initiating curious tyros who into the mysteries of syntax.

Of course, the very simplicity of the argument has its down sides. Jerry Fodor is reputed to have said that all the grief that Chomsky has gotten from “empiricists” dedicated to overturning the POS argument has served him right. That’s what you get (and deserve) for demonstrating the logic of the POS with such a simple straightforward and easily comprehensible case. Of course, what’s a good illustration of the logic of the POS is, at most, the first, not last, word on the issue. And one might have expected professionals interested in the problem to have worked on more than the simple toy presentation. But, one would have been wrong. The toy case, perfectly suitable for illustration of the logic, seems to have completely enchanted the professionals and this is what critics have trained their powerful learning theories on. Moreover, treating this simple example as constituting the “hard” case (rather than a simple illustration), the professionals have repeatedly declared victory over the POS and have confidently concluded that (at most) “simple” learning biases are all we need to acquire Gs. In other words, the toy case that Chomsky used to illustrate the logic of the POS to the uninitiated has become the hard case whose solution would prove rationalist claims about the structure of FL intellectually groundless (if not senseless and bankrupt).

That seems to be the state of play today (as, for example, rehearsed in the comments section of this). This despite the fact that there have been repeated attempts (see here) to explicate the POS logic of the AC argument more fully. That said, let’s run the course one more time. Why? Because, surprisingly, though the AC case is the relatively simple tip of a really massive POS iceberg (c.f. Colin Phillips’ comments here March 19 at 3;47), even this toy case has NOT BEEN ADEQUATELY ADDRESSED BY ITS CRITICS! (see. In particular BPYC dhere for the inadequacies). Let me elaborate by considering what makes the simple story simple and how we might want to round it out for professional consideration.

The AC story goes as follows. We note, first, that AC is a rule of English G. It does not hold in all Gs. Thus we cannot assume that the AC is part of FL/UG, i.e. it must be learned. Ok, how would AC be learned, viz: What is the relevant PLD? Here’s one obvious thing that comes to mind: kids learn the rule by considering its sentential products.[3] What are these? In the simplest case polar questions like those in (1) and their relation to appropriate answers like (2):

(1) a. Can John run

b. Will Mary sing

c. Is Ruth going home

(2) a. John can run

b. Mary will sing

c. Ruth is going home

From these the following rule comes to mind:

(3) To form a polar question: Move the auxiliary to the front. The answer to a polar question is the declarative sentence that results from undoing this movement.[4]

The next step is to complicate matters a tad and ask how well (3) generalizes to other cases, say like those in (4):

(4) John might say that Bill is leaving

The answer is “not that well.” Why? The pesky ‘the’ in (3). In (4), there is a pair of potentially moveable Auxs and so (3) is inoperative as written. The following fix is then considered:

(3’) Move the Aux closest to the front to the front.

This serves to disambiguate which Aux to target in (4) and we can go on. As you all no doubt know, the next question is where the fun begins: what does “closest” mean? How do we measure distance? It can have a linear interpretation: the “leftmost” Aux and, with a little bit of grammatical analysis, we see that it can have a hierarchical interpretation: the “highest” Aux. And now the illustration of the POS logic begins: the data in (1), (2) and (4) cannot choose between these options. If this is representative of what there is in the PLD relevant to AC, then the data accessible to the child cannot choose between (3’) where ‘closest’ means ‘leftmost’ and (3’) where ‘closest’ means ‘highest.’ And this, of course, raises the question of whether there is any fact of the matter here. There is, as the data in (5) shows:

(5) a. The man who is sleeping is happy

b. Is the man who is sleeping happy

c. *Is the man who sleeping is happy

The fact is that we cannot form a polar question like (5c) to which (5a) is the answer and we can form one like (5b) to which (5a) is the answer. This argues for ‘closest’ meaning ‘highest.’ And so, the rule of AC in English is “structure” dependent (as opposed to “linear” dependent) in the simple sense of ‘closest’ being stated in hierarchical, rather than linear, terms.

Furthermore, choice of the hierarchical conception of (3’) is not and cannot be based on the evidence if the examples above are characteristic of the PLD. More specifically, unless examples like (5) are part of the PLD it is unclear how we might distinguish the two options, and we have every reason to think (e.g. based on Childes searches) that sentences like (5b,c) are not part of the PLD. And, if this is all correct, then we have reason for thinking that: (i) that a rule like AC exists in English and whose properties are in part a product of the PLD we find in English (as opposed to Brazilian Portuguese, say) (ii) that AC in English is structure dependent, (iii) that English PLD includes examples like (1), (2) and maybe (4) (though not if we are a degree-0 learners) but not (5) and so we conclude (iv) if AC is structure dependent, then the fact that it is structure dependent is not itself a fact derivable from inspecting the PLD. That’s the simple POS argument.

Now some observations: First, the argument above supports the claim that the right rule is structure dependent. It does not strongly support the conclusion that the right rule is (3’) with ‘closest’ read as ‘highest.’ This is one structure dependent rule among many possible alternatives. All we did above is compare one structure dependent rule and one non-structure dependent rule and argue that the former is better than the latter given these PLD. However, to repeat, there are many structure dependent alternatives.[5] For example, here’s another that bright undergrads often come up with:

(3’’) Move the Aux that is next to the matrix subject to the front

There are many others. Here’s the one that I suspect is closest to the truth:

(3’’) Move Aux

(3’’) moves the correct Aux to the right place using the very simple rule (3’’) in conjunction with general FL constraints. These constraints (e.g. minimality, the Complex NP constraint (viz. bounding/phase theory)) themselves exploit hierarchical rather than linear structural relations and so the broad structure dependence conclusion of the simple argument follows as a very special case.[6] Note, that if this is so, then AC effects are just a special case of Island and Minimality effects. But, if this is correct, it completely changes what an empiricist learning theory alternative to the standard rationalist story needs to “learn.” Specifically, the problem is now one of getting the ML to derive cyclicity and the minimality condition from the PLD, not just partition the class of acceptable and unacceptable AC outputs (i.e. distinguish (5b) from (5c)). I return to a little more discussion of this soon, but first one more observation.

Second, the simple case above uses data like (5) to make the case that the ‘leftmost’ aux cannot be the one that moves. Note that the application of (3’)-‘leftmost’ here yields the unacceptable string (5c). This makes it easy to judge that (3’)-‘leftmost’ cannot be right for the resulting string is clearly unacceptable regardless of what it is intended to mean. However, using this sort of data is just a convenience for we could have reached the exact same conclusion by considering sentences like (6):

(6) a. Eagles that can fly swim

b. Eagles that fly can swim

c. Can eagles that fly swim

(6c) can be answered using (6b) not (6a). The relevant judgment here is not a simple one concerning a string property (i.e. it sounds funny) as it is with (5c). It is rather unacceptability under an interpretation (i.e. this can’t mean that, or, it sounds funny with this meaning). This does not change the logic of the example in any important way, it just uses different data, (viz. the kind of judgment relevant to reaching the conclusions is different).

Berwick, Pietroski, Yankama and Chomsky (BPYC) emphasize that data like (6), what they dub constrained homophony, best describes the kind of data linguists typically use and have exploited since, as Chomsky likes to say, “the earliest days of generative grammar.” Think: flying planes can be dangerous, or I saw the woman with the binoculars, and their disambiguating flying planes is/are dangerous and which binoculars did you see the woman with. At any rate, this implies that the more general version of the AC phenomena is really independent of string acceptability and so any derivation of the phenomenon in learning terms should not obsess over cases like (5c). They are just not that interesting for the POS problem arises in the exact same form even in cases where string acceptability is not a factor.

Let’s return briefly to the first point and then wrap up. The simple discussion concerning how to interpret (3’) is good for illustrating the logic of POS. However, we know that there is something misleading about this way of framing the question. How do we know this? Well, because, the pattern of the data in (5) and (6) is not unique to AC movement. Analogous dependencies (i.e. where some X outside of the relative clause subject relates to some Y inside it) are banned quite generally. Indeed, the basic fact, one, moreover that we all have known about for a very long time, is that nothing can move out of a relative clause subject. For example: BPYC discuss sentences like (7):

(7) Instinctively, eagles that fly swim

(7) is unambiguous, with instinctively necessarily modifying fly rather than swim. This is the same restriction illustrated in (6) with fronted can restricted in its interpretation to the matrix clause. The same facts carry over to examples like (8) and (9) involving Wh questions:

(8) a. Eagles that like to eat like to eat fish

b. Eagles that like to eat fish like to eat

c. What do eagles that like to eat like to eat

(9) a. Eagles that like to eat when they are hungry like to eat

b. Eagles that like to eat like to eat when they are hungry

c. When do eagles that like to eat like to eat

(8a) and (9a) are appropriate answers to (8c) and (9c) but (8b) and (9b) are not. Once again this is the same restriction as in (7) and (6) and (5), though in a slightly different guise. If this is so, then the right answer as to why AC is structure dependent has nothing to do with the rule of AC per se (and so, plausibly, nothing to do with the pattern of AC data). It is part of a far more general motif, the AC data exemplifying a small sliver of a larger generalization. Thus, any account that narrowly concentrates on AC phenomena is simply looking at the wrong thing! To be within the ballpark of the plausible (more pointedly, to be worthy of serious consideration at all), a proffered account must extend to these other cases of as well. That’s the problem in a nutshell.[7]

Why is this important? Because criticisms of the POS have exclusively focused on the toy example that Chomsky originally put forward to illustrate the logic of POS. As noted, Chomsky’s original simple discussion more than suffices to motivate the conclusion that G rules are structure dependent and that this structure dependence is very unlikely to be a fact traceable to patterns in the PLD. But the proposal put forward was not intended to be an analysis of ACs, but a demonstration of the logic of the POS using ACs as an accessible database. It’s very clear that the pattern attested in polar questions extends to many other constructions and a real account of what is going on in ACs needs to explain these other data as well. Suffice it to say, most critiques of the original Chomsky discussion completely miss this. Consequently, they are of almost no interest.

Let me state this more baldly: even were some proposed ML able to learn to distinguish (5c) from other sentences like it (which, btw, seems currently not to be the case), the problem is not just with (5c) but sentences very much like it that are string kosher (like (6)). And even were they able to accommodate (6) (which so far as I know, they currently cannot) there is still the far larger problem of generalizing to cases like (7)-(9). Structure dependence is pervasive, AC being just one illustration. What we want is clearly an account where these phenomena swing together; AC, Adjunct WH movement, Argument Wh Movement, Adverb fronting, and much much more.[8] Given this, the standard empiricist learning proposals for AC are trying (and failing) to solve the wrong problem, and this is why they are a waste of time. What’s the right problem? Here’s one: show how to “learn” the minimality principle or Subjacency/Barriers/Phase theory from PLD alone. Now, were that possible, that would be interesting. Good luck.

Many will find my conclusion (and tone) harsh and overheated. After all isn’t it worth trying to see if some ML account can learn to distinguish good from bad polar questions using string input? IMO, no. Or more precisely, even were this done, it would not shed any light on how humans acquire AC. The critics have simply misunderstood the problem; the relevant data, the general structure of the phenomenon and the kind of learning account that is required. If I were in a charitable mood, I might blame this on Chomsky. But really, it’s not his fault. Who would have thought that a simple illustrative example aimed at a general audience should have so captured the imagination of his professional critics! The most I am willing to say is that maybe Fodor is right and that Chomsky should never have given a simple illustration of the POS at all. Maybe he should in fact be banned from addressing the uninitiated altogether or only if proper warning labels are placed on his popular works.

So, to end: why am I not impressed by empiricist discussions of AC? Because I see no reason to think that this work has yielded or ever will yield any interesting insights to the problems that Chomsky’s original informal POS discussion was intended to highlight.[9] The empiricist efforts have focused on the wrong data to solve the wrong problem. I have a general methodological principle, which I believe I have mentioned before: those things not worth doing are not worth doing well. What POS’s empiricist critics have done up to this point is not worth doing. Hence, I am, when in a good mood, not impressed. You shouldn’t be either.

[1] One point before getting down and dirty: what follows is not at all original with me (though feel free to credit me exclusively). I am repeating in a less polite way many of the things that have been said before. For my money, the best current careful discussion of these issues is in Berwick, Pietroski, Yankama and Chomsky (see link to this below). For an excellent sketch on the history of the debate with some discussion of some recent purported problems with the POS arguments, see this handout by Howard Lasnik and Juan Uriagereka.

[2] I believe (actually I know, thx Howard) that the case is first discussed in detail in Language and Mind (L&M) (1968:61-63). The argument form is briefly discussed in Aspects (55-56), but without attendant examples. The first discussion with some relevant examples is L&M. The argument gets further elaborated in Reflections on Language (RL) and Rules and Representations (RR) with the good and bad examples standardly discussed making their way prominently into view. I think that it is fair to say that the Chomsky “analysis” (btw, these are scare quotes) that has formed the basis of all of the subsequent technical discussion and criticism is first mooted in L&M and then elaborated in his other books aimed at popular audiences. Though the stuff in these popular books is wonderful, it is not LGB, Aspects, the Black Book, On Wh movement, or Conditions on transformations. The arguments presented in L&M, RL and RR are intended as sketches to elucidate central ideas. They are not fully developed analyses, nor, I believe, were they intended to be. Keep this in mind as we proceed.

[3] Of course, not sentences, but utterances thereof, but I abstract from this nicety here.

[4] Those who have gone through this know that the notion ‘Aux’ does not come tripping off the tongue of the uninitiated. Maybe ‘helping verb,’ but often not even this. Also, ‘move’ can be replaced with ‘put’ ‘reorder’ etc. If one has an inquisitive group, some smart ass will ask about sentences like ‘Did Bill eat lunch’ and ask questions about where the ‘did’ came from. At this point, you usually say (with an interior smile), to be patient and that all will be revealed anon.

[5] And many non-structure dependent alternatives, though I leave these aside here.

[6] Minimality suffices to block (4) where the embedded Aux moves to the matrix C. The CNPC suffices to block (5c). See below for much more discussion.

[7] BTW, none of this is original with me here. This is part of BPYC’s general critique.

[8] Indeed, every case of A’-movement will swing the same way. For example: in It’s fresh fish that eagles that like to eat like to eat, the focused fresh fish is complement of the matrix eat not the one inside the RC.

[9] Let me add one caveat: I am inclined to think that ML might be useful in studying language acquisition combined with a theory of FL/UG. Chomsky’s discussion in Chapter 1 of Aspects still looks to me very much like what a modern Bayesian theory with rich priors and a delimited hypothesis space might look like. Matching Gs to PLD even given this, does not look to me like a trivial task (and work by those like Yang, Fodor, Berwick) strike me as trying to address this problem. This, however, is very different from the kind of work criticized here, where the aim has been to bury UG not to use it. This has been a both a failure and, IMO, a waste of time.

Monday, March 17, 2014

Hoisted from the comments

Dennis O linked to this paper by Sascha Felix. It provides a thumbnail history of GG, the relation between descriptive and explanatory motives for language study and the effects that MP had on this. His description strikes me as more or less correct. I have one quibble: As I've noted before, MP is not really an alternative to GB and the problem of explanatory adequacy does not go away because we are interested in going beyond explanatory adequacy. The way I think we should look at things is that GB is the "effective theory" and MP aims to provide a more fundamental account of its generalizations. This said, I think Felix has largely identified the cleavage in syntax today and he rightly has noted that what made the field exciting for many occupies a very very small part of what constitutes current research. I have been won't to describe this as saying that Greenberg won. At its best, the current enterprise aims to find the patterns and describe them. The larger vision, the one that makes linguistics interesting for Chomsky, Felix and me is largely a marginal activity. As I noted in an earlier post, I believe that this will have baleful effects for the discipline as a whole unless we witness the resurgence of an interest in philology by the culture as a whole. Don't bet on it.

Friday, March 14, 2014

Hornstein's lament

Chomsky has long noted the tension between description and explanation. One methodological aim of the early Minimalist Program (MP) was to focus attention on this tension so as to nudge theories in a more explanatory direction.[1] The most important hygienic observation of early MP was that many proffered analyses within GB lacked explanatory power for they were as complex as the data they intended to “explain.” Early MP aimed to sharpen our appreciation of the difference between (re)description and explanation and to encourage us to question the utility of proposals that serve mainly to extend the descriptive range of our technical apparatus.

How well has this early MP goal been realized? IMO, not particularly well. I have had an opportunity of late to read more widely in the literature than I generally do and from this brief foray into the outside world it looks to me that the overriding imperative of the bulk of current syntax research is descriptive. Indeed, the explanatory urge, weak as it has been in the past, appears to me now largely non-existent. Here’s what I mean.

The papers I’ve read have roughly the following kinds of structure:

1. Phenomenon X has been analyzed in two different ways; W1 and W2. In this paper I provide evidence that both W1 and W2 are required.

2. Principle P forbids structures like S. This paper argues that phenomenon X shows that P must be weakened to P’ and/or that an additional principle P’’ is required to handle the concomitant over-generation.

3. Principle P prohibits S. Language L exhibits S. To reconcile P with S in L we augment the features in L with F, which allows L to evade P wrt S.

In each instance, the imperative to cover the relevant data points has been paramount. The explanatory costs of doing so are largely unacknowledged. Let me put this a different way. All linguists agree that ceteris paribus simpler theories are preferable to more complex ones (i.e. Ockham shaves us all!). The question is: What makes things not equal? Here is where the trade-off between explanation and description plays out.

MP considerations urge us to live with a few recalcitrant data points rather than weaken our explanatory standards.[2] The descriptive impulse urges us to weaken our theory to “capture” the strays. There is no right or wrong move in these circumstances. Which way to jump is a matter of judgment. The calculus requires balancing our explanatory and descriptive urges. My reading of the literature is that right now, the latter almost completely dominate. In other words, much (most?) current research is always ready to sacrifice explanation in service of data coverage. Why?

I can think of several reasons for this.

First, I think that the Chomsky program for linguistics has never been widely endorsed by the linguistic community. To be more precise, whereas Chomskian technology has generally garnered an enthusiastic following, the larger cognitive/bio-linguistic program that this technology was in service of, has been warily embraced, if at all. How many times have you seen someone question a technical innovation because it would serve to make a solution to Plato’s or Darwin’s problem more difficult? How many papers have you read lately that aim to reduce rather than expand the number of operative principles in FL/UG? More pointedly, how often do we insist that students be able to construct Poverty of Stimulus (POS) arguments? Indeed, if my experience is anything to go by, the ability to construct a POS is not considered a necessary part of a linguist’s education. Not surprisingly, asking whether a given phenomenon or proposal raises “learnability” issues at all is generally not part of evaluative calculus. The main concerns revolve around how to deploy/reconfigure the given technology to “capture the facts.”[3]

Second, most linguists take their object of study to be language not the faculty of language. Sophisticates take the goal of linguistics to be the discovery of grammatical patterns. This contrasts with the view that the goal of linguistics is to uncover the basic architecture of FL. I have previously dubbed the first group languists and the second linguists. There is no question that languists and linguists can happily co-exist and that the work of each can be beneficial to the other.[4] However, it is important to appreciate that the two groups (the first being very much bigger than the second) are engaged in different enterprises. The grandfather of languistics is Greenberg, not Chomsky. The goal of languistics is essentially descriptive, the prize going to the most thorough descriptions of the most languages. Its slogan might be no language left behind.

Linguists are far more opportunistic. The description of different languages is not a goal in itself. It is valuable precisely to the degree that it sheds light on novel mechanisms and organizing principles of FL. The languistic impulse is that working on a new or understudied language is ipso facto valuable. Not so the linguistic one. Linguists note that biology has come a long way studying essentially three organisms: mice, e-coli and fruit flies. There is nothing wrong in studying other organisms, but there is nothing particularly right about it either. Why? Because the aim is to uncover the underlying biological mechanisms, and if these do not radically differ across organisms then myopically focusing on three is a fine way to proceed. Why assume that the study of FL will be any different? Don’t get me wrong here: studying other organisms/languages might be useful. What is rejected is the presupposition that this is inherently worthwhile.

Together, the above two impulses (actually two faces of the same one) reflect a deeper belief that the Chomsky program is more philosophy than science. This reflects a more general view that theoretical speculation is largely flocculent, whereas description is hard and grounded. You must know what I am about to say next: this reflects the age-old difference between empiricist and rationalist approaches to naturalistic explanation. Empiricism has always suspected theoretical speculation, conceiving of theory in largely instrumental terms. For a rationalist what’s real are the underlying mechanisms, the data being complex manifestations of the interacting more primitive operations. For the empiricist, the data are real, the theoretical constructs being bleached averages of the data, with their main virtue being concision. Chomsky’s program really is Rationalist (both in its psychology and its scientific methodology). The languistic impulse is empiricist. Given that Empiricism is the default “scientific” position (Lila Gleitman once remarked that it is likely innate), it is not surprising that explanation almost always gives way to description, for the former has not independent existence from the latter.

So what to do? I don’t know. What I suspect, however, is that the rise of languistics is not good for the future of the field. To the degree that it dominates the discipline it cuts Linguistics off from the more vibrant parts of the wider intellectual scene, especially cognitive and neuro-science. Philology is not a 21^st century growth industry. Linguistics won’t be either if it fails to make room for the explanatory impulse.

[1] MP had both a methodological and a substantive set of ambitions. A good part of the original 93 paper was concerned with the former in the context of GB style theories.

[2] Living with them does not entail ignoring or forgetting about them. Most theories, even very good ones, have a stable of problem cases. Collecting and cataloguing them is worthwhile even if theory is insulated from them.

[3] This is evident in the review process as well. One of the virtues of NELs and WCCFL papers is that they are short and their main ideas easy to discern. By the time a paper is expanded for publication in a “real” journal it has bloated so much that often the main point is obscured. Why the bloat? In part this is a response to the adept reviewer’s empirical observations and scrambling by the author to cover the unruly data point. This is often done by adding some ad hoc excrescence that it takes pages and pages to defend and elaborate. Wouldn’t the world be a better place if the author simply noted the problem and went on leaving the general theory/proposal untouched? I think it would. But this would require judging the interest of the general proposal, i.e. the theory would have to be evaluated independently of whether it covered every last data point.

[4] IMO, one of the reasons for GG’s success was that it allowed the two to live harmoniously without getting into disputes. MP sharpens the differences between languists and linguists and this may in part explain its more luke warm embrace.

Monday, March 10, 2014

From Derivations to Outputs: Transfer in Minimalist Grammars

Things have been a little quiet on the Minimalist grammar front around here recently (February and March are paper writing season for me). But fortunately Greg Kobele whipped up an amazing intro to the MG conception of Transfer that you should all check out. It starts out with the very basics of Minimalist derivations, which I also discussed in two earlier posts. But then it moves to some really juicy material, namely how derivations can be mapped to surface strings and semantic interpretations without having to construct some kind of phrase structure tree first.

Two pieces of general interest

Here are two things that I read recently that might be of general interest to the cognitively inclined.

This piece here discusses sounds can exploit parts of the visual cortex for higher level object representations. In other words, as one of the authors puts it: the results strongly suggest that "the brain is a task machine, not a sensory machine." The authors come very close to endorsing a strong modularity thesis in which the brain is organized into task oriented modules, language specifically being identified as one such. It seems, in other words, that the idea of modular specificity of he cognitive kind that some linguists have been proposing for quite a while is finally making its way into the neuro literature as well. Note too, that this suggests modular organization of the Chomsky, rather than Fodor variety: it is not simply input systems that are task organized, but aspects of higher level cognition s well that are task specific yet relatively independent of the form of the input.
Bob Berwick sent along this discussion of a piece in Science by Bae et. al. that indicates that the folds in Broca's area are caused by the operation of a single gene. Think of this along the following train of reasoning: Broca's area is the "language" area, brian folding changes the cognitive powers of brains, so one gene can have the effect of changing the cognitive power of a brain. Relate this to Chomsky's old conjecture that a single genetic change could underlie the emergence of language. This was once considered to risible to be taken seriously. It now seems that rather dramatic and potentially cognitively significant changes in brain folding can be traced back to the operations of a single gene. Is Chomsky's hunch still so absurd? Or is it time to admit that we know very little about how brains secrete thought and that maybe we are in no position to know what is absurd and what is not.
Randy Gallistel pointed this article in the NYT from this past Sunday. It is the phenomenological report of a stroke patient wherein he reports the impression of being unable to linguistically express entertained thoughts. This suggests, as Randy pointed out, that there are non-linguistic thoughts that linguistic expressions realize. Question: what are these and what properties do they have? Lila observed that this report is similar to the phenomenology described by Helen Keller prior to her launch into language. At any rate, it makes for some fascinating reading and hints at an important distinction between linguistic versus non-linguistic meaning and thought. In a minimalist setting it also raises the question of what the value added of language is. Recall, that Chomsky (following F. Jacob) suggested that the onset of language increased cognitive function (as opposed to adding a communicative element). If there is already a conceptual system in place, how does it do so?

Wednesday, March 5, 2014

Minimalism's program and its theories

Is there any minimalist theory and if so what are its main tenets? I ask this because I have recently been reading a slew of papers that appear to treat minimalism as a theoretical “framework,” and this suggests that there are distinctive theoretical commitments that minimalist analyses make that render them minimalist (just as there are distinctive assumptions that make a theory GBish, see below). What are these and what makes these commitments “minimalist”? I ask this for it starts to address a “worry” (not really, as I don’t worry much abut these things) that I’ve been thinking about for a while, the distinction between a program and a theory. Here’s what I’ve been thinking.

Minimalism debuted in 1993 as a program, in Chomsky’s eponymous paper. There is some debate as to whether chapter 3 of the “Black Book” (BB) was really the start of the Minimalist Program (MP) or whether there were already substantial hints about the nature of MP earlier on (e.g. in what became chapters 1 and 2 of BB). What is clear is that by 1993, there existed a self-conscious effort to identify a set of minimalist themes and to explore these in systematic ways. These tropes divided into at least two kinds, when seen in retrospect.[1]

First, there was the methodological motif. MP was a call to critically re-examine the theoretical commitments of earlier theory, in particular GB.[2] The idea was to try and concretize methodological nostrums like “simple, elegant, natural theories are best” in the context of then extant syntactic theory. Surprisingly (at least to me)[3], Chomsky showed that these considerations could have a pretty sharp bite in the context of mid 90s theory. I still consider Chomsky’s critical analysis of the GB levels as the paradigm example of methodological minimalism. The paper shows how conceptual considerations impose different burdens of proof wrt the different postulated levels. Levels like PF (which interface with the sound systems (AP)) or LF (which interfaces with the belief (CI) systems) need jump no empirical hurdles (they are “virtually conceptually necessary”) in contrast to internal levels like DS and SS (which require considerable empirical justification). In this instance, methodological minimalism rests on the observation that whereas holding that grammars interface with sound and meaning is a truism (and has been since the dawn of time), postulating grammar internal levels is anything but. From this it trivially follows that any theory that postulates levels analogous to DS and SS faces a high burden of proof. This reasoning is just the linguistic version of the anodyne observation that saying anything scientifically non-trivial requires decent evidence. In effect, it is the observation that PF and LF are very uninteresting levels while DS and SS are interesting indeed.

One of the marvels, IMO, of these methodological considerations is that they led rather quickly to a total reconfiguration of UG (eliminating DS and SS from UG is a significant theoretical step) and induced a general suspicion of grammar internal constructs beyond the suspect levels. In addition to DS and SS the 93 paper cast aspersions on traces (replacing them with copies), introduced feature checking, and suggested that government was a very artificial primitive relation whose central role in the theory of grammar called for serious reconsideration.

These themes are more fully developed in BB’s chapter 4, but the general argumentative outlines are similar to what we find in chapter 3. For example, the reasoning developing Bare Phrase Structure has a very similar structure to that concerning the elimination of DS/SS. It starts with the observation that any theory of grammar must have a combination operation (merge) and then goes on to outline what is the least we must assume concerning the properties of such an operation given widely accepted facts about linguistic structures. The minimal properties require little justification. Departures from them do. The trick is to see how far we can get making only anodyne assumptions (e.g. grammars interface with CI/AP, grammars involve very simple rules of combination) and then requiring that what goes beyond the trivial be well supported before being accepted. So far as I can see, there should be nothing controversial in this form of argument or the burdens it places on theory, though there has been, and continues to be, reasonable controversy about how to apply it in particular cases.[4]

However, truth be told, methodological minimalism is better at raising concerns than delivering theory to meet them. So, for example, a grammar with Merge alone is pretty meager. Thus, to support standard grammatical investigation, minimalists have added technology that supplements the skimpy machinery that methodological minimalism motivates.

A prime example of such is the slew of locality conditions minimalists have adopted (e.g. minimality and phase impenetrability) and the feature inventories and procedures for checking them (Spec-X⁰, AGREE via probe-goal) that have been explored. Locality conditions are tough to motivate on methodological grounds. Indeed, there is a good sense in which grammars that include locality conditions of various kinds and features of various flavors licensed by different feature checking operations are less simple than those that eschew these. However, to be even mildly empirically adequate any theory of grammar will need substantive locality conditions of some kind. Minimalists have tried to motivate them on computational rather methodological grounds. In particular, minimalists have assumed that bounding the domain of applicable operations is a virtue in a computational system (like a grammar) and so locality conditions of some variety are to be expected to be part of UG. The details, however, are very much open to discussion and require empirical justification.

Let me stress this. I have suggested above that there are some minimalist moves that are methodological defaults (e.g. no DS/SS, copies versus traces, some version of merge). The bulk of current minimalist technology, however, does not fall under this rubric. It’s chief motivations are computational and empirical. And here is where we move from minimalism as program to minimalism as theory. Phase theory, for example, does not enjoy the methodological privileges of the copy theory. The latter is the minimal way of coding for the evident existence of non-local dependencies. The former is motivated (at best) in terms of the general virtues of local domains in a computational context and the specific empirical virtues of phase based notions of locality. Phase Theory moves us from the anodyne to the very interesting indeed. It moves us from program to theory, or, more accurately, theories, for there are many ways to realize the empirical and computational goals that motivate phases.

Consider an example, e.g. choosing between a minimalist theory that includes the first more local version of the phase impenetrability condition (PIC1) or the second more expansive one (PIC2). The latter is currently favored because it fits better with a probe-goal technology given data like inverse nominative agreement in Icelandic quirky case clauses. But this is hardly the only technology available and so the decision in favor of this version of the PIC is motivated neither on general methodological nor broadly computational ones. It really is an entirely empirical matter: how well does the specific proposal handle the relevant data? In other words, lots of current phase theory is only tangentially related to the larger minimalist themes that motivate the minimalist program. And this is true for much (maybe most) of what gets currently discussed under the rubric of minimalism.

Now, you may conclude from the above that I take this to be a problem. I don’t. What may be problematic is that practitioners of the minimalist art appear to me not to recognize the difference between these different kinds of considerations. So for example, current minimalism seems to take Phases, PIC2, AGREE under Probe-Goal, and Multiple Spell Out (MSO) as defining features of minimalist syntax. A good chunk of current work consists in tweaking these assumptions (which heads are phases?, is there multiple agree?, must probes be phase heads?, are the heads relevant to AP MSO identical to CI MSO?, etc.) in response to one or another recalcitrant data set. Despite this, there is relatively little discussion (I know of virtually none) of how these assumptions relate to more general minimalist themes, or indeed to any minimalist considerations. Indeed, from where I sit, though the above are thought of as quintessentially minimalist problems, it is completely unclear to me how (or even if) they relate to the any of the features that originally motivated the minimalist program, be they methodological, conceptual or computational. Lots of the technology in use today by those working in the minimalist “framework” is different from what was standard in GB (though lots only looks different, phase theory, for example, being virtually isomorphic to classical subjacency theory), but modulo the technology, the proposals having nothing distinctively minimalist about them. This is not a criticism of the research, for there can be lots of excellent work that is orthogonal to minimalist concerns. However, identifying minimalist research with the particular technical questions that arise from a very specific syntactic technology can serve to insulate current syntactic practice from precisely those larger conceptual and methodological concerns that motivated the minimalist program at the outset.

Let me put this another way: one of the most salutary features of early minimalism is that it encouraged us to carefully consider our assumptions. Very general assumptions led us to reconsider the organization of the grammar in terms of four special levels and reject at least two and maybe all level organized conceptions of UG. It led us to rethink the core properties of phrase structure and the relation of phrase structure operations to displacement rules. It lead us to appreciate the virtues of the unification of the modules (on both methodological and Darwin’s Problem grounds) and to replace traces (and, for some (moi), PRO) with copies. It led us to consider treating all long distance dependencies regardless of their morphological surface manifestations in terms of the same basic operations. These moves were motivated by a combination of considerations. In the early days, minimalism had a very high regard for the effort of clarifying the proferred explanatory details. This was extremely salutary and, IMO, it has been pretty much lost. I suspect that part of the reason for this has been the failure to distinguish the general broad concerns of the minimalist program from the specific technical features of different minimalist theories, thus obscuring the minimalist roots of our theoretical constructs.[5]

Let me end on a slightly different note. Programs are not true of false. Theories are. Our aim is to find out how FL is organized, i.e. we want to find out the truth about FL. MP is a step forward if it helps promote good theories. IMO, it has. But part of minimalism’s charm has been to get us to see the variety of arguments we can and should deploy and how to weight them. One aim is to isolate the distinctive minimalist ideas from the others, e.g. the more empirically motivated assumptions. To evaluate the minimalist program we want to investigate minimalist theories that build on its leading ideas. One way of clarifying what is distinctively minimalist might be by using GB as a point of comparison. Contrasting minimalist proposals with their GBish counterparts would allow us to isolate the distinctive features of each. In the early days, this was standard procedure (look at BB’s Chapter 3!). Now this is rarely done. I suggest we start re-integrating the question “what would GB say” (WWGBS) back into our research methods (here) so as to evaluate how and how much minimalist considerations actually drive current theory. Here’s my hunch: much less than the widespread adoption of the minimalist “framework” might lead you to expect.

[1] Actually there is a third: in addition to methodological and computational motifs there exists evolutionary considerations stemming from Darwin’s Problem. I won’t discuss these here.

[2] Such methodological minimalism could be applied to any theory. Not surprisingly, Chomsky’s efforts were directed at GB, but his methodological considerations could apply to virtually any extant approach.

[3] A bit of confession: I originally reacted quite negatively to the 93 paper, thinking that it could not possibly be either true or reasonable. What changed my mind was an invitation to teach a winter course in the Netherlands on syntactic theory during the winter of 93. I had the impression that my reaction was the norm, so I decided to dedicate the two weeks I was teaching to defending the nascent minimalist viewpoint. Doing this convinced me that there was a lot more to the basic idea than I had thought. What really surprised me is that taking the central tenets even moderately seriously led to entirely novel ways of approaching old phenomena, including ACD constructions, multiple interrogation/ superiority, and QR. Moreover, these alternative approaches, though possibly incorrect were not obviously incorrect and they were different. To discover that the minimalist point of view could prove so fecund given what appear to be such bare bones assumptions, still strikes me as nothing short of miraculous.

[4] As readers may know, I have tried to deploy similar considerations in the domains of control and binding. This has proven to be very controversial but, IMO, not because of the argument form deployed but due to different judgments concerning the empirical consequences. Some find the evidence in favor of grammar internal formatives like PRO to meet the burden of proof requirement outlined above. Some do not. That’s a fine minimalist debate.

[5] I further suspect that the field as a whole has tacitly come to the conclusion that MP was actually not a very good idea, but this is a topic for another post.

Monday, March 3, 2014

Brenner interview

Ellen Lau send me this link to an interview with Sydney Brenner, one of the giants of modern molecular biology and a current skeptic regarding the scientific virtues of Big Data (here). In this interview, Brenner does not discuss the latter, but he has some interesting things to say about the modern academic setting as compared to days of yore. Here are some things that caught my eye.

Like many, he finds problems with the current academic environment, and finds it a particularly hostile environment for the development of new ideas. It is interesting to read how hostile the bio establishment was to work that has since become the reigning paradigm.
It is also interesting to read how people who changed the field (e.g. Frederick Sanger, who revolutionized the field not once, but twice) would have trouble today getting grants or tenure. The need to publish frequently in the "best journals" would have hindered Sanger's ability to make the huge contributions that he did.
I also enjoyed his praise of ignorance, which he ties with originality. For Brenner, one of the advantages of youth is the ignorance of the field and cherished methods that generally graces it. This is what makes it easier for "kids" to propose big ideas and challenge settled opinion. Remember this when old farts (like me) go on and on about how ignorant the newer breed of linguists are and how they don't know squat about past results. A good part of innovation involves ignoring what's come before and reinventing (and caricaturing) received wisdom.
I also particularly liked Brenner's jaundiced observations concerning science bureaucracies, whose aim is to make research accountable. He sees them as blinkered at best and "corrupt" at worst. At the very least, even honest peer review has the tendency to filter out the unusual (original?) and promote the scientific consensus.
Brenner has an interesting discussion on how to alleviate some of the problems he notes with the current climate. See his discussion of "casino funds" and his, IMO, very interesting idea that it be research groups rather than individuals whose research gets assessed for funding.
When Brenner discusses his early days, there is a palpable sense of both excitement and fun surrounding work at the LMB (Lab for molecular bio). It is also clear that this made it possible for new and exciting work to get done despite the displeasure of the old guard. The fun factor is often overlooked in graduate education. Our desires to professionalize our charges and our emphasis on hard work can have the effect of making the whole thing boring and pedantic. It's nice to see someone note the intellectual cost of doing so.

Brenner's observations about the research atmosphere is similar to my own take, though his views are probably much better informed than mine. As reader know, I have my issues with both the journals and the funding agencies. However, it strikes me that the atmospherics in linguistics is similar to what Brenner highlights, with one addition. Linguistics suffers from an additional problem, not likely to be as serious in biology, namely, we have far less money sloshing around our discipline. This ratchets up the pressures that Brenner identifies in biology, which can have the consequence of raising anxiety levels which is a sure way of making everything less fun and the pursuit of originality more hazardous. It's not clear what can be done about this sadly. Nonetheless, it behooves old farts (like me) to keep this in mind when guiding and judging their younger colleagues. We lived in more forgiving times. This made it possible for us to waste our time thinking about things that might not pan out and to think about the big issues that made all the detail work exciting. I suspect that what we did then would now be judged indulgent. Too bad, for as Brenner points out this is where the new ideas come from.