Sunday, May 31, 2015

The road ahead from Athens; some scattered initial remarks

I again mis-spelled someone's name. I "called" Winnie, "Wini." It is now corrected and I am sorry. However, the misspelling allows me to once again thank Winnie for all his great work in getting the Athens gig going. Thx and sorry.


I am currently sitting on the 6th floor of the Athen’s Way Hotel, eating some fruit and yogurt and sipping a cup of tea. It’s a beautiful day. The Road Ahead Conference (see here) ended yesterday and I thought that I would jot down some quick comments.  Before doing so, let me take this opportunity to thank the organizers (Winnie Lechner, Marcel den Dikken, Terje Londahl, Artemis Alexiadou and Peter Svenonius) for a wonderful event. It’s been a blast and my most secure conclusion from the past three days is that if you ever get invited anywhere to do anything by any of these people GO! Thx, and I am sure that here I am not speaking just for myself but for all participants. That pleasant task out of the way, here are a few impressions. In this post I’ll talk about very general matters. In some follow up posts I’ll remark on some of the socio-political issues raised and what we might do to address them. And in some yet later posts I’ll discuss some of the stimulating questions raised and ideas mooted. So let’s start.

My overall impression, one that runs quite counter to some of the pessimism I often hear from colleagues about the state of contemporary syntax, is that intellectually speaking, we are in a golden age of syntax (though politically and sociologically, not so much (I return to this in later posts)).  What do I mean? Well, the organizers invited a very talented group of people (present company excluded, of course) doing a pretty wide cross section of syntactic work. What is very clear is that there is a huge amount of excellent work being done on a wide variety of languages on a wide variety of topics. Typology and morpho-syntax are particularly hot areas of current interest, but syntacticians (in the wide sense meaning those with a good knowledge of syntactic theory and methods) are also heavily involved in “conjunctive” areas such as syntax + language acquisition, syntax + language impairment/disorders, syntax + processing, a.o. As a result, we now know more about the details and overall architectures of more grammars and have better models of how this grammatical knowledge arises and is put to use in more areas than ever before.  Don’t get me wrong: it is clear to all that there is a tremendous amount that we still do not know, even about fundamental issues, but to someone like me, who has been doing this stuff (or at least listening to people who do this stuff) for the last 40+ years, it is clear how much we have learned, and it is very very impressive.

Furthermore, it is also clear that we live in a time where all kinds of syntactic work can be fruitfully pursued. What do I mean by “all”?  Well, there are roughly three kinds of syntactic investigations: (i) there is work on the structure of particular Gs, (ii) there is work on the structure of UGs based on work of particular Gs and (iii) there is work on the structure of FL/UG based on the particulars of UG. (i) aims at careful description of the Gs of a given language (e.g. how does agreement work in L, how are RCs formed? How does binding work?). (ii) aims to find the features that delimit the range of possible Gs in part by distilling out the common features of particular Gs. (iii) aims to simplify UG in part by unifying the principles discovered by (ii) and in part by relating/reducing/unifying these features with more general features of cognition and computation.  All three kinds of work are important and valuable. And though I want to plead specially for (iii) towards the end, I want to be very very very clear that this is NOT because I disvalue (i) or (ii) or that I think that (iii) like work is inherently better than the others. I don’t and never did. My main take-away from Athens is that there has never been a better time to do all three kinds of work. I will return to a discussion of (iii) for I believe the possibility of doing it fruitfully is somewhat of a novelty and the field has not entirely understood how to accommodate it. But I return to this. First the other two.

Let’s start with (i). If your heart gravitates towards descriptive endeavors, there are many many entirely unexplored languages out there waiting for your skills, and there are now well developed methods and paradigms ready for you to apply (and modify). Indeed, one of the obvious advances GG has made has been to provide a remarkably subtle and powerful set of descriptive techniques (based of course on plenty of earlier theory) for uncovering linguistically (i.e. grammatically) significant phenomena. Typologists who fail to avail themselves of this technology are simply going to mis-describe the individual languages they investigate,[1] let alone have nothing to say about the more general issues relating to the variety of structures that natural languages display.[2]

Similarly if your interests are typological you are also in luck. Though there are many more language families to investigate, many have been looked at in great detail and Generative linguists have made a very good start at limning the structural generalizations that cut across them. We now have mapped out more properties of the grammar of case and agreement (at both the specific and general levels) of more languages than ever before. We have even begun to articulate solid mid level generalizations that cut across wide swaths of these languages (and even some language families) enabling a more sophisticated exploration of the parametric options that UG makes available. To me, someone interested in these concerns but not an active participant in the process, the theoretical speculations seemed extremely exciting (especially those linking parameter theory to language change and language acquisition). And though the trading relation between micro vs macro variation has not yet been entirely sorted out (and this impression may be an optimistic appraisal of the discussions I (over)heard) it is pretty clear that there is a thoughtful research agenda about how to proceed to attack these big theoretical issues. So, Generative Syntax (including here morpho-syntax) in this domain is doing very very well. 

Concerning (ii): A very nice outcome of the Athens get-together was the wide consensus regarding what Amy Rose Deal dubbed “Mid-Level Generalizations” (MLG). I have frequently referred to these as the findings of GB, but I think that MLGs is a better moniker for it recognizes that these generalizations are not the exclusive property of any specific research tradition. So though I have tried to indicate that I consider the differences between GB and LFG and HPSG and TAG and… to have been more notational than notional, I adopt ARD’s suggestion that it is better to adopt a more neutral naming convention so as to be more inclusive (does this count as PC?). So from now on, MLGs it is!  At any rate, there is a pretty broad consensus among syntacticians about what these MLGs are (see here for a partial enumeration, in, ahem GB terms). And this consensus (based on the discovery of these MLGs) has made possible a third kind of syntactic investigation, what I would dub pure theoretical syntax (PTS).[3] Now, I don’t want to raise hackles with this term, I just need a way of distinguishing this kind of work from other things we call “theory.” I do not mean to praise this kind of “theory” and demean the other. I really want to just make room for the enterprise mentioned in (iii) by identifying it.

So what is PTS? It is directed at unifying the MLGs. The great example of this in the GB tradition is Chomsky’s unification of Ross’s island effects in “On Wh Movement” (OWM).[4] Chomsky’s project was made feasible by Ross’s discoveries of the PLGs we (after him) call “islands.” Chomsky showed how to treat these apparently disparate configurations as instances of the same underlying system and in the process removed the notion of construction from the fundamental inventory of UG. This theoretical achievement in unification led to the discovery of successive cyclicity effects, to the discovery that adjuncts were different from arguments (both in how they move and how porous they are) and to the discovery of novel locality effects (the ECP and CED in particular).

Someone at the conference (I cannot recall who) mentioned that Chomsky’s work here was apparently un-motivated by empirical concerns. There is a sense in which I believe this to be correct, and one in which it is not. It is incorrect in that Ross’s islands, which were the target of Chomsky’s efforts at unification, are MLGs and so based on a rich set of language specific data (e.g. *Who did you meet someone who likes) which hold in a variety of languages. However, in another sense it was not. In particular, Chomsky did not aim to address novel particular data beyond Ross’s islands. In other words, the achievement in OWM was the unification itself. Chomsky did not further argue that this unification got us novel new data in, say, Hungarian. Of course others went on to show that whatever Chomsky wanted to do the unification had empirical legs. Indeed, the whole successive cyclicity industry starting with Kayne and Pollock on stylistic inversion and proceeding through McCloskey’s work on Irish, Chung’s on Chamorro, Torrego’s on Spanish and many many others was based on this unification. However, Chomsky’s work was theoretical in that its main concern was to provide a theory of Ross’s MLGs and little (sic!) more.

Indeed, one can go a little further here. Chomsky’s unification had an odd consequence in the context of Ross’s work. It proposed a “new” island that Ross himself extensively argued against the existence of. I am talking, of course, about Wh-islands, which Ross found to be highly acceptable, especially if the clausal complement was non-finite. Chomsky’s theory had to include these in his inventory despite Ross’s quite reasonable empirical demurrals (voiced regularly ever since) because they followed from the unification.[5] So, it is arguable (or was at the time), that Chomsky’s unification was less empirically adequate than Ross’s description of islands. Nor was this the only empirical stumble that the unification arguably suffered. It also predicted (at least in its basic form) that who did you read a book about is ungrammatical, which flies in the face of its evident acceptability.  We now know that many of these stumbles led to interesting further research (Rizzi on parameters for example), but it is worth noting that the paper, now a deserved classic, did not emerge without evident empirical problems.

Why is it worth noting this? Because it demonstrates a typical feature of theoretical work; it makes novel connections, leading to new kinds of research/data but often fails to initially (or even ever) cover all of the previously “relevant” data. In other words, unification can incur an apparent empirical cost. Its virtue lies in the conceptual ties it makes, putting things together that look very different and this is a virtue even if some data might be initially (or even permanently) lost. And this historical lesson has a moral: theoretical work, work that in hindsight we consider invaluable, can start out its life empirically hobbled. And we need to understand this if we are to allow it to thrive. We need slightly different measures for evaluating such work. In particular, I suggest that we look to this work more for its ‘ahaa’ effects than for whether it covers the data points that we had presupposed, until its emergence, to be relevant.

Let me make this point a slightly different way. Work like (iii) aims to understand how something is possible not how it is actually. Now, of course to be actual it is useful to be possible. However, many possible things are not actual. That said, it is often the case that we don’t really see how to unify things that look very different, and this is especially true when MLGs are being considered (e.g. case theory really doesn’t look much like binding theory and control does has very different properties from raising). And here is where theory comes in: it aims to develop ways of seeing how two things that look very different might be the same; how might they possibly be connected. I want to further observe that this is not always easy to do.  However, by the nature of the enterprise, the possible often only coarsely fits the actual, at least until empirical tailoring has had a chance to apply. This is why a some empirical indulgence is condign.

So you may be wondering why I got off on this jag in a post about the wonders of the Athens’ conference. The reason is that one of the things that make this period of linguistics such a golden age is that the large budget of MLGs the field seems to recognize makes it ripe for the theoretically ambitious to play their unificational games. And for this work to survive (or even see the light of day) will require some indulgence on the part of my more empirically conscious colleagues. In particular, I believe that  theoretical work will need to evaluated differently (at least in the short and medium run) from the two other kinds of work that I alluded to above, where empirical coverage is reasonably seen as the primary evaluative hurdle.

More specifically, we all want our language particular descriptions and MLGs to be empirically tight (though even fashioning MLGs some indulgence (i.e. tolerance for “exceptions”) is often advisable), elegance be damned. But we want our theories to be simple (i.e. elegant and conceptually integrated and natural), and it is important to recognize that this is a virtue even in the face of empirical leakage. Given that we have entered a period where good theory is possible and desirable, we need to be mindful of this or risk crushing theoretical initiative altogether.[6]

As you may have guessed, part of why I write the last section was because the one thing that I felt was missing at Athens was the realization that this kind of indulgence is now more urgent than ever. Yours truly tried to argue the virtues of making Plato’s Problem (PP) and Darwin’s Problem (DP) central to contemporary research. The reaction, as I saw it (and I might be wrong here), was that such thinking did not really get one very far, that it is possible to do all theoretical work without worrying about these inconclusive problems. It seemed to me that to the degree that PP was acknowledged to be important, it struck me that the consensus was that we should off load acquisition concerns to the professionals in psych (though, of course, they should consult us closely in the process). The general tone seemed to be that eyeballing a proposal for PP compatibility was just self-indulgence, if not worse. The case of DP was even worse. It was taken as too unspecified to even worry about and anyhow general methodological concerns for simplicity and explanation should prove robust enough without having to indulge in pointless evolutionary speculation of the cursory variety available to us.

I actually agree with some version of each of these points. Linguists cannot explore the fine details of PP without indulging in some psychology requiring methods not typically part of the syntactician’s technical armamentarium.[7] And, I agree that right now what we know about evolution is very unlikely to play a substantive role in our theorizing. However, PP and DP serve to vividly bring before the mind two important projects: (i) that the object of study in GG is FL and its fine structure and (ii) that theoretical unification is a virtue in pursuing (i). PP and DP serve to highlight these two features, and as these are often, IMO, lost sight of, this is an excellent thing.

Moreover, at least with PP, it is not correct that we cannot eyeball a proposal to get a sense of whether it will pass platonic muster. In fact, in my experience many of the thoughtful professionals often get their cues regarding hard/interesting problems by first going through simple the PoS logic implicit in a standard syntactic analysis.[8] This simple PP analysis lays the groundwork for more refined considerations. IMO, one of the problems with current syntactic pedagogy is that we don’t teach of our students how to deploy simple PoS reasoning. Why? Well, I think it’s because we don’t actually consider PP that important. Why? Well the most generous answer is that it is assumed that all th work we do already tacitly endorses PP’s basic ideals and so worrying PP to death will not get us very much bang for the buck. Maybe, but being explicit really does make a difference. Explicitly knowing the big issues what the big issues are really is useful. It helps you to think of your work at one remove from the all important details that consume you. And it can even serve, at times, to spur interesting kinds of more specific syntactic speculation. Lastly, it’s the route towards engaging with the larger intellectual community that the Athens conference indicated so many feel detached from.

Personally, I think that the same holds true with DP. I agree that we are not about to use the existing (non-existing) insights from the evolution of cognition and language to ground type (iii) thinking. But, I think that considering the Generative project in DP terms enlarges how we think about the problem. In fact, much more specifically, it was only with the rise of the Minimalist Program (MP) in which DP was highlighted and stressed as PP had been before that the virtues of the unification of the MLGs rose to become a central kind of research project. If you do not go “beyond” explanatory adequacy there is no pressing reason for worrying about how our UGs fit in with other domains of cognition or general features of computation (if indeed they do). PP shoves the learnability problem in our faces and doing so has led us to think constructively about Chomsky Universals and how they might be teased out of our study of Gs. Hopefully, DP will do the same for cleaning up FL/UG: it will, if thought about regularly, make it vivid to us that unifying the modules and seeing how these unified systems might relate to other domains of cognition and computation is something that we should try to tease out of our understanding of our versions of UG. 

I should add that doing this should encourage us to start forcefully re-engaging with the neuro-cognitive-computational sciences (something that syntacticians used to do regularly but not so much anymore). And if we do not do this, IMO, linguistics in general and syntax in particular will not have a bright future. As I said in Athens, if you want to know about the half life of a philological style of linguistics, just consider that the phrase “prospering classics department” is close to being an oxymoron. That way lies extinction. So we need to reengage with these “folks” (ah my first Obamaism) and both PP and DP can act as constant reminders of the links that syntax ought to have with these efforts.

Ok, enough. To conclude: Intellectually, we are in a golden age of linguistics (though we made need to manage this a bit so as to not discourage PTS). However, it also appears that politically things are not so hot. Many of the attendees felt that GG work is under threat of extinction in many places. There was animated discussion about what could be done about this and how to better advertise our accomplishments to both the lay and scientific public. We discussed some of this here, and similar concerns were raised in Athens. However, it is clear that in some places matters are really pretty horrid. This is particularly unfortunate given the intellectual vigor of generative syntax. I will try and say something about this feature in a following post.

[1] Jason Merchant made this point forcefully and amusingly in Athens.
[2] I suspect that some of the hostility that traditional typology shows towards GG lies in the inchoate appreciation GG has significantly raised the empirical standards for typological research and those not conversant with these methods are failing empirically. In other words, traditional typologists rightly fear that they, not their subject matter, is threatened with obsolescence. To my mind, this is all very positive, but, for obvious reasons, it is politically terrible. Typologists of a certain stripe might be literally fighting for their lives, and this naturally leads to a very hostile attitude towards GG based work.
[3] Yes, I also see the possibility of a PTSD syndrome (pure theoretical syntax disorder).
[4] Continuing a project begun in “Conditions on Transformations” and ending with Barriers.
[5] Though see Sprouse’s work which provides evidence that wh-islands show the same super additivity profile as other islands.
[6] I will likely blog on this again in the near future if all of this sounds kind of cryptic.
[7] Though this is changing rapidly, at least at places like the University of Maryland.
[8] I’d like to thank Jeff Lidz for showing me this in spades. I sat in on his terrific class, the kind of class that every syntactician (especially newly minted syntacticians) should do.

Thursday, May 21, 2015

Manufacturing facts; the case of Subject Advantage Effects

Real science data is not natural. It is artificial. It is rarely encountered in the wild and (as Nancy Cartwright has emphasized (see here for discussion)) it standardly takes a lot of careful work to create the conditions in which the facts are observable. The idea that science proceeds by looking carefully at the natural world is deeply misleading, unless, of course, the world you inhabit happens to be CERN. I mention this because one of the hallmarks of a progressive research program is that it supports the manufacture of such novel artificial data and their bundling into large scale “effects,” artifacts which then become the targets of theoretical speculation.[1] Indeed, one measure of how far a science has gotten is the degree to which the data it concerns itself with is factitious and the number of well-established effects it has managed to manufacture. Actually, I am tempted to go further: as a general rule only very immature scientific endeavors are based on naturally available/occurring facts.[2]

Why do I mention this. Well, first, by this measure, Generative Grammar (GG) has been a raging success. I have repeatedly pointed to the large number of impressive effects that GG has collected over the last 60 years and the interesting theories that GGers have developed trying to explain them (e.g. here). Island and ECP effects, binding effects and WCO effects do not arise naturally in language use. They need to be constructed, and in this they are like most facts of scientific interest.

Second, one nice way to get a sense of what is happening in a nearby domain is to zero in on the effects its practitioners are addressing. Actually, more pointedly, one quick and dirty way of seeing whether some area is worth spending time on is to canvass the variety and number of different effects it has manufactured.  In what follows I would like to discuss one of these that has recently come to my attention that has some interests for a GGer like me.

A recent paper (here) by Jiwon Yun, Zhong Chen, Tim Hunter, John Whitman and John Hale (YCHWH) discusses an interesting processing fact concerning relative clauses (RC) that seems to hold robustly cross linguistically. The effect is called the “Subject Advantage” (SA). What’s interesting about this effect is that it holds in languages where the head both precedes and follows the relative clause (i.e. for languages like English and those like Japanese). Why is this interesting? 

Well, first, this argues against the idea that the SA simply reflects increasing memory load as a function of linear distance between gap and filler (i.e. head). This cannot be the relevant variable for though it could account for SA effects in languages like English where the head precedes the RC (thus making the subject gap closer to the head than the object gap is) in Japanese style RCs where heads follow the clause the object gap is linearly closer to the head than the subject gap is, hence predicting an object advantage, contrary to experimental fact.

Second, and here let me quote John Hale (p.c.):

SA effects defy explanation in terms of "surprisal". The surprisal idea is that low probability words are harder, in context. But in relative clauses surprisal values from simple phrase structure grammars either predict effort on the wrong word (Hale 2001) or get it completely backwards --- an object advantage, rather than a subject advantage (Levy 2008, page 1164).

Thus, SA effects are interesting in that they appear to be stable over languages as diverse as English on the one hand and Japanese on the other and seem to refractory to many of the usual processing explanations.

Furthermore, SA effects suggest that grammatical structure is important, or to put this in more provocative terms, that SA effects are structure dependent in some way. Note that this does not imply that SA effects are grammatical effects, only that G structure is implicated in their explanation.  In this, SA effects are a little like Island Effects as understood (here).[3] Purely functional stories that ignore G structure (e.g. like linearly dependent memory load or surprisal based on word-by-word processing difficulty) seem to be insufficient to explain these effects (see YCHWH 117-118).[4]

So how to explain the SA? YCHWH proposes an interesting idea: that what makes object relatives harder than subject relatives is have different amounts of “sentence medial ambiguity” (the former more than the latter) and that resolving this ambiguity takes work that is reflected in processing difficulty. Or put more flatfootedly, finding an object gap requires getting rid of more grammatical ambiguity than finding a subject gap and getting rid of this ambiguity requires work, which is reflected in processing difficulty. That’s the basic idea. He work is in the details that YCHWH provides. And there are a lot of them.  Here are some.

YCHWH defines a notion of “Entropy Reduction” based on the weighted possible continuations available at a given point in a parse. One feature of this is that the model provides a way of specifying how much work parsing is engaged in at a particular point. This contrasts with, for example, a structural measure of memory load. As note 4 observes, such a measure could explain a subject advantage but as John Hale (p.c.) has pointed out to me concerning this kind of story:

This general account is thus adequate but not very precise. It leaves open, for instance, the question of where exactly greater difficulty should start to accrue during incremental processing.

That said, whether to go for the YCHWH account or the less precise structural memory load account is ultimately an empirical matter.[5] One thing that YCHWH suggests is that it should be possible to obviate the SA effect given the right kind of corpus data. Here’s what I mean.

YCHWH defines entropy reduction by (i) specifying a G for a language that defines the possible G continuations in that language and (ii) assigning probabilistic weights to these continuations. Thusm YCHWH shows how to combine Gs and probabilities of use of these. Parsing, not surprisingly, relies on the details of a particular G and the details of the corpus of usages of those G possibilities. Thus, what options a particular G allows affects how much entropy reduction a given word licenses, as does the details of the corpus that are probabilize the G.  This thus means that it is possible that SA might disappear given the right corpus details. Or it allows us to ask what if any corpus details could wipe out SA effects. This, as Tim Hunter noted (p.c.) raises two possibilities. In his words:

An interesting (I think) question that arises is: what, if any, different patterns of corpus data would wipe out the subject advantage? If the answer were 'none', then that would mean that the grammar itself (i.e. the choice of rules) was the driving force. This is almost certainly not the case. But, at the other extreme, if the answer were 'any corpus data where SRCs are less frequent than ORCs', then one would be forgiven for wondering whether the grammar was doing anything at all, i.e. wondering whether this whole grammar-plus-entropy-reduction song and dance were just a very roundabout way of saying "SRCs are easier because you hear them more often".

One of the nice features of the YCHWH discussion is that it makes it possible to analytically approach this problem. It would be nice to know what the answer is both analytically as well as empirically.

Another one of he nice features of YCHWH is that it demonstrates how to probabilize MGs of the Stabler variety so that one can view parsing as a general kind of information processing problem. In such a context difficulties in language parsing are the natural result of general information processing demands. Thus, this conception of parsing locates it in a more general framework of information processing, parsing being one specific application where the problem is to determine the possible G compatible continuations of a sentence. Note that this provides a general model of how G knowledge can get used to perform some task.

Interestingly, on this view, parsing does not require a parser. Why? Because parsing just is information processing when the relevant information is fixed. It’s not like we do language parsing differently than we do, say, visual scene interpretation once we fix the relevant structures being manipulated. In other words, parsing on the YCHWH view is just information processing in the domain of language (i.e. there is nothing special about language processing except the fact that it is Gish structures that are being manipulated). Or, to say this another way, though we have lots of parsing, there is no parser that does it.

YCHWH  is a nice example of a happy marriage of grammar and probabilities to explain an interesting parsing effect, the SA. The latter is a discovery about the ease of parsing RCs that suggests that G structure matters and that language independent functional considerations just won’t cut it. It also shows how easy it is to combine MGs with corpora to deliver probabilistic Gs that are plausibly useful in language use. All in all, fun stuff, and very instructive.

[1] This is all well discussed by Bogen and Woodward (here).
[2] This is one reason why I find admonitions to focus on natural speech as a source of linguistic data to be bad advice in general. There may be exceptions, but as a general rule such data should be treated very gingerly.
[3] See, for example, the discussion in the paper by Sprouse, Wagers and Phillips.
[4] A measure of distance based on structure could explain the SA. For example, there are more nodes separating the object trace and the head than separating the subject trace and the head. If memory load were a function of depth of separation, that could account for the SA, at least at the whole sentence level. However, until someone defines an incremental version of the Whole-Sentence structural memory load theory, it seems that only Entropy Reduction can account for the word-by-word SA effect across both English-type and Japanese-type languages.
[5] The following is based on some correspondence with Tim Hunter. Thus he is entirely responsible for whatever falsehoods creep into the discussion here.

Monday, May 18, 2015

Two things to read

1. Alex Drummond sent me this link to a nice little paper on what appears to be an old topic that still stumps physicists. The chestnut is the question of whether hot water freezes more quickly than cold. The standard answer is "you gotta be kidding" and then lots of aspersions are cast on those that think that they have proven the contrary empirically. Read this, but what's interesting is that nobody ever thought that the right answer was anything but the obvious one. However, experiments convinced many over centuries that the unintuitive view (i.e. that hot water does freeze faster) was correct. The paper reviews the history of what is now called the "Mpemba Effect," named after a high school student who had the courage of his experiments and was ridiculed for this by teachers and fellow students until bigger shots concluded that his report was not nuts. Not that it was correct, however. It turns out that the question is very complex, takes a lot of careful reasoning to make clear and turns out to be incredibly hard to test. It's worthwhile reading for linguists for it gives a good taste of how complex interaction effects stymie even advanced sciences. So, following the adage that if it's tough for physics don't be surprised if it't bought for linguistics, it's good to wallow in the hardships and subtleties of a millennial old problem.

2. Here's a recent piece on how hard it is to think cleanly in the sciences. None of it is surprising. The bottom line is that there is lots of wiggle room even in the best sciences for developing theories that would enhance one were they true. So, there is a strong temptation to find them true and there are lots of ways of fudging the process so that what we would like to be the case has evidence in its favor. I personally find none of this surprising or disheartening.

Two points did strike me as curious.

First the suggestion that a success rate of 15% is something to worry about. Maybe it is, but what should we a priori believe the success rate should be? Maybe 15% is great for all we know. There is this presupposition that the scientific method (such as it is) should insulate us from publishing bad papers. But why think this? IMO, the real issue is not how many bad papers get out there but how many good ones. Maybe an 85% miss rate is required to generate the small number of good papers that drive a field forward.

Second, there is the suggestion that this is in part due to the exigencies of getting ahead in the academic game. The idea is that pressures today are such that there is lots to gain in painting rosy research pictures of ever expand revolutionary insight. Maybe. But do we really know if things were better in more relaxed times when these sorts of pressures were less common? I don't know. It would be nice to have a diachronic investigation to see that things have gotten worse. Personal anecdote: I once read through the proceedings of the Royal Society from the 17th and 18th centuries. It was a riot. Lots of the stuff was terrible. Of course, what survives to the present day is the gold, not the dross. So, how do we know that things have gotten worse and that the reason for this are contemporary pressures?

That's it. Science is hard. Gaining traction is difficult. Lots of useless work gets done and gets published. Contrary to scientific propaganda, there is no "method" for preventing this. Of course, we might be able to do better and we should if we can. But I for one am getting a little tired of this sky-is-falling stuff. The idea seems to be that if only we were more careful all problems could be solved. Why would anyone believe this? As the first paper outlines, even apparently simple problems are remarkably difficult, and this in areas we know a lot about.

Friday, May 15, 2015

David Adger talks to Thomas Graf about some very big issues

This David Adger post speaks for itself.


I’d intended this to be a response to Thomas’s comments but it got too long, and veered off in various directions.

Computational and other levels

Thomas makes the point that there’s too much work at the ‘implementational’ level, rather than at the proper Marrian computational level, and gives examples to do with overt vs covert movement, labelling etc. He makes an argument that all that stuff is known to be formally equivalent, and we essentially shouldn’t be wasting our time doing it. So ditch a lot of the work that goes on in syntax (sob!).

But I don’t think that’s right. Specification at the computational level for syntax is not answered fully by specifying the computational task as solving the problem of providing an infinite set of sound-meaning pairings; it’s solving the issue of why these pairings, and not some other thinkable set.  So, almost all of that `implementation’ level work about labels or whatever is actually at the computational level. In fact, I don’t really think there is an algorithmic level for syntax in the classical Marrian sense: the computational level for syntax defines the set of pairings and sure that has a physical realization in terms of brain matter, but there isn’t an algorithm per se. The information in the syntax is accessed by other systems, and that probably is algorithmic in the sense there’s a set by step process to transform information of one sort into another (to phonology, or thinking, or various other mental subsystems), but the syntax itself doesn’t undergo information transforming processes of this sort, it’s a static specification of legitimate structures (or derivations). I think that the fact that this isn’t appreciated sometimes within our field (and almost never beyond it) is actually a pretty big problem, perhaps connected with the hugely process oriented perspective of much cognitive psychology.

Back to the worry about the actual `implementational’ issue to do with Agree vs Move etc. I think that Thomas is right, and that some of it may be misguided, inasmuch as the different approaches under debate may have zero empirical consequences (that is, they don’t answer the question: why this pairing and not some other - derivations/representations is perhaps a paradigm case of this). In such cases the formal equivalence between grammars deploying these different devices is otiose and I agree that it would be useful to accept this for particular cases. But at least some of this ‘implementational’ work can be empirically sensitive: think of David Pesetsky’s arguments for covert phrasal as well as covert feature (=Agree) movement, or mine and Gillian’s work on using Agree vs overt movement to explain why Gaelic wh-phrases don’t reconstruct like English ones do but behave in a ways that’s intermediate between bound pronouns and traces. The point here is that this is work at Marr’s computational level to try to get to what the correct computational characterization of the system is.

Here’s a concrete example. In my old paper on features in minimalism, I suggested that we should not allow feature recursion in the specification of lexical items (unlike HPSG). I still think that’s right, but not allowing it causes a bunch of empirical issues to arise: we can’t deal with tough constructions by just saying that a tough-predicate selects an XP/NP predicate, like you can in HPSG, so the structures that are legitimized (or derivations if you prefer) by such an approach are quite different from those legitimized by HPSG. On the other hand, there are a whole set of non-local selectional analyses that are available in HPSG that just aren’t in a minimalist view restricted in the way I suggested (a good thing). So the specification at the computational level about the richness of feature structure directly impacts on the possible analyses that are available. If you look at that paper, it looks very implementational, in Thomas’s sense, as it’s about whether embedding of feature structures should be specified inside lexical items or outside them in the functional sequence, but the work it’s actually doing is at the computational level and has direct empirical (or at least analytical) consequences. I think the same is true for other apparently ‘implementational’ issues, and that’s why syntacticians spend time arguing about them.

Casting the Net

Another worry about current syntax that’s raised, and this is a new worry to me so it’s very interesting, is that it’s too ‘tight’: That is, that particular proposals are overly specific which is risky, because they’re almost always wrong, and ultimately a waste of energy. We syntacticians spend our time doing things that are just too falsifiable (tell that to Vyv Evans!). Thomas calls this net-syntax, as you try to cast a very particularly shaped net over the phenomena, and hence miss a bunch. There’s something to this, and I agree that sometimes insight can be gained by retracting a bit and proposing weaker generalizations (for example, the debate between Reinhart Style c-command for bound variable anaphora, and the alternative Higginbotham/Safir/Barker style Scope Requirement looks settled, for the moment, in the latter’s favour, and the latter is a much weaker claim). But I think that the worry misses an important point about the to and fro between descriptive/empirical work and theoretical work. You only get to have the ‘that’s weird’ moment when you have a clear set of theoretical assumptions that allow you to build on-the-fly analyses for particular empirical phenomena, but you then need a lot of work on the empirical phenomenon in question before you can figure out what the analysis of that phenomenon is such that you can know whether your computational level principles can account for it. That analytical work methodologically requires you to go down the net-syntax type lines, as you need to come up with restrictive hypotheses about particularities, in order to explore the phenomenon in the first case. So specific encodings are required, at least methodologically to make progress. I don’t disagree that you need to back off from those specific encodings, and not get too enraptured by them, but discovering high level generalisations about phenomena needs them, I think. We can only say true things when we know what the empirical lay of the land is, and the vocabulary we can say those true things in very much depends on a historical to and fro between quite specific implementations until we reach a point where the generalizations are stable. On top of this, during that period, we might actually find that the phenomena don’t fall together in the way we expected (so syntactic anaphor binding, unlike bound variable anaphora, seems to require not scope but structural c-command, at least as far as we can tell at the moment). The difference between syntax and maths, which was the model that Thomas gave, is that we don’t know in syntax where the hell we are going much of the time and what the problems are really going to be, whereas we have a pretty good idea of what the problems are in maths.

Structure and Interpretation

I’ll (almost) end on a (semi-)note of agreement. Thomas asks why we care about structure. I agree with him that structures are not important for the theoretical aspects of syntax, except as what systems generate, and I’m wholly on board with Thomas’s notion of derivational specifications and their potential lexicalizations (in fact, that was sort of the idea behind my 2010 thing on trying to encode variability in single grammars by lexicalising subsequences of functional hierarchies, but doing it via derivations as Thomas has been suggesting is even better).  I agree that if you have, for example, a feature system of any kind of complexity, you probably can’t do the real work of testing grammars by hand as the possible number of options just explodes. I see this as an important growth area for syntax: what are the relevant features, what are their interpretations, how do they interact, and my hunch is that we’ll need fairly powerful computational techniques to explore different grammars within the domains defined by different hypotheses about these questions, along the lines Thomas indicates. 

So why do we have syntax papers filled with structures? I think the reason is that, as syntacticians, we are really interested in how sign/sound relates to meaning (back to why these pairings), and unless you have a completely directly compositional system like a lexicalized categorial grammar, you need structures to effect this pairing, as interpretation needs structure to create distinctions that it can hook onto. Even if you lexicalize it all, you still have lexical structures that you need a theory of. So although syntactic structures are a function of lexical items and their possible combinations, the structure just has to go somewhere.

But we do need to get more explicit about saying how these structures are interpreted semantically and phonologically. Outside our field, the `recursion-only’ hypothesis (which was never, imo, a hypothesis that was ever proposed or one that anyone in syntax took seriously), has become a caricature that is used to beat our backs (apologies for the mixed metaphor). We need to keep emphasizing the role of the principles of the interpretation of structure by the systems of use. That means we need to talk more to people who are interested in how language is used, which leads me to …

The future’s bright, the future’s pluralistic.

On the issue of whether the future is rosy or not, I actually think it is, but it requires theoretical syntacticians to work with people who don’t automatically share our assumptions and to respect what assumptions those guys bring, and see where compatibilities or rapprochements lie, and where there are real, empirically detectable, differences. Part of the sociological problem Thomas and others have mentioned is insularity and perceived arrogance. My own feeling is that younger syntacticians are not as insular as those of my generation (how depressing – since when was my generation a generation ;-( ), so I’m actually quite sanguine about the future of our field; there’s a lot of stellar work in pure syntax but those same people doing that work are engaging with neuroscientists, ALL people, sociolinguists, computational people etc). But it will require more work on our (i.e. we theoretical syntacticians’) part: talking to non-syntacticians and nonlinguists, overcoming the legacy of past insularity, and engaging in topics that might seem outside of our comfort zones. But there is a huge amount of potential here, not just in the more computational areas that Thomas mentioned, but also in areas that have not had as much input from generative syntax as they could have had: multilingualism, language of ageing, language shift in immigrant populations, etc. There are areas we can really contribute to, and there are many more. I agree with Thomas that we shouldn’t shirk `applied’ research: we should make it our own.