Monday, August 28, 2017

The normalization of science

The title for this post is meant to suggest Kuhn’s distinction between revolutionary and normal science. The post is prompted by an article in PNAS that Jeff Lidz sent me. It’s by the mathematical brothers Geman. The claim in the opinion note (ON) is that contemporary science is decidedly small bore and lacks the theoretical and explanatory ambitions of earlier scientific inquiry. This despite the fact that there are more scientists doing science and more money spent on research today (I wish that were as obvious in linguistics!) than ever before. ON’s take is that, despite this, today
…advances are mostly incremental, and largely focused on newer and faster ways to gather and store information, communicate, or be entertained. (9384)
Rather than aiming to deliver abstract “unifying theories” concerning basic “mechanisms”, the research challenge is taken to be “more about computation, simulation and “big data”-style empiricism” (9385).
FoLers may recognize that this somewhat jaundiced view fits with my own narrower pet peeves concerning theory within GG. I have complained more than once that theoretical speculation is currently held in low regard. In fact, I believe (and have said so before) that many (in fact, IMO, most) practitioners consider theory to be, at best, useless ornamentation and, at worst, little more than horse doodoo. What is prized is careful description, corralling recalcitrant data points, smoothing well-known generalizations. More general explanatory ambitions are treated with suspicion and held to extremely high standards if given any hearing at all. This, at least, is my view of the current scene (and, IMO, the general hostility towards Minimalist speculation reflects this). The Geman brothers think that the disdain for fundamental unifying theory is part of the larger current scientific ethos. Their note asks what mechanisms drive it.

Before going through their claims, let me point out that neither the Brothers Geman (BG) nor yours truly want to be understood as dissing the less theory driven empirical work that is being done.  Both BG and I appreciate how hard it is to do this work and we also appreciate its importance. That is not the point. Rather, the point is to observe that nowadays only this kind of work is valued and that the field strongly marginalizes theoretical work that has different ambitions (e.g. unification, reduction, conceptual clarification).

OP canvasses several reasons for why this might be so. It considers a few endogenous factors. For example, that the problems scientists tackle today are just harder in that they are “ “unsimplifiable,” not “amenable to abstraction” (9385). OP replies that “many natural phenomena seem mysterious and hopelessly complex before being truly understood.” I would add that the passion for description fits poorly with the readiness to idealize and, if not tempered, it will make the abstraction required for fruitful theorizing impossible. We need to elevate explanatory “oomph” as a virtue alongside data coverage if we are to get beyond ““big data”-style empiricism.”

But, OP does not think that this is the main impetus behind small bore science. It thinks the problems are cultural. This comes in two parts, one of which is pretty standard by now (see here for some discussion and references), and one is more original (or at least I have never considered it). Let me begin with the first more standard observations.

OP believes that scientists today face an incentive system that rewards small bore projects. Fat CVs gain promotion, kudos, grants, and recognition. And fat CVs are best pursued by searching for the “minimal publishable unit” (I loved this term MPUs should become a standard measure) and seeking the best venues for public exposure (viz. wide ranging exposure and publicity being the current coin of the scientific realm). So publish often and be as splashy as possible is what the incentive system encourages and OP thinks that this promotes conservative research strategies that discourage doing something new and different and theoretically novel. Note that this assumes that theoretical novelty is risky in that its rewards are only evident in the longer term. This strikes me as a reasonable bet. However, I think that there is also a tension here: why splashiness encourages conservativity is unclear to me. Perhaps by splashy OP just means getting and remaining in the public eye, rather than doing something truly original and daring.

OP claims that the review process also functions as a conservative mechanism discouraging big ideas:

In academia, the two most important sources of feedback scientists receive about their performance are the written evaluations following the submission of papers for publication and proposals for research funding. Unfortunately, in both cases, the peer review process rarely supports pursuing paths that sharply diverge from the mainstream direction, or even from researchers’ own previously published work. (9386)

As I noted, these two observations are not novel (see here for example), even if they may be well placed. Frankly, I would love to hear from younger colleagues about whether this rings true for them. How deeply do these incentives work to encourage some styles of research and discourage others within linguistics? I think they obtain, but I would love to hear what my younger colleagues think.  I can say from my seat on tenure and promotion committees that CV size matters, though bulk alone is not sufficient. There is a hierarchy of journals and publishing in these is a pre-requisite for hiring and promotion, as are the all important letters. I will say a bit more about this at the end when I comment on OP’s one suggested fix.

OP makes two other cultural observations that I have not seen discussed before concerning how the internet may have changed the conduct of research in unfortunate ways. The first way strikes me as a bid fuddy-duddy in the sense that it sounds like the complaint an old person makes about youngsters.  In fact we hear this claim daily in the popular press regarding the apparent inability of the under 30 to focus given their bad multi-tasking habits. OP carries this complaint over to young researchers who, by constantly being “on-line” and/or “messaging”, end up suffering from a kind of research ADHD. Here is OP (9385):

Less discussed is the possible effect on creativity: Finding organized explanations for the world around us, and solutions for our existential problems, is hard work and requires intense and sustained concentration. Constant external stimulation may inhibit deep thinking. In fact, is it even possible to think creatively while online? Perhaps “thinking out of the box” has become rare because the Internet is itself a box.

This may be true, though I am not sure that I believe it (though being old, I am inclined to believe it). My younger colleagues don’t seem to be that distracted by these new communicative instruments nearly as much as I am. They seem used to it and treat it as just another useful tool. But again, I might be wrong and would love to know from younger colleagues if they think that there is any truth to this.
A second aspect of being connected that OP mentions rings more true to me. Here the issue is not “how we communicate” but “how much” we do so. OP identifies an “epidemic of communication” fed by “easy travel, many more meetings, relentless email and a low threshold for interaction” (9385). OP makes the interesting suggestion that this may be way too much of a good thing. Why so? Because it encourages “cognitive inbreeding.” Here is OP again (9385):

Communication is necessary, but, if there is too much communication, it starts to look like everyone is working in pretty much the same direction. A current example is the mass migration to “deep learning” in machine intelligence.

The first sentence is the useful point. I included the second one for spite because I don’t like Deep Learning and anything that takes a whack at its current fashionability is ok with me. But, the main point is interesting and worth considering. OP even provides a nice analogy with speciation in evolution. Evolution relies on diverse gene pools which requires some isolating of different populations. Too much interaction threatens to homogenize the gene pool and, by analogy, the set of acceptable ideas which in turn makes originality harder. The epidemic of communication also encourages team work, by making collaboration easier. In OP’s opinion, theory is largely a solitary matter and requires an iconoclastic bent of mind, something that is not fostered by an emphasis on team projects and too much collaboration. In place of “big ideas” the new technology fosters “big projects.” That’s the view.

I am not sure that I agree, but it is an intriguing suggestion. It fits with three things I have noticed.
First, that people would rather do anything than think. Thinking is really hard and frustrating. I know that when I am working on something I cannot get my head around I am always looking for something else to read (“can’t get down to the problem until I have mastered all the relevant literature”). I also tidy my office (well, a little). What I don’t do is stare at the problem and stare at the problem and think hard. It’s just too much work, and frustrating. So, I can believe that the ready availability of things to read makes it possible to avoid the hard work of thinking all the more.
Second, and now I am mainly focused on theoretical syntax, we have found no good replacement for Chomsky’s regular revolutions.  Let me be careful here. There is a trope that suggests that GG undergoes a revolution every decade or so and this is used as an indication that GG has made no scientific progress. I think that this is bunk. As I have noted before, I think that our knowledge has accumulated and later theory has largely conserved earlier findings. But, there is a grain of truth to this, but contrary to accepted wisdom the purported “revolutions” have been very good for the field for they have made room for those not invested in the old ideas to advance new ones. In other words, Chomsky’s constantly pulling the rug out from under his old students and shifting attention to new problems, technology and subject matter acted to disrupt complacency and make room for new ways of thinking (to the irritation of the oldsters (and I speak from experience here)). And this was very healthy. In fact, the periods of high theory all started in roughly this way. Chomsky was not the only purveyor of new ideas but he was a reliable source of intellectual disruption. We have, IMO, far less of this now. Rather, we have much more careful filigree descriptive work, but less exciting theoretical novelty. We really need more fights, less consensus. At any rate, this is consonant with OP’s main point, and I find it congenial.

Third, I think that we as a field have come to prize looking busy. In my department for example, it is noticed if someone is not “participating” and we track how many conferences students present at and papers they publish (visible metrics of success that we share with our academic overlords). The idea that a grad student’s main job is to sit and think and play with ideas is considered a bit quaint. Everyone needs to be doing something. But sitting and thinking is not doing in quite the same way that participating in every research group is. It’s harder and more solitary and less valued nowadays, or so it seems. I doubt that this is just true of UMD.

OP makes one more important point, but this sadly is not something we can do much about right now. OP notes that once jobs were very plentiful, as were grants. This made it possible to explore different ideas that might not pan out for your livelihood was not at stake if you swam against the tide or if it too some time before the ideas hit paydirt. I suspect that this is a big cause of the current atmosphere of conformity and timidity that OP identifies. In situations where decisions are made by committees and openings are scarce, the aim is not to offend. Careful, conventional filigree work is safer and playing it safe is a good idea when options are few.

That’s more or less the OP analysis. It also has one suggestion for making things better. It is a small suggestion. Here it is (9386):

Change the criteria for measuring performance. In essence, go back in time. Discard numerical performance metrics, which many believe have negative impacts on scientific inquiry … Suppose, instead, every hiring and promotion decision were mainly based on reviewing a small number of publications chosen by the candidate. The rational reaction would be to spend more time on each project, be less inclined to join large teams in small roles, and spend less time taking professional selfies. Perhaps we can then return to a culture of great ideas and great discoveries.

I like this idea, but I am not sure it will fly. Why? Because it requires that institutions exercise judgment and trust their local colleagues considered opinions. It is easy to count CV entries (it is also “objective” (and hence less liable to abuse)). It’s much harder to evaluate actual research sympathetically and intelligently (and it is also necessarily personal and “subjective” (and so more liable to abuse)). And it is even harder to evaluate evaluations for sympathy and intelligence. At the very least, this is all very labor intensive. I don’t see it happening. What I can see happening is a cut down version of this. We should begin to institutionalize one question (I call it “Poeppel’s question” because he made it the lead off query in everyone of his lab meetings) concerning work we read, review, listen to presentations of, advise: What’s the point?  Why should anyone care? If a demand for an answer to this question becomes institutionalized it will force us all to think more expansively and will promote another less descriptive dimension of evaluation. It’s a small thing, a doable thing. I think it might have surprisingly positive effects. I for one intend to start right away.

Thursday, August 24, 2017

Another sad event

Bill Davies died on August 18. I knew him a bit as he worked some on control and had inquiries concerning some students who were looking for work ay U of I. He was a decent and intelligent man. Here is a more extensive note that Rob Chametzky sent.


It is with profound sadness that we note the passing of Professor William Davies on August 18, 2017. After joining the faculty in 1986, Bill was at the heart of departmental life, serving for many years as Departmental Executive Officer, Director of Graduate Studies, Director of English as a Second Language Programs for thirteen years between 1990 and 2005, and as a mentor and advisor for countless PhD, MA, and BA students in the department. As ESL Director, Bill ensured that ESL Programs’ faculty and staff were treated as respected professionals and as full members of the Department, and that linguistics graduate students pursuing a focus in Teaching English as a Second Language (TESL) were provided teaching experience with structured support and supervision by ESL faculty—just one example, among many, of his strong commitment to student success.
Bill was a highly respected theoretical syntactician and a preeminent scholar of Austronesian languages, focusing extensively on the syntax of raising and control, and on the syntax and morphology of Javanese and Madurese. He received his PhD from the University of California, San Diego in 1981, writing his dissertation on Choctaw clause structure. Beginning with Choctaw, and continuing with other languages, much of Bill's research united his interests and training in syntactic theory with his passion for language documentation and preservation. From the early 1990s, his attraction to the languages of Indonesia drew him first to Javanese, and then to Madurese, a language that he worked on for some twenty years. His work on various linguistic phenomena in Madurese culminated in 2010 in the De Gruyter Mouton A Grammar of Madurese, the first (and only) comprehensive grammar of this language of 14 million speakers. Bill’s theoretical work traversed a broad range of phenomena, including wh-questions, reflexives and reciprocals, antipassives, causatives, and (especially) raising and control. However, his theoretical work was, invariably, coupled with efforts to give back to the people who so graciously allowed him into their space to do his research – studying the grammar of the Madurese while, at the same time, preserving and rendering accessible the rapidly disappearing folk story traditions for their next generation.   
Bill taught and conducted research both at Cornell University, where he held a Mellon postdoctoral fellowship, and at California State University, Sacramento, prior to joining the faculty at Iowa. In addition, he served on the faculty of two Linguistic Society of America Summer Institutes, at the Ohio State University and the University of Chicago, and co-coordinated (with Stan Dubinsky) National Science Foundation-funded workshops at two additional Summer Institutes. Bill made innumerable invited and conference presentations of his research in Indonesia, along with many other international, national, and local venues. He published extensively, both in singly-authored works and in collaborative work with students and colleagues—including, with long-time collaborator Stan Dubinsky, an edited volume on the theory of grammatical functions, two influential volumes on control and raising, and a forthcoming textbook on language conflict and language rights. His research was funded by a number of prestigious grants, including a National Science Foundation grant supporting his work on the grammar of Madurese, and, more recently, grants from the National Endowment for the Humanities, the NSF, and the Smithsonian Institution, as well as the American Institute for Indonesian Studies and the Fulbright Scholar program, for his project documenting the language of the Baduy Dalam. Much of Bill's recent work circled back to his early roots in anthropological linguistics, uniting his interests in culture and language preservation with careful descriptive work, theoretical analyses, and digital audio and video recordings and transcriptions of both Madurese and Baduy folk stories.

During his time on the Iowa faculty, Bill, a gifted and award-winning teacher, particularly enjoyed teaching Linguistic Field Methods and Linguistic Structures— two classes which allowed him to share his passion for Austronesian linguistics with generations of students, and which ultimately spawned many PhD dissertation and qualifying paper topics. To his PhD students, Bill was a tireless mentor and an impeccable role model; he was deeply proud of their accomplishments and cherished this work.

Bill was gentle, warm, compassionate, and funny (and occasionally (okay, often) sarcastic); friends, students, and colleagues will remember his invaluable contributions to the Department of Linguistics and to the University of Iowa, of course, but more so his fundamental decency, his sense of fairness, and his unceasing advocacy for his students and for the field of linguistics. He will be deeply missed. 

Monday, August 21, 2017

Language vs linguistics, again; the case of Christiansen and Chater

Morten Christiansen and Nick Chater have done us all a favor. They have written a manifesto (here, C&C) outlining what they take to be a fruitful way of studying language. To the degree that I understand it, it seems plausible enough given its apparent interests. It focuses on the fact that language as we encounter it on a daily basis is a massive interaction effect and the manifesto heroically affirms the truism (just love those papers that bravely go out on a limb (the academic version of exercise?)) that explaining interaction effects requires an “integrated approach.” Let me emphasize how truistic this is: if X is the result of many interacting parts then only a story that enumerates these parts, describes their properties and explains how they interact can explain an effect that is the result of interacting parts interacting. Thus a non-integrated account of an interaction effect is a logical non-starter. It is also worth pointing out the obvious: this is not a discovery, it is a tautology (and a rather superficial one at that), and not one that anyone (and here I include C&C’s bête noir Monsieur Chomsky (I am just back from vacationing in Quebec so excuse the francotropism)) can, should or would deny (in fact, we note below that he made just this observation oh so many (over 60) years ago!).

That said, C&C, from where I sit, does make two interesting moves that go beyond the truistic. The first is that it takes the central truism to be revolutionary and in need of defence (as if anyone in their right mind would ever deny it). The second noteworthy feature is that the transparent truth of the truism (note truisms need not be self-evident (think theorems) but this one is) seems to license a kind of faith based holism, one that goes some distance in thwarting the possibility of a non-trivial integrated approach of the kind C&C urges. 

Before elaborating these points in more detail, I need (the need here is pathetically psychological, sort of like a mental tic, so excuse me) to make one more overarching point: C&C appears to have no idea what GG is, what its aims are or what it has accomplished over the last 60 years. In other words, when C&C talks about GG (especially (but not uniquely) about the Chomsky program) it is dumb, dumb, dumb! And it is not even originally dumb. It is dumb in the old familiar way. It is boringly dumb. It is stale dumb. Dumb at one remove from other mind numbing dumbness. Boringly, predictably, pathetically dumb. It makes one wonder whether or not the authors ever read any GG material. I hope not. For having failed to read what it criticizes would be the only half decent excuse for the mountains of dumb s**t that C&C asserts. If I were one of the authors, I would opt for intellectual irresponsibility (bankruptcy (?)) over immeasurable cluelessness if hauled before a court of scientific inquiry. At any rate, not having ever read the material better explains the wayward claims confidently asserted than having read it and so misconstrued it. As I have addressed C&C’s second hand criticisms more than (ahem) once, I will allow the curious to scan the archives for relevant critical evisceration.[1]

Ok, the two main claims: It is a truism that language encountered “in the wild” is the product of many interacting parts. This observation was first made in the modern period in Syntactic Structures. In other words, a long time ago.[2] In other words, the observation is old, venerable and, by now, commonplace. In fact, the distinction between ‘grammatical’ and ‘acceptable’ first made over 60 years ago relies on the fact that a speaker’s phenomenology wrt utterances is not exclusively a function of an uttered sentence’s grammatical (G) status. Other things matter, a lot. In the early days of GG, factors such as memory load, attention, pragmatic suitability, semantic sensibility (among other factors) were highlighted in addition to grammaticality. So, it was understood early on that many many factors went into an acceptability judgment, with grammaticality being just one relevant feature. Indeed, this observation is what lies behind the competence/performance distinction (a point that C&C seems not to appreciate (see p. 3), the distinction aiming to isolate the grammatical factors behind acceptability, thereby, among other things, leaving room for other factors to play a role.[3]

And this was not just hand waving or theory protecting talk (again contra C&C, boy is its discussion DUMB!!). A good deal of work was conducted early on trying to understand how grammatical structure could interact with these other factors to burden memory load and increase perceived unacceptability (just think of the non-trivial distinction between center and self embedding and its implications for memory architecture).[4] This kind of work proceeds apace even today, with grammaticality understood to be one of the many factors that go into making judgments gradiently acceptable.[5] Indeed, there is no grammatically informed psycho-linguistic work done today (or before) that doesn’t understand that G/UG capacities are but one factor among others needed to explain real time acquisition, parsing, production, etc. UG is one factor in accounting for G acquisition (as Jeff Lidz, Charles Yang, Lila Gleitman etc. have endlessly emphasized) and language particular Gs are just one factor in explaining parsability (which is, in turn, one factor in underlying acceptability) (as Colin Phillips, Rick Lewis, Shravan Vasishth, Janet Fodor, Bob Berwick, Lyn Frazier, Jon Sprouse, etc. etc. etc. have endlessly noted). Nobody denies the C&C truism that language use involves multiple interacting variables. Nobody is that stupid!

So, C&C is correct in noting that if one’s interest is in figuring out how language is deployed/acquired/produced/parsed/etc. then much more than a competence theory will be required. This is not news. This is not even an insight. The question is not if this is so, but how it is so. Given this, the relevant question is: what tree is C&C barking up by suggesting that this is contentious?

I have two hypotheses. Here they are.

1.     C&C doesn’t take G features to be at all relevant to acceptability.
2.     C&C favors a holistic rather than an analytic approach to explaining interaction effects in language.

Let’s discuss each in turn.

C&C is skeptical that grammaticality is a real feature of natural language expressions. In other words, C&C's beef with the traditional GG conception in which G/UG properties are one factor among many lies with assigning G/UG any role at all. This is not as original as it might sound. In fact, it is quite a traditional view, one that Associationists and Structuralists held about 70 years ago. It is the view that GG defenestrated, but apparently, did not manage to kill (next time from a higher floor please). The view amounts to the idea that G regularities (C&C is very skeptical that UG properties exist at all, I return to this presently) are just probabilistic generalizations over available linguistic inputs. This is the view embodied in Structuralist discovery procedures (and suggested in current Deep Learning approaches) wherein levels were simple generalizations over induced structures of a previous lower level. Thus, all there is to grammar is successively more abstract categories built up inductively from lower level less abstract categories. On this view, grammatical categories are classes of words, which are definable as classes of morphemes, which are definable as classes of phonemes, which are definable as classes of phones. The higher levels are, in effect, simple inductive generalizations over lower level entities. The basic thought is that higher-level categories are entirely reducible to lower level distributional patterns. Importantly, in this sort of analysis, there are no (and can be no) interesting theoretical entities, in the sense of real abstract constructs that have empirical consequences but are not reducible or definable in purely observational terms. In other words, on this view, syntax is an illusion and the idea that it makes an autonomous contribution to acceptability is a conceptual error.

Now, I am not sure whether C&C actually endorses this view, but it does make noises in that direction. For example, it endorses a particular conception of constructions and puts it “at the heart” of its “alternative framework” (4). The virtues of C&C constructions is that they are built up from smaller parts in a probabilistically guided manner. Here is C&C (4):

At the heart of this emerging alternative framework are constructions , which are  learned pairings of form and meaning ranging from meaningful parts of words (such as word endings, for example, ‘-s’, ‘-ing’) and words themselves (for example, ‘penguin’) to multiword sequences (for example, ‘cup of tea’) to lexical patterns and schemas (such as, ‘the X-er, the Y-er’, for example, ‘the bigger, the better’). The quasi-regular nature of such construction grammars allows them to capture both the rule-like patterns as well as the myriad of exceptions that often are excluded by fiat from the old view built on abstract rules. From this point of view, learning a language is learning the skill of using constructions to understand and produce language. So, whereas the traditional perspective viewed the child as a mini-linguist with the daunting task of deducing a formal grammar from limited input, the construction-based framework sees the child as a developing language-user, gradually honing her language-processing skills. This requires no putative universal grammar but, instead, sensitivity to multiple sources of probabilistic information available in the linguistic input: from the sound of words to their co-occurrence patterns to information from semantic and pragmatic contexts.

This quote does not preclude a distinctive Gish contribution to acceptability, but its dismissal of any UG contribution to the process suggests that it is endorsing a very strong rejection of the autonomy of syntax thesis.[6] Let me repeat, a commitment to the centrality of constructions does not require this. However, the C&C version seems to endorse it. If this is correct, then C&C sees the central problem with modern GG is its commitment to the idea that syntactic structure is not reducible to either statistical distributional properties or semantic or pragmatic or phonological or phonetic properties of utterances. In other words, C&C rejects the GG idea that grammatical structure is real and makes any contribution to the observables we track through acceptability.

This view is indeed radical, and virtually certain to be incorrect.[7] If there is one thing that all linguists agree on (including constructionists like Jackendoff and Culicover) it’s that syntax is real. It is not reducible to other factors. And if this is so, then G structure exists independently of other factors. I also think that virtually all linguists believe that syntax is not the sum of statistical regularity in the PLD.[8] And there is good reason for this; it is morally certain that many of the grammatical factors that linguists have identified over the last 60 years have linguistically proprietary roots and leave few footmarks in the PLD. To argue that this standard picture is false requires a lot of work, none of which C&C does or points to. Of course, C&C cannot be held responsible for this failing, for C&C has no idea what this work argues because C&C’s authors appear never to have never read any of it (or, if it has been read, it has not been understood, see above). But were C&C informed by any of this work, it would immediately appreciate that it is nuts to think that it is possible to eliminate G features as one factor in acceptability.[9]

In sum, one possible reading of C&C is that it endorses the old Structuralist idea of discovery procedures, denies the autonomy of syntax thesis (i.e. the thesis that syntax is “real”) and believes in the (yes I got to say it) the old Empiricist/Associationist trope that language capacity is nothing but a reflection of tracked statistical regularities. It’s back folks. No idea ever really dies, no matter how unfounded and implausible and how many times it has been stabbed through the heart with sharp arguments.

Before going on to the second point, let me add a small digression concerning constructions. Look, anyone who works on the G of a particular language endorses some form of constructionism (see here for some discussion). Everyone assumes that morphemes have specific requirements, with specific selection restrictions. These are largely diacritical and part of the lexical entry of the morpheme. Gs are often conceived as checking these features in the course of a derivation and one of the aims of a theory of Gs (UG) is to specify the structural/derivational conditions that regulate this feature checking. Thus, everyone’s favorite language specific G has some kinds of constructions that encode information that is not reducible to FL or UG principles (or not so reducible as far as we can tell). 

Moreover, it is entirely consistent with this view that units larger than morphemes code this kind of information. The diacritics can be syncategorematic and might grace structures that are pretty large (though given something like an X’ syntax with heads or a G with feature percolation the locus of the diacritical information can often be localized on a “listable” linguistic object on the lexicon). So, the idea that C&C grabs with both hands and takes to be new and revolutionary is actually old hat. What distinguishes the kind of constructionism one finds in C&C from the more standard variety found in standard work is the idea central to GG that constructions are not “arbitrary.” Rather, constructions have a substructure regulated by more abstract principles of grammar (and UG). C&C seems to think that anything can be a construction. But we know that this is false.[10] Constructions obey standard principles of Grammar (e.g. no mirror image constructions, no constructions that violate the ECP or binding theory, etc.). So though there can be many kinds of constructions that compile all sorts of diverse information there are some pretty hard constraints regulating what a possible construction is.

Why do I mention this? Because I could not stop myself! Constructions lie at the heart of C&C’s “alternative framework” and nonetheless C&C has no idea what they are, that they are standard fare in much of standard GG (even minimalist Gs are knee deep in such diacritical features) and that they are not the arbitrary pairings that C&C takes them to be. In other words, once again C&C is mind numbingly ignorant (or, misinformed).

So that’s one possibility. C&C denies G properties are real. There is a second possible assumption, one that does not preclude this one and is often found in tandem with it, but is nonetheless different. The second problem C&C sees with the standard view lies with its analytical bent. Let me explain.

The standard view of performance within linguistics is that it involves contributions of many factors. Coupled with this is a methodology: The right way to study these is to identify the factors involved, figure out their particular features and see how they combine in complex cases. One of the problems with studying such phenomena is that the interacting factors don’t always nicely add up. In other words, we cannot just add the contributions of each component together to get a nice well-behaved sum at the end. That’s what makes some problems so hard to solve analytically (think turbulence). But, that’s still the standard way to go about matters.  GG endorsed this view from the get-go. To understand how language works in the wild, figure out what factors go into making, say, an utterance, and see how these factors interact. Linguists focused on one factor (G and UG) but understood that other factors also played a role (e.g. memory, attention, semantic/pragmatic suitability etc.). The idea was that in analyzing (and understanding) any bit of linguistic performance, grammar would be one part of the total equation, with its own distinctive contribution.[11]

Two things are noteworthy about this. First, it is hard, very hard. It requires understanding how (at least) two “components” function as well as understanding how they interact.  As interactions need not be additive, this can be a real pain, even under ideal conditions where we really know a lot (that’s why engineers need to do more than simply apply the known physics/chemistry/biology). Moreover, interaction effects can be idiosyncratic and localized, working differently in different circumstances (just ask your favorite psycho/neuro linguist about task effects). So, this kind of work is both very demanding and subject to quirky destabilizing effects. Recall Fodor’s observation: the more modular a problem is, the more likely it is solvable at all. This reflects the problems that interaction effects generate.[12]

At any rate, this is the standard way science proceeds when approaching complex phenomena. It factors it into its parts and then puts these parts back together. It is often called atomism or reductionsism but it is really just analysis with synthesis and it has proven to be the only real game in town.[13] That said, many bridle at this approach and yearn for more holistic methods. Connectionsists used to sing the praises of holism: only the whole system computes! You cannot factor a problem into its parts without destroying it. Holists often urge simulation in place of analysis (let’s see how the whole thing runs). People like me find this to be little more than the promotion of obscurantism (and not only me, see here for a nice take down in the domain of face perception).

Why do I mention this here? Because, there is a sense in which C&C seems to object not only to the idea that Grammar is real, but also to the idea that the right way to approach these interaction effects is analytically. C&C doesn’t actually say this, but it suggests it in its claims that the secret to understanding language in the wild lies with how all kinds of information are integrated quickly in the here and now. The system as a whole gives rise to structure (which “may [note the weasel word here, btw, NH] explain why language structure and processing is highly local in the linguistic signal” (5))[14] and the interaction of the various factors eases the interpretation problem (though as Gleitman and Trusewell and friends have shown, having too much information is itself a real problem (see here, here and here.)). The prose in C&C suggests to me that only at the grain of the blooming buzzing interactive whole will linguistic structure emerge. If this is right, then the problem with the standard view is not merely that it endorses the reality of grammar, but that it takes the right approach to be analytic rather than holistic. Again, C&C does not expressly say this, but it does suggest it, and it makes sense of its dismissal of “fragmented” investigations of the complex phenomenon. In their view, we need to solve all the problems at once and together, rather than piecemeal and then fit them together. Of course, we all know that there is no “best” way to proceed in these complex matters; that sometimes a more focused view is better and sometimes a more expansive one. But the idea that an analytic approach is “doomed to fail” (1) surely bespeaks an antipathy towards the analytic approach to language.

An additional point: note that if one thinks that all there is to language is statistically piecing together of diverse kinds of information then one is really against the idea that language in the wild is the result of interacting distinct modules with their own powers and properties. This, again, is an old idea. Again you all know who believed this (hint, starts with an E). So, if one were looking for an overarching unifying theme in C&C, one that is not trotted out explicitly but holds the paper together, then one could do worse than look to Associationism/Empiricism. This is the glue that holds the various parts together, from the hostility to the very idea that grammars are real to the conviction that the analytic approach (standard in the sciences) is doomed to failure.

There is a lot of other stuff in this paper that is also not very good (or convincing). But, I leave it as an exercise to the reader to find these and dispose of them (take a look at the discussion of language and cultural evolution for a real good time (on p.6). I am not sure, but it struck me as verging on the incoherent and mixing up the problem of language change with the problem of the emergence of a facility for language). Suffice it to say that C&C adds another layer to the pile of junk written on language perpetrated on the innocent public by the prestige journals. Let me end with a small rant on this.

C&C appeared in Nature. This is reputed to be a fancy journal with standards (don’t believe it for a moment. It’s all show business now).[15] I doubt that Nature believes that it publishes junk. Maybe it takes it to be impossible to evaluate opinion or “comment” pieces. Maybe it thinks that taste cannot be adjudicated. Maybe. But I doubt it. Rather, what we are witnessing here is another case of Chomsky bashing, with GG as collateral damage. It is not only this, but it is some of this. The other factor is the rise of big data science. I will have something to say about this in a later post. For now, take a look at C&C. It’s the latest junk installment of a story that doesn’t get better with repetition. But in this case all the arguments are stale as well as being dumb. Maybe their shelf expiration date will come soon. One can only hope even if such hope is irrational given the evidence.

[1] Type ‘Evans and Levinson’ (cited in C&C) or ‘Vyvyan Evans’ or ‘Everett’ in the search section for a bevy of replies to the old tired incorrect claims that C&C throws out like confetti at a victory parade.
[2] Actually, I assume that Chomsky’s observations are just another footnote to Plato or Aristotle, though I don’t know what text he might have been footnoting but, as you know the guy LOVES footnotes!
[3] The great sin of Generative Semantics was to conflate grammaticality and acceptability by, in effect, treating any hint of unacceptability as something demanding a grammatical remedy.
[4] I should add that the distinction between these two kinds of structures (center vs self embedding) is still often ignored or run together. At time, it makes one despair about whether there is any progress at all in the mental sciences.
[5] And which, among other things, led Chomsky to deny that there is a clean grammatical/ungrammatical distinction, insisting that there are degrees of grammaticality as well as the observed degrees of acceptability. Jon Sprouse is the contemporary go-to person on these issues.
[6] And recall, the autonomy of syntax thesis is a very weak claim. It states that syntactic structure is real and hence not reducible to observable features of the linguistic expression. So syntax is not just a reflex of meaning or sound or probabilistic distribution or pragmatic felicity or… Denying this weak claim is thus a very strong position.
[7] There is an excellent discussion of the autonomy of syntax and what it means and why it is important in the forthcoming anniversary volume on Syntactic Structures edited by Lasnik, Patel-Grosz, Yang et moi. It will make a great stocking stuffer for the holidays so shop early and often.
[8] Certainly Jackendoff, the daddy of constructionism has written as much.
[9] Here is a good place to repeat sotto voce and reverentially: ‘Colorless green ideas sleep furiously’ and contrast it with ‘Furiously sleep ideas green colorless.’
[10] Indeed, if I am right about the Associationist/Empiricist subtext in C&C then C&C does not actually believe that there are inherent limits on possible constructions. On this reading of C&C the absence of mirror image constructions is actually just a fact about their absence in the relevant linguistic environment. They are fine potential constructions. They just happen not to occur.  One gets a feeling that this is indeed what C&C thinks by noting how impressed it is with “the awe-inspiring diversity of the world’s languages” (6). Clearly C&C favors theories that aim for flexibility to cover this diversity. Linguists, in contrast, often focus on “negative facts,” possible data that is regularly absent. These have proven to be reliable indicators of underlying universal principles/operations. The fact that C&C does not mention this kind of datum is, IMO, a pretty good indicator that it doesn’t take it seriously. Gaps in the data are accidents, a position that any Associationist/Empiricist would naturally gravitate towards. In fact, if you want a reliable indicator of A/E tendencies look for a discussion of negative data. If it does not occur, I would give better than even odds that you are reading the prose of a card carrying A/Eer.
[11] Linguists do differ on whether this is a viable project in general (i.e. likely to be successful). But this is a matter of taste, not argument. There is no way to know without trying.
[12] For example, take a look at this recent piece on the decline of the bee population and the factors behind it. It ends with a nice discussion of the (often) inscrutable complexity of interaction effects:

Let's add deer to the list of culprits, then. And kudzu. It's getting to be a long list. It's also an indication of what a complex system these bees are part of. Make one change that you don't think has anything to do with them -- develop a new pesticide, enact a biofuels subsidy, invent the motorized lawnmower -- and the bees turn out to feel it.

[13] Actually, it used to be the only game in town. There are some urging that scientific inquiry give up the aim of understanding. I will be writing a post on this anon.
[14] This btw, is not even descriptively correct given the myriad different kinds of locality that linguists have identified. Indeed, so far as I know, there is no linear bound between interacting morphemes anywhere in syntax (e.g. agreement, binding, antecedence, etc.).
[15] It’s part of the ethos of the age. See here for the theme song.

Monday, August 14, 2017

Grammars and functional explanations

One of the benefits of having good colleagues and a communal department printer is that you get to discover interesting stuff you would never have run across. The process (I call itresearch,” btw) is easy: go get what you have just printed out for yourself and look at the papers that your colleagues have printed out for themselves that are lying in the pick-up tray. If you are unscrupulous you steal it from the printer tray and let your colleague print another for her/himself. If you are honest you make a copy of the paper and leave the original for your collegial benefactor (sort of a copy theory of mental movement).  In either case, whatever the moral calculus, there is a potential intellectual adventure waiting for you every time you go and get something you printed out. All of this is by way of introducing the current post topic. A couple of weeks ago I fortuitously ran into a paper that provoked some thought, and that contained one really pregnant phrase (viz. “encoding-induced load reduction”). Now, we can all agree that the phrase is not particularly poetic. But it points to a useful idea whose exploration was once quite common. I would like to say a few words about the paper and the general idea as a way of bringing both to your attention.

The paper is by Bonhage, Fiebach, Bahlmann and Mueller (BFBM). It makes two main points: (i) to describe coding for the structure features of language unfold over time and (ii) to identify the neural implementations of this process.  The phenomenal probe into this process is the Sentence Superiority Effect (SSE). SSE is “that observation that sentences are remembered better than ungrammatical word strings” (1654). Anyone who has crammed for an exam where there is lots of memorization is directly acquainted with the SSE. It’s a well know device for making otherwise disparate information available to concoct sentences/phrases as mnemonic devices. This is the fancy version of that. At any rate, it exists, is behaviorally robust and is a straightforward bit of evidence for online assignment of grammatical structure where possible. More exactly, it is well known that “chunking” enhances memory performance and it seems, not surprisingly, that linguistic structure affords chunking. Here is BFBM (1656):

Linguistically based chunking can also be described as an enriched
encoding process because it entails, in addition to the simple sequence of items, semantic and syntactic relations between items… [W]e hypothesize that the WM benefit of sentence structure is to a large part because of enriched encoding. This enriched encoding in turn is hypothesized to result in reduced WM [working memory, NH] demands during the subsequent maintenance phase…

BFBM identifies the benefit specifically to maintaining a structure in memory, though there is a cost for the encoding. This predicts increased activity in those parts of the brain wherein coding happens and reduced activity in parts of the brain responsible for maintaining the coded information. As BFBM puts it (1656):

With respect to functional neuroanatomy, we predict that enriched encoding should go along with increased activityin the fronto-temporal language network for semantic and syntactic sentence processing.

During the subsequent maintenance period, we expected to see reduced activity for sentence material in VWM systems responsible for phonological rehearsal because of the encoding-induced load reduction.

The paper makes several other interesting points concerning (i) the role of talking to oneself sotto voce in memory enhancement (answer: not important factor in SSE), (ii) the degree to which the memory structures involved in the SSE are language specific or more domain general (answer: both language areas and more general brain areas involved) and (iii) the relative contribution of syntactic vs semantic structure to the process (somewhat inconclusive IMO). At any rate, I enjoyed going through the details and I think you might as well.

But what I really liked is the program of linking linguistic architectures with more general cognitive processes. Here, again, is BFBM (1666):

But how does the involvement of the semantic system contribute to a performance advantage in the present working memory task? One possible account is chunking
of information. From cognitive psychology, we know that chunking requires the encoding of at least two hierarchical levels: item level and chunk level (Feigenson & Halberda, 2004). The grammatical information contained in the word list makes it possible to integrate the words (i.e., items) into a larger unit (i.e., chunk) that is specified by grammatical relationships and a basic meaning representation,
as outlined above. This constitutes not only a syntactically but also a semantically enriched unit that contains agents and patients characterized, for example,
by specific semantic roles. Additional encoding of sentence-level meaning of this kind, triggered by syntactic structure, might facilitate the following stages (i.e.,
maintenance and retrieval) of the working memory process.

So, there is a (not surprising) functional interaction between grammatical coding and enhanced memory (through load reduction) through reduced maintenance costs in virtue of there existing an encoding of linguistic information above the morpheme/word level. Thus, the existence of G encodings fits well with the cognitive predilections of memory structure (in this case maintenance).

Like I said, this general idea is very nice and is one that some of my closest friends (and relatives) used to investigate extensively. So, for example, Berwick and Weinberg (B&W) tried to understand the Subjacency Condition in terms of its functional virtues wrt efficient left corner parsing (see, e.g. here). Insightful explorations of the “fit” between G structure and other aspects of cognition are rarish if for no other reason that it requires really knowing something about the “interfaces.” Thus, you need to know something about parsers and Gs to do what B&W attempted. Ditto with current work on the meaning of quantifiers and the resources of the analogue number system embedded on our perceptual systems (see here). Discovering functional fit requires really understanding properties of the interfaces in a non-trivial way. And this is hard!

That said, it is worth doing, especially if one’s interests lie in advancing minimalist aims. We expect to find these kinds of dependencies, and we expect that linguistic encodings should fit snugly with non-linguistic cognitive architecture if “well-designed.”[1] Moreover, it should help us to understand some of the conditions that we find regulate G interactions. So, for example, Carl De Marcken exhibited the virtues of headedness for unsupervised learners (see here for discussion and links). And, it seems quite reasonable to think that Minimality is intimately connected with the fact that biological memory is subject to similarity based interference effects. It is not a stretch, IMO, to see minimality requirements as allowing for “encoding-induced load reduction” by obviating (some of) the baleful effects of similarity based interference endemic to a content addressable memory system like ours. Or, to put this another way, one virtue of Gs that include a minimality restriction is that it will lessen the cognitive memory load on performance systems that use these Gs (most likely for acts of retrieval (vs maintenance)).

Minimalism invites these kinds of non-reductive functional investigations. It invites asking: how does the code matter when used? Good question, even if non-trivial answers are hard to come by.

[1] Yes, I know, there are various conceptions of “good design” only some of which bear on this concept. But one interesting thing to investigate is the concept mooted here for it should allow us to get a clearer picture of the structure of linguistic performance systems if we see them as fitting well with the G system. This assumption allows us to exploit G structure as a probe into the larger competence plus production system. This is what BFBM effectively does to great effect.