Monday, August 21, 2017

Language vs linguistics, again; the case of Christiansen and Chater

Morten Christiansen and Nick Chater have done us all a favor. They have written a manifesto (here, C&C) outlining what they take to be a fruitful way of studying language. To the degree that I understand it, it seems plausible enough given its apparent interests. It focuses on the fact that language as we encounter it on a daily basis is a massive interaction effect and the manifesto heroically affirms the truism (just love those papers that bravely go out on a limb (the academic version of exercise?)) that explaining interaction effects requires an “integrated approach.” Let me emphasize how truistic this is: if X is the result of many interacting parts then only a story that enumerates these parts, describes their properties and explains how they interact can explain an effect that is the result of interacting parts interacting. Thus a non-integrated account of an interaction effect is a logical non-starter. It is also worth pointing out the obvious: this is not a discovery, it is a tautology (and a rather superficial one at that), and not one that anyone (and here I include C&C’s bête noir Monsieur Chomsky (I am just back from vacationing in Quebec so excuse the francotropism)) can, should or would deny (in fact, we note below that he made just this observation oh so many (over 60) years ago!).

That said, C&C, from where I sit, does make two interesting moves that go beyond the truistic. The first is that it takes the central truism to be revolutionary and in need of defence (as if anyone in their right mind would ever deny it). The second noteworthy feature is that the transparent truth of the truism (note truisms need not be self-evident (think theorems) but this one is) seems to license a kind of faith based holism, one that goes some distance in thwarting the possibility of a non-trivial integrated approach of the kind C&C urges. 

Before elaborating these points in more detail, I need (the need here is pathetically psychological, sort of like a mental tic, so excuse me) to make one more overarching point: C&C appears to have no idea what GG is, what its aims are or what it has accomplished over the last 60 years. In other words, when C&C talks about GG (especially (but not uniquely) about the Chomsky program) it is dumb, dumb, dumb! And it is not even originally dumb. It is dumb in the old familiar way. It is boringly dumb. It is stale dumb. Dumb at one remove from other mind numbing dumbness. Boringly, predictably, pathetically dumb. It makes one wonder whether or not the authors ever read any GG material. I hope not. For having failed to read what it criticizes would be the only half decent excuse for the mountains of dumb s**t that C&C asserts. If I were one of the authors, I would opt for intellectual irresponsibility (bankruptcy (?)) over immeasurable cluelessness if hauled before a court of scientific inquiry. At any rate, not having ever read the material better explains the wayward claims confidently asserted than having read it and so misconstrued it. As I have addressed C&C’s second hand criticisms more than (ahem) once, I will allow the curious to scan the archives for relevant critical evisceration.[1]

Ok, the two main claims: It is a truism that language encountered “in the wild” is the product of many interacting parts. This observation was first made in the modern period in Syntactic Structures. In other words, a long time ago.[2] In other words, the observation is old, venerable and, by now, commonplace. In fact, the distinction between ‘grammatical’ and ‘acceptable’ first made over 60 years ago relies on the fact that a speaker’s phenomenology wrt utterances is not exclusively a function of an uttered sentence’s grammatical (G) status. Other things matter, a lot. In the early days of GG, factors such as memory load, attention, pragmatic suitability, semantic sensibility (among other factors) were highlighted in addition to grammaticality. So, it was understood early on that many many factors went into an acceptability judgment, with grammaticality being just one relevant feature. Indeed, this observation is what lies behind the competence/performance distinction (a point that C&C seems not to appreciate (see p. 3), the distinction aiming to isolate the grammatical factors behind acceptability, thereby, among other things, leaving room for other factors to play a role.[3]

And this was not just hand waving or theory protecting talk (again contra C&C, boy is its discussion DUMB!!). A good deal of work was conducted early on trying to understand how grammatical structure could interact with these other factors to burden memory load and increase perceived unacceptability (just think of the non-trivial distinction between center and self embedding and its implications for memory architecture).[4] This kind of work proceeds apace even today, with grammaticality understood to be one of the many factors that go into making judgments gradiently acceptable.[5] Indeed, there is no grammatically informed psycho-linguistic work done today (or before) that doesn’t understand that G/UG capacities are but one factor among others needed to explain real time acquisition, parsing, production, etc. UG is one factor in accounting for G acquisition (as Jeff Lidz, Charles Yang, Lila Gleitman etc. have endlessly emphasized) and language particular Gs are just one factor in explaining parsability (which is, in turn, one factor in underlying acceptability) (as Colin Phillips, Rick Lewis, Shravan Vasishth, Janet Fodor, Bob Berwick, Lyn Frazier, Jon Sprouse, etc. etc. etc. have endlessly noted). Nobody denies the C&C truism that language use involves multiple interacting variables. Nobody is that stupid!

So, C&C is correct in noting that if one’s interest is in figuring out how language is deployed/acquired/produced/parsed/etc. then much more than a competence theory will be required. This is not news. This is not even an insight. The question is not if this is so, but how it is so. Given this, the relevant question is: what tree is C&C barking up by suggesting that this is contentious?

I have two hypotheses. Here they are.

1.     C&C doesn’t take G features to be at all relevant to acceptability.
2.     C&C favors a holistic rather than an analytic approach to explaining interaction effects in language.

Let’s discuss each in turn.

C&C is skeptical that grammaticality is a real feature of natural language expressions. In other words, C&C's beef with the traditional GG conception in which G/UG properties are one factor among many lies with assigning G/UG any role at all. This is not as original as it might sound. In fact, it is quite a traditional view, one that Associationists and Structuralists held about 70 years ago. It is the view that GG defenestrated, but apparently, did not manage to kill (next time from a higher floor please). The view amounts to the idea that G regularities (C&C is very skeptical that UG properties exist at all, I return to this presently) are just probabilistic generalizations over available linguistic inputs. This is the view embodied in Structuralist discovery procedures (and suggested in current Deep Learning approaches) wherein levels were simple generalizations over induced structures of a previous lower level. Thus, all there is to grammar is successively more abstract categories built up inductively from lower level less abstract categories. On this view, grammatical categories are classes of words, which are definable as classes of morphemes, which are definable as classes of phonemes, which are definable as classes of phones. The higher levels are, in effect, simple inductive generalizations over lower level entities. The basic thought is that higher-level categories are entirely reducible to lower level distributional patterns. Importantly, in this sort of analysis, there are no (and can be no) interesting theoretical entities, in the sense of real abstract constructs that have empirical consequences but are not reducible or definable in purely observational terms. In other words, on this view, syntax is an illusion and the idea that it makes an autonomous contribution to acceptability is a conceptual error.

Now, I am not sure whether C&C actually endorses this view, but it does make noises in that direction. For example, it endorses a particular conception of constructions and puts it “at the heart” of its “alternative framework” (4). The virtues of C&C constructions is that they are built up from smaller parts in a probabilistically guided manner. Here is C&C (4):

At the heart of this emerging alternative framework are constructions , which are  learned pairings of form and meaning ranging from meaningful parts of words (such as word endings, for example, ‘-s’, ‘-ing’) and words themselves (for example, ‘penguin’) to multiword sequences (for example, ‘cup of tea’) to lexical patterns and schemas (such as, ‘the X-er, the Y-er’, for example, ‘the bigger, the better’). The quasi-regular nature of such construction grammars allows them to capture both the rule-like patterns as well as the myriad of exceptions that often are excluded by fiat from the old view built on abstract rules. From this point of view, learning a language is learning the skill of using constructions to understand and produce language. So, whereas the traditional perspective viewed the child as a mini-linguist with the daunting task of deducing a formal grammar from limited input, the construction-based framework sees the child as a developing language-user, gradually honing her language-processing skills. This requires no putative universal grammar but, instead, sensitivity to multiple sources of probabilistic information available in the linguistic input: from the sound of words to their co-occurrence patterns to information from semantic and pragmatic contexts.

This quote does not preclude a distinctive Gish contribution to acceptability, but its dismissal of any UG contribution to the process suggests that it is endorsing a very strong rejection of the autonomy of syntax thesis.[6] Let me repeat, a commitment to the centrality of constructions does not require this. However, the C&C version seems to endorse it. If this is correct, then C&C sees the central problem with modern GG is its commitment to the idea that syntactic structure is not reducible to either statistical distributional properties or semantic or pragmatic or phonological or phonetic properties of utterances. In other words, C&C rejects the GG idea that grammatical structure is real and makes any contribution to the observables we track through acceptability.

This view is indeed radical, and virtually certain to be incorrect.[7] If there is one thing that all linguists agree on (including constructionists like Jackendoff and Culicover) it’s that syntax is real. It is not reducible to other factors. And if this is so, then G structure exists independently of other factors. I also think that virtually all linguists believe that syntax is not the sum of statistical regularity in the PLD.[8] And there is good reason for this; it is morally certain that many of the grammatical factors that linguists have identified over the last 60 years have linguistically proprietary roots and leave few footmarks in the PLD. To argue that this standard picture is false requires a lot of work, none of which C&C does or points to. Of course, C&C cannot be held responsible for this failing, for C&C has no idea what this work argues because C&C’s authors appear never to have never read any of it (or, if it has been read, it has not been understood, see above). But were C&C informed by any of this work, it would immediately appreciate that it is nuts to think that it is possible to eliminate G features as one factor in acceptability.[9]

In sum, one possible reading of C&C is that it endorses the old Structuralist idea of discovery procedures, denies the autonomy of syntax thesis (i.e. the thesis that syntax is “real”) and believes in the (yes I got to say it) the old Empiricist/Associationist trope that language capacity is nothing but a reflection of tracked statistical regularities. It’s back folks. No idea ever really dies, no matter how unfounded and implausible and how many times it has been stabbed through the heart with sharp arguments.

Before going on to the second point, let me add a small digression concerning constructions. Look, anyone who works on the G of a particular language endorses some form of constructionism (see here for some discussion). Everyone assumes that morphemes have specific requirements, with specific selection restrictions. These are largely diacritical and part of the lexical entry of the morpheme. Gs are often conceived as checking these features in the course of a derivation and one of the aims of a theory of Gs (UG) is to specify the structural/derivational conditions that regulate this feature checking. Thus, everyone’s favorite language specific G has some kinds of constructions that encode information that is not reducible to FL or UG principles (or not so reducible as far as we can tell). 

Moreover, it is entirely consistent with this view that units larger than morphemes code this kind of information. The diacritics can be syncategorematic and might grace structures that are pretty large (though given something like an X’ syntax with heads or a G with feature percolation the locus of the diacritical information can often be localized on a “listable” linguistic object on the lexicon). So, the idea that C&C grabs with both hands and takes to be new and revolutionary is actually old hat. What distinguishes the kind of constructionism one finds in C&C from the more standard variety found in standard work is the idea central to GG that constructions are not “arbitrary.” Rather, constructions have a substructure regulated by more abstract principles of grammar (and UG). C&C seems to think that anything can be a construction. But we know that this is false.[10] Constructions obey standard principles of Grammar (e.g. no mirror image constructions, no constructions that violate the ECP or binding theory, etc.). So though there can be many kinds of constructions that compile all sorts of diverse information there are some pretty hard constraints regulating what a possible construction is.

Why do I mention this? Because I could not stop myself! Constructions lie at the heart of C&C’s “alternative framework” and nonetheless C&C has no idea what they are, that they are standard fare in much of standard GG (even minimalist Gs are knee deep in such diacritical features) and that they are not the arbitrary pairings that C&C takes them to be. In other words, once again C&C is mind numbingly ignorant (or, misinformed).

So that’s one possibility. C&C denies G properties are real. There is a second possible assumption, one that does not preclude this one and is often found in tandem with it, but is nonetheless different. The second problem C&C sees with the standard view lies with its analytical bent. Let me explain.

The standard view of performance within linguistics is that it involves contributions of many factors. Coupled with this is a methodology: The right way to study these is to identify the factors involved, figure out their particular features and see how they combine in complex cases. One of the problems with studying such phenomena is that the interacting factors don’t always nicely add up. In other words, we cannot just add the contributions of each component together to get a nice well-behaved sum at the end. That’s what makes some problems so hard to solve analytically (think turbulence). But, that’s still the standard way to go about matters.  GG endorsed this view from the get-go. To understand how language works in the wild, figure out what factors go into making, say, an utterance, and see how these factors interact. Linguists focused on one factor (G and UG) but understood that other factors also played a role (e.g. memory, attention, semantic/pragmatic suitability etc.). The idea was that in analyzing (and understanding) any bit of linguistic performance, grammar would be one part of the total equation, with its own distinctive contribution.[11]

Two things are noteworthy about this. First, it is hard, very hard. It requires understanding how (at least) two “components” function as well as understanding how they interact.  As interactions need not be additive, this can be a real pain, even under ideal conditions where we really know a lot (that’s why engineers need to do more than simply apply the known physics/chemistry/biology). Moreover, interaction effects can be idiosyncratic and localized, working differently in different circumstances (just ask your favorite psycho/neuro linguist about task effects). So, this kind of work is both very demanding and subject to quirky destabilizing effects. Recall Fodor’s observation: the more modular a problem is, the more likely it is solvable at all. This reflects the problems that interaction effects generate.[12]

At any rate, this is the standard way science proceeds when approaching complex phenomena. It factors it into its parts and then puts these parts back together. It is often called atomism or reductionsism but it is really just analysis with synthesis and it has proven to be the only real game in town.[13] That said, many bridle at this approach and yearn for more holistic methods. Connectionsists used to sing the praises of holism: only the whole system computes! You cannot factor a problem into its parts without destroying it. Holists often urge simulation in place of analysis (let’s see how the whole thing runs). People like me find this to be little more than the promotion of obscurantism (and not only me, see here for a nice take down in the domain of face perception).

Why do I mention this here? Because, there is a sense in which C&C seems to object not only to the idea that Grammar is real, but also to the idea that the right way to approach these interaction effects is analytically. C&C doesn’t actually say this, but it suggests it in its claims that the secret to understanding language in the wild lies with how all kinds of information are integrated quickly in the here and now. The system as a whole gives rise to structure (which “may [note the weasel word here, btw, NH] explain why language structure and processing is highly local in the linguistic signal” (5))[14] and the interaction of the various factors eases the interpretation problem (though as Gleitman and Trusewell and friends have shown, having too much information is itself a real problem (see here, here and here.)). The prose in C&C suggests to me that only at the grain of the blooming buzzing interactive whole will linguistic structure emerge. If this is right, then the problem with the standard view is not merely that it endorses the reality of grammar, but that it takes the right approach to be analytic rather than holistic. Again, C&C does not expressly say this, but it does suggest it, and it makes sense of its dismissal of “fragmented” investigations of the complex phenomenon. In their view, we need to solve all the problems at once and together, rather than piecemeal and then fit them together. Of course, we all know that there is no “best” way to proceed in these complex matters; that sometimes a more focused view is better and sometimes a more expansive one. But the idea that an analytic approach is “doomed to fail” (1) surely bespeaks an antipathy towards the analytic approach to language.

An additional point: note that if one thinks that all there is to language is statistically piecing together of diverse kinds of information then one is really against the idea that language in the wild is the result of interacting distinct modules with their own powers and properties. This, again, is an old idea. Again you all know who believed this (hint, starts with an E). So, if one were looking for an overarching unifying theme in C&C, one that is not trotted out explicitly but holds the paper together, then one could do worse than look to Associationism/Empiricism. This is the glue that holds the various parts together, from the hostility to the very idea that grammars are real to the conviction that the analytic approach (standard in the sciences) is doomed to failure.

There is a lot of other stuff in this paper that is also not very good (or convincing). But, I leave it as an exercise to the reader to find these and dispose of them (take a look at the discussion of language and cultural evolution for a real good time (on p.6). I am not sure, but it struck me as verging on the incoherent and mixing up the problem of language change with the problem of the emergence of a facility for language). Suffice it to say that C&C adds another layer to the pile of junk written on language perpetrated on the innocent public by the prestige journals. Let me end with a small rant on this.

C&C appeared in Nature. This is reputed to be a fancy journal with standards (don’t believe it for a moment. It’s all show business now).[15] I doubt that Nature believes that it publishes junk. Maybe it takes it to be impossible to evaluate opinion or “comment” pieces. Maybe it thinks that taste cannot be adjudicated. Maybe. But I doubt it. Rather, what we are witnessing here is another case of Chomsky bashing, with GG as collateral damage. It is not only this, but it is some of this. The other factor is the rise of big data science. I will have something to say about this in a later post. For now, take a look at C&C. It’s the latest junk installment of a story that doesn’t get better with repetition. But in this case all the arguments are stale as well as being dumb. Maybe their shelf expiration date will come soon. One can only hope even if such hope is irrational given the evidence.


[1] Type ‘Evans and Levinson’ (cited in C&C) or ‘Vyvyan Evans’ or ‘Everett’ in the search section for a bevy of replies to the old tired incorrect claims that C&C throws out like confetti at a victory parade.
[2] Actually, I assume that Chomsky’s observations are just another footnote to Plato or Aristotle, though I don’t know what text he might have been footnoting but, as you know the guy LOVES footnotes!
[3] The great sin of Generative Semantics was to conflate grammaticality and acceptability by, in effect, treating any hint of unacceptability as something demanding a grammatical remedy.
[4] I should add that the distinction between these two kinds of structures (center vs self embedding) is still often ignored or run together. At time, it makes one despair about whether there is any progress at all in the mental sciences.
[5] And which, among other things, led Chomsky to deny that there is a clean grammatical/ungrammatical distinction, insisting that there are degrees of grammaticality as well as the observed degrees of acceptability. Jon Sprouse is the contemporary go-to person on these issues.
[6] And recall, the autonomy of syntax thesis is a very weak claim. It states that syntactic structure is real and hence not reducible to observable features of the linguistic expression. So syntax is not just a reflex of meaning or sound or probabilistic distribution or pragmatic felicity or… Denying this weak claim is thus a very strong position.
[7] There is an excellent discussion of the autonomy of syntax and what it means and why it is important in the forthcoming anniversary volume on Syntactic Structures edited by Lasnik, Patel-Grosz, Yang et moi. It will make a great stocking stuffer for the holidays so shop early and often.
[8] Certainly Jackendoff, the daddy of constructionism has written as much.
[9] Here is a good place to repeat sotto voce and reverentially: ‘Colorless green ideas sleep furiously’ and contrast it with ‘Furiously sleep ideas green colorless.’
[10] Indeed, if I am right about the Associationist/Empiricist subtext in C&C then C&C does not actually believe that there are inherent limits on possible constructions. On this reading of C&C the absence of mirror image constructions is actually just a fact about their absence in the relevant linguistic environment. They are fine potential constructions. They just happen not to occur.  One gets a feeling that this is indeed what C&C thinks by noting how impressed it is with “the awe-inspiring diversity of the world’s languages” (6). Clearly C&C favors theories that aim for flexibility to cover this diversity. Linguists, in contrast, often focus on “negative facts,” possible data that is regularly absent. These have proven to be reliable indicators of underlying universal principles/operations. The fact that C&C does not mention this kind of datum is, IMO, a pretty good indicator that it doesn’t take it seriously. Gaps in the data are accidents, a position that any Associationist/Empiricist would naturally gravitate towards. In fact, if you want a reliable indicator of A/E tendencies look for a discussion of negative data. If it does not occur, I would give better than even odds that you are reading the prose of a card carrying A/Eer.
[11] Linguists do differ on whether this is a viable project in general (i.e. likely to be successful). But this is a matter of taste, not argument. There is no way to know without trying.
[12] For example, take a look at this recent piece on the decline of the bee population and the factors behind it. It ends with a nice discussion of the (often) inscrutable complexity of interaction effects:

Let's add deer to the list of culprits, then. And kudzu. It's getting to be a long list. It's also an indication of what a complex system these bees are part of. Make one change that you don't think has anything to do with them -- develop a new pesticide, enact a biofuels subsidy, invent the motorized lawnmower -- and the bees turn out to feel it.

[13] Actually, it used to be the only game in town. There are some urging that scientific inquiry give up the aim of understanding. I will be writing a post on this anon.
[14] This btw, is not even descriptively correct given the myriad different kinds of locality that linguists have identified. Indeed, so far as I know, there is no linear bound between interacting morphemes anywhere in syntax (e.g. agreement, binding, antecedence, etc.).
[15] It’s part of the ethos of the age. See here for the theme song.

5 comments:

  1. George Miller and Philip Johnson-Laird began their tome Language and Perception (1977) by complaining: "A repeated lament of those who would understand human nature is that everything is related to everything else", and ended it in an optimistic mood, BECAUSE they broke down language and perception into manageable pieces: "One exciting aspect of the study of language is the feeling, lacking in some areas of psychology, the problems are tractable, that the work has a cumulative effect, and that even mistakes can ultimately lead to advances in knowledge".

    C&C? Classic bullshit (http://facultyoflanguage.blogspot.com/2013/03/science-bullshit.html): everything is related to everything else, heads I win, tails you lose.

    ReplyDelete
  2. Nice article great post comment information thanks for sharing

    ดูหนัง

    ReplyDelete
  3. The piece discussed here doesn't appear in Nature but in Nature Human Behaviour, a fairly new journal from the Nature group.

    It seems unfair to call Jackendoff "the daddy of constructionism" (in your footnote 8). If you want to identify a founding figure, it would be more reasonable to pick Adele Goldberg. If you insist on a male founding figure, Fillmore would be the obvious choice over Jackendoff.

    ReplyDelete
    Replies
    1. Actually, grandad is Chomsky in LSLT and SS where rules are construction based. The idea that these were really fundamental units goes back to Emmon Bach, I believe. But you are right Goldberg is also a player, as is Culicover. Jackendof was useful to me as he clearly believes in UG and so is someone that both likes constructions and buys grammar as real. I have no idea if Goldberg buys this.

      Delete