Faculty of Language: April 2017

Thursday, April 27, 2017

How biological is biolinguistics?

My answer: very, and getting more so all the time. This view will strike many as controversial. For example Cedric Boeckx (here and here) and David Berlinsky (here) (and most linguistics in discussions over beer) contend that linguistics is a BINO (biology in name only). After all, there is little biochemistry, genetics, or cellular biology in current linguistics, even of the Minimalist variety. Even the evolang dimension is largely speculative (though, IMO, this does not distinguish it from most of the “seripous” stuff in the field). And, as this is what biology is/does nowadays, then, the argument goes, linguistic pronouncements cannot have biological significance and so the “bio” in biolinguistics is false advertising. That’s the common wisdom as best as I can tell, and I believe it to be deeply (actually, shallowly) misguided. How so?

A domain of inquiry, on this view, is defined by its tools and methods rather than its questions. Further, as the tools and methods of GG are not similar to those found in your favorite domain of biology then there cannot be much bio in biolinguistics. This is a very bad line of reasoning, even if some very smart people are pushing it. In my view, it rests on pernicious dualist assumptions which, had they been allowed to infect earlier work in biology, would have left it far poorer than it is today. Let me explain.

First, the data linguists use is biological data: we study patterns which would be considered contenders for Nobel Prizes in Medicine and Physiology (i.e. bio Nobels) were they emitted by non humans. Wait, would be? No, actually were. Unraveling the bee waggle dance was Nobel worthy. And what’s the waggle dance? It’s the way a bee “articulates” (in a sign language sort of way, but less sophisticated) how far and in what direction honey lies. In other words, it is a way for bees to map AP expressions onto CI structures that convey a specific kind of message. It’s quite complicated (see here), and describing it’s figure 8 patterns (direction and size) and how they related to the position of the sun and the food source is what won von Frisch the prize in Physiology and Medicine. In other words, von Frisch won a bio Nobel for describing a grammar of the bee dance.

And it really was “just” a G, with very little “physiology” or “medicine” implicated. Even at the present time, we appear to know very little about either the neural or genetic basis of the dance or its evolutionary history (or at least Wikipedia and a Google search seems to reveal little beyond anodyne speculations like “Ancestors to modern honeybees most likely performed excitatory movements to encourage other nestmates to forage” or “The waggle dance is thought to have evolved to aid in communicating information about a new nest site, rather than spatial information about foraging sites” (Wikipedia)). Nonetheless, despite the dearth of bee neurophysiology, genetics or evo-bee-dance evolutionary history, the bio worthies granted it a bio Nobel! Now here is my possibly contentious claim: describing kinds of patterns humans use to link articulations to meanings is no less a biological project than is describing waggle dance patterns. Or, to paraphrase my good and great friend Elan Dresher: if describing how a bunch of bees dance is biology so too is describing how a bunch of Parisians speak French.

Second, it’s not only bees! If you work on bird songs or whale songs or other forms of vocalization or vervet monkey calls you are described as doing biology (look at the journals that publish this stuff)! And you are doing biology even if you are largely describing the patterns of these songs/calls. Of course, you can also add a sprinkle of psychology to the mix and tentatively describe how these calls/songs are acquired to cement your biological bona fides. But, if you study non human vocalizations and their acquisition then (apparently) you are doing biology, but if you do the same thing in humans apparently you are not. Or, to be more precise, describing work on human language as biolinguistics is taken to be wildly inappropriate while doing much the same thing with mockingbirds is biology. Bees, yes. Whales and birds, sure. Monkey calls, definitely. Italian or Inuit; not on your life! Dualism anyone?

As may be evident, I think that this line of reasoning is junk best reserved for academic bureaucrats interested in figuring out how to demarcate the faculty of Arts from that of Science. There is every reason to think that there is a biological basis for human linguistic capacity and so studying manifestations of this capacity and trying to figure out its limits (which is what GG has been doing for well over 60 years) is biology even if it fails to make contact with other questions and methods that are currently central in biology. To repeat, we still don’t know the neural basis or evolutionary etiology of the waggle dance but nobody is lobbying for rescinding von Frisch’s Nobel.

One can go further: Comparing modern work in GG and early work in genetics leads to a similar conclusion. I take it as evident that Mendel was doing biology when he sussed out the genetic basis for the phenotypic patterns in his pea plant experiments. In other words, Mendel was doing biogenetics (though this may sound redundant to the modern ear). But note, this was biogenetics without much bio beyond the objects of interest being pea plants and the patterns you observe arising when you cross breed them. Mendel’s work involved no biochemistry, no evolutionary theory, no plant neuro-anatomy or plant neuro-physiology. There were observed phenotypic patterns and a proposed very abstract underlying mechanism (whose physical basis was a complete mystery) that described how these might arise. As we know, it took the rest of biology a very long time to catch up with Mendel’s genetics. It took about 65 years for evolution to integrate these findings in the Modern Synthesis and almost 90 years until biology (with the main work carried out by itinerant physicists) figured out how to biochemically ground it in DNA. Of course, Mendel’s genetics laid the groundwork for Watson and Crick and was critical to making Darwinian evolution conceptually respectable. But, and this is the important point here, when first proposed, its relation to other domains of biology was quite remote. My point: if you think Mendel was doing biology then there is little reason to think GGers aren’t. Just as Mendel identified what later biology figured out how to embody, GG is identifying operations and structures that the neurosciences should aim to incarnate. Moreover, as I discuss below, this melding of GG with cog-neuro is currently enjoying a happy interaction somewhat analogous to what happened with Mendel before.

Before saying more, let me make clear that of course biolinguists would love to make more robust contact with current work in biology. Indeed, I think that this is happening and that Minimalism is one of the reasons for this. But I will get to that. For now let’s stipulate that the more interaction between apparent disparate domains of research the better. However, absence of apparent contact and the presence of different methods does not mean that subject matters differ. Human linguistic capacity is biologically grounded. As such inquiry into linguistic patterns is reasonably considered a biological inquiry about the cognitive capacities of a very specific animal; humans. It appears that dualism is still with us enough to make this obvious claim contentious.

The point of all of this? I actually have two: (i) to note that the standard criticism of GG as not real biolinguistics at best rests on unjustified dualist premises (ii) to note that one of the more interesting features of modern Minimalist work has been to instigate tighter ties with conventional biology, at least in the neuro realm. I ranted about (i) above. I now want to focus on (ii), in particular a recent very interesting paper by the group around Stan Dehaene. But first a little segue.

I have blogged before on Embick and Poeppel’s worries about the conceptual mismatch between the core concepts in cog-neuro and those of linguistics (here for some discussion). I have also suggested that one of the nice features of Minimalism is that it has a neat way of bringing the basic concepts closer together so that G structure and its bio substructure might be more closely related. In particular, a Merge based conception of G structure goes a long way towards reanimating a complexity measure with real biological teeth. In fact, it is effectively a recycled version of the DTC, which, it appears, has biological street cred once again.[1] The cred is coming from work showing that one can take the neural complexity of a structure as roughly indexed by the number of Merge operations required to construct it (see here). A recent paper goes the earlier paper one better by embedding the discussion in a reasonable parsing model based on a Merge based G. The PNAS paper (Henceforth Dehaene-PNAS) (here) has a formidable cast of authors, including two linguists (Hilda Koopman and John Hale) orchestrated by Stan Dehaene. Here is the abstract:

Although sentences unfold sequentially, one word at a time, most linguistic theories propose that their underlying syntactic structure involves a tree of nested phrases rather than a linear sequence of words. Whether and how the brain builds such structures, however, remains largely unknown. Here, we used human intracranial recordings and visual word-by-word presentation of sentences and word lists to investigate how left-hemispheric brain activity varies during the formation of phrase structures. In a broad set of language-related areas, comprising multiple superior temporal and inferior frontal sites, high-gamma power increased with each successive word in a sentence but decreased suddenly whenever words could be merged into a phrase. Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity. More superficial models of language, based solely on sequential transition probability over lexical and syntactic categories, only captured activity in the posterior middle temporal gyrus. Formal model comparison indicated that the model of multiword phrase construction provided a better fit than probability- based models at most sites in superior temporal and inferior frontal cortices. Activity in those regions was consistent with a neural implementation of a bottom-up or left-corner parser of the incoming language stream. Our results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.

A few comments, starting with a point of disagreement: Whether the brain builds hierarchical structures is not really an open question. We have tons of evidence that it does, evidence that linguists a.o. have amassed over the last 60 years. How quickly the brain builds such structure (on line, or in some delayed fashion) and how the brain parses incoming strings in order to build such structure is still opaque. So it is misleading to say that what Dehaene-PNAS shows is both that the brain does this and how. Putting things this way suggests that until we had such neural data these issues were in doubt. What the paper does is provide neural measures of this structure building processes and provides a nice piece of cog-neuro inquiry where the cog is provided by contemporary Minimalism in the context of a parser and the neuro is provided by brain activity in the gamma range.

Second, the paper demonstrates a nice connection between a Merge based syntax and measures of brain activity. Here is the interesting bit (for me, my emphasis):

Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity.

Merged based Gs treat all combinations as equal regardless of the complexity of the combinations or differences among the items being combined. If Merge is the only operation, then it is easy to sum the operations that provide the linguistic complexity. It’s just the same thing happening again and again and on the (reasonable) assumption that doing the same thing incurs the same cost we can (reasonably) surmise that we can index the complexity of the task by adding up the required Merges. Moreover, this hunch seems to have paid off in this case. The merges seem to map linearly onto brain activity as expected if complexity generated by Merge were a good index of the brain activity required to create such structures. To put this another way: A virtue of Merge (maybe the main virtue for the cog-neuro types) is that it simplifies the mapping from syntactic structure to brain activity by providing a common combinatory operation that underlies all syntactic complexity.[2] Here is Dehaene-PNAS paper (4):

A parsimonious explanation of the activation profiles in these left temporal regions is that brain activity following each word is a monotonic function of the current number of open nodes at that point in the sentence (i.e., the number of words or phrases that remain to be merged).

This makes for a limpid trading relation between complexity as measured cognitively and as measured brain-wise transparent when implemented in a simple parser (note the weight carried by “parsimonious” in the quote above). What the paper argues is that this simple transparent mapping has surprising empirical virtues and part of what makes it simple is the simplicity of Merge as the basic combinatoric operation.

There is lots more in this paper. Here are a few things I found most intriguing.

A key assumption of the model is that combining the words into phrases occurs after the word at the left edge of the constituent boundary (2-3):

…we reasoned that a merge operation should occur shortly after the last word of each syntactic constituent (i.e., each phrase). When this occurs, all of the unmerged nodes in the tree comprising a phrase (which we refer to as “open nodes”) should be reduced to a single hierarchically higher node, which becomes available for future merges into more complex phrases.

This assumption drives the empirical results. Note that it indicates that structure is being built bottom-up. And this assumption is a key feature of a Merge based G that assumes something like Extension. As Dehaene-PNAS puts it (4):

The above regressions, using “total number of open nodes” as an independent variable, were motivated by our hypothesis that a single word and a multiword phrase, once merged, contribute the same amount to total brain activity. This hypothesis is in line with the notion of a single merge operation that applies recursively to linguistic objects of arbitrary complexity, from words to phrases, thus accounting for the generative power of language

If the parsing respects the G principle of Extension then it will have to build structure in this bottom up fashion. This means holding the “open” nodes on a stack/memory until this bottom up building can occur. The Dehaene-PNAS paper provides evidence that this is indeed what happens.

What kind of evidence? The following (3) (my emphasis):

We expected the items available to be merged (open nodes) to be actively maintained in working memory. Populations of neurons coding for the open nodes should therefore have an activation profile that builds up for successive words, dips following each merge, and rises again as new words are presented. Such an activation profile could follow if words and phrases in a sentence are encoded by sparse overlapping vectors of activity over a population of neurons (27, 28). Populations of neurons involved in enacting the merge operation would be expected to show activation at the end of constituents, proportional to the number of nodes being merged. Thus, we searched for systematic increases and decreases in brain activity as a function of the number of words inside phrases and at phrasal boundaries.

So, a Merge based parser that encodes Extension should show a certain brain activity rhythm indexed to the number of open nodes in memory and the number of Merge operations executed. And this is what the paper found.

Last, and this is very important: the paper notes that Gs can be implemented in different kinds of parsers and tries to see which one best fits the data in their study. There is no confusion here between G and parser. Rather, it is recognized that the effects of a G in the context of a parser can be investigated, as can the details of the parser itself. It seems that for this particular linguistic task, the results are consistent either a bottom-up or left corner parser, with the latter being a better fit for this data (7):

Model comparison supported bottom-up and left-corner parsing as significantly superior to top-down parsing in fitting activation in most regions in this left-hemisphere language network…

Those findings support bottom-up and/or left-corner parsing as tentative models of how human subjects process the simple sentence structures used here, with some evidence in favor of bottom-up over left-corner parsing. Indeed, the open-node model that we proposed here, where phrase structures are closed at the moment when the last word of a phrase is received, closely par- allels the operation of a bottom-up parser.

This should not be that surprising a result given the data that the paper investigates. The sentences of interest contain no visible examples where left context might be useful for downstream parsing (e.g. Wh element on the left edge (see Berwick and Weinberg for discussion of this)). We have here standard right branching phrase structure and for these kinds of sentences non-local left context will be largely irrelevant. As the paper notes (8), the results do “not question the notion that predictability effects play a major role in language processing” and as it further notes there are various kinds of parsers that can implement a Merge based model, including those where “prediction” plays a more important role (e.g. left-corner parsers).
That said, the interest of Dehaene-PNAS lies not only in the conclusion (or maybe not even mainly there), but in the fact that it provides a useful and usable model for how to investigate these computational models in neuro terms. That’s the big payoff, or IMO, the one that will pay dividends in the future. In this, it joins the earlier Pallier et al and the Ding et al papers. They are providing templates for how to integrate linguistic work with neuro work fruitfully. And in doing so, they indicate the utility of Minimalist thinking.
Let me say a word about this: what cog-neuro types want are simple usable models that have accessible testable implications. This is what Minimalism provides. We have noted the simplicity that Merge based models afford to the investigations above; a simple linear index of complexity. Simple models are what cog-neuro types want, and for the right reasons. Happily, this is what Minimalism is providing and we are seeing its effects in this kind of work.
An aside: let’s hear it for stacks! The paper revives classical theories of parsing and revives the idea that brains have stacks important for the parsing of hierarchical structures. This idea has been out of favor for a long time. One of the major contributions of the Dehaene-PNAS paper is to show that dumping it was a bad idea, at least for language, and, most likely, other domain where hierarchical organization is essential.
Let me end: there is a lot more in the Dehaene-PNAS paper. There are localization issues (where the operations happen) and arguments showing that simple probability based models cannot survive the data reviewed. But for current purposes there is a further important message: Minimalism is making it easier to put a lot more run of the mill everyday bio into biolinguistics. The skepticism about the biological relevance of GG and Minimalism for more bio investigation is being put paid by the efflorescence of intriguing work that combines them. This is what we should have expected. It is happening. Don’t let anyone tell you that linguistics is biologically inert. At least in the brain sciences, it’s coming into its own, at last![3]

[1] Alec Marantz argued that the DTC is really the only game in town. Here’s a quote:

…the more complex a representation- the longer and more complex the linguistic computations necessary to generate the representation- the longer it should take for a subject to perform any task involving the representation and the more activity should be observed in the subject’s brain in areas associated with creating or accessing the representation or performing the task.

For discussion see here.

[2] Note that this does not say that only a Merge based syntax would do this. It’s just that Merge systems are particularly svelt systems and so using them is easy. Of course many Gs will have Mergish properties and so will also serve to ground the results.

[3] IMO, it is also the only game in town when it comes to evolang. This is also the conclusion of Tatersall in his review of Berwick and Chomsky’s book. So, yes, there is more than enough run of the mill bio to license the biolinguistics honorific.

Thursday, April 20, 2017

Cedric Boeckx replies to some remarks by Hornstein on Berwick and Chomsky's "Why Only Us"

Norbert Hornstein commented on my "quite negative" review of Berwick and Chomsky's book _Why Only Us_ published in Inference (http://inference-review.com/article/not-only-us). Here is the link to his comments, posted here on the Faculty of Language blog: http://facultyoflanguage.blogspot.com.es/2017/04/inference-to-some-great-provocations.html

I want to begin by thanking him for reading the review and sharing his thoughts. The gist of his remarks is, if I read him right, that he is "not sure" what I find wrong with _Why Only Us_. He concludes by saying that "Why [Boeckx] doesn’t like (or doesn’t appear to like) this kind of story escapes me." I found the paragraphs he devotes to my review extremely useful, as they articulate, with greater clarity than what I could find in _Why Only Us_, some of the intuitions that animate the kind of evolutionary narrative advocated in the book I reviewed: The "All you need is Merge" approach. Hornstein's points give me the opportunity to stress some of the reasons why I think this kind of approach is misguided. This is what I want to express here.

In so far as I can see, Hornstein makes the following claims (which are, indeed, at the heart of _Why Only Us_) [the claims appear in roughly the order found on Hornstein's original post]

1. Hornstein seems to endorse Tattersall's oft-repeated claim, used in Why Only Us, that there is a link between linguistic abilities and the sort of symbolic activities that have been claimed to be specific to humans. This is important because the fossil evidence for these activities is used to date the emergence of the modern linguistic mind; specifically, to argue for a very recent emergence of the modern language faculty. Here is the relevant passage:

"in contrast to CB [Boeckx], IT [Tattersall] thinks it pretty clear that this “symbolic activity” is of “rather recent origin” and that, “as far as can be told, it was only our lineage that achieved symbolic intelligence with all of its (unintended) consequences” (1). If we read “symbolic” here to mean “linguistic” (which I think is a fair reading), it appears that IT is asking for exactly the kind of inquiry that CB thinks misconceived."

Perhaps it is too strong to say that Hornstein endorses it, but clearly he does not buy my skepticism towards this kind of argument, expressed in my review (and backed up with references to works questioning Tattersall that unfortunately Hornstein does not discuss, delegating to Tattersall as the unique expert.)

2. Hornstein grants me "several worthwhile points"; specifically, my claims that "there is more to language evolution than the emergence of the Basic Property (i.e. Merge and discretely infinite hierarchically structured objects) and that there may be more time available for selection to work its magic than is presupposed". Hornstein writes that "many would be happy to agree that though BP is a distinctive property of human language it may not be the only distinctive linguistic property." He continues "CB is right to observe that if there are others (sometimes grouped together as FLW vs FLN) then these need to be biologically fixed and that, to date, MP has had little to say about these. One might go further; to date it is not clear that we have identified many properties of FLW at all. Are there any?" Later on, he writes "CB is quite right that it behooves us to start identifying distinctive linguistic properties beyond the Basic Property and asking how they might have become fixed. And CB is also right that this is a domain in which comparative cognition/biology would be very useful", but stresses that "It is less clear that any of this applies to explaining the evolution of the Basic Property itself."

3. Hornstein seems to think that my problem is that I "think that B[erwick]&C[homsky] are too obsessed with" recursion (or Merge). He goes on: "But this seems to me an odd criticism. Why? Because B&C’s way into the ling-evo issues is exactly the right way to study the evolution of any trait: First identify the trait of interest. Second, explain how it could have emerged. B&C identify the trait (viz. hierarchical recursion) and explain that it arose via the one time (non-gradual) emergence of a recursive operation like Merge. The problem with lots of evo of lang work is that it fails to take the first step of identifying the trait at issue. ... If one concedes that a basic feature of FL is the Basic Property, then obsessing about how it could have emerged is exactly the right way to proceed"

4. He thinks that my "discussion is off the mark" (specifically, my insistence on descent with modification and bottom-up approaches in the review) because Merge "is not going to be all that amenable to any thing but a “top-down, all-or-nothing” account". "What I mean", Hornstein says, "is that recursion is not something that takes place in steps"; "there is no such thing as “half recursion” and so there will be no very interesting “descent with modification” account of this property. Something special happened in humans. Among other things this led to hierarchical recursion. And this thing, whatever it was, likely came in one fell swoop. This might not be all there is to say about language, but this is one big thing about it and I don’t see why CB is resistant to this point."

5. Hornstein stresses that he "doubt[s] that hierarchical recursion is the whole story (and have even suggested that something other than Merge is the secret sauce that got things going), I do think that it is a big part of it and that downplaying its distinctiveness is not useful." He goes on: we "can agree that evolution involves descent with modification. The question is how big a role to attribute to descent and how much to modification (as well as how much modification is permitted). The MP idea can be seen as saying that much of FL is there before Merge got added. Merge is the “modification” all else the “descent.” "No mystery about the outline of such an analysis, though the details can be very hard to develop"... "it is hard for me to see what would go wrong if one assumed that Merge (like the third color neuron involved in trichromatic vision (thx Bill for this)) is a novel circuit and that FL does what it does by combining the powers of this new operation with those cognitive/computational powers inherited from our ancestors. That would be descent with modification"

6. He sums up: "The view Chomsky (and Berwick and Dawkins and Tattersall) favor is that there is something qualitatively different between language capable brains and ones that are not. This does not mean that they don’t also greatly overlap. It just means that they are not capacity congruent. But if there is a qualitative difference (e.g. a novel kind of circuit) then the emphasis will be on the modifications, not the descent in accounting for the distinctiveness. B&C is happy enough with the idea that FL properties are largely shared with our ancestors. But there is something different, and that difference is a big deal. And we have a pretty good idea about (some of) the fine structure of that difference and that is what Minimalist linguistics should aim to explain"

All of these are interesting points, although I think they miss the target, for reasons worth making explicit (again), if only because that way we can know what is likely to be productive and what is not. After all, I could be wrong, and Hornstein (and Berwick/Chomsky in Why Only Us) could be wrong. I'll tackle Hornstein's points in a somewhat different order from the one he used, but I don't think that doing so introduces any misrepresentation.

Let's begin with points of (apparent) agreement: Hornstein is willing to concede that we need a bit more than Merge, although if I read him well, he is not as clear about it as I would like. Why do I say so? On the one hand, he writes that "many would be happy to agree that though BP is a distinctive property of human language it may not be the only distinctive linguistic property. CB is right to observe that if there are others (sometimes grouped together as FLW vs FLN) then these need to be biologically fixed and that, to date, MP has had little to say about these. One might go further; to date it is not clear that we have identified many properties of FLW at all. Are there any?" On the other hand, he writes "CB is also right that this is a domain in which comparative cognition/biology would be very useful (and has already been started [FN:There has been quite a lot of interesting comparative work done, most prominently by Berwick, on relating human phonology with bird song").

I won't comment much on the references provided by Hornstein in that footnote, but I must say that I think it reveals too much of a bias towards work done by linguists. In my opinion, the great comparative work that exists has not been done by linguists (in the narrow sense of the term). Hornstein's is not a lovely bias to display in the context of interdisciplinarity (indeed, it's not good to show this bias on a blog that likes to stress so much that people in other disciplines ignores the work of linguists. Don't do unto others ...) In the case of birdsong, this kind of work goes several decades back, and detailed studies like Jarvis 2004(!), or Samuels 2011 (cited in my review) hardly justify the "has already been started" claim about comparative cognition. But let's get to the main point: we can't just ask "are there any? (shared features)" and at the same time cite work that shows that there is a lot of it. But there is something worse, in light of my review: Hornstein seems to have no problem with the usefulness of comparative cognition ("a domain in which comparative cognition/biology would be very useful") so long as it applies to everything except Merge ("there will be no very interesting “descent with modification” account of this property"; "It is less clear that any of this applies to explaining the evolution of the Basic Property itself." "this property is not going to be all that amenable to any thing but a “top-down, all-or-nothing” account") This is one of the issues I intended to bring up in the review, and what I called "exceptional nativism". I'll return to this below, but for now, let me stress that even if Hornstein grants me that there is more than Merge, Merge is still special, in a way that is different from "other distinctive linguistic properties".

It's not the case that I object to _Why Only Us_ because I think Berwick and Chomsky are "too obsessed with Merge". I object to it because I think they obsess about it in the wrong way: they (and Hornstein) are obsessed in making it not only special, but distinct-in-a-way-that-other-distinct-things-are-not: it takes it out of the Darwinian domain of descent with modification.

Hornstein discusses Descent With Modification, but his prose reveals that he and I understand it differently. Indeed, he appears to understand it in a way that I warned against in my review. Here is the key passage: "the MP idea can be seen as saying that much of FL is there before Merge got added. Merge is the “modification” all else the “descent.” " I think this is wrong. It takes the phrase descent with modification pretty much like most linguists understood the FLN/FLB distinction of Hauser, Chomsky and Fitch 2002: there is FLN and there is FLB. There is descent and there is modification. But I don't think this is the core of the Darwinian logic: "Descent with modification" ought to be understood as "modified descent" (tinkering), not as "descent" put side by side with/distinct from modification. Because modification is modification of something shared; it's inextricably linked to descent. Descent with modification is not "this is shared" and "this is different" and when you put these 2 things you get "descent and modification", because the different is to be rooted in the shared. We can't just say Merge is the 'modification' bit unless we say what it is a modification of. (Needless to say, if, as Hornstein writes, we replace Merge by some other "secret sauce that got things going", my point still applies. The problem is with thinking in terms of some secret sauce, unless we break it down in no so secret ingredients, ingredients that can be studied, in isolation, in other species. That's the message in my review.)

The Darwinian logic is basically that the apple can only fall so far from the tree. The apple is not the tree. But it is to be traced back to the tree. As I put it in the review: there's got to be a way from there (them) to here (us). And Merge should not be any exception to this. I'll put it this way: if the way we define Merge makes it look like an exception to this kind of thinking, then we are looking (obsessing?) at Merge in the wrong way. If we find Merge interesting (and I do!), then we've got to find a way (or better, several ways, that we can then test) to make it amenable to "descent with modification"/"Bottom-up (as opposed to all-or-nothing/top-down) approaches. Of course, we could say, well, tough luck: we can't choose what nature gave us. It gave us Merge/recursion, and you can't understand this gradually. It's there or not. It's discontinuous but in a way different from the kind of discontinuity that biologists understand (significantly modified descent). But if we do so, then, tough luck indeed: we are confining Darwin's problem to a mystery. A fact, but a mysterious one ("Innate knowledge is a mystery, though a fact", as McGinn once put it.) It's fine by me, but then I can't understand how people can write about how "All you need is Merge" accounts shedding light on Darwin's problem. They must mean Darwin's mystery. To use Hume's phrase, such accounts restore Merge to that obscurity, in which it ever did and ever will remain”.

Don't get me wrong. It's a perfectly coherent position. Indeed, in a slightly different context, referenced in my review, David Poeppel talks about the "incommensurability problem" of linking mind and brain. Maybe we can't link both, just like we can't understand the evolution of Merge. Note here that I say the evolution of Merge. At times, Hornstein, like Berwick and Chomsky, gives me the impression that he thinks Merge is the solution. But it's the problem. It's that which needs to be explained. And I think that one way to proceed is to understand its neural basis and trace back the evolution of that (that is, engage with Poeppel's granularity mismatch problem, not endorse his incommensurability problem), because perhaps Merge described at the computational level (in Marr's sense) is mysterious from a descent-with-modification perspective, but not so at the algorithmic and implementational levels. And I think it's the task of linguists too to get down to those levels (jointly with others), as opposed to lecturing to biologists about how Merge is the solution, and it's their problem if they don't get it. (Incidentally, it's a bit ironic that Hornstein praises Lobina's discussion of recursion in his blog post, but does not mention the fact that Lobina took issue with Hornstein's take on recursion in some of his publications.)

Hornstein writes that "The problem with lots of evo of lang work is that it fails to take the first step of identifying the trait at issue". He does not give references, so I cannot judge what he means by "lots". I like to think I read a lot, and my assessment doesn't match Hornstein's at all. I think a lot of work in evo of lang promotes a Darwinian feeling for the phenotype. This is quite different from, say, a Darcy-Thompsonian feeling for the phenotype. I see in lots of evo of lang work a desire to make talk about evo of language amenable to conventional evolutionary discourse. Why Only Us ("this property is not going to be all that amenable to any thing but a “top-down, all-or-nothing” account") is the exact opposite.

Perhaps a bit of an empirical discussion would help (I always find it helpful, but I don't know if Hornstein would agree here). Let's take the example of vocal learning, a key component of our language-faculty-mosaic. Not a cool as Merge for some, but still pretty neat. In fact, the way many talks and papers about vocal learning begin is quite like the way linguists like to talk about Merge. Let me spell this out. It's often pointed out that vocal learning (the ability to control one's vocal apparatus to reproduce sounds one can hear, typically from con-specifics) is a fairly sparsely distributed trait in the animal kingdom. It may not be as unique as Merge ("Only us"), but it's close: it's "only" for a selected few. (I stress that I am talking about the classic presentation of this trait; ideas of a vocal learning continuum, which I like, would only make the point I am about to make stronger. See work by Petkov and Jarvis on this.) Like Merge, Vocal learning is an all or nothing affair. You have it, or you don't. It looks like an all or nothing thing. But unlike Merge, people have been able to gain insight into its neural structure, and break it down to component pieces. Among these, there is a critical cortico-laryngeal connection that appears to qualify for the "new circuit" that underlies vocal learning (see Fitch's book on evolution of language for references). And people have been able to get down to the molecular details for birds (Erich Jarvis, Constance Scharff, Stephanie White, and many others), bats (Sonja Vernes), and link it to language/speech (Simon Fisher and lots of people working on FOXP2). Erich Jarvis in particular has been able to show that most likely this new circuit has a motor origin, and "proto" aspects of it may be found in non-vocal learning birds (suboscines). All of this is quite rich in terms of insight. And this richness (testability, use of comparative method, etc.) makes the Merge solution to Darwin's problem pale in comparison. It's true, as Hornstein points out, linguists know a lot about Merge, but they only know it from one perspective (the "Cartesian" perspective, the one that leads to "Why Only Us"), and this may not be the most useful perspective from a Darwinian point of view. The main message of my review of Why Only Us was that.

Final point, on timing, and Hornstein's appeal to Tattersal's review of the fossil evidence for the emergence of symbolic behavior. No one can know everything (where would one put it?, as they say), but in my experience it's always good to rely on more than one expert (I cite some in the review). My main point about this in the review was not so much to question the dating of said emergence, but rather to ask for an account of how symbolic behavior is linked to linguistic behavior. I can see why it's plausible to think these are linked. But if the argument concerning the timing of the emergence of the language faculty rests on very few things, and it's one of them, we want more than a plausibility argument. (Norbert's blog loves to say "show me the money" when non-generativists make claims about language acquisition based on what strikes them as plausible. I guess, it's only fair to say "show me the money", or else I'll start selling Hornstein bridges.) It's plausible to think "how could they have done it without language". So, symbol, ergo language. But then again, I can't remember how I lived without my iphone. Poverty of Imagination arguments can be very weak. I know of very few attempts to mechanistically link symbolic and linguistic behavior. One attempt, by Juan Uriagereka, was about "knots" and the Chomsky hierarchy. David Lobina showed how this failed in a paper entitled "much ado about knotting". Victor Longa tried to link blombos style marks to crossing dependencies in language, but it's fair to say the argument for crossing dependencies there is nowhere near as neat as what Chomsky did in Syntactic Structures. Apart from that, I am not aware of any explanatory attempt. I once asked Tattersal after a talk he gave, and he told me something that amounted to "ask Chomsky". When I ask linguists, they tell me ask the experts like Tattersall. And so I begin to be suspicious...

But there is something more I could say about the timing issue that came to my mind when reading Hornstein's comments: if Merge is such an all-or-nothing thing, not subject to the logic of descent, then why should we care if it's recent or not? It could have emerged, in one fell swoop, millions of years ago, and remain "unattached" to "speech devices" until recently. And so why do we want to insist about a very recent origin? The timing issue is only very important if issues about gradual evolution matters. But if we stress that gradual evolution is out of the question, then, the timing issue is a red herring. So, why does Hornstein insist on it?

Let me close by thanking Norbert again for his comments. They led to this long post. But they often made me feel like Alice when she tells the Red Queen that one can't believe impossible things. Hornstein, like the Red Queen, disagrees. But it's not Wonderland, it's Darwinland.

Tuesday, April 18, 2017

Inference to some great provocations

David Berlinsky is editing a newish online magazine Inference that articles in which I have mentioned in several previous posts. The latest issue is full of fun for linguists as there are four articles of immediate relevance. Here’s the link for the issue. Let me say a word or two about the pieces.

The first is an essay by Chomsky that goes over familiar ground regarding the distinctive nature of human linguistic capacity. He observes that this observation has a Cartesian pedigree and that language was recognized as distinctive (all and only humans have it) and wondrous (it was free and capable of expressing unboundedly many thoughts) and demanding of some kind of explanation (it really didn’t fit in well with what was understood to be the causal structure of the physical world) as early as it was noticed.

As Chomsky notes, Cartesians had relatively little of substance to say about the underpinnings of this wondrous capacity, mainly because the 17^th century lacked the mathematical tools for the project. They had no way of describing how it was possible to “make infinite use of finite means” as von Humboldt put it (2). This changed in the 20^th century with Church, Godel, Post and Turing laying the foundations of computation theory. This work “demonstrated how a finite object like the brain could generate an infinite variety of expressions.” And as a result, “[i]t became possible, for the first time, to address part of” the problem that the Cartesians identified (2).

Note the ‘part of’ hedge. As Chomsky emphasizes, the problem the Cartesians identified has two parts. The first, and for them the most important feature, is the distinction between “inclined” vs “impelled” behavior (3). Machines are impelled to act, never “inclined.” Humans, being free agents, are most often “inclined” (though they can be “compelled” as well). Use of language is the poster child for inclined behavior. Cartesians had no good understanding of the mechanics of inclination. As Chomsky observes, more than 300 years later, neither do we. As he puts it, language’s “free creative use remains a mystery,” as does free action in general (e.g. raising one’s hand) (3).

The second part, one that computation theory has given us a modest handle on, is the unbounded nature of the thoughts we can express. This feature very much impressed Galileo and Arnauld & Lancelot and von Humboldt, and it should impress you too! The “infinite variety” of meaningfully distinct expressions characteristic of human language “surpasse[s] all stupendous inventions” (1). Chomsky has redubbed this feature of language “the Basic Property” (BP). BP refers to a property of the human brain, “the language faculty,” and its capacity to “construct a digitally infinite array of structured expressions” each of which “is semantically interpreted as expressing a thought, and each can be externalized by some sensory modality such as speech” (2). BP is what GG has been investigating for the last 60 years or so. Quite a lot has been discovered about it (and yes, there is still lots that we don’t know!).

Chomsky emphasizes something that is worth reemphasizing: these facts about language are not news. That humans have linguistic creativity in the two senses above should not really be a matter of dispute. That humans do language like no other animal does should also be uncontroversial. How we do this is a very tough question, only a small part (very small part) of which we have managed to illuminate. It is sad that much debate still circulates around the whether question rather than the how. It is wasted time.

An important theme in Chomsky’s essay turns on how the world looks when we have no idea what’s up. Here is a quote that I believe all good scientifically inclined GGers should have tattooed to themselves (preferably in some discrete place) (3):

When understanding is thin, we expect to see extreme variety and complexity.

Absolutely! Variety and complexity are hallmarks of ignorance. And this is why progress and simplicity go hand in hand. And this is why I have clasped to my heart Dresher’s apposite dictum: There should be only two kinds of papers in linguistics: (i) papers that show that two things that look completely different are roughly the same and (ii) papers that show that two things that are roughly the same are in fact identical. These are the papers that highlight our progressively deeper understanding. Complication is often necessary, but it is progressive just in case it paves the way for greater simplicity.

The unification and simplicity is, thus, a leading indicator of scientific insight. Within linguistics it has a second function. It allows one to start addressing the issue of how FL might have evolved. Here’s Chomsky:

In the analysis of the Basic Property, we are bound to seek the simplest computational procedure consistent with the data of language. Simplicity is implicit in the basic goals of scientific inquiry. It has long been recognized that only simple theories can attain a rich explanatory depth. “Nature never doth that by many things, which may be done by a few,” Galileo remarked, and this maxim has guided the sciences since their modern origins. It is the task of the scientist to demonstrate this, from the motion of the planets, to an eagle’s flight, to the inner workings of a cell, to the growth of language in the mind of a child. Linguistics seeks the simplest theory for an additional reason: it must face the problem of evolvability. Not a great deal is known about the evolution of modern humans. The few facts that are well established, and others that have recently been coming to light, are rather suggestive. They conform to the conclusion that the language faculty is very simple; it may, perhaps, even be computationally optimal, precisely what is suggested on methodological grounds.

Unless FL is simpler than we have considered it to be up till now (e.g. far simpler than say GBish models make it out to be) then there is little chance that we will be able to explain its etiology. So there are both general methodological grounds for wanting simple theories of FL and linguistic internal reasons for hoping that much of the apparent complexity of FL is just apparent.

Chomsky’s piece proceeds by rehearsing in short form the basic minimalist trope concerning evolvability. First, that we know little about it and that we will likely not know very much about it ever. Second, that FL is a true species property as the Cartesians surmised. Third, that FL has not evolved much since humans separated. Fourth, that FL is a pretty recent biological innovation. The third and fourth points are taken to imply that the Basic Property aspect of FL must be pretty simple in the sense that what we see today pretty well reflects the original evo innovation and so its properties are physically simple in that they have not been shaped by the forces of selection. In other words, what we see in BP is pretty much undistorted by the shaping effects of evolution and so largely reflect the physical constraints that allowed it to emerge.

All of this is by now pretty standard stuff, but Chomsky tells it well here. He goes on to do what any such story requires. He tries to illustrate how a simple system of the kind he envisions will have those features that GG has discovered to be characteristic of FL (e.g. structure dependence, unboundedly many discrete structures capable of supporting semantic interpretation etc.). This second step is what makes MP really interesting. We have a pretty good idea what kinds of things FL concerns itself with. That’s what 60 years of GG research has provided. MP’s goal is to show how to derive these properties from simpler starting points, the simpler the better. The target of explanation (the explanadum) are the “laws” of GB. MP theories are interesting to the degree that they can derive these “laws” from simpler more principled starting points. And, that, Chomsky argues, is what what makes Merge based accounts interesting, they derive features that we have every reason to believe characterize FL.[1]

Two other papers in the issue address these minimalist themes. The first is a review of the recent Berwick & Chomsky (B&C) book Why only us. The second is a review of a book on the origins of symbolic artifacts. Cederic Boeckx (CB) reviews B&C. Ian Tatersall (IT) reviews the second. The reviews are in interesting conflict.

The Boeckx review is quite negative, the heart of the criticism being that asking ‘why only humans have language’ is the wrong question. What makes it wrong? Well, frankly, I am not sure. But I think that the CB review thinks that asking it endorses a form of “exceptional nativism” (7) that fails to recognize “the mosaic character of language,” which, if I get the point, implies eschewing “descent with modification” models of evolution (the gold standard according to CB) in favor of “top-down, all-or-nothing” perspectives that reject comparative cognition models (or any animal models), dismiss cultural transmission as playing any role in explaining “linguistic complexity” and generally take a jaundiced view of any evolutionary accounts of language (7-8). I actually am skeptical regarding any of this.

Before addressing these points, however, it is interesting that IT appears to take the position that CB finds wrong-headed. He thinks that human symbolic capacities are biologically quite distinctive (indeed “unique”) and very much in need of some explanation. Moreover, in contrast to CB, IT thinks it pretty clear that this “symbolic activity” is of “rather recent origin” and that, “as far as can be told, it was only our lineage that achieved symbolic intelligence with all of its (unintended) consequences” (1). If we read “symbolic” here to mean “linguistic” (which I think is a fair reading), it appears that IT is asking for exactly the kind of inquiry that CB thinks misconceived.

That said, let’s return to CB’s worries. The review makes several worthwhile points. IMO, the two most useful are the observation that there is more to language evolution than the emergence of the Basic Property (i.e. Merge and discretely infinite hierarchically structured objects) and that there may be more time available for selection to work its magic than is presupposed. Let’s consider these points in turn.

I think that many would be happy to agree that though BP is a distinctive property of human language it may not be the only distinctive linguistic property. CB is right to observe that if there are others (sometimes grouped together as FLW vs FLN) then these need to be biologically fixed and that, to date, MP has had little to say about these. One might go further; to date it is not clear that we have identified many properties of FLW at all. Are there any?

One plausible candidate involves those faculties recruited for externalization. It is reasonable to think that once FLN was fixed in the species, that linking its products to the AP interface required some (possibly extensive) distinctive biological retrofitting. Indeed, one might imagine that all of phonology is such a biological kludge and that human phonology has no close biological analogues outside of humans.[2] If this is so, then the question of how much time this retrofitting required and how fast the mechanisms of evolution (e.g. selection) operate is an important one. Indeed, if there was special retrofitting for FLW linguistic properties then these must have all taken place before the time that humans went their separate ways for precisely the reasons that Chomsky likes to (rightly) emphasize: not only can any human acquire the recursive properties of any G, s/he can also acquire the FLW properties of any G (e.g. any phonology, morphology, metrical system etc.).[3] If acquiring any of these requires a special distinctive biology, then this must have been fixed before we went our separate ways or we would expect, contrary to apparent fact, that e.g. some “accents” would be inaccessible to some kids. CB is quite right that it behooves us to start identifying distinctive linguistic properties beyond the Basic Property and asking how they might have become fixed. And CB is also right that this is a domain in which comparative cognition/biology would be very useful (and has already been started (see note 2). It is less clear that any of this applies to explaining the evolution of the Basic Property itself.

If this is right, it is hard for me to understand CB’s criticism of B&C’s identification of hierarchical recursion as a very central distinctive feature of FL and asking how it could have emerged. CB seems to accept this point at times (“such a property unquestionably exists” (3)) but thinks that B&C are too obsessed with it. But this seems to me an odd criticism. Why? Because B&C’s way into the ling-evo issues is exactly the right way to study the evolution of any trait: First identify the trait of interest. Second, explain how it could have emerged. B&C identify the trait (viz. hierarchical recursion) and explain that it arose via the one time (non-gradual) emergence of a recursive operation like Merge. The problem with lots of evo of lang work is that it fails to take the first step of identifying the trait at issue. But absent this any further evolutionary speculation is idle. If one concedes that a basic feature of FL is the Basic Property, then obsessing about how it could have emerged is exactly the right way to proceed.

Furthermore, and here I think that CB’s discussion is off the mark, it seems pretty clear that this property is not going to be all that amenable to any thing but a “top-down, all-or-nothing” account. What I mean is that recursion is not something that takes place in steps, a point that Dawkins made succinctly in support of Chomsky’s proposal (see here). As he notes, there is no such thing as “half recursion” and so there will be no very interesting “descent with modification” account of this property. Something special happened in humans. Among other things this led to hierarchical recursion. And this thing, whatever it was, likely came in one fell swoop. This might not be all there is to say about language, but this is one big thing about it and I don’t see why CB is resistant to this point. Or, put another way, even if CB is right about many other features of language being distinctive and amenable to more conventional evo analysis, it does not gainsay the fact that the Basic Property is not one of these.

There is actually a more exorbitant possibility that perhaps CB is reacting to. As the review notes (7): “Language is special, but not all that special; all creatures have special abilities.” I don’t want to over-read this, but one way of taking it is that different “abilities” supervene on common capacities. This amounts to a warning not to confuse apparent expressions of capacities for fundamental differences in capacities. This is a version of the standard continuity thesis (that Lenneberg, among others, argued is very misleading (i.e. false) wrt language). On this view, there is nothing much different in the capacities of the “language ready” brain from the “language capable” brain. They are the same thing. In effect, we need add nothing to an ape brain to get ours, though some reorganization might be required (i.e no new circuits). I personally don’t think this is so. Why? For the traditional reasons that Chomsky and IT note, namely that nothing else looks like it does language like we do, even remotely. And though I doubt that hierarchical recursion is the whole story (and have even suggested that something other than Merge is the secret sauce that got things going), I do think that it is a big part of it and that downplaying its distinctiveness is not useful.

Let me put this another way. All can agree that evolution involves descent with modification. The question is how big a role to attribute to descent and how much to modification (as well as how much modification is permitted). The MP idea can be seen as saying that much of FL is there before Merge got added. Merge is the “modification” all else the “descent.” There will fe features of FL continuous with what came before and some not continuous. No mystery about the outline of such an analysis, though the details can be very hard to develop. At any rate, it is hard for me to see what would go wrong if one assumed that Merge (like the third color neuron involved in trichromatic vision (thx Bill for this)) is a novel circuit and that FL does what it does by combining the powers of this new operation with those cognitive/computational powers inherited from our ancestors. That would be descent with modification. And, so far as I can tell, that is what a standard MP story like that in B&C aims to deliver. Why CB doesn’t like (or doesn’t appear to like) this kind of story escapes me.

Observe that how one falls on the distinctiveness of BC issue relates to what one thinks of the short time span observation (i.e. language is of recent vintage so there is little time for natural selection or descent with modification to work its magic). The view Chomsky (and Berwick and Dawkins and Tatersall) favor is that there is something qualitatively different between language capable brains and ones that are not. This does not mean that they don’t also greatly overlap. It just means that they are not capacity congruent. But if there is a qualitative difference (e.g. a novel kind of circuit) then the emphasis will be on the modifications, not the descent in accounting for the distinctiveness. B&C is happy enough with the idea that FL properties are largely shared with our ancestors. But there is something different, and that difference is a big deal. And we have a pretty good idea about (some of) the fine structure of that difference and that is what Minimalist linguistics should aim to explain.[4] Indeed, I have argued and would continue to argue that the name of the Minimalist game is to explain these very properties in a simple way. But I’ve said that already here, so I won’t belabor the point (though I encourage you to do so).

A few more random remarks and I am done. The IT piece provides a quick overview of how distinctive human symbolic (linguistic?) capacities are. In IT’s view, very. In IT’s view, the difference also emerged very recently, and understanding that is critical to understanding modern humans. And he is not alone. The reviewee Genevieve von Petziger appears to take a similar view, dating the start of the modern human mind to about 80kya (2). All this fits in with the dates that Chomsky generally assumes. It is nice to see that (some) people expert in this area find these datings and the idea that the capacity of interest is unique to us credible. Of course, to the degree that this dating is credible and to the degree that this is not a long time for evolution to exercise its powers the harder the evolutionary problem becomes. And, of course, that’s what makes the problem interesting. At any rate, what the IT review makes clear is that the way Chomsky has framed the problem is not without reasonable expert support. Whether this view is correct, is, of course, an empirical matter (and hence beyond my domain to competently judge).

Ok, let me mention two more intellectual confections of interest and we are done. I will be short.

The first is a review of Wolfe’s book by David Lobina and Mark Brenchley. It is really good and I cannot recommend it highly enough. I urge you in particular to read the discussion on recursion as self-reference vs self-embedding and the very enlightening discussion of how Post’s original formalism (might have) led to some confusion on these issues. I particularly liked the discussion of how Merge de-confuses them, in effect by dumping the string based conception of recursion that Post’s formalism used (and which invited a view of recursion as self-embedding) and implementing the recursive idea more cleanly in a Merge like system in which linguistic structures are directly embedded in one another without transiting through strings at all. This cleanly distinguishes the (misleading) idea that the recursion lies with embedding clauses within clauses from the more fundamental idea that recursion requires some kind of inductive self-reference. Like I said, the discussion is terrific and very useful.

And now for desert: read David Adger’s fun review of Arrival. I confess that I did not really like the movie that much, but after reading David’s review, I intend to re-see it with a more open mind.

That’s it. Take a look at the issue of Inference. It’s nice to see serious linguistic issues intelligently discussed in a non-specialist’s venue. It can be done and done well. We need more of it.

[1] Chomsky also mentions that how lexical items have very distinctive properties and that we understand very little about them. This ahs become a standard trope in his essays, and a welcome one. It seems that lexical items are unlike animal signs in that the latter are really “referential” in ways that the former are not. The how and whys behind this, however, is completely opaque.

[2] There has been quite a lot of interesting comparative work done, most prominently by Berwick, on relating human phonology with bird song. See here and here for some discussion and links.

[3] There is another possibility: once FLN is in place there is only one way to retrofit all the components of FLW. If so, then there is no selection going on here and so the fact that all those endowed with FLNs share common FLWs would not require a common ancestor for the FLWs. Though I know nothing about these things, this option strikes me as far-fetched. If it is, then the logic that Chomsky has deployed for arguing that FLN was in place before humans went their separate ways would hold for FLW as well.

[4] CB makes a claim that is often mooted in discussions about biology. It is Dobzhansky’s dictum that nothing in biology makes sense except in the light of evolution. I think that this is overstated. Lots of biology “makes sense” without worrying about origin. We can understand how hearts work or eyes see or vocal tracts produce sounds without knowing anything at all about how they emerged. This is not to diss the inquiry: we all want to know how things came to be what they are. But the idea that natural selection is the only thing that makes sense of what we see is often overblown, especially so when Dobzhansky quotes are marshaled. For some interesting discussion of this see this.

Faculty of Language

Comments