My answer: very, and getting more so all the time. This view will strike many as controversial. For example Cedric Boeckx (here and here) and David Berlinsky (here) (and most linguistics in discussions over beer) contend that linguistics is a BINO (biology in name only). After all, there is little biochemistry, genetics, or cellular biology in current linguistics, even of the Minimalist variety. Even the evolang dimension is largely speculative (though, IMO, this does not distinguish it from most of the “seripous” stuff in the field). And, as this is what biology is/does nowadays, then, the argument goes, linguistic pronouncements cannot have biological significance and so the “bio” in biolinguistics is false advertising. That’s the common wisdom as best as I can tell, and I believe it to be deeply (actually, shallowly) misguided. How so?
A domain of inquiry, on this view, is defined by its tools and methods rather than its questions. Further, as the tools and methods of GG are not similar to those found in your favorite domain of biology then there cannot be much bio in biolinguistics. This is a very bad line of reasoning, even if some very smart people are pushing it. In my view, it rests on pernicious dualist assumptions which, had they been allowed to infect earlier work in biology, would have left it far poorer than it is today. Let me explain.
First, the data linguists use is biological data: we study patterns which would be considered contenders for Nobel Prizes in Medicine and Physiology (i.e. bio Nobels) were they emitted by non humans. Wait, would be? No, actually were. Unraveling the bee waggle dance was Nobel worthy. And what’s the waggle dance? It’s the way a bee “articulates” (in a sign language sort of way, but less sophisticated) how far and in what direction honey lies. In other words, it is a way for bees to map AP expressions onto CI structures that convey a specific kind of message. It’s quite complicated (see here), and describing it’s figure 8 patterns (direction and size) and how they related to the position of the sun and the food source is what won von Frisch the prize in Physiology and Medicine. In other words, von Frisch won a bio Nobel for describing a grammar of the bee dance.
And it really was “just” a G, with very little “physiology” or “medicine” implicated. Even at the present time, we appear to know very little about either the neural or genetic basis of the dance or its evolutionary history (or at least Wikipedia and a Google search seems to reveal little beyond anodyne speculations like “Ancestors to modern honeybees most likely performed excitatory movements to encourage other nestmates to forage” or “The waggle dance is thought to have evolved to aid in communicating information about a new nest site, rather than spatial information about foraging sites” (Wikipedia)). Nonetheless, despite the dearth of bee neurophysiology, genetics or evo-bee-dance evolutionary history, the bio worthies granted it a bio Nobel! Now here is my possibly contentious claim: describing kinds of patterns humans use to link articulations to meanings is no less a biological project than is describing waggle dance patterns. Or, to paraphrase my good and great friend Elan Dresher: if describing how a bunch of bees dance is biology so too is describing how a bunch of Parisians speak French.
Second, it’s not only bees! If you work on bird songs or whale songs or other forms of vocalization or vervet monkey calls you are described as doing biology (look at the journals that publish this stuff)! And you are doing biology even if you are largely describing the patterns of these songs/calls. Of course, you can also add a sprinkle of psychology to the mix and tentatively describe how these calls/songs are acquired to cement your biological bona fides. But, if you study non human vocalizations and their acquisition then (apparently) you are doing biology, but if you do the same thing in humans apparently you are not. Or, to be more precise, describing work on human language as biolinguistics is taken to be wildly inappropriate while doing much the same thing with mockingbirds is biology. Bees, yes. Whales and birds, sure. Monkey calls, definitely. Italian or Inuit; not on your life! Dualism anyone?
As may be evident, I think that this line of reasoning is junk best reserved for academic bureaucrats interested in figuring out how to demarcate the faculty of Arts from that of Science. There is every reason to think that there is a biological basis for human linguistic capacity and so studying manifestations of this capacity and trying to figure out its limits (which is what GG has been doing for well over 60 years) is biology even if it fails to make contact with other questions and methods that are currently central in biology. To repeat, we still don’t know the neural basis or evolutionary etiology of the waggle dance but nobody is lobbying for rescinding von Frisch’s Nobel.
One can go further: Comparing modern work in GG and early work in genetics leads to a similar conclusion. I take it as evident that Mendel was doing biology when he sussed out the genetic basis for the phenotypic patterns in his pea plant experiments. In other words, Mendel was doing biogenetics (though this may sound redundant to the modern ear). But note, this was biogenetics without much bio beyond the objects of interest being pea plants and the patterns you observe arising when you cross breed them. Mendel’s work involved no biochemistry, no evolutionary theory, no plant neuro-anatomy or plant neuro-physiology. There were observed phenotypic patterns and a proposed very abstract underlying mechanism (whose physical basis was a complete mystery) that described how these might arise. As we know, it took the rest of biology a very long time to catch up with Mendel’s genetics. It took about 65 years for evolution to integrate these findings in the Modern Synthesis and almost 90 years until biology (with the main work carried out by itinerant physicists) figured out how to biochemically ground it in DNA. Of course, Mendel’s genetics laid the groundwork for Watson and Crick and was critical to making Darwinian evolution conceptually respectable. But, and this is the important point here, when first proposed, its relation to other domains of biology was quite remote. My point: if you think Mendel was doing biology then there is little reason to think GGers aren’t. Just as Mendel identified what later biology figured out how to embody, GG is identifying operations and structures that the neurosciences should aim to incarnate. Moreover, as I discuss below, this melding of GG with cog-neuro is currently enjoying a happy interaction somewhat analogous to what happened with Mendel before.
Before saying more, let me make clear that of course biolinguists would love to make more robust contact with current work in biology. Indeed, I think that this is happening and that Minimalism is one of the reasons for this. But I will get to that. For now let’s stipulate that the more interaction between apparent disparate domains of research the better. However, absence of apparent contact and the presence of different methods does not mean that subject matters differ. Human linguistic capacity is biologically grounded. As such inquiry into linguistic patterns is reasonably considered a biological inquiry about the cognitive capacities of a very specific animal; humans. It appears that dualism is still with us enough to make this obvious claim contentious.
The point of all of this? I actually have two: (i) to note that the standard criticism of GG as not real biolinguistics at best rests on unjustified dualist premises (ii) to note that one of the more interesting features of modern Minimalist work has been to instigate tighter ties with conventional biology, at least in the neuro realm. I ranted about (i) above. I now want to focus on (ii), in particular a recent very interesting paper by the group around Stan Dehaene. But first a little segue.
I have blogged before on Embick and Poeppel’s worries about the conceptual mismatch between the core concepts in cog-neuro and those of linguistics (here for some discussion). I have also suggested that one of the nice features of Minimalism is that it has a neat way of bringing the basic concepts closer together so that G structure and its bio substructure might be more closely related. In particular, a Merge based conception of G structure goes a long way towards reanimating a complexity measure with real biological teeth. In fact, it is effectively a recycled version of the DTC, which, it appears, has biological street cred once again. The cred is coming from work showing that one can take the neural complexity of a structure as roughly indexed by the number of Merge operations required to construct it (see here). A recent paper goes the earlier paper one better by embedding the discussion in a reasonable parsing model based on a Merge based G. The PNAS paper (Henceforth Dehaene-PNAS) (here) has a formidable cast of authors, including two linguists (Hilda Koopman and John Hale) orchestrated by Stan Dehaene. Here is the abstract:
Although sentences unfold sequentially, one word at a time, most linguistic theories propose that their underlying syntactic structure involves a tree of nested phrases rather than a linear sequence of words. Whether and how the brain builds such structures, however, remains largely unknown. Here, we used human intracranial recordings and visual word-by-word presentation of sentences and word lists to investigate how left-hemispheric brain activity varies during the formation of phrase structures. In a broad set of language-related areas, comprising multiple superior temporal and inferior frontal sites, high-gamma power increased with each successive word in a sentence but decreased suddenly whenever words could be merged into a phrase. Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity. More superficial models of language, based solely on sequential transition probability over lexical and syntactic categories, only captured activity in the posterior middle temporal gyrus. Formal model comparison indicated that the model of multiword phrase construction provided a better fit than probability- based models at most sites in superior temporal and inferior frontal cortices. Activity in those regions was consistent with a neural implementation of a bottom-up or left-corner parser of the incoming language stream. Our results provide initial intracranial evidence for the neurophysiological reality of the merge operation postulated by linguists and suggest that the brain compresses syntactically well-formed sequences of words into a hierarchy of nested phrases.
A few comments, starting with a point of disagreement: Whether the brain builds hierarchical structures is not really an open question. We have tons of evidence that it does, evidence that linguists a.o. have amassed over the last 60 years. How quickly the brain builds such structure (on line, or in some delayed fashion) and how the brain parses incoming strings in order to build such structure is still opaque. So it is misleading to say that what Dehaene-PNAS shows is both that the brain does this and how. Putting things this way suggests that until we had such neural data these issues were in doubt. What the paper does is provide neural measures of this structure building processes and provides a nice piece of cog-neuro inquiry where the cog is provided by contemporary Minimalism in the context of a parser and the neuro is provided by brain activity in the gamma range.
Second, the paper demonstrates a nice connection between a Merge based syntax and measures of brain activity. Here is the interesting bit (for me, my emphasis):
Regression analyses showed that each additional word or multiword phrase contributed a similar amount of additional brain activity, providing evidence for a merge operation that applies equally to linguistic objects of arbitrary complexity.
Merged based Gs treat all combinations as equal regardless of the complexity of the combinations or differences among the items being combined. If Merge is the only operation, then it is easy to sum the operations that provide the linguistic complexity. It’s just the same thing happening again and again and on the (reasonable) assumption that doing the same thing incurs the same cost we can (reasonably) surmise that we can index the complexity of the task by adding up the required Merges. Moreover, this hunch seems to have paid off in this case. The merges seem to map linearly onto brain activity as expected if complexity generated by Merge were a good index of the brain activity required to create such structures. To put this another way: A virtue of Merge (maybe the main virtue for the cog-neuro types) is that it simplifies the mapping from syntactic structure to brain activity by providing a common combinatory operation that underlies all syntactic complexity. Here is Dehaene-PNAS paper (4):
A parsimonious explanation of the activation profiles in these left temporal regions is that brain activity following each word is a monotonic function of the current number of open nodes at that point in the sentence (i.e., the number of words or phrases that remain to be merged).
This makes for a limpid trading relation between complexity as measured cognitively and as measured brain-wise transparent when implemented in a simple parser (note the weight carried by “parsimonious” in the quote above). What the paper argues is that this simple transparent mapping has surprising empirical virtues and part of what makes it simple is the simplicity of Merge as the basic combinatoric operation.
There is lots more in this paper. Here are a few things I found most intriguing.
A key assumption of the model is that combining the words into phrases occurs after the word at the left edge of the constituent boundary (2-3):
…we reasoned that a merge operation should occur shortly after the last word of each syntactic constituent (i.e., each phrase). When this occurs, all of the unmerged nodes in the tree comprising a phrase (which we refer to as “open nodes”) should be reduced to a single hierarchically higher node, which becomes available for future merges into more complex phrases.
This assumption drives the empirical results. Note that it indicates that structure is being built bottom-up. And this assumption is a key feature of a Merge based G that assumes something like Extension. As Dehaene-PNAS puts it (4):
The above regressions, using “total number of open nodes” as an independent variable, were motivated by our hypothesis that a single word and a multiword phrase, once merged, contribute the same amount to total brain activity. This hypothesis is in line with the notion of a single merge operation that applies recursively to linguistic objects of arbitrary complexity, from words to phrases, thus accounting for the generative power of language
If the parsing respects the G principle of Extension then it will have to build structure in this bottom up fashion. This means holding the “open” nodes on a stack/memory until this bottom up building can occur. The Dehaene-PNAS paper provides evidence that this is indeed what happens.
What kind of evidence? The following (3) (my emphasis):
We expected the items available to be merged (open nodes) to be actively maintained in working memory. Populations of neurons coding for the open nodes should therefore have an activation profile that builds up for successive words, dips following each merge, and rises again as new words are presented. Such an activation profile could follow if words and phrases in a sentence are encoded by sparse overlapping vectors of activity over a population of neurons (27, 28). Populations of neurons involved in enacting the merge operation would be expected to show activation at the end of constituents, proportional to the number of nodes being merged. Thus, we searched for systematic increases and decreases in brain activity as a function of the number of words inside phrases and at phrasal boundaries.
So, a Merge based parser that encodes Extension should show a certain brain activity rhythm indexed to the number of open nodes in memory and the number of Merge operations executed. And this is what the paper found.
Last, and this is very important: the paper notes that Gs can be implemented in different kinds of parsers and tries to see which one best fits the data in their study. There is no confusion here between G and parser. Rather, it is recognized that the effects of a G in the context of a parser can be investigated, as can the details of the parser itself. It seems that for this particular linguistic task, the results are consistent either a bottom-up or left corner parser, with the latter being a better fit for this data (7):
Model comparison supported bottom-up and left-corner parsing as significantly superior to top-down parsing in fitting activation in most regions in this left-hemisphere language network…
Those findings support bottom-up and/or left-corner parsing as tentative models of how human subjects process the simple sentence structures used here, with some evidence in favor of bottom-up over left-corner parsing. Indeed, the open-node model that we proposed here, where phrase structures are closed at the moment when the last word of a phrase is received, closely par- allels the operation of a bottom-up parser.This should not be that surprising a result given the data that the paper investigates. The sentences of interest contain no visible examples where left context might be useful for downstream parsing (e.g. Wh element on the left edge (see Berwick and Weinberg for discussion of this)). We have here standard right branching phrase structure and for these kinds of sentences non-local left context will be largely irrelevant. As the paper notes (8), the results do “not question the notion that predictability effects play a major role in language processing” and as it further notes there are various kinds of parsers that can implement a Merge based model, including those where “prediction” plays a more important role (e.g. left-corner parsers).
That said, the interest of Dehaene-PNAS lies not only in the conclusion (or maybe not even mainly there), but in the fact that it provides a useful and usable model for how to investigate these computational models in neuro terms. That’s the big payoff, or IMO, the one that will pay dividends in the future. In this, it joins the earlier Pallier et al and the Ding et al papers. They are providing templates for how to integrate linguistic work with neuro work fruitfully. And in doing so, they indicate the utility of Minimalist thinking.
Let me say a word about this: what cog-neuro types want are simple usable models that have accessible testable implications. This is what Minimalism provides. We have noted the simplicity that Merge based models afford to the investigations above; a simple linear index of complexity. Simple models are what cog-neuro types want, and for the right reasons. Happily, this is what Minimalism is providing and we are seeing its effects in this kind of work.
An aside: let’s hear it for stacks! The paper revives classical theories of parsing and revives the idea that brains have stacks important for the parsing of hierarchical structures. This idea has been out of favor for a long time. One of the major contributions of the Dehaene-PNAS paper is to show that dumping it was a bad idea, at least for language, and, most likely, other domain where hierarchical organization is essential.
Let me end: there is a lot more in the Dehaene-PNAS paper. There are localization issues (where the operations happen) and arguments showing that simple probability based models cannot survive the data reviewed. But for current purposes there is a further important message: Minimalism is making it easier to put a lot more run of the mill everyday bio into biolinguistics. The skepticism about the biological relevance of GG and Minimalism for more bio investigation is being put paid by the efflorescence of intriguing work that combines them. This is what we should have expected. It is happening. Don’t let anyone tell you that linguistics is biologically inert. At least in the brain sciences, it’s coming into its own, at last!
 Alec Marantz argued that the DTC is really the only game in town. Here’s a quote:
…the more complex a representation- the longer and more complex the linguistic computations necessary to generate the representation- the longer it should take for a subject to perform any task involving the representation and the more activity should be observed in the subject’s brain in areas associated with creating or accessing the representation or performing the task.
For discussion see here.
 Note that this does not say that only a Merge based syntax would do this. It’s just that Merge systems are particularly svelt systems and so using them is easy. Of course many Gs will have Mergish properties and so will also serve to ground the results.
 IMO, it is also the only game in town when it comes to evolang. This is also the conclusion of Tatersall in his review of Berwick and Chomsky’s book. So, yes, there is more than enough run of the mill bio to license the biolinguistics honorific.