Faculty of Language: Berwick Post: Gene Jockeys

Tuesday, June 18, 2013

Berwick Post: Gene Jockeys

Norbert Here: Few people know as much about both linguistics and genetics as Bob. As described in my last post, he gave a great overview on some exciting new research comparing human and neanderthal genomes and finding that they differ barely at all, at least in areas where we understand a little about what's happening. After a modicum of bugging, Bob reprieves his little talk here. He rightly notes that the paucity of difference has several interpretations wrt the emergence of language facility. The two main contenders are that neanderthals are indistinguishable from us linguistically, the other is that they are very different. The latter, if correct, has interesting implications for the mergence of FL, as Bob discusses at the end. The main reason for taking this second position is the absence of markers of cultural complexity (no big bang anthro evidence for cultural complexity until roughly 50,000 years ago). This kind of evidence is not dispositive, but it is very suggestive and anthropologists have regularly linked the emergence of cultural complexity to the emergence of language. So, enjoy the post, it is very very intriguing.

Gene Jockeys

Robert C. Berwick

Unless you’ve been marooned on a desert island for the past few decades, you probably already know that scientists have been able to sequence the entire human genome (several, in fact). And probably you’ve also heard that using rather remarkable technology scientists have also been able to do the same using the ancient DNA from bone samples of (extinct) Neandertals and their (extinct) relatives, the Denisovans (with a soon-to-be-released ‘high resolution’ Neandertal genome courtesy of my friends David Paige and company at the Broad and elsewhere)[1]; see Science here and here. So that leads to the obvious question: what’s the genomic difference between us and these two extinct Homo species? What Reich and company did back in 2010 (and what they’ll soon tell us more precisely very, very soon in 2013) is simply line up the genomes for humans and Neandertals and then count up changes. So, what’s the diff? It turns out to be very, very small, with some fascinating implications for language.

You may recall that the human genome consists of a sequence of approximately 4.5 billion DNA ‘letters’ – each letter one of 4 nucleotides A(denine), G(uanine), C(ytoseine), and T(hymine). These ‘spell out’ all the proteins a human cell can make: every triplet of nucleotides codes for one of 20 possible amino acids, along with ‘start’ and ‘stop’ signals, so, e.g., the DNA nucleotide triple CAG codes for the amino acid Glutamine, while CAC codes for Histidine. In this way the DNA can actually code for arbitrary sequences of amino acids, with different strings of amino acids constituting different proteins. (A simplification of course; the process that ‘reads out’ or transcribes the DNA and then translates the resulting transcribed code into amino acid sequences is an astonishingly complex bit of nanobiology that still is not yet fully understood.)

Actually, out of these 4.5 billion DNA nucleotides, only a very small portion, 1-2%, actually codes for proteins. Some of the rest is clearly important since it is involved in regulating DNA itself, and has played a role in what makes us different, but so far we know much less about these elements, so we’ll stick to protein-coding DNA for now.[2] So, 2% leaves us with something like 90 million DNA nucleotides that code for proteins, or about 30 million amino acids, assuming 1 amino acid for ever nucleotide triplet. Now we can play the game: line up the human and Neandertal sequences, and count up the number of amino acid differences between ‘us’ and ‘them,’ in particular, differences that have become fixed (constant) in humans as compared to the same amino acids in Neandertals. What do you think that number is? 100,000? 10,000? 1000? 100? 1? If you guessed 100, you’re not far off – the answer is 88 out of 30 million, or 0.000293%. Talk about a needle in a haystack! These 88 amino acid differences have all been accumulated since the time that humans and Neandertals diverged, roughly 400-600 thousand years ago. But more importantly, nearly all these amino acid differences are not something you’d write home to your cognitively attuned grandmother about – not much there that’s obviously about language, cognition, or the brain. So for instance, we differ from Neandertals in such exciting categories as our olfactory receptor proteins, reproductive system, skin and sweat (no more hair, remember?), various immune system details, the ability to digest milk after weaning, and the like – in fact, exactly what we expect to find for any two otherwise close, but diverging lineages. (See Green article cited earlier, Table 2, pp. 714 and 715, which has the whole amino acid difference list, starting with RPTN, an ‘epidermal matrix protein’; GREB1, a response gene in the estrogen pathway; and so on, all the way to PROM2, a ‘plasma membrane protrusion’ protein.[3])

In particular you might remember that the much-hyped FOXP2 gene does not differ between ‘us’ and ‘them,’ at least as far as the original storyline went. However, more recently, it has been claimed that human FOXP2 does differ from Neandertal FOXP2 in a non-protein coding, regulatory region, but it remains to be seen whether this is a difference that really makes a (functional) difference. (See Maricic et al., 2013, A recent evolutionary change affects a regulatory element in the human FOXP2 gene, Molecular Biology & Evolution, 271, doi:10.1093/molbev/mss271.) In fact, there are so few differences between ‘us’ and ‘them’ that some venturesome souls have recently taken this as compelling evidence that humans and Neandertals were just one and the same species and both must have had full-blown modern language. Now, I don’t buy this in part because there’s actually nothing in the archaeological record to back it up besides highly inferential evidence that one can argue either way – and further, the most parsimonious explanation, since everyone believes that not having language is the primitive state and that only we currently do have language, is that language appeared in just a single lineage – us. The symbolic proxies associated with us but not them are just too apparent, along with the glaringly obvious fact that wherever modern ‘we’ appeared, whatever other Homo species that happened to be around there before us disappeared, leaving us as the sole survivors.[4]

You can check this all out for yourself – load up the UCSC human genome browser, along with special ‘tracks’ for the Neandertal and Denisovan results, as per this and you’ll pull up the actual nucleotide DNA sequence for one tiny, tiny part of the end of chromosome 8, which I’ve already set to be contrasted with the Neandertal and Denisovan sequence. If you look, you’ll see that the Neandertal and Denisovan sequences are just presented as long gray or black bars, which simply means that for DNA letter after DNA letter, human, Denisovan, and Neandertal are the same. If you squint very, very hard, you can make out two tiny letters, one a “G”, and the other a “T” that are different between ‘us’ and ‘them.’ I picked this region because it also contains one of the touted differences between us and Neandertals, marked by a tiny red bar at the far right, the next to last DNA letter in view, which is associated with a variant of the gene Microcephalin, involved in brain development. (This single DNA letter evidently fluctuates in modern humans in different proportions, either G or C, but Neandertals have only G in this position.)

So what’s the moral for linguistics? Well, for one thing, it means that Norbert and Cedric are right when they say that alongside Plato’s problem and Orwell’s problem, there’s a third problem for linguistics, Darwin’s problem. There has been simply too little evolutionary time and too little evolutionary distance between DNA-sans-language and DNA-with-language for the change that brought us language to have been all that great. To be sure, there could be other DNA differences – regulatory transcription factors and developmental DNA, promoters, enhancers, (micro)RNAi’s, intergenicRNA that have made the real difference – like FOXP2, a transcription factor, that is, a gene that makes a protein that in turn up or down regulates the production of other proteins. To take another example, the delayed brain growth so characteristic of humans may in part be due to myocyte enhancer factor 2 (MEFA2), a region that some suggest was selected for very recently in humans, but not in Neandertals (see Somel et al. Nature Rev. Neuro, 2013 here for a good review). Even so, it doesn’t appear that there was enough time for the invisible hand of selection to build a finely tuned, extremely modular system of the principles-and-parameters sort. Rather, as Norbert has suggested, it seems more likely that evolution worked as it so often has, by opportunistic bricolage, throwing together a pastiche of pre-existing bricks and mortar. That’s right in line with any approach that tosses out as much ‘language specific’ machinery as possible, leaving behind just that little bit of ‘special sauce’ that makes for human language and a world of difference between ‘us’ and ‘them.’ I’ll leave it for readers to decide what the special sauce might be.

[1]This is possible only because DNA is one of the most inert biological molecules known – just as you’d expect for something used for information storage, far better than magnetic tape. (So DNA doesn’t “replicate itself” – it doesn’t do anything by itself; it just sits there to be read like a blueprint. It’s the rest of the cell machinery that carries out this job.) No Jurassic Park though, so don’t expect dinosaurs anytime soon – after 100,000 years or so, water and other stuff degrades DNA so much that nothing’s really recoverable. So far.

[2]The rest of the non-protein coding human DNA consists of regulatory elements of very kinds, degraded relic genes that are no longer functional; transposons; repetitive sequences whose functional role is not yet clear; and so forth. As a concrete example of how these regions play a role, Sabeti and colleagues have published a recent article in Cell (Feb. 2013), demonstrating that an inter-genic region which seems to be under positive selection has dialed down the human response to bacterial infection as compared to other animals apparently to avoid septic shock.

[3]Such differences in reproductive traits, immune system, and so forth are seen over and over again in other animals as the locus of divergence between closely related species.

[4]See Tattersall, 2012, Masters of the Planet. Yes, yes, Reich & company also found evidence of interbreeding between Neandertals and us and Denisovans and us – enough so that a non-negligible proportion of our DNA comes from Neandertals. Isn’t this a violation of ‘reproductive isolation,’ the litmus test for what counts as a ‘good species’? Not any more; as any student of modern biology appreciates, this in and of itself is not reason enough to count us as and Neandertals as one and the same species, and the amount of interbreeding required to get the empirically observed level of admixture here isn’t very great, just 1 individuals every 70-80 generations. See Jerry Coyne and Allan Orr’s magisterial Speciation (2004, Sinauer Associates) for more.

13 comments:

ChrisJune 18, 2013 at 10:27 AM
Conveniently timed to counter this bit of heresy...
ReplyDelete
Replies
Robert BerwickJune 18, 2013 at 12:31 PM
Yes, indeed. But in my view, it's not really 'heresy' As far as I can make out, it's simply a collection of untestable assertions, which is to say, a story. See if you can find _one_, as in _one_, assertion in the article pointed to by Chris that is empirically testable. (It claims, e.g., that we ought to be able to find remnants of Neandertal language in modern human languages, just like we can find Neandertal genes in the modern human genome; it claims that Neandertals had full human language -- actually, even further back, that 1 million years ago, the ancestor of both modern humans and Neandertals, Homo ergaster, had fully modern language, and that Neandertal language was most probably tonal.) For every such _empirically testable_ assertion that anybody finds, I will pledge $100 to the charity of their choice. (Duplicates don't count; your offer may vary; members of the National Academy of Science are prohibited from entry.)
ReplyDelete
Replies
karthik durvasulaJune 18, 2013 at 2:48 PM
I'm afraid there is a strongly reductionist feel to the discussion about how the data connects back to linguistic theory. If the argument against neural reductionism is that we know precious little about the neuronal processes and how they result in complex behaviour ("we presently know next to nothing about the physical principles underlying mental phenomena"), then why isn't it also true of inferring too much (or even anything) from the current genetic evidence. [Note: the anti-reductionist stance makes sense to me.]

The observed similarity is in 1-2% of the genome. From the little I understand of genetics, there seems to be a tremendous amount we don't know about these "regulatory" gene sequences and even the "junk" DNA. If so, shouldn't we be pretty skeptical of any evidence such data provides for/against linguistic theories?

I can't help but feel that there is a "reductionist thought is OK, when it serves our purpose" flair to this that I find very unsettling. Furthermore, it puts linguists in bad light, when our reasoning becomes so opportunistic.

Perhaps, there are genuine differences between the two cases. It would be nice to know why arguing based on known information in a realm is not OK (to the point of stupidity) in one case, but OK in another.
ReplyDelete
Replies
Robert BerwickJune 18, 2013 at 3:37 PM
You're spot on Karthik. I share your view. We have _almost no_ idea how anybody's genotype connects to almost any phenotype you'd care to name - let alone a complex cognitive/behavioral phenotype like language. If there was any impression to the contrary, please let me be the first to dispel it. In fact, I was trying to get across an _anti_reductionist point: there is very little genomic difference (much much smaller than 1% BTW -- 100 times smaller), and yet, apparently, a large phenotypic difference, between 'us' and 'them'. What conclusions are to be drawn from this - well, you'll get different answers from different people.
ReplyDelete
Replies
Alex ClarkJune 20, 2013 at 1:08 AM
Hi Bob, fascinating stuff.

A clarification question: one thing that puzzles me is that the variation between humans is apparently 0.1% to 0.4%.
So how can the difference between human and neandertal be
0.000293%? Should the 88 be divided by a much smaller number than 30 million, namely the number of amino acids that is fixed in human?

ReplyDelete
Replies
UnknownJune 20, 2013 at 4:14 AM
Excellent that an expert who knows more than most people about genetics has joined discussion on Norbert's blog. You pose a challenge:

For every such _empirically testable_ assertion that anybody finds, I will pledge $100 to the charity of their choice.

I admit this challenge reminds me of a question Cedric Boeckx faced at recent conference in Lisbon [I do this from memory, so please correct me if I am wrong, Cedric]. During Q&A of a talk reporting results from early acquisition studies on 4 [or was it 5?] languages one person asked: Why do you guys [generativists] always just work on a few well known European languages and then make claims about all languages. Cedric agreed that data from more languages would be desirable but, unfortunately, at the moment there are no [no large enough??] data bases for early acquisition in most languages. The questioner was rather unimpressed and asked: So why don't you [Cedric + his students] go out and create data bases for additional languages instead of waiting for others to do it for you? I thought it was a bit unfair to direct that question just at Cedric but, that aside, it seemed like a reasonable request directed at the field.

So, in this spirit, instead of paying us to locate _empirically testable_ assertions why don't you tell us what your empirically testable hypothesis is? And while you're at it: instead of letting us decide what that "little bit of ‘special sauce’ that makes for human language and a world of difference between ‘us’ and ‘them'" is - why don't you tell us what you think it is? Empirically testable proposals are especially appreciated.
ReplyDelete
Replies
Robert BerwickJune 20, 2013 at 4:24 AM
This comment has been removed by the author.
ReplyDelete
Replies
BenjaminJuly 16, 2013 at 3:22 PM
I'd like to comment on this:
" The main reason for taking this second position is the absence of markers of cultural complexity (no big bang anthro evidence for cultural complexity until roughly 50,000 years ago). This kind of evidence is not dispositive, but it is very suggestive and anthropologists have regularly linked the emergence of cultural complexity to the emergence of language."
Dediu and Levinson (the paper mentioned above) claim that this is *not* currently the standard view of anthropologists. And they provide convincing arguments (in my view) against the inference from absence of evidence of cultural complexity before a certain date to absence of fully human language before that date. For instance, there are attested human groups (hunter-gatherers) whose cultural practices can leave no detectable trace. And yet undoubtedly they have the same language capacity as other humans. So it seems to me that this idea that fully human language is not older than (about) 50000 years old has basically very little basis. On top of that, evidence for symbolic activity before 50000 years ago seems to have been found (the paper mentions this). I haven't seen anything convincing that would rule out the possibility that a (non-trivial) precursor of human language existed 1 million years ago. In fact, precisely because of Darwin's problem, it is reasonable to speculate that there was such a precursor. But of course this is just speculation, and I agree that basically not much can be known in this domain at this point (if anything at all).
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, June 18, 2013

Berwick Post: Gene Jockeys

13 comments:

Contributors