Norbert Here: Few people know as much about both linguistics and genetics as Bob. As described in my last post, he gave a great overview on some exciting new research comparing human and neanderthal genomes and finding that they differ barely at all, at least in areas where we understand a little about what's happening. After a modicum of bugging, Bob reprieves his little talk here. He rightly notes that the paucity of difference has several interpretations wrt the emergence of language facility. The two main contenders are that neanderthals are indistinguishable from us linguistically, the other is that they are very different. The latter, if correct, has interesting implications for the mergence of FL, as Bob discusses at the end. The main reason for taking this second position is the absence of markers of cultural complexity (no big bang anthro evidence for cultural complexity until roughly 50,000 years ago). This kind of evidence is not dispositive, but it is very suggestive and anthropologists have regularly linked the emergence of cultural complexity to the emergence of language. So, enjoy the post, it is very very intriguing.
Gene Jockeys
Robert C. Berwick
Unless you’ve been marooned on a desert
island for the past few decades, you probably already know that scientists have
been able to sequence the entire human genome (several, in fact). And probably you’ve
also heard that using rather remarkable technology scientists have also been
able to do the same using the ancient DNA from bone samples of (extinct)
Neandertals and their (extinct) relatives, the Denisovans (with a
soon-to-be-released ‘high resolution’ Neandertal genome courtesy of my friends
David Paige and company at the Broad and elsewhere)[1];
see Science here and
here. So that leads to the obvious question: what’s the genomic
difference between us and these two extinct Homo
species? What Reich and company did back
in 2010 (and what they’ll soon tell us more precisely very, very soon in 2013)
is simply line up the genomes for humans and Neandertals and then count up
changes. So, what’s the diff? It turns
out to be very, very small, with some fascinating implications for language.
You may recall that the human genome
consists of a sequence of approximately 4.5 billion DNA ‘letters’ – each letter
one of 4 nucleotides A(denine), G(uanine), C(ytoseine), and T(hymine).
These ‘spell out’ all the proteins a
human cell can make: every triplet of nucleotides codes for one of 20 possible
amino acids, along with ‘start’ and ‘stop’ signals, so, e.g., the DNA nucleotide
triple CAG codes for the amino acid Glutamine, while CAC codes for Histidine. In this way the DNA can actually code for arbitrary
sequences of amino acids, with different strings of amino acids constituting
different proteins. (A simplification of course; the process that ‘reads out’ or
transcribes the DNA and then translates the resulting transcribed code into
amino acid sequences is an astonishingly complex bit of nanobiology that still
is not yet fully understood.)
Actually, out of these 4.5 billion DNA
nucleotides, only a very small portion, 1-2%, actually codes for proteins. Some
of the rest is clearly important since it is involved in regulating DNA itself,
and has played a role in what makes us different, but so far we know much less
about these elements, so we’ll stick to protein-coding DNA for now.[2] So, 2% leaves us with something like 90
million DNA nucleotides that code for proteins, or about 30 million amino
acids, assuming 1 amino acid for ever nucleotide triplet. Now we can play the game: line up the human
and Neandertal sequences, and count up the number of amino acid differences
between ‘us’ and ‘them,’ in particular, differences that have become fixed
(constant) in humans as compared to the same amino acids in Neandertals. What
do you think that number is? 100,000?
10,000? 1000? 100?
1? If you guessed 100, you’re not
far off – the answer is 88 out of 30 million, or 0.000293%. Talk about a needle in a haystack! These 88
amino acid differences have all been accumulated since the time that humans and
Neandertals diverged, roughly 400-600 thousand years ago. But more importantly,
nearly all these amino acid differences are not something you’d write home to
your cognitively attuned grandmother about – not much there that’s obviously
about language, cognition, or the brain. So for instance, we differ from
Neandertals in such exciting categories as our olfactory receptor proteins, reproductive
system, skin and sweat (no more hair, remember?), various immune system details,
the ability to digest milk after weaning, and the like – in fact, exactly what
we expect to find for any two otherwise close, but diverging lineages. (See
Green article cited earlier, Table 2, pp. 714 and 715, which has the whole
amino acid difference list, starting with RPTN, an ‘epidermal matrix protein’;
GREB1, a response gene in the estrogen pathway; and so on, all the way to
PROM2, a ‘plasma membrane protrusion’ protein.[3])
In particular you might remember that the
much-hyped FOXP2 gene does not differ between ‘us’ and ‘them,’ at
least as far as the original storyline went. However, more recently, it has
been claimed that human FOXP2 does
differ from Neandertal FOXP2 in a
non-protein coding, regulatory region, but it remains to be seen whether this
is a difference that really makes a (functional) difference. (See Maricic et al., 2013, A recent evolutionary
change affects a regulatory element in the human FOXP2 gene, Molecular Biology
& Evolution, 271, doi:10.1093/molbev/mss271.) In fact, there are so few differences between ‘us’ and
‘them’ that some venturesome souls have recently taken this as compelling evidence
that humans and Neandertals were just one and the same species and both must have had full-blown modern
language. Now, I don’t buy this in part because
there’s actually nothing in the archaeological record to back it up besides
highly inferential evidence that one can argue either way – and further, the
most parsimonious explanation, since everyone believes that not having language
is the primitive state and that only we currently do have language, is that
language appeared in just a single lineage – us. The symbolic proxies associated with us but
not them are just too apparent, along with the glaringly obvious fact that
wherever modern ‘we’ appeared, whatever other Homo species that happened to be around there before us
disappeared, leaving us as the sole survivors.[4]
You can check this all out for yourself –
load up the UCSC human genome browser, along with special ‘tracks’ for the
Neandertal and Denisovan results, as per this and you’ll pull up the actual nucleotide DNA sequence for
one tiny, tiny part of the end of chromosome 8, which I’ve already set to be
contrasted with the Neandertal and Denisovan sequence. If you look, you’ll see
that the Neandertal and Denisovan sequences are just presented as long gray or
black bars, which simply means that for DNA letter after DNA letter, human,
Denisovan, and Neandertal are the same.
If you squint very, very hard, you can make out two tiny letters, one a
“G”, and the other a “T” that are different
between ‘us’ and ‘them.’ I picked this
region because it also contains one of the touted differences between us and
Neandertals, marked by a tiny red bar at the far right, the next to last DNA
letter in view, which is associated with a variant of the gene Microcephalin, involved in brain
development. (This single DNA letter evidently fluctuates in modern humans in
different proportions, either G or C, but Neandertals have only G in this
position.)
So what’s the moral for linguistics? Well, for one thing, it means that Norbert
and Cedric are right when they say that alongside Plato’s problem and Orwell’s
problem, there’s a third problem for linguistics, Darwin’s problem. There has been simply too little evolutionary
time and too little evolutionary distance between DNA-sans-language and DNA-with-language for the change that brought us
language to have been all that great. To
be sure, there could be other DNA differences – regulatory transcription
factors and developmental DNA, promoters, enhancers, (micro)RNAi’s,
intergenicRNA that have made the real difference – like FOXP2,
a transcription factor, that is, a gene that makes a protein that in turn up
or down regulates the production of other proteins. To take another example,
the delayed brain growth so characteristic of humans may in part be due to
myocyte enhancer factor 2 (MEFA2), a
region that some suggest was selected for very recently in humans, but not in
Neandertals (see Somel et al. Nature Rev. Neuro, 2013 here for a good review).
Even so, it doesn’t appear that there was enough time for the invisible
hand of selection to build a finely tuned, extremely modular system of the
principles-and-parameters sort. Rather,
as Norbert has suggested, it seems more likely that evolution worked as it so
often has, by opportunistic bricolage, throwing together a pastiche of
pre-existing bricks and mortar. That’s
right in line with any approach that tosses out as much ‘language specific’ machinery
as possible, leaving behind just that little bit of ‘special sauce’ that makes
for human language and a world of difference between ‘us’ and ‘them.’ I’ll leave it for readers to decide what the
special sauce might be.
[1]This is possible only
because DNA is one of the most inert biological molecules known – just as you’d
expect for something used for information storage, far better than magnetic
tape. (So DNA doesn’t “replicate itself” – it doesn’t do anything by itself; it just sits there to be read like a
blueprint. It’s the rest of the cell machinery that carries out this job.) No Jurassic
Park though, so don’t expect dinosaurs anytime soon – after 100,000 years
or so, water and other stuff degrades DNA so much that nothing’s really
recoverable. So far.
[2]The rest of the
non-protein coding human DNA consists of regulatory elements of very kinds,
degraded relic genes that are no longer functional; transposons; repetitive
sequences whose functional role is not yet clear; and so forth. As a concrete
example of how these regions play a role, Sabeti and colleagues have published
a recent article in Cell (Feb. 2013),
demonstrating that an inter-genic region which seems to be under positive
selection has dialed down the human response to bacterial infection as compared
to other animals apparently to avoid septic shock.
[3]Such differences in
reproductive traits, immune system, and so forth are seen over and over again in
other animals as the locus of divergence between closely related species.
[4]See Tattersall, 2012, Masters of the Planet. Yes, yes, Reich
& company also found evidence of interbreeding between Neandertals and us
and Denisovans and us – enough so that a non-negligible proportion of our DNA
comes from Neandertals. Isn’t this a violation of ‘reproductive isolation,’ the
litmus test for what counts as a ‘good species’? Not any more; as any student
of modern biology appreciates, this in and of itself is not reason enough to
count us as and Neandertals as one and the same species, and the amount of
interbreeding required to get the empirically observed level of admixture here
isn’t very great, just 1 individuals every 70-80 generations. See Jerry Coyne and Allan Orr’s magisterial Speciation (2004, Sinauer Associates)
for more.
Conveniently timed to counter this bit of heresy...
ReplyDeleteYes, indeed. But in my view, it's not really 'heresy' As far as I can make out, it's simply a collection of untestable assertions, which is to say, a story. See if you can find _one_, as in _one_, assertion in the article pointed to by Chris that is empirically testable. (It claims, e.g., that we ought to be able to find remnants of Neandertal language in modern human languages, just like we can find Neandertal genes in the modern human genome; it claims that Neandertals had full human language -- actually, even further back, that 1 million years ago, the ancestor of both modern humans and Neandertals, Homo ergaster, had fully modern language, and that Neandertal language was most probably tonal.) For every such _empirically testable_ assertion that anybody finds, I will pledge $100 to the charity of their choice. (Duplicates don't count; your offer may vary; members of the National Academy of Science are prohibited from entry.)
ReplyDeleteI'm afraid there is a strongly reductionist feel to the discussion about how the data connects back to linguistic theory. If the argument against neural reductionism is that we know precious little about the neuronal processes and how they result in complex behaviour ("we presently know next to nothing about the physical principles underlying mental phenomena"), then why isn't it also true of inferring too much (or even anything) from the current genetic evidence. [Note: the anti-reductionist stance makes sense to me.]
ReplyDeleteThe observed similarity is in 1-2% of the genome. From the little I understand of genetics, there seems to be a tremendous amount we don't know about these "regulatory" gene sequences and even the "junk" DNA. If so, shouldn't we be pretty skeptical of any evidence such data provides for/against linguistic theories?
I can't help but feel that there is a "reductionist thought is OK, when it serves our purpose" flair to this that I find very unsettling. Furthermore, it puts linguists in bad light, when our reasoning becomes so opportunistic.
Perhaps, there are genuine differences between the two cases. It would be nice to know why arguing based on known information in a realm is not OK (to the point of stupidity) in one case, but OK in another.
You're spot on Karthik. I share your view. We have _almost no_ idea how anybody's genotype connects to almost any phenotype you'd care to name - let alone a complex cognitive/behavioral phenotype like language. If there was any impression to the contrary, please let me be the first to dispel it. In fact, I was trying to get across an _anti_reductionist point: there is very little genomic difference (much much smaller than 1% BTW -- 100 times smaller), and yet, apparently, a large phenotypic difference, between 'us' and 'them'. What conclusions are to be drawn from this - well, you'll get different answers from different people.
ReplyDeleteNice! I guess, I misunderstood the post a bit, then.
DeleteI went back and took a look at it, I think this is what led me to infer what I did in the post: "There has been simply too little evolutionary time and too little evolutionary distance between DNA-sans-language and DNA-with-language for the change that brought us language to have been all that great....That’s right in line with any approach that tosses out as much ‘language specific’ machinery as possible, leaving behind just that little bit of ‘special sauce’ that makes for human language and a world of difference between ‘us’ and ‘them.’ "
Are we talking about "language-specific machinery" at the genetic level or at the neurobiological and cognitive levels? Isn't it possible that there could be small changes at the DNA level that could still have big consequences at the neurobiological and cognitive levels? I feel that we will be in agreement here too, but I figured it was wise to clarify.
Btw, thanks a lot for the clarification!
DeleteHi Bob, fascinating stuff.
ReplyDeleteA clarification question: one thing that puzzles me is that the variation between humans is apparently 0.1% to 0.4%.
So how can the difference between human and neandertal be
0.000293%? Should the 88 be divided by a much smaller number than 30 million, namely the number of amino acids that is fixed in human?
Good question, Alex. The 88 amino acid differences noted above are differences that have become _fixed_ in the human population - that is, as far as we know, humans don't differ at all wrt these. Obviously there _are_ differences in genomes from person to person (or else we'd all be genomic clones of one another, like identical twins). If one looks at single DNA nucleotide differences ('single nucleotide polymorphisms', or SNPs), then on average you will find 1 SNP for every 1000 DNA nucleotides. But there are other variations too. In the blog I mentioned one, in Microcephalin, where there seem to be at least two main variants in human populations. So, yes, the number is probably less than 30 million, but not so much less that it changes the back-of-the-envelope calculation all that much. It will be interesting to see what David Reich and the rest of the Neandertal genome people come up with in their high-resolution analysis later this year.
DeleteExcellent that an expert who knows more than most people about genetics has joined discussion on Norbert's blog. You pose a challenge:
ReplyDeleteFor every such _empirically testable_ assertion that anybody finds, I will pledge $100 to the charity of their choice.
I admit this challenge reminds me of a question Cedric Boeckx faced at recent conference in Lisbon [I do this from memory, so please correct me if I am wrong, Cedric]. During Q&A of a talk reporting results from early acquisition studies on 4 [or was it 5?] languages one person asked: Why do you guys [generativists] always just work on a few well known European languages and then make claims about all languages. Cedric agreed that data from more languages would be desirable but, unfortunately, at the moment there are no [no large enough??] data bases for early acquisition in most languages. The questioner was rather unimpressed and asked: So why don't you [Cedric + his students] go out and create data bases for additional languages instead of waiting for others to do it for you? I thought it was a bit unfair to direct that question just at Cedric but, that aside, it seemed like a reasonable request directed at the field.
So, in this spirit, instead of paying us to locate _empirically testable_ assertions why don't you tell us what your empirically testable hypothesis is? And while you're at it: instead of letting us decide what that "little bit of ‘special sauce’ that makes for human language and a world of difference between ‘us’ and ‘them'" is - why don't you tell us what you think it is? Empirically testable proposals are especially appreciated.
Thanks for your note, Christina. I think I was a bit unclear when I referred to empirically testable assertions. I was referring to the paper that Chris (in the first blog comment) linked to - it claims all sorts of things about Neandertal and Homo ergaster 'language' (namely, that these lineages all had fully modern language, and that Neandertals were the same species as us). So, the idea was to see whether one could find any testable assertions in that paper. It's hard to come up with testable assertions about any of these events in the past about a cognitive ability (language) which doesn't leave any fossil record. As for the 'special sauce', I have written elsewhere that one obvious candidate is whatever it is that gave rise to 'Merge', or, along the lines of what Norbert says, 'label'. (See my chapters, "Syntax Facit Saltum" and "All you need is Merge" in the book edited by Di Sciullo and Cedric.) I haven't the foggiest idea about how to figure out what really happened, though - first we'd need a description of the 'phenotype' for merge or label - something like its neural realization would be good. This seems out of reach at the moment, at least to me.
DeleteThis comment has been removed by the author.
ReplyDeleteI'd like to comment on this:
ReplyDelete" The main reason for taking this second position is the absence of markers of cultural complexity (no big bang anthro evidence for cultural complexity until roughly 50,000 years ago). This kind of evidence is not dispositive, but it is very suggestive and anthropologists have regularly linked the emergence of cultural complexity to the emergence of language."
Dediu and Levinson (the paper mentioned above) claim that this is *not* currently the standard view of anthropologists. And they provide convincing arguments (in my view) against the inference from absence of evidence of cultural complexity before a certain date to absence of fully human language before that date. For instance, there are attested human groups (hunter-gatherers) whose cultural practices can leave no detectable trace. And yet undoubtedly they have the same language capacity as other humans. So it seems to me that this idea that fully human language is not older than (about) 50000 years old has basically very little basis. On top of that, evidence for symbolic activity before 50000 years ago seems to have been found (the paper mentions this). I haven't seen anything convincing that would rule out the possibility that a (non-trivial) precursor of human language existed 1 million years ago. In fact, precisely because of Darwin's problem, it is reasonable to speculate that there was such a precursor. But of course this is just speculation, and I agree that basically not much can be known in this domain at this point (if anything at all).
The part you quote is from my intro, not Berwick's post. So let me say that I rely entirely on experts. The main source for my views are Ian Tattersall "Masters of the Planet" c.f. Chapters 13 and 14. Ian Tattersall is no goomba, as you can see if you look at this link: (http://en.wikipedia.org/wiki/Ian_Tattersall.
DeleteSo at the very least the claim that language is a relatively recent innovation is hardly without scholarly support by mainstream anthropologists. Of course, as I noted, the evidence is indirect and so not dispositive. That said, as these things go it's not bad and I am happy to assume that something like the indicated 80-100,000 year time frame is on the right track.
Last point: it's never been clear to me that this matters all that much. If there is a qualitative difference between what humans do linguistically and what everything else does then the change is quite dramatic. If the time span re the emergence is short it only emphasizes that the change was small. However, whether large or small it resulted in a qualitative change and the goal is to find out how this might have happened. So, take whatever starting points you want and give yourself as much time as you want, how does this help? Recursion does not arise by increments and as this is the source of the big bang (at least for people like me) it's not clear what elongating the time line will buy you. That said, my current view is that Tattersall's discussion seems pretty good to me given the standards of the field.l