Monday, December 21, 2015
Holidays
I am going to be offline for most of the next two weeks. I don't swear not to put up an occasional post, but I am pretty sure that I won't. So, enjoy the time off. See you again in January.
A view from the front row
For those that have not seen this, here is a view of Chomsky by Bev Stohl his long time personal assistant. For what it is worth, I recognize the person that Bev describes more or less the same way that she describes him. It is not a view that is universally shared, though it is hardly unique (see here).
A long time ago I learned something that has proven to be invaluable (even if it is hard to implement). The unity of the virtues is a myth. What is that? It's the view that people who are virtuous are also smart and that the beautiful are also moral and kind and that the kind are also smart etc. In other words, the virtues come as a package. This view, a natural one I believe, comes with the corollary that one's friends are smart and kind and beautiful and that people whose views you don't agree with are lesser beings along some dimension. As I said, this is a myth, or so I believe. Smart people can be very unpleasant (if not worse) and nice people can be wrong (if not worse). For this reason I try to distinguish people's views from their personalities and try to evaluate them separately. But, like everyone else, I like it when those I admire intellectually are also people I admire personally. Chomsky is one such person. Bev's portrait accurately represents why.
A long time ago I learned something that has proven to be invaluable (even if it is hard to implement). The unity of the virtues is a myth. What is that? It's the view that people who are virtuous are also smart and that the beautiful are also moral and kind and that the kind are also smart etc. In other words, the virtues come as a package. This view, a natural one I believe, comes with the corollary that one's friends are smart and kind and beautiful and that people whose views you don't agree with are lesser beings along some dimension. As I said, this is a myth, or so I believe. Smart people can be very unpleasant (if not worse) and nice people can be wrong (if not worse). For this reason I try to distinguish people's views from their personalities and try to evaluate them separately. But, like everyone else, I like it when those I admire intellectually are also people I admire personally. Chomsky is one such person. Bev's portrait accurately represents why.
Wednesday, December 16, 2015
Three readables
Here are some readables.
First, for those interested in some background reading on
the DMZTP paper (Nai Ding, Lucia Melloni, Hang Zhang, Xiang Tian and David
Poeppel) discussed here
can look at Embick & Poeppel (E&P) (part deux) here. This
2015 paper is a sequel to a much earlier 2005 paper outlining the problems of
relating work on the brain bases of linguistic behavior and linguistic research
(discussed here).
The paper situates the discussion in DMZTP against a larger research program in
the cog-neuro of language.
E&P identifies three neuroling projects of varying
degrees of depth and difficulty.
1. Correlational
neuroling
2. Integrated
neuroling
3. Explanatory
neuroling
DMZTP fits neatly into the first project and makes tentative
conjectures relevant to the second and third. What are these three projects.
Here’s how E&P describes each (CR=computational/representational and NB=Neurobiological)
(360):
Correlational
neurolinguistics: CR theories of language
are used to
investigate the NB foundations of language.
Knowledge of
how the brain computes is gained by
capitalising on
CR knowledge of language.
Integrated
neurolinguistics: CR neurolinguistics plus the
NB perspective
provides crucial evidence that adjudicates
among different
CR theories. That is, brain data enrich our
understanding
of language at the CR level.
Explanatory
neurolinguistics: (Correlational + Integrated
neurolinguistics)
plus something about NB structure/
function
explains why the CR theory of language involves
particular
computations and representations (and not
others).
The whole paper is a great read (nothing surprising here)
and does a good job at identifying the kinds of questions worth answering. It’s
greatest virtue, IMO, is that it treats results both in linguistics and in
cog-neuro respectfully and asks how their respective insights can be
integrated. This is not a program of mindless reduction, something that is
unfortunately characteristic of too much current NB work on language.
Second, here’s
a piece on some big methodological goings-on in physics. The question,
relevant to our little part of the scientific universe, is what makes a theory
scientific. It seems that many don’t like string theory or multiverses and
think them and the thoeires that make use of them unscientific. Philosophers
are called in to help clear the muddle (something that physicists hate even the
idea of, but times are desperate it seems) and philosophers note that the
muddle partly arises from mistaking the hallmarks of what makes something
science. Popper and falsificationism is now understood by everyone to be a
simplification at best and a sever distortion with very bad consequences at
worst.
Chomsky once observed that big methodological questions of
what makes something science is usefully focused on those areas of our greatest
scientific success. I think that this is right. However, I also think that
listening in on these discussions is instructive. Here’s a place where eavesdropping might be
fun and instructive.
Third, here’s
a Tech Review article on some recent work by Tenenbaum and colleagues on
handwriting recognition (funny, just as cursive is on the cusp of disappearing,
we show how to get machines to recognize it. The curious twists and turns of
history). The research described is quite self-consciously opposed to deep
learning approaches to similar problems. Where does the difference lie?
Effectively the program uses a generative procedure using “strokes of an
imaginary pen” to match the incoming letters. Bayes is then used to refine
these generated objects. In other words, given a set of generative procedures
for constructing letters, we can generate better and better matches to input
through an iterative process in a Bayes like framework. And there is real
payoff. In putting this kind of generative procedure into the Bayes system, you
can learn to recognize novel “letters” from a very small number of examples.
Sound familiar? It should. Think Aspects! So, it looks like the tech world is coming to appreciate
the power of “innate” knowledge, i.e. how given
information can be used and extended.
Good. This is just the kind of stories GGers should delight in.
How’s this different from the deep learning/big data (DL/BD)
stuff? Well, by packing in prior info you can “learn” from a small number of
examples. Thus, this simplifies the inductive problem. Hinton, one of the
mucky-mucks in the DL/BD world notes that this stuff is “compatible with deep
learning.” Yup. Nonetheless, it fits ill with the general ethos behind the
enterprise. Why? Because it exploits an entirely different intuition concerning
how to approach “learning.” From the few discussions I have seen, DL/BD starts
from the idea that get enough data and learning will take care of itself. Why? Because
learning consists in extracting the generalizations in the data. If the
relevant generalizations are there in the data to be gleaned (even if lots of
data is needed to glean it as there is often A LOT of noise obscuring the
signal) then given enough data (hence the ‘big’ in Big Data) learning will
occur. The method described here questions the utility of this premise. As Tenenbaum notes:
“The key thing about probabilistic
programming—and rather different from the way most of the deep-learning stuff
is working—is that it starts with a program that describes the causal processes
in the world,” says Tenenbaum. “What we’re trying to learn is not a signature
of features, or a pattern of features. We’re trying to learn a program that
generates those characters.”
Does this make
the two approaches irreconcilable? Yes and no. No, in that one can always
combine two points of view that are not logically contradictory, and these
views are not. But yes in that what one takes to be central to solving a
learning problem (e.g. a pre-packaged program describing the causal process) is
absent form the other. It’s once again the question of where one thinks what
the hard problem is: describing the hypothesis space or the movements around
it. DL/BD downplays the former and bets on the latter. In this case, Tenenbaum
does a Chomsky and bets on the former.
As many of you know,
I have had my reservations concerning many of Tenenbaum’s projects, but I am think
he is making the right moves in this case. It always pays to recognize common
ground. DL/BD is the current home of unreconstructed empiricism. In this
particular case, Tenenbaum is making points that challenge this worldview. Good
for him, and me.
Monday, December 14, 2015
Brains do linguistic hierarchy
I was going to do something grand in praise of the paper I
mentioned in an earlier post by Nai Ding, Lucia Melloni, Hang Zhang, Xiang Tian
and David Poeppel (DMZTP) in
Nature Neurosceince
(here).
However, wiser heads have beaten me to the punch (see the comments sections here).
Still, as Morris Halle once noted, we here discuss not the news but the truth,
and with a mitigated version of this dictum in mind, I want to throw in my 2
cents (which in Canada, where I am writing this now, would amount to exactly 0
cents, given the recent abandonment of the penny (all amounts are rounded to
nearest 0)). So here is my summary judgment (recall, I AM NO EXPERT IN THESE
MATTERS!!!). It is the best neurolinguistics paper I have ever read. IMO, it
goes one step beyond even the best neuro-ling papers in outlining a possible
(as in ‘potential’) mechanism for a linguistically relevant phenomenon. Let me
explain.
The standard good
neuroling paper takes linguistically motivated categories and tries to localize
them in brain geography. We saw an example of this in the Frankland and Greene
paper wrt “theta roles” (see here
and here)
and in the Pallier et. al. paper for Merge (see here).
There are many other fine examples of this kind of work (see comment section here
for other many good references)[1].
However, at least to me, these papers generally don’t show (and even don’t even
aim to show) how brains accomplish
some cognitive task but try to locate where in the brain it is being
discharged. DMZTP also plays the brain geography game, but aims for more. Let
me elaborate.
DMZTP accomplishes several things.
First, it uncovers brain indices of hierarchy building. How
does it do this? It isolates a brain measure of on-line sentence parsing, a
measure that “entrains” to (correlates with) to linguistically relevant
hierarchy independently of prosodic and
statistical properties of the input. DMZTP assume, as any sane person
would, that if brains entrain to G relevant categories during comprehension
then these brains contain knowledge of the relevant categories and structures.
In other words, one cannot use knowledge that one does not have (cannot entrain
to data structures that are not contained in the brain). So, the paper provides
evidence that brains can track
linguistically significant categories and rationally concludes that the brain does so whenever confronted with
linguistic input (i.e. not only in artificial experimental conditions required
to prove the claim, but reflexively does this whenever linguistic material is
presented to it).
Showing this is no easy matter. It requires controlling for
all other sorts of factors. The two prominent ones that DMZTP controls for are
prosodic features of speech and the statistical properties of sub-sentential
inputs. Now, there is little doubt that speech comprehension exploits both
prosodic and statistical factors in parsing incoming linguistic input. The majority
opinion in the cog-neuro of language is that such features are all that the
brain uses. Indeed, many assume that brains are structurally incompatible with
grammatical rules (you know, neural nets don’t do representations) that build
hierarchical structures of the kind that GGers have been developing over the
last 60 years. Of course, such skepticism is ridiculous. We have scads of
behavioral evidence that linguistic objects are hierarchically organized and
that speakers know this and use this on line.[2]
And if dualism is false (and neuro types love to rail against silly Cartesians
who don’t understand that there are no ghosts (at least in brains)), then this
immediately and immaculately implies that brains
code for such hierarchical dependencies as well.[3]
DMZTP recognizes this (and does not
interpret its results Falkland&Greenishly i.e. as finally establishing some
weak-kneed hair brained linguistic’s conjecture). If so, the relevant question
is not whether this is so, but how it is, and this resolves into a series of
other related questions: (i) What are the neural indices of brain sensitivity
to hierarchy? (ii) What parts of the brain generate these neural markers? (iii)
How is this hierarchical information coded in neural tissue? and (iv) How do
brains coordinate the various kinds of linguistic hierarchical information in
online activities? These are hard question. How does DMZTP contribute to
answering them?
DMZTP shows that different brain frequencies track three
different linguistically relevant levels: syllables, phrases and sentences. In
particular, DMZTP shows
that cortical dynamics emerge at all timescales
required for the processing of different linguistic levels, including the
timescales corresponding to larger linguistic structures such as phrases and
sentences, and that the neural representation of each linguistic level
corresponds to timescales matching the timescales of the respective linguistic
level (1).
Not surprisingly, the relevant frequencies go from shorter
to longer. Moreover, the paper shows that the frequency responses can only be
accounted for by assuming that the brain
exploits “lexical, semantic and syntactic knowledge” and cannot be explained in terms of the
brain’s simply tracking prosodic or statistical information in the signal.
The tracking is actually very sensitive. One of the nicest
features of DMZTP is that it shows how “cortical responses” change as phrasal
structure changes. Bigger sentences and phrases provide different (yet similar)
profiles to shorter ones (see figure 4). In other words, DMZTP identifies
neural correlates that track sentence and phrase structure size as well as
type.
Second, DMZTP identifies the brain areas that generate the
neural “entrainment” activity they identified. I am no expert in these matters,
but the method used seems different from what I have seen before in such
papers. They used “intracranial cranial” electrodes (i.e. inside brains!) to localize the generators of the activity. Using
this technique (btw, don’t try this at home, you need hospitals with consenting
brain patients (epileptics in DMZTP’s case) who are ready to allow brain
invasions), DMZTP shows that the areas that generate the syllable, phrase and
sentence “waves” spatially dissociate.
Furthermore, they show that some areas of the brain that
respond to phrasal and sentential structure “showed no significant syllabic
rate response” (5). In the words of the authors:
In other words, there are cortical circuits
specifically encoding larger, abstract linguistic structures without responding
to syllabic-level acoustic features of speech. (5)
The invited conclusion (and I am more than willing to accept
the invitation) is that there are neural circuits tuned to tracking this kind
of abstract linguistic information. Note: This does not imply that these circuits are specifically tuned to exclusively
tracking this kind of information. The linguistic specificity of these brain
circuits has not been established. Nor has it been established that these kinds
of brain circuits are unique to humans. However, as DMZTP clearly knows, this
is a good first (and necessary) step towards studying these questions in more
detail (see the DMZTP discussion section). This, IMO, is a very exciting
prospect.
The last important contribution of the DMZTP lies in a
speculation. Here it is:
Concurrent neural tracking of hierarchical
linguistic structures provides a plausible functional mechanism for temporally
integrating smaller linguistic units into larger structures. In this form of
concurrent neural tracking, the neural representation of smaller linguistic
units is embedded at different phases of the neural activity tracking a higher
level structure. Thus, it provides a
possible mechanism to transform the hierarchical embedding of linguistic
structures into hierarchical embedding of neural dynamics, which may
facilitate information integration in time.
(5) [My emphasis, NH]
DMZTP relates this kind of brain wave embedding to
mechanisms proposed in other parts of cog-neuro to account for how brains
integrate top-down and bottom-up information and allows for the former to
predict properties of the latter. Here’s DMTZP:
For language processing, it is likely that
concurrent neural tracking of hierarchical linguistic structures provides
mechanisms to generate predictions on multiple linguistic levels and allow
interactions across linguistic levels….
Furthermore, coherent synchronization to the
correlated linguistic structures in different representational networks, for
example, syntactic, semantic and phonological, provides a way to integrate
multi-dimensional linguistic representations into a coherent language percept
just as temporal synchronization between cortical networks provides a possible
solution to the binding problem in sensory processing.
(5-6)
So, the DMZTP results are
theoretically suggestive and fit well with other current theoretical speculations
in the neural literature for addressing the binding problem and for providing a
mechanism that allows for different kinds of information to talk to one
another, and thereby influence online computation.
More particularly, the low
frequency responses to which sentences entrain are
… more distributed than high-gamma activity [which
entrain to syllables, NH], possibly reflecting the fact that the neural
representations of different levels of linguistic structures serve as inputs to
broad cortical areas. (5)
And
this is intriguing for it provides a plausible way for the brain to use high
level information to make useful predictions about the incoming input (i.e. a
mechanism for how the brain uses higher level information to make useful
top-down predictions).[4]
There
is one last really wonderful speculation; the oscillations DMZTP has identified
are “related to intrinsic, ongoing neural oscillations” (6). If they are, then
this would ground this speech processing system in some fundamental properties
of brain dynamics. In other words, and this is way over the top, (some of) the
system’s cog-neuro properties might reflect the most general features of brain
architecture and dynamics (“the timescales of larger linguistic structures fall
in the timescales, or temporal receptive windows that the relevant cortical
networks are sensitive to”). Wouldn’t that be amazing![5] Here is DMZTP again:
A long-lasting controversy concerns how the neural
responses to sensory stimuli are related to intrinsic, ongoing neural oscillations.
This question is heavily debated for the neural response entrained to the
syllabic rhythm of speech and can also be asked
for neural activity entrained to the time courses of larger linguistic
structures. Our experiment was not designed to answer this question; however,
we clearly found that cortical speech processing networks have the capacity to
generate activity on very long timescales corresponding to larger linguistic
structures, such as phrases and sentences. In other words, the timescales of
larger linguistic structures fall in the timescales, or temporal receptive
windows that the relevant cortical networks are sensitive to. Whether the
capacity of generating low-frequency activity during speech processing is the
same as the mechanisms generating low-frequency spontaneous neural oscillations
will need to be addressed in the future. (6)
Let
me end this encomium with two more points.
First,
a challenge: Norbert, why aren’t you critical of the hype that has been
associated with this paper, as you were of the PR surrounding the Frankland
& Greene (F&G) piece (see here and here)? The relevant text
for this question is the NYU press release (here). The reason is that,
so far as I can tell, the authors of DMZTP did not inflate their results the
way F&G did. Most importantly, they did not suggest that their work
vindicates Chomsky’s insights. So, in the paper, the authors note that their
work “underscore the undeniable existence of hierarchical structure building
operations in language comprehension” (5). These remarks then footnote standard
papers in linguistics. Note the adjective ‘undeniable.’
Moreover,
the press release is largely accurate. It describes DMZTP as “new support” for
the “decades old” Chomsky theory that we possess an “internal grammar.” It
rightly notes that “psychologists and neuroscientists predominantly reject this
viewpoint” and believe that linguistic knowledge is “based on both statistical
calculations between works and sound cue structures.” This, sadly, is the
received wisdom in the cog-neuro and pysch world, and we know why (filthy
Empiricism!!!). So, the release does not misdescribe the state of play and does
not suggest that neuroscience has finally provided real evidence for a heretofore airy-fairy speculation. In fact, it
seems more or less accurate, hence no criticism from me. What is sad is the noted
state of play in psych and cog-neuro, and this IS sad, very very sad.
Second,
the paper provides evidence for a useful methodological point: that one can do
excellent brain science using G theory that is not at the cutting edge. The G
knowledge explored is of Syntactic
Structures (SS) vintage. No Minimalism here. And that’s fine. Minimalism
does not gainsay that sentences have the kinds of structures that SS
postulated. It suggests different generative mechanisms, but not ones that
result in wildly different structures. So, you out there in cog-neuro land:
it’s ok to use G properties that are not at the theoretical cutting edge. Of
course, there is nothing wrong with hunting for Merge (go ahead), but many
questions clearly do not need to exploit the latest theoretical insight. So no
more excuses regarding how ling theory is always changing and so is so hard to
use and is so complicated yada yada yada.
That’s
it. My 2 cents. Go read the paper. It is very good, very suggestive and, oddly
for a technical piece, very accessible. Also, please comment. Others may feel
less enthralled than I have been. Tell us why.
[1]
I would include some recent papers by Lyna Pylkkanen on adjectival modification
in this group as well.
[2]
These are two different claims: it could be that the linguistic knowledge
exists but is not used online. However, we have excellent evidence for both the
existence of grammatical knowledge and its on-line usage. DMZTP provides yet more evidence that such
knowledge exists and is used online.
[3]
P.S. Most who rail against dualism really don’t seem to understand what the
doctrine is. But, for current purposes, this really does not matter.
[4]
Note, the paper does not claim to explain how
hierarchical information is coded in the brain. It might be that it is actually
coded in neural oscillations. But
DMZTP does not claim this. It claims that these oscillations reflect the
encoding (however that is done) and that they can be used to possibly convey
the relevant information. David Adger makes this point in the comments section
of the earlier post on the DMZTP paper. So far as I can tell, DMZTP commits no
hostages as to how the G information is coded in brains. It is, for example,
entirely consistent with the possibility that a Gallsitel like DNA coding of
this info is correct. All the paper does is note that these oscillations are
excellent indices of such structure, not that they are the neural bases of this knowledge.
[5]
Here’s a completely wild thought: imagine if we could relate phases to the
structure of these intrinsic oscillations? So the reason for the phases we have
is that they correspond to the size of the natural oscillations which subvene
language use. Now that would be something. Of course, at present there is zero reason
to believe anything like this. But then again, why exactly phases exist and are
the ones there are is theoretically ungrounded even within linguistics. That
suggests that wild speculation is apposite.
Tuesday, December 8, 2015
Once we are on the topic of grad education
Here is a recent piece on grad education in Nature. The "problem" it points to is that academia is producing more PhDs than there are academic positions to fill. This is true, it appears in virtually every domain, including the hard sciences like physics and biology. I am pretty sure that this applies to linguistics as well. I put 'problem' in scare quotes for it is an interesting question what kind of problem this is. Let's stipulate that in the best of all possible worlds PhDs only go to the deserving and all the deserving get the jobs desired. WHat is the best policy when this is not what we have? One reaction is that we should not produce more PhDs than there are academic positions for. Another is that we train PhDs for several kinds of career tracks (with several kinds of PhDs) only some of which are academic and another is that we train PhDs as we have always done but we are up front about the dangers of employment disappointment. All of these options are reviewed in the Nature piece.
For what it is worth, I am not sure what the right thing to do is. I do know that given this situation we should really be making sure that PhDs do not go into debt in order to get a degree. This requires that grad programs find ways to fully fund their students. As for the rest, I really don't know. There is something paternalistic about putting quotas on PhDs just because landing a job at the end is problematic. We must be clear about the prospects. But we should never lie to students or mislead them, But once the employment facts are made clear, are there further responsibilities? Maybe offering courses that make career options more flexible in the worst case? I don't know. However, this is, sadly, once again, a very important problem that could use some discussion.
When I graduated, jobs were not thick on the ground. Grad students were told that landing a job was chancy, even if one graduated from a good place with a good PhD. This did not serve to deter many of us. We all thought that we we would be ok. Some were. Some weren't. Looking back I am not sure that I would endorse a system that pre-culled us. Why? Because doing grad work was rewarding in itself. It was fun doing this stuff and, moreover, it turns out, in retrospect, that figuring out who would be the successful and who the less successful was not antecedently obvious. There may not be a better system.
This is an important problem. How should we proceed? Grad students out there are especially invited to chime in.
For what it is worth, I am not sure what the right thing to do is. I do know that given this situation we should really be making sure that PhDs do not go into debt in order to get a degree. This requires that grad programs find ways to fully fund their students. As for the rest, I really don't know. There is something paternalistic about putting quotas on PhDs just because landing a job at the end is problematic. We must be clear about the prospects. But we should never lie to students or mislead them, But once the employment facts are made clear, are there further responsibilities? Maybe offering courses that make career options more flexible in the worst case? I don't know. However, this is, sadly, once again, a very important problem that could use some discussion.
When I graduated, jobs were not thick on the ground. Grad students were told that landing a job was chancy, even if one graduated from a good place with a good PhD. This did not serve to deter many of us. We all thought that we we would be ok. Some were. Some weren't. Looking back I am not sure that I would endorse a system that pre-culled us. Why? Because doing grad work was rewarding in itself. It was fun doing this stuff and, moreover, it turns out, in retrospect, that figuring out who would be the successful and who the less successful was not antecedently obvious. There may not be a better system.
This is an important problem. How should we proceed? Grad students out there are especially invited to chime in.
The brain does linguistic hierarchy
I hope to blog on this paper by Poeppel and friends (the authors are Ding, Melloni, Zhang, Tian and Poeppel) more in the near future. However, for now, I wanted to bring it to your attention. This is an important addition to the growing number of papers that are beginning to use linguistic notions seriously to map brain behavior. There have been earlier papers by Dehaene and Friederici and Frankland and Greene and, no doubt others. But it looks to me that the number of these kinds of papers is increasing, and I take this to be a very good thing. Let's hope it soon puts to sleep the idea,s till prevalent among influential people, that hierarchy is tough or unnatural for brains to do, despite the overwhelming behavioral evidence that brains/minds indeed do it (see here for a monumentally dumb paper).
One warning: some of the hype around the Poeppel & Co paper are reporting it as final vindication of Chomsky's views (see here). From a linguists point of view, it is rather the reverse: this is a vindication of the idea that standard neuro methods can be of utility in investigating human cog-neuro capacities. In fact, the title of the Poeppel & Co paper indicates that this is how they are thinking of it as well. However, the hype does in fact respond to a standing prejudice in the brain sciences and so advertising the results in this way makes some rhetorical sense. As the Medical Press release accurately notes:
One warning: some of the hype around the Poeppel & Co paper are reporting it as final vindication of Chomsky's views (see here). From a linguists point of view, it is rather the reverse: this is a vindication of the idea that standard neuro methods can be of utility in investigating human cog-neuro capacities. In fact, the title of the Poeppel & Co paper indicates that this is how they are thinking of it as well. However, the hype does in fact respond to a standing prejudice in the brain sciences and so advertising the results in this way makes some rhetorical sense. As the Medical Press release accurately notes:
Neuroscientists and psychologists predominantly reject this viewpoint, contending that our comprehension does not result from an internal grammar; rather, it is based on both statistical calculations between words and sound cues to structure. That is, we know from experience how sentences should be properly constructed—a reservoir of information we employ upon hearing words and phrases. Many linguists, in contrast, argue that hierarchical structure building is a central feature of language processing.And given this background, a little corrective hype might be forgivable. By the way, it is important to understand that the result is linguistically modest. It shows that hierarchical dependencies is something the brain tracks and that stats cannot be the explanation for the results discovered. It does not tell us what specific hierarchical structures are being observed and which linguistic structures the might point to. That said, take a look. the paper is sure to be important.
Monday, December 7, 2015
How deep are typological differences?
Linguists like to put languages into groups. Some of these,
as in biology, are groupings based on historical descent (Germanic vs Romance),
some of long standing (Indo-European vs Ural Altaic vs Micronesian). Some
categorizations show more sensitive to morpho-syntactic form (analytic vs
agglutinative) and some are tied to whether they got to where they are spoken
by tough guys who rode little horses over long distances (Finno-Ugaric (and
Basque?)). There is a tacit agreement
that these groupings are significant typologically and hence linguistically significant as well. In what follows, I want
to query the ‘hence.’ I would like to offer a line of argument that concludes
that typological differences tell us nothing about FL. Or, to put this another
way, the structure of FL in no way reflects the typological differences that
linguists have uncovered. Or, to put this in reverse, typological distinctions
among languages have no FL import. If this is correct, then typology is not a
good probe into the structure of FL. And so if your interest is the structure
of FL (i.e. if liming the fine structure of FL is how you measure linguistic
significance), you might be well advised to study something other than
typology.
Before proceeding let me confess that I am not all that
confident about the argument that follows. There are several reasons for this. First,
I am unsure that the premises are as solid as I would like them to be. As you
will see, it relies on some semi-evolutionary speculation (and we all know how
great that is, not!). Second, even given the premises, I am unsure that the
logic is airtight. However, I think that the argument form is interesting and
it rests on widely held minimalist premises (based on a relatively new and,
IMO, very important observation regarding the evolutionary stability of FL), so
even if the argument fails it might tell us something about these
premises. So, with these caveats, cavils,
hedges and CYAs out of the way, here is the argument.
Big fact 1: the stability
of FL. Chomsky has emphasized this point recently. It is the observation that
whatever change (genetic, epi-genetic, angelic) led to the re-wiring of the
human brain thus supporting the distinctive species specific nature of human
linguistic facility, whatever change that was, it has remained intact and
unchanged in the species since its biological entrance. How do we know?
We know because of Big
Fact 2: any kid can learn any language and any kid learning any language
does so in essentially the same way. Thus, for example, a kid from NYC raised
in Papua New Guinea (PNG) will acquire the local argot just like a native (and
in the same way, with the same stages, making the same kinds of mistakes etc.).
And vice versa for a PNGer in NYC, despite the relative biological isolation of
PNGers for a pretty long period of time. If you don’t like this pair, plug in
any you would like, say Piraha speakers and German speakers or Hebrew Speakers
and Japanese. A child’s biological background seems irrelevant to which Gs it
can acquire and how it acquires them. Thus, since humans separated about 100kya
(trek out of Africa and all that), FL has remained biologically stable in the
species. It has not changed. That’s the big fact of interest.
Now, observe that 100k years is more than enough time for
evolution to work its magic. Think of Darwin’s finches. As soon as a niche
opened up, these little critters evolved to exploit it. And quickly filling
niches is not reserved just for finches. Humans do the same thing. Think of
lactase persistence (here).
The capacity to usefully digest milk products arose with the spread of cattle
domestication (i.e. roughly 5-10kya).[1]
So, humans also evolutionarily track novel “environmental” options and change
to exploit them at a relatively rapid rate. If 5-10k years is enough for the
evolution of the digestive system, then 100k years should be enough for FL to
“evolve” should there be something there to evolve. But, as we saw above, this
seems to be false. Or, more accurately, Big Fact 2 implies Big Fact 1 and Big
Fact 1 denies that FL has evolved in the last 100k years. In sum, it seems that
once the change allowing FL to emerge occurred nothing else happened evolution
wise to differentially affect this capacity across humans. So far as we can
tell, all human FLs are the same.
We can add to this a third “observation,” or, more
accurately, something I believe that linguists think is likely to be true
though we probably only have anecdotal evidence for it. Let’s call this Big Fact 3 (understanding the slight
tendentiousness of the “fact” part): kids can learn multiple first languages simultaneously and do so
in the same way despite the languages involved. [2]
So, LADs can acquire English and Hebrew (a Germanic and Semitic language) as
easily as German and Swedish (two Germanic languages), or Navajo and French or
Basque and Spanish as easily as French and Spanish or… In fact, kids will
acquire any two languages no matter how typologically distinct in effectively
the same way. In short, typological difference has no discernable impact on the
course of acquisition of two first languages. So, not only is there no
ethnically-biologically based genetic pre-disposition among FLs for some Gs
over others, there is not even a cognitive preference for acquiring Gs of the
same type over Gs that are typologically radically different.
If thess “facts” are indeed facts, the conclusion seems
obvious: to the degree that we understand FL as that cognitive-neural feature
of humans that underlies our capacity to acquire Gs then it is the same across
all humans (in the same sense that hearts or kidneys are, (i.e. abstracting
from normal variation)) and this implies that it has not evolved despite apparently
sufficient time for doing so.[3]
This raises an obvious question: why not? Why does the
process of language acquisition not care about typological differences? Or, if
typological differences run deep then why have they had no impact on the FLs of
people who have lived in distinct linguistic eco-niches?
Here’s one obvious answer: typological differences are
irrelevant to FL. However big these differences may seem to linguists, FL sees
these typologically “different” languages as all of a piece. In other words, from
the point of view of FL, typological variation is just surface fluff.
Same thing, said differently: there is a difference between
variation and typology. Variation is a fact, languages appear on the surface to
have different properties. Typology is a mid-level theoretical construct. It is
the supposition that variation comes in family types, or, that variation is (at
least in part) grammatically principled. The argument above does not question
the fact of variation. It calls into question whether this variation is in any
FL sense principled, whether the mid level construct is FL significant. It
argues it isn’t.
Let me put this last point more positively. Variation
establishes a puzzle for linguists in that kids acquire Gs that result in
different surface features. So, FL plus a learning theory must be able to
accommodate variation. However, if the above is on the right track, then this
is not because typological cleavages reflect structural fault lines (or
G-attractors) in FL or the learning theory. How exactly FL and learning
theories yield distinctive Gs is currently unknown. We have good cases studies
of how experience can fix different Gs with different surface properties but I
think it is fair to say that there is still lots more fundamental work to be
done.[4]
Nonetheless, even without knowing how
this happens, the argument above suggests that it does not happen in virtue of
a typologically differentiated FL.
Let me end with one last observation. Say that the above is
correct, it seems to me that a likely corollary is that FL has no internal
parameters. What I mean is that FL does not determine a finite space of possible
Gs, as GB envisioned. Why not?
Well say that acquisition consisted in fixing the values of
a finite series of FL internal open parameters. Then why wouldn’t evolution
have fixed the FL of speakers of typologically isolated languages so that the
relevant typological parameters were no longer open. On the assumption that
“closing” such a parameter would yield an acquisition advantage (fixing
parameters would reduce the size of the parameter space, so the more fixed parameters
the better as this would simplify the acquisition problem), why wouldn’t
evolution take advantage of the eco-niche to speed up G acquisition? Thus, why
wouldn’t humans be like finches with FLs quickly specializing to their
typological eco-niches? Doesn’t this suggest that parameters are not internal
properties of FL?
I am pretty sure that readers will find much to disagree
with here. That’s great. I think that the line of reasoning above is reasonable
and hangs together. Moreover, if correct, I believe that it is important for
pretty obvious reasons. But, there are sure to be counter-arguments and other
ways of understanding the “facts.” Can’t wait.
[1]
The example provided by Bill Idsardi. Thanks.
[2]
By two “first” languages I intend to signal the fact that this is a different
process from second language acquisition. Form the little I know about this
process, there is no strcit upper bound on how many first languages one can
simultaneously acquire, though I am willing to bet that past 3 or 4 the process
gets pretty hairy.
[3]
It also strongly casts doubt on the idea that FL itself is the product of an
evolutionary process. If it is, the question becomes why did it stop when it
did and not continue after humans separated? Why no apparent changes in the
last 100k years?
[4]
Charles Yang has a forthcoming book on this topic (which I heartily recommend)
and Jeff Lidz has done some exemplary work showing how to think of FL and
learning theory together to deliver integrated accounts of real time language acquisition. I am sure that there is other work
of this kind. Feel free to mention them in comments.