I want to say a couple of words about David Poeppel’s second
lecture (see here);
the one on which he argues that brains entrain to G like structures in online
speech comprehension. In earlier posts (e.g. here),
I evinced some skepticism about whether it will be possible for linguistics to
make much contact with brain work (BW) until BW starts finding the neural
analogues of classical CS notions like register, stack, index, buffer, etc. Here
I want to, at once, elaborate, yet also mitigate, this skepticism in light of
David’s lectures.
Whence my skepticism? The reason lies with BW methodology. Most
BW focuses on what the brain does while
being subjected to a particular stimulus. So for example, how does the brain
“react” when peppered with sentences that vary in grammaticality (note: I mean
‘grammaticality’ here and not ‘acceptability’).[1]
So, while a brain is processing an
actual linguistic object, what does it do? By its very nature, this kind of
experiment can only indirectly make contact with the basic claims of GG. Why?
Because, GG is a theory of competence (i.e. what speakers know), and what BW probes is not knowledge (linguistic or
otherwise) per se but how this
knowledge is deployed in real time (e.g. how it is used to perform the task of,
e.g. analyzing the incoming speech stream). I strongly believe that what you
know is implicated in what you do. But what you do (and how you do it) does not
rely exclusively on what you know.
That’s why it is useful to make the competence/performance distinction. And,
back to the main point, BW experiments are all based on online performance
tasks where the details of the performance system matter to the empirical
outcomes (or so I would suppose).[2]
But if these details matter, then in order to even see the contribution of Gs
(or FL/UG) it is imperative to have performance systems of the kind that we
believe are cognitively viable, and the ones that we think are such are largely
combinations of Gs embedded in classical computing devices with Turing-like
architectures.[3]
Nor is this mix an accident if people like Randy Gallistel
(and Jerry Fodor and Zenon Pylyshyn and Gary Marcus) are to be believed (and I
for one make it a cognitive policy to believe most of what Randy (and Jerry)
says). Turing architectures are just what cog-neuro (CN) needs to handle
representations, and representations are something that any theory hoping to
deal with basic cognition will need. As I have gone over these arguments before,
I will not further worry this point. But, rest assured, I take this very
seriously and because of this I draw the obvious conclusion: until BW finds the
neural analogues of basic Turing architecture the possibility of BW and
cognition (including linguistics) making vigorous contact will be pretty low.
That said, one of the interesting features of David’s second
lecture is that he showed that it is not impossible. In fact, the general
program he outlines is very exciting and worth thinking about from a linguistic
point of view. This is what I will try to do here (with all the appropriate
caveats concerning my low skill set etc.).
The lecture focuses on what David describes as “an
interesting alignment between…systems neuroscience….physics…(and) linguistics”
(slide 21). More particularly, he argues that the brain have natural “theta
rhythms” of 4-8 Hz, that physically speaking the modulation spectrum of speech
is 4-5 Hz and that the mean syllable duration cross linguistically is 150-300
ms, which is the right size to fit into these theta bands. So, if speech
processing requires chunking into brains sized units, then we might expect the
stream to be chopped into theta band sizes for the brain to examine in order to
extract further linguistic information necessary to get one to the relevant
interpretation. And what we expect, David claims we in fact find. The brain
seems to entrain to syllables, phrases and sentences. Or, more exactly, we can
find neural measures that seem to correlate with each such linguistic unit (see
slides 29-34).
Now showing this is not at all trivial, and therein lies the
beauty of the experiments that David so lovingly reported (it was actually a
bit embarrassing to see him caressing those findings so intimately in such a
public setting (over 150 people watching!!)).
Here’s
a post to a discussion of (and link to) the paper that David discussed.
Suffice it to say that what was required to make this convincing was
controlling for the many factors that likely correlate with the syntactic
structure. So the Ding et al (David being the last al) paper stripped out
prosodic information and statistical transitional probability information so
that only linguistic information concerning phrase structure and sentence
structure remained. What the paper showed is that even in the absence of these cues in the occurent stimulus the
brain entrained to phrases and sentences in addition to syllables. So, the
conclusion: brains of native speakers can track G like structures of their
native languages on line. In other words, it looks like brains use Gs in online
performance.
Now, gorgeous as this is, it is important to understand that
the result is very unsurprising from
a linguistic point of view. The conventional position is speakers use their
knowledge to parse the incoming signal. On the assumption that humans do this
in virtue of some feature of their brains (rather than say their ham strings or
pituitary glands) then it is not surprising to find a neural correlate of this
performance. Nor do David and friends believe otherwise. Oddly, the finding
caused somewhat of a buzz in the lecture hall, and this only makes sense if the
idea that people use their Gs in performance is considered a way-out-there kind
of proposition.
I should add, that nothing about these results tell us where
this Gish knowledge comes from. None of the entrainment results implicate piles
of innate brain structure or a genetic basis for language or any of the other
“sexy” (viz. conceptually anodyne) stuff that gets a psychologist or
neuroscientist hopping madly around flailing his/her arms and/or pulling
his/her hair. All it implicates is the existence of something like our old
recognizable linguistic representations (representations, btw, which are
completely uncontroversial linguistically in that they have analogues in
virtually every theory of syntax that I know of). They imply that sentences
have hierarchical structures of various kinds (syllables, phrases, sentences)
and that these are causally efficacious in online processing. How could this
conclusion possibly cause a stir.
It shouldn’t, but it did. That we can find a neural correlate of what we
know to be happening is interesting and noteworthy, but it is not something
that we did not expect to exist, at least absent a commitment to Cartesian Dualism
of the most vulgar kind (and yes, there are sophisticated forms of dualism).
What structures do the neural signals correlate with? Many
have noted that the relevant level may not be syntactic. In particular, what we
are tracking might be phonological phrases, rather than syntactic ones (I said
as much at Nijmegen and Ellen Lau noted this in a comment here).
However, this does not really matter for the main conclusion. Why? Because the
phonological phrases that might be being tracked are products of an underlying
syntax from which they are determined. Here’s what I mean.
Say that what is being tracked are phonological phrases of
some sort. What the experiment shows is that they are not tracking these
objects in virtue of their phonological
structure. The relevant phonological information has been weeded out of the
stimuli, as has the statistical information. So, if the brain is entraining to phonological phrases, then it is
doing so in virtue of tracking syntactic information. Now, again, every theory
I know of has a mapping between syntactic and intonational structure so the
fact that one is tracking the latter by analyzing structure according to the
former is not a surprise and it remains evidence that the brain can use (and
does use) G information in parsing the incoming speech string. So the general
conclusion that David draws (slide 57) seems to me perfectly correct:
1. There
are cortical circuits in the brain that “generate slow rhythms matching the
time scales of larger linguistic structures, even when such rhythms are not
present in the speech input” and this “provides a plausible mechanism for
online building of large linguistic structures.”
2. Such
tracking is rule/grammar based.
So what is exciting is not the conclusion (that brains use G
information in performance) but the fact that we now have a brain measure of
it.
There is also an exciting suggestion: that what holds for
the syllable, also holds for phrases and sentences. Here’s what I mean. There
is ample evidence that David reviews in lecture 2 that the physics of speech results
in chunks that have both a nice physical description and that fit into nice
neural bands and that find a
linguistic analogue; the syllable. So there is a kind of physical/neural basis
for this.
The question is whether this analogy extends to larger
units?[4]
What David shows is that the brain entrains to these units and that it does so
using the theta and delta band oscillations to do so. However, is it plausible
that phrases and sentences come (at least on average) in certain physical
“sizes” that neatly fit into natural brain cyclic bands so that we can argue
that there is a physical/neural natural size that phrases and sentences conform
to? This seems like a stretch. Is it?
I have no idea. But, maybe we shouldn’t dismiss the analogy
that David is pushing too quickly. Say that it is phono phrases that the Ding
et al paper is tracking. The question then is what kind of inventory of phono
phrases do we find. We know that these don’t perfectly track syntactic phrases
(remember: this is the cat, that ate the rat, that stole the cheese, that…).
There are “readjustments” that map constituents to their phonological
expressions. The question then is whether it is too far fetched to think that
something like this can hold for phrases and sentence as well as syllables.
Might there be a standard “average” size of a phrase (or a phase that contains
a phrase) say? One that is neatly packaged in a brain wave with the right bandwidth?
This doesn’t seem too far-fetched to me, but how would I know (remember, I know
squat about these matters). At any rate, this is the big idea that David’s
second and third lecture hint at. We are looking for natural neural envelopes within which interesting syntactic processing takes place. We are
not looking at syntactic processing itself, or at least neither David’s
lectures nor the Ding et al paper suggest that we are, but the containers the
brain wraps them in when analyzing them online.
That said, the envelope thesis would be interesting were it
true, even for a syntactician like me. Why? Because this could plausibly
constrain the mapping between (let’s call them) phonological phrase and
syntactic phrases and this might in turn tell us something about syntactic
phrases. Of course, it might not. But it is a useful speculation worth further
investigation, which, I am pretty certain, is what David is going to do.
To wrap up: I am pretty skeptical that current neuro methods
can make contact with much of what linguistics does but not because of any
problems with linguistics. The problem lies, in the main, with the fact that BW
has stopped looking for the structures required to tell any kind of performance
story. In a word, it has severed its ties to classical CS as Gallistel has
cogently argued. I believe that once we
find classical architectures in the brain (and they are there as the cognitive
evidence overwhelmingly shows us) then contact will be rampant and progress
will be made understanding how brains do language. This, however, will still
mainly tell us about how Gs get used and so only indirectly shine a light on
what kinds of representations Gs have. Of course, you can find lost keys even
in indirect light so this is nothing to sneeze at, or pooh-pooh, or sneer at or
cavalierly dismiss, or…What David’s work has shown is that it might be possible
to find interesting linguistically significant stuff even short of cracking the
Turing architecture/representation problem. David’s work relies on the idea (truism?)
that brains chunk information and the bold hypothesis that this chunking is
both based in brain architecture and might (at least indirectly) correlate with
significant linguistic units. He has made this conjecture solid when it comes
to syllables. It would be a really big deal if he could extend this to phrases
and sentences. I hope he is right and he can do it. It would be amazing and
wonderful.
One last comment: say that David is wrong. What would it
mean for the relation between linguistics and BW? Not much. It would leave us
roughly where we are today. The big problem, IMO, is still the one that
Gallistel has identified. This is true regardless of whether David’s hope is
realizable. Neural envelopes for packaging representations are not themselves representations.
They are what you stuff representations into. The big goal of the cog-neuro
game should be to understand the neural bases of these representations. If
David is right, we will have a narrower place to look, but just as you cannot
tell a book by its cover (although, personally I often try to) so you cannot
deduce the (full) structure of the representations from the envelopes they are
delivered in. So should David fail, it would be really too bad (the idea is a
great one) but it would not alter much how we should understand the general
contours of the enterprise. In other side, if David wins, we all do. If his
conjecture stops at the syllable, this has no implications about the neural
reality of phrases and sentences. They are very real, whether or not our
current technology assigns them stable neural correlates.
[1]
A HW problem: what’s the difference?
[2]
In fact, that whole point is that these are online tasks that allow you to look
into what is being done as it is being done. This contrasts with off line, say,
acceptability judgment tasks where all we look at it is the final step of what
all concede to be a long and complicated process. Interestingly, it appears
that taking a look at this last step allows for a better window into the
overall system of knowledge than does looking at how these steps are arrived
at. This is not that surprising when one thinks about it, despite the air of
paradox. Think forests and trees.
[3]
Think Marcus parsers or the Berwick and Weinberg left corner parser (a favorite
of mine for personal reasons).
[4]
This is what Ellen was skeptical of.
No comments:
Post a Comment