Tuesday, February 9, 2016

David Poeppel's second provocation

I want to say a couple of words about David Poeppel’s second lecture (see here); the one on which he argues that brains entrain to G like structures in online speech comprehension. In earlier posts (e.g. here), I evinced some skepticism about whether it will be possible for linguistics to make much contact with brain work (BW) until BW starts finding the neural analogues of classical CS notions like register, stack, index, buffer, etc. Here I want to, at once, elaborate, yet also mitigate, this skepticism in light of David’s lectures.

Whence my skepticism? The reason lies with BW methodology. Most BW focuses on what the brain does while being subjected to a particular stimulus. So for example, how does the brain “react” when peppered with sentences that vary in grammaticality (note: I mean ‘grammaticality’ here and not ‘acceptability’).[1] So, while a brain is processing an actual linguistic object, what does it do? By its very nature, this kind of experiment can only indirectly make contact with the basic claims of GG. Why? Because, GG is a theory of competence (i.e. what speakers know), and what BW probes is not knowledge (linguistic or otherwise) per se but how this knowledge is deployed in real time (e.g. how it is used to perform the task of, e.g. analyzing the incoming speech stream). I strongly believe that what you know is implicated in what you do. But what you do (and how you do it) does not rely exclusively on what you know. That’s why it is useful to make the competence/performance distinction. And, back to the main point, BW experiments are all based on online performance tasks where the details of the performance system matter to the empirical outcomes (or so I would suppose).[2] But if these details matter, then in order to even see the contribution of Gs (or FL/UG) it is imperative to have performance systems of the kind that we believe are cognitively viable, and the ones that we think are such are largely combinations of Gs embedded in classical computing devices with Turing-like architectures.[3]

Nor is this mix an accident if people like Randy Gallistel (and Jerry Fodor and Zenon Pylyshyn and Gary Marcus) are to be believed (and I for one make it a cognitive policy to believe most of what Randy (and Jerry) says). Turing architectures are just what cog-neuro (CN) needs to handle representations, and representations are something that any theory hoping to deal with basic cognition will need. As I have gone over these arguments before, I will not further worry this point. But, rest assured, I take this very seriously and because of this I draw the obvious conclusion: until BW finds the neural analogues of basic Turing architecture the possibility of BW and cognition (including linguistics) making vigorous contact will be pretty low.

That said, one of the interesting features of David’s second lecture is that he showed that it is not impossible. In fact, the general program he outlines is very exciting and worth thinking about from a linguistic point of view. This is what I will try to do here (with all the appropriate caveats concerning my low skill set etc.).

The lecture focuses on what David describes as “an interesting alignment between…systems neuroscience….physics…(and) linguistics” (slide 21). More particularly, he argues that the brain have natural “theta rhythms” of 4-8 Hz, that physically speaking the modulation spectrum of speech is 4-5 Hz and that the mean syllable duration cross linguistically is 150-300 ms, which is the right size to fit into these theta bands. So, if speech processing requires chunking into brains sized units, then we might expect the stream to be chopped into theta band sizes for the brain to examine in order to extract further linguistic information necessary to get one to the relevant interpretation. And what we expect, David claims we in fact find. The brain seems to entrain to syllables, phrases and sentences. Or, more exactly, we can find neural measures that seem to correlate with each such linguistic unit (see slides 29-34).

Now showing this is not at all trivial, and therein lies the beauty of the experiments that David so lovingly reported (it was actually a bit embarrassing to see him caressing those findings so intimately in such a public setting (over 150 people watching!!)). Here’s a post to a discussion of (and link to) the paper that David discussed. Suffice it to say that what was required to make this convincing was controlling for the many factors that likely correlate with the syntactic structure. So the Ding et al (David being the last al) paper stripped out prosodic information and statistical transitional probability information so that only linguistic information concerning phrase structure and sentence structure remained. What the paper showed is that even in the absence of these cues in the occurent stimulus the brain entrained to phrases and sentences in addition to syllables. So, the conclusion: brains of native speakers can track G like structures of their native languages on line. In other words, it looks like brains use Gs in online performance.

Now, gorgeous as this is, it is important to understand that the result is very unsurprising from a linguistic point of view. The conventional position is speakers use their knowledge to parse the incoming signal. On the assumption that humans do this in virtue of some feature of their brains (rather than say their ham strings or pituitary glands) then it is not surprising to find a neural correlate of this performance. Nor do David and friends believe otherwise. Oddly, the finding caused somewhat of a buzz in the lecture hall, and this only makes sense if the idea that people use their Gs in performance is considered a way-out-there kind of proposition.

I should add, that nothing about these results tell us where this Gish knowledge comes from. None of the entrainment results implicate piles of innate brain structure or a genetic basis for language or any of the other “sexy” (viz. conceptually anodyne) stuff that gets a psychologist or neuroscientist hopping madly around flailing his/her arms and/or pulling his/her hair. All it implicates is the existence of something like our old recognizable linguistic representations (representations, btw, which are completely uncontroversial linguistically in that they have analogues in virtually every theory of syntax that I know of). They imply that sentences have hierarchical structures of various kinds (syllables, phrases, sentences) and that these are causally efficacious in online processing. How could this conclusion possibly cause a stir.  It shouldn’t, but it did. That we can find a neural correlate of what we know to be happening is interesting and noteworthy, but it is not something that we did not expect to exist, at least absent a commitment to Cartesian Dualism of the most vulgar kind (and yes, there are sophisticated forms of dualism).

What structures do the neural signals correlate with? Many have noted that the relevant level may not be syntactic. In particular, what we are tracking might be phonological phrases, rather than syntactic ones (I said as much at Nijmegen and Ellen Lau noted this in a comment here). However, this does not really matter for the main conclusion. Why? Because the phonological phrases that might be being tracked are products of an underlying syntax from which they are determined. Here’s what I mean.

Say that what is being tracked are phonological phrases of some sort. What the experiment shows is that they are not tracking these objects in virtue of their phonological structure. The relevant phonological information has been weeded out of the stimuli, as has the statistical information. So, if the brain is entraining to phonological phrases, then it is doing so in virtue of tracking syntactic information. Now, again, every theory I know of has a mapping between syntactic and intonational structure so the fact that one is tracking the latter by analyzing structure according to the former is not a surprise and it remains evidence that the brain can use (and does use) G information in parsing the incoming speech string. So the general conclusion that David draws (slide 57) seems to me perfectly correct:

1.     There are cortical circuits in the brain that “generate slow rhythms matching the time scales of larger linguistic structures, even when such rhythms are not present in the speech input” and this “provides a plausible mechanism for online building of large linguistic structures.”
2.     Such tracking is rule/grammar based.

So what is exciting is not the conclusion (that brains use G information in performance) but the fact that we now have a brain measure of it.

There is also an exciting suggestion: that what holds for the syllable, also holds for phrases and sentences. Here’s what I mean. There is ample evidence that David reviews in lecture 2 that the physics of speech results in chunks that have both a nice physical description and that fit into nice neural bands and that find a linguistic analogue; the syllable. So there is a kind of physical/neural basis for this.

The question is whether this analogy extends to larger units?[4] What David shows is that the brain entrains to these units and that it does so using the theta and delta band oscillations to do so. However, is it plausible that phrases and sentences come (at least on average) in certain physical “sizes” that neatly fit into natural brain cyclic bands so that we can argue that there is a physical/neural natural size that phrases and sentences conform to? This seems like a stretch. Is it?

I have no idea. But, maybe we shouldn’t dismiss the analogy that David is pushing too quickly. Say that it is phono phrases that the Ding et al paper is tracking. The question then is what kind of inventory of phono phrases do we find. We know that these don’t perfectly track syntactic phrases (remember: this is the cat, that ate the rat, that stole the cheese, that…). There are “readjustments” that map constituents to their phonological expressions. The question then is whether it is too far fetched to think that something like this can hold for phrases and sentence as well as syllables. Might there be a standard “average” size of a phrase (or a phase that contains a phrase) say? One that is neatly packaged in a brain wave with the right bandwidth? This doesn’t seem too far-fetched to me, but how would I know (remember, I know squat about these matters). At any rate, this is the big idea that David’s second and third lecture hint at. We are looking for natural neural envelopes within which interesting syntactic processing takes place. We are not looking at syntactic processing itself, or at least neither David’s lectures nor the Ding et al paper suggest that we are, but the containers the brain wraps them in when analyzing them online.

That said, the envelope thesis would be interesting were it true, even for a syntactician like me. Why? Because this could plausibly constrain the mapping between (let’s call them) phonological phrase and syntactic phrases and this might in turn tell us something about syntactic phrases. Of course, it might not. But it is a useful speculation worth further investigation, which, I am pretty certain, is what David is going to do.

To wrap up: I am pretty skeptical that current neuro methods can make contact with much of what linguistics does but not because of any problems with linguistics. The problem lies, in the main, with the fact that BW has stopped looking for the structures required to tell any kind of performance story. In a word, it has severed its ties to classical CS as Gallistel has cogently argued.  I believe that once we find classical architectures in the brain (and they are there as the cognitive evidence overwhelmingly shows us) then contact will be rampant and progress will be made understanding how brains do language. This, however, will still mainly tell us about how Gs get used and so only indirectly shine a light on what kinds of representations Gs have. Of course, you can find lost keys even in indirect light so this is nothing to sneeze at, or pooh-pooh, or sneer at or cavalierly dismiss, or…What David’s work has shown is that it might be possible to find interesting linguistically significant stuff even short of cracking the Turing architecture/representation problem. David’s work relies on the idea (truism?) that brains chunk information and the bold hypothesis that this chunking is both based in brain architecture and might (at least indirectly) correlate with significant linguistic units. He has made this conjecture solid when it comes to syllables. It would be a really big deal if he could extend this to phrases and sentences. I hope he is right and he can do it. It would be amazing and wonderful.

One last comment: say that David is wrong. What would it mean for the relation between linguistics and BW? Not much. It would leave us roughly where we are today. The big problem, IMO, is still the one that Gallistel has identified. This is true regardless of whether David’s hope is realizable. Neural envelopes for packaging representations are not themselves representations. They are what you stuff representations into. The big goal of the cog-neuro game should be to understand the neural bases of these representations. If David is right, we will have a narrower place to look, but just as you cannot tell a book by its cover (although, personally I often try to) so you cannot deduce the (full) structure of the representations from the envelopes they are delivered in. So should David fail, it would be really too bad (the idea is a great one) but it would not alter much how we should understand the general contours of the enterprise. In other side, if David wins, we all do. If his conjecture stops at the syllable, this has no implications about the neural reality of phrases and sentences. They are very real, whether or not our current technology assigns them stable neural correlates.

[1] A HW problem: what’s the difference?
[2] In fact, that whole point is that these are online tasks that allow you to look into what is being done as it is being done. This contrasts with off line, say, acceptability judgment tasks where all we look at it is the final step of what all concede to be a long and complicated process. Interestingly, it appears that taking a look at this last step allows for a better window into the overall system of knowledge than does looking at how these steps are arrived at. This is not that surprising when one thinks about it, despite the air of paradox. Think forests and trees.
[3] Think Marcus parsers or the Berwick and Weinberg left corner parser (a favorite of mine for personal reasons).
[4] This is what Ellen was skeptical of.

No comments:

Post a Comment