Tuesday, February 9, 2016

David Poeppel's second provocation

I want to say a couple of words about David Poeppel’s second lecture (see here); the one on which he argues that brains entrain to G like structures in online speech comprehension. In earlier posts (e.g. here), I evinced some skepticism about whether it will be possible for linguistics to make much contact with brain work (BW) until BW starts finding the neural analogues of classical CS notions like register, stack, index, buffer, etc. Here I want to, at once, elaborate, yet also mitigate, this skepticism in light of David’s lectures.

Whence my skepticism? The reason lies with BW methodology. Most BW focuses on what the brain does while being subjected to a particular stimulus. So for example, how does the brain “react” when peppered with sentences that vary in grammaticality (note: I mean ‘grammaticality’ here and not ‘acceptability’).[1] So, while a brain is processing an actual linguistic object, what does it do? By its very nature, this kind of experiment can only indirectly make contact with the basic claims of GG. Why? Because, GG is a theory of competence (i.e. what speakers know), and what BW probes is not knowledge (linguistic or otherwise) per se but how this knowledge is deployed in real time (e.g. how it is used to perform the task of, e.g. analyzing the incoming speech stream). I strongly believe that what you know is implicated in what you do. But what you do (and how you do it) does not rely exclusively on what you know. That’s why it is useful to make the competence/performance distinction. And, back to the main point, BW experiments are all based on online performance tasks where the details of the performance system matter to the empirical outcomes (or so I would suppose).[2] But if these details matter, then in order to even see the contribution of Gs (or FL/UG) it is imperative to have performance systems of the kind that we believe are cognitively viable, and the ones that we think are such are largely combinations of Gs embedded in classical computing devices with Turing-like architectures.[3]

Nor is this mix an accident if people like Randy Gallistel (and Jerry Fodor and Zenon Pylyshyn and Gary Marcus) are to be believed (and I for one make it a cognitive policy to believe most of what Randy (and Jerry) says). Turing architectures are just what cog-neuro (CN) needs to handle representations, and representations are something that any theory hoping to deal with basic cognition will need. As I have gone over these arguments before, I will not further worry this point. But, rest assured, I take this very seriously and because of this I draw the obvious conclusion: until BW finds the neural analogues of basic Turing architecture the possibility of BW and cognition (including linguistics) making vigorous contact will be pretty low.

That said, one of the interesting features of David’s second lecture is that he showed that it is not impossible. In fact, the general program he outlines is very exciting and worth thinking about from a linguistic point of view. This is what I will try to do here (with all the appropriate caveats concerning my low skill set etc.).

The lecture focuses on what David describes as “an interesting alignment between…systems neuroscience….physics…(and) linguistics” (slide 21). More particularly, he argues that the brain have natural “theta rhythms” of 4-8 Hz, that physically speaking the modulation spectrum of speech is 4-5 Hz and that the mean syllable duration cross linguistically is 150-300 ms, which is the right size to fit into these theta bands. So, if speech processing requires chunking into brains sized units, then we might expect the stream to be chopped into theta band sizes for the brain to examine in order to extract further linguistic information necessary to get one to the relevant interpretation. And what we expect, David claims we in fact find. The brain seems to entrain to syllables, phrases and sentences. Or, more exactly, we can find neural measures that seem to correlate with each such linguistic unit (see slides 29-34).

Now showing this is not at all trivial, and therein lies the beauty of the experiments that David so lovingly reported (it was actually a bit embarrassing to see him caressing those findings so intimately in such a public setting (over 150 people watching!!)). Here’s a post to a discussion of (and link to) the paper that David discussed. Suffice it to say that what was required to make this convincing was controlling for the many factors that likely correlate with the syntactic structure. So the Ding et al (David being the last al) paper stripped out prosodic information and statistical transitional probability information so that only linguistic information concerning phrase structure and sentence structure remained. What the paper showed is that even in the absence of these cues in the occurent stimulus the brain entrained to phrases and sentences in addition to syllables. So, the conclusion: brains of native speakers can track G like structures of their native languages on line. In other words, it looks like brains use Gs in online performance.

Now, gorgeous as this is, it is important to understand that the result is very unsurprising from a linguistic point of view. The conventional position is speakers use their knowledge to parse the incoming signal. On the assumption that humans do this in virtue of some feature of their brains (rather than say their ham strings or pituitary glands) then it is not surprising to find a neural correlate of this performance. Nor do David and friends believe otherwise. Oddly, the finding caused somewhat of a buzz in the lecture hall, and this only makes sense if the idea that people use their Gs in performance is considered a way-out-there kind of proposition.

I should add, that nothing about these results tell us where this Gish knowledge comes from. None of the entrainment results implicate piles of innate brain structure or a genetic basis for language or any of the other “sexy” (viz. conceptually anodyne) stuff that gets a psychologist or neuroscientist hopping madly around flailing his/her arms and/or pulling his/her hair. All it implicates is the existence of something like our old recognizable linguistic representations (representations, btw, which are completely uncontroversial linguistically in that they have analogues in virtually every theory of syntax that I know of). They imply that sentences have hierarchical structures of various kinds (syllables, phrases, sentences) and that these are causally efficacious in online processing. How could this conclusion possibly cause a stir.  It shouldn’t, but it did. That we can find a neural correlate of what we know to be happening is interesting and noteworthy, but it is not something that we did not expect to exist, at least absent a commitment to Cartesian Dualism of the most vulgar kind (and yes, there are sophisticated forms of dualism).

What structures do the neural signals correlate with? Many have noted that the relevant level may not be syntactic. In particular, what we are tracking might be phonological phrases, rather than syntactic ones (I said as much at Nijmegen and Ellen Lau noted this in a comment here). However, this does not really matter for the main conclusion. Why? Because the phonological phrases that might be being tracked are products of an underlying syntax from which they are determined. Here’s what I mean.

Say that what is being tracked are phonological phrases of some sort. What the experiment shows is that they are not tracking these objects in virtue of their phonological structure. The relevant phonological information has been weeded out of the stimuli, as has the statistical information. So, if the brain is entraining to phonological phrases, then it is doing so in virtue of tracking syntactic information. Now, again, every theory I know of has a mapping between syntactic and intonational structure so the fact that one is tracking the latter by analyzing structure according to the former is not a surprise and it remains evidence that the brain can use (and does use) G information in parsing the incoming speech string. So the general conclusion that David draws (slide 57) seems to me perfectly correct:

1.     There are cortical circuits in the brain that “generate slow rhythms matching the time scales of larger linguistic structures, even when such rhythms are not present in the speech input” and this “provides a plausible mechanism for online building of large linguistic structures.”
2.     Such tracking is rule/grammar based.

So what is exciting is not the conclusion (that brains use G information in performance) but the fact that we now have a brain measure of it.

There is also an exciting suggestion: that what holds for the syllable, also holds for phrases and sentences. Here’s what I mean. There is ample evidence that David reviews in lecture 2 that the physics of speech results in chunks that have both a nice physical description and that fit into nice neural bands and that find a linguistic analogue; the syllable. So there is a kind of physical/neural basis for this.

The question is whether this analogy extends to larger units?[4] What David shows is that the brain entrains to these units and that it does so using the theta and delta band oscillations to do so. However, is it plausible that phrases and sentences come (at least on average) in certain physical “sizes” that neatly fit into natural brain cyclic bands so that we can argue that there is a physical/neural natural size that phrases and sentences conform to? This seems like a stretch. Is it?

I have no idea. But, maybe we shouldn’t dismiss the analogy that David is pushing too quickly. Say that it is phono phrases that the Ding et al paper is tracking. The question then is what kind of inventory of phono phrases do we find. We know that these don’t perfectly track syntactic phrases (remember: this is the cat, that ate the rat, that stole the cheese, that…). There are “readjustments” that map constituents to their phonological expressions. The question then is whether it is too far fetched to think that something like this can hold for phrases and sentence as well as syllables. Might there be a standard “average” size of a phrase (or a phase that contains a phrase) say? One that is neatly packaged in a brain wave with the right bandwidth? This doesn’t seem too far-fetched to me, but how would I know (remember, I know squat about these matters). At any rate, this is the big idea that David’s second and third lecture hint at. We are looking for natural neural envelopes within which interesting syntactic processing takes place. We are not looking at syntactic processing itself, or at least neither David’s lectures nor the Ding et al paper suggest that we are, but the containers the brain wraps them in when analyzing them online.

That said, the envelope thesis would be interesting were it true, even for a syntactician like me. Why? Because this could plausibly constrain the mapping between (let’s call them) phonological phrase and syntactic phrases and this might in turn tell us something about syntactic phrases. Of course, it might not. But it is a useful speculation worth further investigation, which, I am pretty certain, is what David is going to do.

To wrap up: I am pretty skeptical that current neuro methods can make contact with much of what linguistics does but not because of any problems with linguistics. The problem lies, in the main, with the fact that BW has stopped looking for the structures required to tell any kind of performance story. In a word, it has severed its ties to classical CS as Gallistel has cogently argued.  I believe that once we find classical architectures in the brain (and they are there as the cognitive evidence overwhelmingly shows us) then contact will be rampant and progress will be made understanding how brains do language. This, however, will still mainly tell us about how Gs get used and so only indirectly shine a light on what kinds of representations Gs have. Of course, you can find lost keys even in indirect light so this is nothing to sneeze at, or pooh-pooh, or sneer at or cavalierly dismiss, or…What David’s work has shown is that it might be possible to find interesting linguistically significant stuff even short of cracking the Turing architecture/representation problem. David’s work relies on the idea (truism?) that brains chunk information and the bold hypothesis that this chunking is both based in brain architecture and might (at least indirectly) correlate with significant linguistic units. He has made this conjecture solid when it comes to syllables. It would be a really big deal if he could extend this to phrases and sentences. I hope he is right and he can do it. It would be amazing and wonderful.

One last comment: say that David is wrong. What would it mean for the relation between linguistics and BW? Not much. It would leave us roughly where we are today. The big problem, IMO, is still the one that Gallistel has identified. This is true regardless of whether David’s hope is realizable. Neural envelopes for packaging representations are not themselves representations. They are what you stuff representations into. The big goal of the cog-neuro game should be to understand the neural bases of these representations. If David is right, we will have a narrower place to look, but just as you cannot tell a book by its cover (although, personally I often try to) so you cannot deduce the (full) structure of the representations from the envelopes they are delivered in. So should David fail, it would be really too bad (the idea is a great one) but it would not alter much how we should understand the general contours of the enterprise. In other side, if David wins, we all do. If his conjecture stops at the syllable, this has no implications about the neural reality of phrases and sentences. They are very real, whether or not our current technology assigns them stable neural correlates.





[1] A HW problem: what’s the difference?
[2] In fact, that whole point is that these are online tasks that allow you to look into what is being done as it is being done. This contrasts with off line, say, acceptability judgment tasks where all we look at it is the final step of what all concede to be a long and complicated process. Interestingly, it appears that taking a look at this last step allows for a better window into the overall system of knowledge than does looking at how these steps are arrived at. This is not that surprising when one thinks about it, despite the air of paradox. Think forests and trees.
[3] Think Marcus parsers or the Berwick and Weinberg left corner parser (a favorite of mine for personal reasons).
[4] This is what Ellen was skeptical of.

Sunday, February 7, 2016

A reaction to my Nijmegen provocation

Bill Idsardi sent me a link to some useful comments by Sean Roberts on David's three lectures (see here). As Bill noted, we seem to agree that the barn owl is a great paradigm of what we should all be aiming for. However, I think we agree on little else. However, this is not what I wanted to comment on.  What I found interesting is that Sean got exactly the wrong message from my comments (and that is almost certainly my fault). Let me elaborate a little bit.

First, I did not mean to imply (nor did I say) that the only questions of linguistic interest were syntactic. So, I agree with Sean that "there is more to language than syntax." But, and this is what I wanted to get across, there is at least syntax, and this is not something that neuroscientists seem happy to concede. In fact, I has been my experience that brain types have no idea what modern syntax has discovered and when they do know something they seem worried that there are many apparently different theories out there that don't agree (some also worry about the empirical basis of the discoveries). One aim I had was to assure the brain types in the audience that the standard theories make most of the same distinctions and identify effectively the same kinds of dependencies as syntactically relevant. So, once one gets by the surface orthography, many of the theories are basically saying the same thing. Thus, there is, contrary to appearances, a general consensus about some of the key syntactic features of Gs. See slides 9 and 10 (here).

Second, I argued that brain types should hope that something like Minimalism is on the right track as it will make it easier to make contact between linguistic ideas and neural ones. I mentioned bird song and phonology as an example. If they are the same kind of system (as some have argued) that we might learn something about the neural basis of phonology by studying (ahem, torturing) birds. Similarly, if large parts of FL are not linguistically specific, as Minimalism hopes, then we can study these in linguistic systems. So, for example, we could learn a lot about FL by studying other cognitive systems in humans and animals. What we will not be able to do is learn anything about those features of FL that are linguistically proprietary precisely because they are specific to linguistic capacity. But, if Minimalism is right, the set of things that are properly only linguistic may be small.

Third, I did not provide a parts list. I suggested that giving one will be easier if Minimalism is right for it reduces the number of moving parts to a minimum and identifies those that are purely linguistic. That should be useful for in place of looking for very complex "circuits" (e.g. passive, raising, question formation) we would be hunting for more basic ones like ones for agree or merge/combine. This, I suggested, look like the right kind of grain, unlike the GB notions that came before. Of course, I could be wrong, but…

Fourth, I did say that Chomsky is always right. And he is, ABOUT THE BIG ISSUES. I personally also think that he is often right about the details, but many have disagreed with him on these, including moi. But on the big issues he is completely correct. There is no doubt that humans have Gs that they use to produce and understand language. There is no doubt that there is something special about humans that make them linguistically capable and that involves being able to acquire Gs. These strike me as truisms and it is really a waste of time pretending that these views are controversial. They may generate controversy (I have no idea if Sean thinks them suspect) but they really aren't. They are obviously true. Of course, how they are and what this means for cognition and brains is NOT trivial or obvious. However, we have learned some things over the last 60 years about the mental mechanisms implicated. If we have learned less about how the brain does these things, then the problem lies not with what we have learned about FL from a cognitive perspective.

Fifth, where does the problem lie? Here I channel Randy Gallistel. I think that brain people will not make contact with most of cognition until they give up their neural net ways. Randy makes this point concerning dead reckoning in ants and cash behavior in birds. It applies parri passu for language and many other domains of cognition. Brain people need to start wondering about how brains represent and how they use these representations in information processing. They need to discover the analogues of standard CS notions like registers, indices, read from, writing to, stacks and all the stuff that CS types brought to cognition (and linguistics too) in the 70s and 80s. Read Mitch Marcus, Bob Berwick etc. When brain types find these mechanisms it will be much easier to relate findings in cognition (and linguistics) and brains. Why? Because neuro investigations are about how language is being used. We have no idea how brains store information (as David P noted). We can look to see how the brain does language processing given a linguistic input. But this means that we are looking at how Gs are being deployed and for this we need the technology that CS brings to the table. We have plenty of models of parsing. However, none of these make neural sense as they are all stated in classical CS idiom.

Last point: Sean took me to be dogmatic. I think I am. There are some issues that I strongly believe are simply not up for grabs and keeping an open mind concerning these is bad for your intellectual health (think flat earth or global warming denialism) At Nijmegen, I pointed out what I though these were. I outlined them, and defended them (up to a point) and, given that I had 20 minutes, I exercised my judgement and told the assembled brain worthies what I thought the uncontroversial parts were and tried to clear up some misconceptions. That's about all you can do in 20 minutes. However, what I think really irks Sean is not that I gave no arguments, but that I made a lot of claims that he didn't like (such as that GG has stuff to tell us and that these results are not controversial or really debatable). And for that I am happy. As Keynes once said: "I personally despair of results from anything except violent and ruthless truth telling--that will work in the end even if slowly." I sure hope Keynes was right. And as I know some small parts of the truth as regards language, I intend to make these known in venues like Nijmegen whenever I can as forcefully as I can. And you should too, for believe me, there is a lot of ignorance out there.

Thursday, February 4, 2016

David Adger; Baggett Lecture 3

Well, he did it! Three excellent and provocative talks on syntactic theory. Here is the third set of slides. In this talk, David went after sidewards movement (SWM), a favorite idea of mine and, though you might not believe this, I sat quite demurely through the whole talk and basically agreed with much of the what David had to say. I, not surprisingly, did not buy the conclusion, but I did buy the way that he set up the problem and the way that he approached a solution given his empirical judgments about the empirical viability of SWM. How so?

David makes two important points (i.e. points that I completely agree with).

First, that contrary to what is sometimes said, the current definition(s) of Merge does not by itself rule out SWM as an instance of Merge. It is sometimes claimed that whereas Merge when applied to E and I instances is a binary operation, when extended to SWM becomes a 3-place operation. What is correct is that one can define SWM to be 3-place and thereby invidiously distinguish it from the other applications, BUT, this is not a definition forced by any notions of conceptual simplicity or computational elegance. It is just something one can do if one wants to rule out SWM.

Moreover, as David seemed to concede, there is no really non ad-hoc way of ruling SWM out by simplifying the definition of Merge. All require further (ahem) refinements (what I would dub, extrinsic machinery designed to rule out a perfectly well defined option). As he noted in the talk, and I have been at pains to emphasize in conversation over the years, SWM is what you get when you leave the simple definition alone. To repeat. It is possible to merge two unconnected expressions together (E-merge) and to merge a subpart of one expression to that expression (I-merge). SO why is it not also possible to merge the subpart of one expression to another that it is not a subpart of (SWM). In other words, you can "look inside" a constituent and you can have multiple constituents in a "workspace" and this is all you need to allow SWM unless you make things more complicated. And that's why I have always thought that SWM is a natural consequence of a very simple definition of Merge and that preventing it requires either complicating the definition or arguing that more goes into Merge than the simple operation.

Moreover, how to complicate matters is not a mystery. The idea that I-merge requires AGREE (i.e. move = Agree + EPP) suffices to block SWM as ex-ante the target does not c-command the mover. Needless to say, someones modus ponens can be someone else's modus tolens and one might conclude from this that AGREE is a suspect operation (e.g. someone like moi) that should be forcefully thrown out of our minimalist Eden. But, if SWM proves to be empirically unpalatable, well this is one way to get rid of it.

Side note: of course if E-merge and I-merge are actually the very same operation then why I-merge needs to be licensed by AGREE but E-merge does not (indeed cannot) have to be becomes a bit of a conceptual mystery. And please don't tell me about having to identify the inputs to Merge as if finding an expression inside a constituent is particularly computationally demanding (indeed, more demanding than finding an element in the lexicon or the numeration).

Second, David thinks that it is important to start thinking about how operations like Merge are computationally realized algorithmically. In fact, his talk presents an algorithm that makes SWM unavailable. In other words, Merge the rule allows it but the computational implementation of Merge inside an architecture with certain kinds of memory restrictions prevents it. So why not SWM on this view? It's the structure of linguistic memory stupid!

I liked the ambitions behind this a lot. I've argued before that this line of thinking is what MP should be endorsing (e.g. see here and here and search from SMT on the site for others). It seems to me that David is on board with this now. In particular, that the SMT should concern itself not merely with restrictions imposed on FL by the interfaces AP and CI but also by the kinds of memory structures that we think are necessary to use Gs generated by FL. We know a little about these things now and it is worth speculating as to what this would mean for FL. David's talk can be viewed as an exercise in this kind of thinking. Great.

So, I loved the lecture, but did not buy the conclusion. Let me note why very briefly.

One thing to look for in a new research program are things that are different or novel from the perspective of older research programs. SWM should it exist is a novel kind of operation that we would not have though reasonable within GB, say. However, if the above is right, then in an MP Merge-centric context this is a kind of operation we might expect to find. So, if we do find it, it constitutes an empirical argument in favor of the new way of looking at things. Thus, if SWM, then it is very interesting.  Of course, this does not mean that it is right. It probably isn't. But it is very interesting and we should not try to get rid of it because it looks novel. Just the opposite. We should look for cases and see how they fare empirically. IMO, SWM analyses have been pretty insightful (e.g. Nunes on parasitic gaps, Uriagereka and Bobaljik and Brown on head movement, moi on adjunct control, and some new stuff on double object constructions whose authors must remain nameless for now). This stuff might all be wrong, but I find the analyses very interesting.

So, thx to David for 3 great lectures. High theory indeed! And also lots of fun. As I noted before, when the lectures become available on video I will link to them.

Wednesday, February 3, 2016

David Adger; Baggett Lecture 2

We are two thirds of the way through this years lectures and David continued his Sherman like march through the thicket of Minimalist mechanisms. Here are the slides. The second lecture had three main points; to expose the inadequacies of standard roll-up analyses of certain word order effects (mainly attempts to derive Greenberg's universal 20), to show that what made the standard roll-up approaches successful were exogenous constraints added to an otherwise much too powerful roll-up technology and to argue for a theory which endogenously prevented the unwanted structures from being generated. The latter is a revised (and improved) version of the theory presented in David's book (here).

The second lecture started with David's recasting of the meta-principle that he assumed in lecture 1 and continued to assume here: that movement operations that have no affect on CI are to be discouraged. His entering wedge into the discussion on roll-up was the observation that roll-up is mainly interested in deriving word order effects from hierarchy (in an LCAish context a la Kayne). This violates the meta-principle and so it must go! However, as became clear in the question period of lecture 1, the adequacy of the meta-principle was far from clear (recall Lasnik's objections also pursued by Alexander Williams). So David began lecture 2 by revising his version of the principle. It too ran into some heavy questioning as it seems to detach semantic interpretation operations from the syntactic operations that generated the relevant syntactic structures. This time the semanticists (Valentine Hacquard, Paul Pietroski and Alexander Williams) pushed David very hard on whether so de-tethering CI interoperation from Merge didn't amount to allowing the possibility of what I dubbed "construction semantics." David was appalled at the possibility, but IMO, the semanticists were able to make the case that his revised assumptions seemed to lead in that direction. The question period was taped and I will put a link up to the 3 lectures when the videos are posted.

So today is the final lecture. I expect to be quite challenged for David is gunning for sidewards movement today (a favorite of mine). I'll post the slides tomorrow after I get them. SO far the lectures have been great fun.

Tuesday, February 2, 2016

David Poeppel: The Nijmegen Lectures where "only the best are good enough"

I’m on the train leaving Nijmegen where David Poeppel just finished giving three terrific lectures (slides here) on language and the brain. The lectures were very well attended/received and generated lots of discussion. Lecture 3 was especially animated. It was on the more general topic of how to investigate the cognitive properties of the brain. For those interested in reviewing the slides of the lectures (do it!), I would recommend starting with the slides from the third as it provides a general setting for the other two talks. Though I cannot hope to them justice, let me discuss them a bit, starting here with 3.

Lecture 3 makes three important general points.

First it identifies two different kinds of cog-neuro problems (btw, David (with Dave Embick) has made these important points before (see here and here for links and discussion). The first is the Maps Problem (MP), the second the Mapping Problem (MingP). MP is what most cog-neuro (CN) of language practice today addresses. It asks a “where” question. It takes cognitively identified processes and tries to correlate them with activity in various parts of the brain (e.g. Broca’s does syntax and STS does speech). MingP is more ambitious. It aims to answer the “how” question; how do brains do cognitive computation. It does this by relating the primitives and causal processes identified in a cognitive domain with brain primitives and operations that compute the functions cognition identifies.

A small digression, as this last point seems to generate misunderstanding in neuro types. As it goes without saying, let me say it: this does not mean that neuro is the subservient handmaiden of cognition (CNLers need not bow down to linguists (especially syntacticians) though should you feel so moved, don’t let me stand in your way). Of course there should be plenty of “adjusting” of cog proposals on the basis of neuro insight. The traffic is, and must be, two-way in principle. However, at any given period of research some parts of the whole story might require more adjusting than others as the what we know in some areas might be better grounded than the what that we know in others. And right now, IMO (and I am not here speaking for David) our understanding of how the brain does many things we think important (e.g. how knowledge is represented in brains) is (ahem) somewhat unclear. More honestly, IMO, we really know next to nothing about how cognition gets neutrally realized. David’s excellent slide 3:22 on c-elegans illustrates the general gap between neural structure and understanding of behavior/cognition. In this worm, despite knowing everything there is to know about the neurons, their connectivity, and the genetics of c-elegans we still know next to nothing at all about what it does, except, as David noted, how it poops (a point made by many others, including Cristof Koch (see here)). This is an important result for it belies the claim that we understand the basics of how thought lives in brains but for the details for given all the details we still don’t know squat about how and why the worm does what it does. In short, by common agreement, there’s quite a lot we do not yet understand how to understand.  So, yes, it is a two way street and we should aim to make the traffic patterns richly interactive (hope for accidents?) but as a matter of fact at this moment in time it is unlikely that neuro stuff can really speak with much authority to cog stuff (though see below for some tantalizing possibilities).

Back to the main point. David rightly warned against confusing where with how. He pointedly emphasized that saying where something happens is not at all the same as explaining how it happens, though a plausible and useful first step in addressing the how question is finding where in the brain the how is happening. Who could disagree?

David trotted out Marr (and Aristotle and Tinbergen (slide 3:67)) to make his relevant conceptual distinctions/points richer and gave a great illustration of a successful realization of the whole shebang.[1]  I strongly recommend looking at this example for it is a magnificent accessible illustration of what success looks like in real cog-neuro life (see slides 3:24-32). The case of interest involves localizing the position of something based on auditory information. The particular protagonists are the barn owl and the rodent. Barn owls hunt at night and use sounds to find prey. Rodents are out at night and want to avoid being found. Part of the finding/hiding is to locate where the protagonist is based on the sounds it makes. Question: how do they do this?

Well, consider a Marr like decomposition of the problem each faces. The computational problem is locating position of the “other” based on auditory info. The algorithm exploits the fact that we gather such info through the ears.  Importantly, there are two of them and they sit on opposite sides of the head (they are separated). This means that sounds that are not directly ahead or behind come to each ear at different rates. Calculating the differences locates the source of the sound. There is a nice algorithm for this based on incident detectors and delay lines that an engineer called Jeffres put together in 1948 and precisely the cognitive circuits that support this algorithm were discovered in birds in 1990 by Carr and Konishi. So, we have the problem (place location based on auditory signals differentially hitting the two ears), we have the algorithm due to Jeffres and we have the brain circuits that realize this algorithm due to Carr and Konishi. So, for this problem we have it all. What more could you ask for?

Well, you might want to know if this is the only way to solve the problem and the answer is no. The barn owl implements the algorithm in one kind of circuit and, as it turns out, the rodent uses another. But they both solve the same problem, albeit in slightly different ways. Wow!!! In this case, then, we know everything we could really hope for. What is being done, what computation is being executed and which brain circuits are doing it. Need I say that this is less so in the language case?

Where are we in the latter? Well, IMO, we know quite a bit about the problem being solved. Let me elaborate a bit.

In a speech situation someone utters a sentence. The hearer’s problem is to break down the continuous wave form coming at her and extract an interpretation. The minimum required to do this is to segment the sound, identify the various phonetic parts (e.g. phonemes) use these to access the relevant lexical entries (e.g. morphemes), and assemble these morphemes to extract a meaning. We further know that doing this requires a G with various moving and interacting parts (phonetics, phonology, morphology, syntax, semantics). We know that the G will have certain properties (e.g. generate recursive hierarchies). I also think that we now know that human parsing routines extract these G features online and very quickly. This we know because of the work carried out in the last 60 years. And for particular languages we have pretty good specifications of the actual G rules that best generate the relevant mappings. So, we have a very good description of the computational level problem and a pretty good idea of the representational vocabulary required to “solve” the problem and some of the ways that these representations are deployed in real time algorithms. What we don’t know a lot about are the wetware that realize these representations or the circuits that subserve the computations of these algorithms. Though we do have some idea of where these computations are conducted (or at least places whose activity correlates with this information processing). Not bad, but no barn owl or rodent. What are the current most important obstacles to progress?

In lecture 3, David identifies two bottlenecks. First, we don’t really have plausible “parts-lists” of the relevant primitives and operations in the G or the brain domain for language. Second, we are making our inquiry harder by not pursuing radical decomposition in cog-neuro. A word about each.

David has before discussed the parts list problem under the dual heading of the granularity mismatch problem (GMP) and the ontological incommensurability problem (OIP).  He does so again. In my comments on lecture 3 after David’s lecture (see here), I noted that one of the nice features of Minimalism is that it is trying to address GMP. In particular, if it is possible to actually derive the complexity of Gs as described by GG in, say, its GB form, in terms of much simpler operations like Merge and natural concepts like Extension then we will have identified a plausible set of basic operations that it would make sense to look for neural analogues of (circuits that track merge say (as Pallier et. al. and Freiderici and her group have been trying to do)). So Minimalism is trying to get a neurally useful “parts list” of specifically linguistic primitive operations that it is reasonable to hope that (relatively transparent) parsers use in analyzing sentences in real time.

However, it is (or IMO, should be) trying to do more. The Minimalist conceit is the idea that Gs only use a small number of linguistically special operations, and that most of FL (and the Gs they produce) use cognitively off the shelf elements. What kind? Well operations like feature checking. The features language tracks may be different but the operations for tracking them are cognitively general. IMO, this also holds true of a primitive operation of putting two things together that is an essential part of Merge. At any rate, if this is right, then a nice cog-neuro payoff if Minimalism is on the right track is that it is possible to study some of these primitive operations that apply within FL in other animals. We have seen this being seriously considered for sound where it has been proposed that birds are model species for studying the human sound system (see here for discussion). Well if Merge involves the put-together operation and this operation exists in non-human animals then we can partially study Merge by looking at what they and their brains do. That’s the idea and that’s how contemporary linguistics might be useful to modern cog-neuro.

BTW, there is something amusing about this if it is true. In my experience, cog-neuro of language types hate Minimalism. You know, too abstract, not languagy enough, not contextually situated, etc. But, if what I say above is on the right track, then this is exactly what should make it so appealing, which brings me to David’s second bottleneck.

As David noted (with the hope of being provocative (and it was)), there is lots to be said for radically ignoring lots of what goes in when we actually process speech. There is nothing wrong with ignoring intonation, statistics, speaker intentions, turn-tacking behavior and much much more. In fact, science progresses by ignoring most things and trying to decompose a problem into its interacting sub-parts and then putting them back together again. This last step, even when one has the sub-parts and the operations that manipulate them is almost always extremely complicated. Interaction effects are a huge pain, even in domains where most everything is known (think turbulence). However, this is how progress is made, by ignoring most of what you “see” and exploring causal structures in non-naturalistic settings. David urges us to remember this and implement it within the cog-neuro of language. So screw context and its complexities! Focus in on what we might call the cog-neuro of the ideal speaker hearer. I could not agree more, so I won’t bore you with my enthusiasm.

I’ve focused here on lecture 3. I will post a couple of remarks on the other two soon. But do yourself a favor and take a look. They were great, the greatest thing being the reasonable, hopeful ambition they are brimming with.



[1] Bill Idsardi has long encouraged me to look at this case in detail for it perfectly embodied the Marrian ideal. He was right. It’s a great case.

David Adger; Bagget Lecture 1

This year UMD is lucky to have David Adger as the Baggett lecturer. I here post his slide from the first. The series aims to clean up the flora and fauna that has grown up within minimalist theory, thereby muddying one of the key desiderata of the program, cleaning up the messiness of earlier GB theory. In other words, David's aims are theoretical in the best sense. The goal, of course, is to clear the underbrush (thereby enhancing explanation) without sacrificing empirical coverage, or at least too much empirical coverage. In the nature of the enterprise, tightening the theoretical screws will necessarily be accompanied by some apparent loss on the data side. Respect for theory, therefore, entails being ready to live with a few "puzzles," some of them quite challenging for at least a little while. The inability to live with that uncovered data point is the distinctive mark of a lack of interest in theoretical explanation.

David's first lecture starts off with a bang. He argues that we should do away with head movement. In subsequent lectures he aims to dump roll-up derivations and (a favorite of mine) sidewards movement, as well as parallel merge, under-merge (sounds like a cartoon character) and many other forms of merges. Here he settles with doing away with had movement.

The particular empirical application is v movement up the extended projection. He offers two main arguments against it and he bases the argumentation on a meta-principle featured in the earliest days of MP; that movement should have an effect on interpretation at the interfaces. As David buys CHomsky's idea that externalization is an afterthought, he takes this to mean that it should have CI effects. His first argument against v to T to C is that it has no such effects (or very few as he argues against the usual suspects).  His second argument involves word formation and how v movement doesn't deliver up what it promises, at least on one interesting G (that of Kiowa).

The question period was interesting and several people challenged several of David's arguments. The most interesting question, IMO, came from Howard Lasnik who noted that there are lots of apparent cases of A-movement that have nary an effect on interpretation. The two he noted were raising of expletives and idioms (e.g. there seem to be men in the garden. the shit seems to have hit the fan). It is not clear how raising in either case can affect CI. A similar observation can be made for all cases where we find obligatory reconstruction (e.g. fronting of predicates as in VP fronting in English where it has been shown that the VP must reconstruct to its base position at CI). The challenge is clear: it suggests the meta-principle that David relies on is a bit too strong.

I confess to ebing partial to David's conclusion (viz. that V movement is not a rule of G). In some of my own work, I noted that under reasonable assumptions concerning minimality, heads should not move (see here for discussion). I had to really wriggle to find a way to let it in, but there are ways. The little world sketched there, however, would be a nicer place were there no head movement at all. So, I hope that David is right.

There are two cases, however, that need further analysis. Clitic movement and ellipsis.

It is well known that clitic doubling obviates minimality effects. In other words, what appears to be a species of head movement obviates intervention effects that regulate syntactic commerce. How?

Second, there appears to be a correlation between head movement and VP ellipsis. So languages that move verbs high don't allow VP ellipsis unless the VP elided is moved out to a position higher than where the V sits. Takahashi has a great discussion of this in his old general's paper. I discuss it here in more detail.

David believes he can handle both cases. See his appendix for the first.

At any rate, a very nice and provocative first lecture. Can't wait for number 2 today.

Oh yes. These were taped. If I ever get these tapes, I will post them. For now, you have the slides.

Monday, February 1, 2016

On string-acceptability vs. the availability of interpretations, and the "this is the reading therefore this is the structure" gambit

This post is intended as an intellectual provocation. It is the strongest version of a thought I've had knocking around in my head for quite a few years now, but not necessarily a version that I'd be willing to formally defend. Therefore, I urge readers not to lose sight of the fact that this is written in a blog; it is as much an attempt at thinking "out loud" and engaging in conversation as it is an attempt to convince anyone of anything. [Norbert has helped me think through some of these things, but I vehemently absolve him of any responsibility for them, and certainly of any implication that he agrees with me.]

My point of departure for this discussion is the following statement: were the mapping from phonetics to phonology to morphology to syntax to semantics to pragmatics isomorphic – or even 100% reliable – there would be little to no need for linguists. Much of the action, for the practicing linguist, lies precisely in those instances where the mapping breaks down, or is at least imperfect. That doesn't mean, of course, that the assumption that the mapping is isomorphic isn't a valid null hypothesis; it probably is. But an assumption is not the same as a substantive argument.

If you disagree with any of this, I'd be interested to hear it; in what follows, though, I will be taking this as a given.

So here goes...

––––––––––––––––––––

The last 15-20 years or so have seen a trend in syntactic argumentation, within what we may broadly characterize as the GB/Principles-and-Parameters/minimalism community, of treating facts about the interpretation of an utterance as dispositive in arguments about syntactic theory.

One response that I've received in the past when conveying this impression to colleagues is that all syntactic evidence is inexorably tied to interpretation, because (i) string-acceptability is just the question of whether utterance A is acceptable under at least one interpretation, and so (ii) string-acceptability is not different in kind from asking whether A is acceptable under interpretation X versus under interpretation Y. In fact, this reasoning goes, there really isn't such a thing as string-acceptability per se, since the task of testing string-acceptability amounts to asking a person, "Can you envision at least one context in which at least one of the interpretations of A is appropriate?"

I think this is too simplistic, since as we all know, there is still a contrast between Colorless green ideas sleep furiously and *Furiously sleep ideas green colorless. But even setting that aside for now, I don't think that the fact that an utterance A has at least one interpretation should be treated (by syntacticians) on a par with the fact that it has interpretation X but not interpretation Y. The reason is that the isomorphic mapping from syntax to semantics (or vice versa, for the purposes of this discussion) is a methodological heuristic, not a substantive argument (see above).

Let's illustrate using an example from locality. Evidence about locality can be gleaned in some instances from string-acceptability alone. That (1) is unacceptable does not depend on a particular interpretation – nor does it even depend on a particular theory of what an interpretation is (i.e., what the primitives of meaning are), for that matter.

(1) *What do you know the delivery guy that just brought us?

I therefore consider the unacceptability of (1) dispositive in syntactic argumentation (well, modulo the usual caveats about acceptability vs. grammaticality, I should say). On the other hand, the fact that (2) can only be interpreted as a question about reasons for knowing, not as a question about reasons for bringing, is not the same type of evidence.

(2) Why do you know the delivery guy that just brought us pizza?

To be clear, they are both evidence for the same thing. But they are not evidence of the same kind. And the provocation offered in this post is that they should not been afforded the same status in distinguishing between syntactic theories.

For the sake of argument, suppose we lived in a world where (2) did have both interpretations, but (1) was still bad. I, as a syntactician, would first try to find a syntactic reason for this. Failing that, however, I would be content with leaving that puzzle for semanticists to worry about. (Perhaps, in this counterfactual world, my semanticist friends would conclude that elements like why can participate in the same kind of semantic relationships that regulate the interaction between the logophoric centers of various clauses? I don't know if that makes any sense. Anyway, I won't try too hard to reason about what other people might do to explain something in a hypothetical world.) More importantly, I'd keep the theory of locality exactly as it is in our world. Obviously the other world would be a less pleasing world to live in. The theory of locality would enjoy less support in this hypothetical world than it does in our world. But the support lost in this counterfactual scenario would be circumstantial, not direct; it is the loss of semantic support for a syntactic theory.

There are (at least) two things you might be asking at this juncture. First, is this distinction real? Aren't we all linguists? Aren't we all after the same thing, at the end of the day? I think the answer depends on granularity. At one level, yes, we're all after the same thing: the nature and properties of that part of our mind that facilitates language. But insofar as we believe that the mechanism behind language is not a monolith; that syntax constitutes a part of it that is separate from interpretation; and that the mapping between the two is not guaranteed a priori to be perfect, then no: the syntactician is interested in a different part of the machine than the semanticist is.

Second, you might be asking this: even if these distinctions are real, why are they important? Why should we bother with them? My answer here is that losing sight of these distinctions risks palpable damage to the health of syntactic theory. Above, I noted that in research on syntax, evidence from interpretation should take a back seat to evidence from string-acceptability. But it feels to me like way too many people are content to posit movement-to-spec-of-TargetInterpretationP (or -ScopeP) without the understanding that, as long as the evidence provided is purely from interpretation, this is really just a semantic theory expressed in syntactic terms. (One might even say it is an 'abuse' of syntactic vocabulary, if one's point were to try and provoke.) This will end up being a valid syntactic theory only to the extent that the aforementioned syntax-semantics (or semantics-syntax) mapping turns out to be transparent. But – and this is the crux of my point – we already know that the mapping between the two isn't always transparent. (As an example, think of semantic vs. syntactic reconstruction.) And so such argumentation should be treated with skepticism, and its results should not be treated as "accepted truths" about syntax unless they can be corroborated using syntactic evidence proper, i.e., string-acceptability.