Friday, January 29, 2016

How I spent (some of) my winter vacation in Nijmegen

I was asked how it went in Nijmegen. Let me say a word or two, with the understanding that what I say here is very much a one sided perspective.

When I left for David’s lectures I was pretty confident that the GG linguistic worldview, which I take to be hardly worth contesting, is treated with great skepticism (if not worse, ‘naivety’ and ‘contempt’ are adjectives that come to mind) in the cog-neuro (CN) of language world. One prominent voice of this skepticism is Peter Hagoort, whose views concerning GG I critically discussed here. Before leaving, I was pretty confident that neither my views nor his had changed much so I was looking forward to a good vigorous back and forth. Here are the slides I presented. They were intended to provoke, though not because I thought that they were anything but anodyne intellectually speaking. The provocation would come from the fact that the truisms I was defending are barely understood by many in the “opposition.”

The presentation had four objectives:

1.     To note that Chomsky’s views are always worth taking seriously, the main reason being that he very often right
2.     To explain why it is virtually apodictic that
a.     Part of human linguistic capacity involves having an mind/brain internal G
b.     FL exists and that it had some linguistically specific structure (aka UG is not null)
3.     That cog-neuro of language investigators should hope like hell that something like the Minimalist program is viable
4.     To dispel some common misconceptions about GG widespread in the CNospehere

This, IMO, went well. I led with (1) in order to capture the audience’s attention (which, I believe I did). However, I really wanted to make points (2) and (3) in a way that a non-linguist could appreciate. To do this I tried to make a distinction between two claims: whether mental Gs and FL/UG exists in human minds/brains and what they actually look like. The first, I argued, cannot be contentious (i.e. that humans have mental grammars in the brain is a trivial truth). The second I noted must be (e.g. whether FL/UG has bounding nodes and a subjacency principle is an empirical issue). In other words, that humans have internal Gs and that humans have an FL with some linguistically specific properties is a virtual truism. What Gs and FL/UG looks like is an empirical question that both is and should be very contentious, as are theories in any domain.

And here was the main point: one should not confuse these two claims. We can argue about what human Gs look like and how FL is constituted. We cannot seriously argue about whether they exist incarnated in a region roughly between our ears.

Why are the points in (2) virtual truisms? For the reasons that Chomsky long ago noted (as I noted in (1), he is very often right). They are simple consequences of two obvious facts.

First, the fact of linguistic creativity: it is obvious that a native speaker can produce and understand an effective infinity of linguistic structures. Most of these structures are novel in that sense that speakers have never encountered them before. Nonetheless, these sentences/phrases etc. are easily produced and understood. This can only be explained if we at least assume that speakers who do this have an internalized set of rules that are able to generate the structures produced/heard. These rules (aka Gs) must be recursive to allow for the obvious fact of linguistic creativity (the only way to specify an infinite set is recursively). So given that humans display linguistic creativity and given that this evident capacity requires something like a G, we effortlessly conclude that humans have internal Gs. And assuming we are not dualists, then these Gs are coded somehow in human brains. The question is not whether this is so, but what these Gs look like and how brains code them.

The second evident facts is that humans have FLs. Why? Because 'FL' is the name we give to the obvious capacity that humans have to acquire Gs in the reflexive effortless way they do. To repeat the mantra: nothing does language like humans do language?  So, unless you are a dualist, this must mean that there is something special about us that allows for this. As this is a cognitive capacity, then the likely locus of difference between us and them lies in our brain (though, were it the kidney, liver or left toe that would be fine with me). ‘FL’ is the name of this something special. Moreover, it’s a pretty good bet that at least some of FL is cognitively specific to language because, as anyone can see (repeat mantra here) nothing does language like we do language. Ergo, we have something special that they do not. And this something special is reflected in our brains/minds. What that special thing is and how brains embody them remain difficult empirical questions. That said, that humans have FLs with linguistically specific UG features is a truism.

I believe that these two points got across, though I have no idea if the morals were internalized. Some remarks in the question period led me to think that people very often confuse the whether and what question. Many seem to think that accepting the trivial truth of (2) means that you have to believe everything that Chomsky has to say about the structure of Gs and FL. I assured the audience that this was not so, although I also mentioned that given Chomsky’s track record on the details it is often a good idea to listen carefully to what he has to say about these empirical matters. I believe that this surprised some who truly believe that GGers are mindlessly in thrall to Chomsky’s every word and accept it as gospel. I pleaded guilty. Thus, I assured them that though it was true that, as a matter of cognitive policy, I always try my hardest to believe what Chomsky does, my attitudes were not widely shared and are not considered prerequisites for good standing in the GG community. Moreover, sadly, even I have trouble keeping to my methodological commitment of intellectual subservience all the time.

I next argued that Minimalism (M) is not the bogeyman that so many non-linguists (and even linguists) think it is. In fact, I noted that cog-neuro types should hope like hell that some version of the program succeeds. Why? If it does it will make studying language easier in some ways. How so?

Well if M works then there are many parts of FL, the non-linguistically proprietary ones, that can be studied in animals other than humans. After all, M is the position that FL incorporates operations that are cognitively and/or computationally general, which means that they are not exclusive to humans. This is very different from earlier views of FL where a very large part of FL consisted of what looked like language specific (and hence human specific) structure.  As it is both illegal and rude to do to us what we regularly do to mice, if most of FL resides in us but not in them then standard methods of cog-neuro inquiry will be unavailable. If however, large aprts of FL are recycled operations and principles of a-linguistic cognition and/or computation (which is what M is betting) then we can, in principle, learn a lot about FL by studying non-human brains. What we cannot learn much about are the UG parts, for, by assumption, these are special to us. However, if UG is a small part of FL, this leaves many things to potentially investigate.

Second, I noted that if M gets anywhere then it promises to address what David Poeppel describes as the parts-list problem: it provides a list of basic properties whose incarnation in brains is worth looking for. In other words, it breaks linguistic competence down to manageable units. In fact, the fecundity of this way of looking at things has been exploited by some cog-nuero types already (e.g. Pallier et. al. and Friederici & Co) in their efforts to localize language function in the brain. It turns out that looking for Merge may be more tractable than looking for Raising. So, two nice consequences for cog-neuro of language should M prove successful.

I do not think that this line of argument proved to be that persuasive, but not rally because of the Mishness of the ideas per se. I think that the main resistance comes from another idea. There is a view out there that brains cannot track the kinds of abstract structures that linguists posit (btw, this is what made David's second lecture so important). Peter Hagoort in his presentation noted that brains do not truck in “linguaforms.” He takes the Kosslyn-Pylyshyn debate over imagery to be decisive in showing that brains don’t do propositions. And if they don’t then how can they manipulate the kinds of ling structures that GG postulates. I still find Hagoort’s point to be a complete non-sequitur. Even if imagery is non-propositional (a view that I do not accept actually) it does not follow that language is. It only follows that it is different. However, the accepted view as Hagoort renders it is that brain mechanisms in humans are not in any way different in kind from those in other animals and so if their brains don’t use linguaforms then neither can ours. I am very confident that our brains do manipulate linguaforms, and I suspect that theirs do to some extent as well.

What makes a brain inimical to linguaforms? Well basically it is assumed that brains have a neural net/connectionist architecture. IMO, this is the main stumbling block: CNers take all brain function to be a species of signal detection. This is what neural nets are pretty good at doing. There is a signal in the data, it is noisy and the brains job is to extract that signal from the data. GGers don’t doubt that brains do some signal processing, but we also believe that the brain also does information processing in Gallistel’s sense. However, as Gallistel has noted, CNers are not looking for the neural correlates required to make information processing possible. The whole view of the brain as a classical computing device is unpopular in the CN world, and this will make it almost impossible to deal with most of cognition (as Randy has argued), language being just a very clear hard example of the cognitive general case.

I was asked what kind of neuro experiment could we do to detect that the kinds of ling structure I believe to exist. Note, neuro experiments, not behavioral ones.  I responded that if CNers told us the neural equivalent say of a stack or of a buffer or of embedding I could devise an experiment or two. So I asked: what are the neural analogues of these notions? There was silence. No idea.

Moreover, it became pretty clear that this question never arises. Gallistel, it seems, is entirely correct. The CN community has given up on the project of trying to find how general computational properties are incarnated. But given that every theory of parsing/production that I know of is cast in a classical computational idiom, it is not surprising that GG stuff and brain stuff have problems making contact. CN studies brains in action. It cannot yet study what brains contain (i.e. what kinds of hard disks the brain contains and how info is coded on them). Until we can study this (and don’t hold your breath) CN can study language to the degree that it can study how linguistic knowledge is used. But all theories of how ling knowledge is used requires the arsenal of general computational concepts that Gallistel has identified. Unfortunately, current CN is simply not looking for how the brain embodies these, and so it is no surprise that making language and the brains sciences fruitfully meet is very hard. However, it's not language that's the problem! It is hard for CN to give the neural correlates of the mechanisms that explain how ants find their way home so the problem is not a problem of language and the brain, but cognition and the brain.

So, how did it go? I believe that I got some CNers to understand what GG does and dispelled some myths. Yes, our data is fine, no, we believe in meaning, yes Gs exist as does FL with some UG touches, no, everything is not in the signal… However, I also came away thinking that Gallistel’s critique is much more serious than I had believed before. The problem is that CN has put aside the idea that brains are information processing systems and sees them as fancy signal detection devices. And, until this idea is put aside and CN finds the neural analogues of classical computational concepts, mapping ling structure to neural mechanisms will be virtually impossible, not because they are linguistic but because they are cognitive. There is no current way to link linguistic concepts to brain primitives because brain primitives cannot do any kind of cognition at all (sensation yes, perception, partly, but cognition, nada). 

Where does that leave us? We can still look for parts of the brain that correlate with doing languagy things (what David Poeppel calls the map problem (see next post)), but if the aim is to relate brain and linguistic mechanisms, this is a long way off if we cannot find the kinds of computational structures and operations that Gallistel has been urging CN to look for.

So how did it go? Well, not bad. Nijmegen is nice. The weather was good. The food served was delicious and, let me say this loud and clear, I really enjoyed the time I spent talking with the CNers, especially Peter Hagoort. He likes a good argument and is really fun to disagree with (and that is easy to do given how wrong he is about things linguistic). So, it was fun. It may even have been productive. However, I doubt the lectures, excellent though they were, will mark a sea-change in ling-neuro interactions. I hope I am wrong, but I doubt it. We shall see.


  1. I agree that it's evidently true that the brain has infinite generative capacity, and that generative linguistics has discovered a huge array of facts that any theory of linguistic competence would need to explain. On the other hand, I either disagree with or don't understand the following part:

    Well basically it is assumed that brains have a neural net/connectionist architecture. IMO, this is the main stumbling block: CNers take all brain function to be a species of signal detection.

    It's still not clear to me why a neural network wouldn't be able to implement what we like to describe in symbolic terms (except perhaps for some very reductive architectures, cf Gary Marcus' work). On some level it's trivially true that that needs to be the case, because the brain is made up of neurons, not symbols, no? Sorry for stating the obvious - perhaps I'm not getting the distinction you're making between signal detection and information processing. I do agree though that much more work needs to be done to understand how what we characterize as symbol processing is implemented in the brain, though.

    1. There are (good) arguments out there that neural net/connectionist architectures have trouble with representations. Marcus, Fodor&Pylyshyn make these for cog architectures and Gallsitel extends this to neural architecture. This is what I had in mind. Nets are fancy feature detectors: given a fixed inventory of features that nodes respond to we can get them to respond to the actual features in the environment by various "reward" schemes. There are no representations, nor the computational infrastructure to support such. Gallistel's point is that some computational architectures can support representations and some cannot and the problem right now is that neural net/connectionist architectures, which are the only ones being adopted, cannot. So, nobody is looking for the neural equivalent of an address or a stack or a pointer because it is assumed that these are not the kinds of notions a brain can support because that's not what neural nets can do. And this is a problem for us as linguists, I believe. How do brains embody these ideas? Once we know this, we can do a lot of CN of language. Until then, we cannot.

    2. I don't know enough to discuss the neuro question (not sure anyone does), but as far as computational neural nets, I think you have a particular PDP-like minimal neural net architecture in mind. Gary Marcus was quite clear that his argument was about that particular type of neural net, but people have proposed a range of more structured architectures that combine basic neural nets with other elements. For example, this paper from Facebook AI Research proposes an architecture that adds stacks to a recurrent neural network. I don't know where this architecture would fall in your typology - it's clearly not a symbol manipulator in the traditional sense, but it's also not a "pure" pattern detector (I'm not actually sure that a recurrent neural network would count a pattern detector even without the stacks).

      I don't know that we have evidence that this type of model couldn't learn to do what we take to be symbolic about language - indeed, the paper I linked to show that that model can learn various formal languages pretty well. Granted, there's a large gap between those miniature languages and natural languages, but my point is that it's still an empirical question which architectures can and can't implement the kind of symbolic processing that we associate with syntax (for example).

    3. Maybe SRNs as discussed by Christiansen and Chater here: ( might be a useful initial target ... they are claimed to be able to learn certain recursive patterns, but I suspect that there's very little chance that they would tend to get the right typology or generalizations, such as that when fully center embedded structures occur, they tend to have the same overall structure as nonembedded ones (except for a very strong tendency not to contain further center embeddings of the same thing), and when there are differences between center-embedded and not-embedded at tall, as in German subordinate clauses, they apply to all subordinate structures, including the edge-embedded ones.

      This is pretty old, but it might help clarify the issues, such as at least some of what a connectionist network would actually have to do to satisfy linguists. And also clarify why flat models are hopeless.

    4. @Avery, I agree - "what a connectionist network would actually have to do to satisfy linguists" is exactly the question that needs to be asked. Of course, the answer could also be "no connectionist network could possibly satisfy linguists", but I'm not sure what the arguments for that position are. To even state that answer we would probably need some conceptual clarity about what counts as a connectionist network, what counts as an "information processing system", and whether these two exhaustively cover the space of possible models.

    5. Neuroscience I know nothing about but I was a connectionist before a linguist. One of the fascinating aspects of general purpose neural nets is that it works like a black box: given enough tweaking, it can approximate the input-output behavior very well--now better with Deep Learning. But that's also what makes these systems ultimately unsatisfying: because it's a blackbox, we don't really understand what's going on when we want to understand how the humans deal with languages. (And this is not to even mention that most of the NN models of language have not worked very well.) My understanding is that even when DL works much better than previous methods (for certain tasks), people currently don't understand why it works better. One may say that's not very satisfying either--if he/she is interested in understanding something.

    6. It is precisely the I/O nature of these NNs that Gallistel points to as the problem. THe issue is not matching behavior but strong representations that are then accessible and callable from memory when required. He argues that this is precisely what NNs cannot do, and that is the problem for a system that can handle cognition (even non-human cognition as in ant navigation or bird foraging). He also notes that the NN fixation is stopping CNers from thinking about how to neurally implement classical computational notions like 'address' 'write to' 'stack' 'buffer' etc. If Nijmegen was any indication, this seems correct. Nobody knows what the neural equivalents of these things are. But worse, nobody is even looking. It's just not on the radar. And this has implications for ling.

      The state of the art now in CN allows us to look at the brain performing some action; say parsing a sentence. We cannot look into the brain and see how knowledge is stored, only how it might be being used. But our computational theories of G performance really exploit classical computer architectures to performance them. Gs are in look up tables etc but we only "see" these through the activities of the perfumer. But if CN has no idea how these non-ling computational properties work in brains, then there is little hope that we can make serious contact at the level of mechanism. We can still look where things happen, but not how they do. Moreover, and this was my take away, this is not really a LANGUAGE and the brain problem, but a COGNITION and the brain problem. This is, of course, Randy's point. After Nijmigen, I believe that he is absolutely right.

    7. Well I would take notice of a neural net which did some combination of the following things (even one would be interesting):

      a) picked up some grammatical generalizations of the kind linguists have been interested in for some time (eg the basics of Icelandic, Kayardild or Lardil, and, say Dinka) from a 'reasonable' amount of data (nature, amount and quality of data required is subject to negotiotion)

      b) did not easily do things such as learn grammars that provided different possibilities for edge-embedded vs center-embedded structures (the problem I suspect that SRNs might have), different ordering rules for NPs with different grammatical genders, and similar nonexistent oddities.

      c) can do two way transduction between sentences and 'knowledge state updates', that is meanings. EG given a desired information update, produce a sentence that an addressee can process to apply that update to their own information (dynamic semantics as in DRT and friends). There is some work on semantics for neural nets, although I have no idea what it does.

      The problem I see with Norbert's general view above is that I've never noticed that people paid any attention to high-level criticisms of what they are doing that emerge from a somewhat different perspective, but 'your system doesn't do X but it should' and/or 'does do Y but it shouldn't', where the behavior you're recommending is basically an adjustment of what the system is actually doing (but perhaps requiring drastic changes in how the system works), is more likely to have an effect.

    8. I have to side with Tal on the technical side, but I am also very sympathetic to Gallistel's methodological point.

      To the best of my knowledge (which is very lacking in this area), some types of recurrent neural networks are Turing complete. See for instance this paper. Any computable representation format can be reencoded into a string-based format manipulated by Turing machines, so the assertion that neural networks cannot support representations is false in the standard technical sense.

      But that does not undermine Gallistel's basic point that neural networks have misled an entire field into believing that they can talk about cognition without talking about computation. And that simply doesn't work, hardware tells you precious little about software (if you want to argue otherwise, you have to give a very convincing argument why everything we have learned since Turing does not apply to brains). So in that sense of support neural networks are indeed a failure, and their prominence in various fields has been very unhealthy.

    9. Yes, I think we agree on the technical vs methodological/sociological distinction. Our desiderata for a theory of language are framed in symbolic terms (e.g. Avery's points), so whoever proposes a neural network as an implementation of linguistic competence had better show that the network's behavior is consistent with those symbolic generalizations. And it may be the case that neural network mislead people into assuming without empirical demonstration that the network is powerful enough to learn whatever it needs to learn, given "enough" data.

    10. Thomas: These properties have been known for a long time but have had very little impact one way or another. Kurt Hornik showed in 1991 that backprop networks are universal approximators, and even earlier results in the 1980s showed that they are NP-hard. The same can be said about numerous other computing devices--e.g., cellular automata--but it's hard to see how that knowing that changes anything about anything, even on the technical side. We just have bigger machines now.

    11. This comment has been removed by the author.

    12. I have a suspicion that formulating the symbolically stated desiderata that we find self-evidently worthy of explanation in such a way that people like Morten Christiansen would find them even worth thinking about at all is a bit of a challenge, since an assertion such as "XPs fully embedded within XPs tend to have pretty much the same structure other than a reluctance to contain yet another fully embedded XP" really do not seem to me to be meaningful to them ... they don't really accept grammaticality judgements, they're mostly interested in various kinds of quantitative data, so, how do you set the problem up in such a way that they can't ignore it?

      Repeating the same kind of stuff that linguists have been saying for the last 50 years doesn't seem to have had much effect.

    13. If those symbolic generalizations are going to be relevant to cognitive science, they'll ultimately need to make predictions about neural activity, or about behavior of whatever sort - reading times, syntactic priming, acceptability judgments, etc. Those are the desiderata. And if someone came up with a neural network that derived all of those facts, well, it might not be an interesting theory of the language faculty, but it would certainly suggest that you don't absolutely need symbols as part of your architecture.

      Arbitrarily excluding from consideration an entire class of human behavior, e.g. acceptability judgments, is an odd scientific move (do you remember the paper where Christiansen made that argument?). But that question seems orthogonal to the symbolic vs. nonsymbolic issue.

  2. I just want to add to the discussion by pointing towards some recent work that tries to tackling David's "mapping problem" in the domain of syntax. John Hale and I have been trying to match parser states with neural signals. See this CMCL proceedings paper ( for an initial report (journal paper in the pipeline). This work evaluates parsers defined over a few different grammars, including a Minimalist Grammar (in Stabler's sense).

    Leila Wehbe and the group at CMU (she is now at Cal) also been doing related work that uses machine learning to tease out what sorts of parse state information is encoded in different areas of the cortex during reading (

    No, this work doesn't say how the brain pops an entry off of a stack (or even if the human parser is best modeled with a stack-based memory), but it points towards an approach that I think is pretty promising: a family of computational models quantifies how a specific parser operation, such as identifying a syntactic constituent, might be carried out by a neural circuit word-by-word and model predictions are tested for their statistical fit with measured brain data (currently macro data like BOLD or neuromagnetic flux, one day micro-level signals??)