I was asked how it went in Nijmegen. Let me say a word or
two, with the understanding that what I say here is very much a one sided
perspective.
When I left for David’s lectures I was pretty confident that
the GG linguistic worldview, which I take to be hardly worth contesting, is
treated with great skepticism (if not worse, ‘naivety’ and ‘contempt’ are
adjectives that come to mind) in the cog-neuro (CN) of language world. One prominent
voice of this skepticism is Peter Hagoort, whose views concerning GG I
critically discussed here.
Before leaving, I was pretty confident that neither my views nor his had
changed much so I was looking forward to a good vigorous back and forth. Here
are the slides I presented. They were intended to provoke, though not because I
thought that they were anything but anodyne intellectually speaking. The
provocation would come from the fact that the truisms I was defending are barely
understood by many in the “opposition.”
The presentation had four objectives:
1. To
note that Chomsky’s views are always worth taking seriously, the main reason
being that he very often right
2. To
explain why it is virtually apodictic that
a. Part
of human linguistic capacity involves having an mind/brain internal G
b. FL
exists and that it had some
linguistically specific structure (aka UG is not null)
3. That
cog-neuro of language investigators should hope like hell that something like
the Minimalist program is viable
4. To
dispel some common misconceptions about GG widespread in the CNospehere
This, IMO, went well. I led with (1) in order to capture
the audience’s attention (which, I believe I did). However, I really wanted to
make points (2) and (3) in a way that a non-linguist could appreciate. To do
this I tried to make a distinction between two claims: whether mental Gs and FL/UG exists in human minds/brains and what they actually look like. The first,
I argued, cannot be contentious (i.e. that humans have mental grammars in the brain is a trivial truth). The
second I noted must be (e.g. whether FL/UG has bounding nodes and a subjacency
principle is an empirical issue). In other words, that humans have internal Gs and that humans have an FL with some linguistically specific properties
is a virtual truism. What Gs and FL/UG looks like is an empirical question that
both is and should be very contentious, as are theories in any domain.
And here was the main point: one should not confuse these
two claims. We can argue about what human Gs look like and how FL is
constituted. We cannot seriously argue about whether they exist incarnated in a
region roughly between our ears.
Why are the points in (2) virtual truisms? For the reasons
that Chomsky long ago noted (as I noted in (1), he is very often right). They
are simple consequences of two obvious facts.
First, the fact of linguistic creativity: it is obvious that
a native speaker can produce and understand an effective infinity of linguistic
structures. Most of these structures are novel in that sense that speakers have
never encountered them before. Nonetheless, these sentences/phrases etc. are
easily produced and understood. This can only
be explained if we at least assume
that speakers who do this have an internalized set of rules that are able to
generate the structures produced/heard. These rules (aka Gs) must be recursive
to allow for the obvious fact of linguistic creativity (the only way to specify
an infinite set is recursively). So given that humans display linguistic
creativity and given that this evident capacity requires something like a G, we
effortlessly conclude that humans have
internal Gs. And assuming we are not dualists, then these Gs are coded somehow
in human brains. The question is not whether this is so, but what these Gs look
like and how brains code them.
The second evident facts is that humans have FLs. Why?
Because 'FL' is the name we give to the obvious capacity that humans have to
acquire Gs in the reflexive effortless way they do. To repeat the mantra: nothing
does language like humans do language? So, unless
you are a dualist, this must mean that there is something special about us that
allows for this. As this is a cognitive capacity, then the likely locus of
difference between us and them lies in our brain (though, were it the kidney,
liver or left toe that would be fine with me). ‘FL’ is the name of this
something special. Moreover, it’s a pretty good bet that at least some of FL is cognitively specific to
language because, as anyone can see (repeat mantra here) nothing does language
like we do language. Ergo, we have something special that they do not. And this
something special is reflected in our brains/minds. What that special thing is
and how brains embody them remain difficult empirical questions. That said, that humans have FLs with
linguistically specific UG features is a truism.
I believe that these two points got across, though I have no
idea if the morals were internalized. Some remarks in the question period led
me to think that people very often confuse the whether and what question. Many
seem to think that accepting the trivial truth of (2) means that you have to
believe everything that Chomsky has to say about the structure of Gs and FL. I
assured the audience that this was not so, although I also mentioned that given
Chomsky’s track record on the details it is often a good idea to listen
carefully to what he has to say about these empirical matters. I believe that
this surprised some who truly believe that GGers are mindlessly in thrall to
Chomsky’s every word and accept it as gospel. I pleaded guilty. Thus, I assured them that though it was
true that, as a matter of cognitive policy, I always try my hardest to believe
what Chomsky does, my attitudes were not widely shared and are not considered prerequisites for good standing in the GG community. Moreover, sadly, even I have trouble
keeping to my methodological commitment of intellectual subservience all the
time.
I next argued that Minimalism (M) is not the bogeyman that
so many non-linguists (and even linguists) think it is. In fact, I noted that
cog-neuro types should hope like hell that some version of the program succeeds.
Why? If it does it will make studying language easier in some ways. How so?
Well if M works then there are many parts of FL, the
non-linguistically proprietary ones, that can be studied in animals other than
humans. After all, M is the position that FL incorporates operations that are
cognitively and/or computationally general, which means that they are not
exclusive to humans. This is very different from earlier views of FL where a
very large part of FL consisted of what looked like language specific (and hence human specific) structure. As it is both illegal and rude to do to us what
we regularly do to mice, if most of FL resides in us but not in them then
standard methods of cog-neuro inquiry will be unavailable. If however, large
aprts of FL are recycled operations and principles of a-linguistic cognition and/or computation (which is what M is betting) then we can, in principle, learn a lot about FL by
studying non-human brains. What we cannot learn much about are the UG parts,
for, by assumption, these are special to us. However, if UG is a small part of
FL, this leaves many things to potentially investigate.
Second, I noted that if M gets anywhere then it promises to
address what David Poeppel describes as the parts-list problem: it provides a
list of basic properties whose incarnation in brains is worth looking for. In
other words, it breaks linguistic competence down to manageable units. In fact,
the fecundity of this way of looking at things has been exploited by some
cog-nuero types already (e.g. Pallier et. al. and Friederici & Co) in their efforts to localize language function in the brain.
It turns out that looking for Merge may be more tractable than looking for
Raising. So, two nice consequences for cog-neuro of language should M prove
successful.
I do not think that this line of argument proved to be that
persuasive, but not rally because of the Mishness of the ideas per se. I think that the main resistance comes from another idea. There is a view out there that
brains cannot track the kinds of abstract structures that linguists posit (btw, this is what made David's second lecture so important).
Peter Hagoort in his presentation noted that brains do not truck in
“linguaforms.” He takes the Kosslyn-Pylyshyn debate over imagery to be decisive
in showing that brains don’t do propositions. And if they don’t then how can
they manipulate the kinds of ling structures that GG postulates. I still find
Hagoort’s point to be a complete non-sequitur. Even if imagery is non-propositional (a view that I do not accept
actually) it does not follow that language is. It only follows that it is
different. However, the accepted view as Hagoort renders it is that brain
mechanisms in humans are not in any way different in kind from those in other
animals and so if their brains don’t use linguaforms then neither can ours. I
am very confident that our brains do manipulate linguaforms, and I suspect that
theirs do to some extent as well.
What makes a brain inimical to linguaforms? Well basically it
is assumed that brains have a neural net/connectionist architecture. IMO, this is the
main stumbling block: CNers take all
brain function to be a species of signal detection. This is what neural nets
are pretty good at doing. There is a signal in the data, it is noisy and the
brains job is to extract that signal from the data. GGers don’t doubt that
brains do some signal processing, but we also believe that the brain also does
information processing in Gallistel’s sense. However, as Gallistel has noted,
CNers are not looking for the neural correlates required to make information
processing possible. The whole view of the brain as a classical computing device
is unpopular in the CN world, and this will make it almost impossible to deal
with most of cognition (as Randy has argued), language being just a very clear hard example of the cognitive general case.
I was asked what kind of neuro experiment could we do to
detect that the kinds of ling structure I believe to exist. Note, neuro experiments, not behavioral ones. I responded that if CNers told us the neural
equivalent say of a stack or of a buffer or of embedding I could devise an
experiment or two. So I asked: what are the neural analogues of these notions?
There was silence. No idea.
Moreover, it became pretty clear that this question never
arises. Gallistel, it seems, is entirely correct. The CN community has given up
on the project of trying to find how general computational properties are incarnated.
But given that every theory of parsing/production that I know of is cast in a
classical computational idiom, it is not surprising that GG stuff and brain
stuff have problems making contact. CN studies brains in action. It cannot yet
study what brains contain (i.e. what kinds of hard disks the brain contains and
how info is coded on them). Until we can study this (and don’t hold your
breath) CN can study language to the degree that it can study how linguistic
knowledge is used. But all theories
of how ling knowledge is used requires the arsenal of general computational
concepts that Gallistel has identified. Unfortunately, current CN is simply not
looking for how the brain embodies these, and so it is no surprise that making
language and the brains sciences fruitfully meet is very hard. However, it's not language that's the problem! It is
hard for CN to give the neural correlates of the mechanisms that explain how ants find their way home so the problem is not a problem of
language and the brain, but cognition and the brain.
So, how did it go? I believe that I got some CNers to
understand what GG does and dispelled some myths. Yes, our data is fine, no, we
believe in meaning, yes Gs exist as does FL with some UG touches, no,
everything is not in the signal… However, I also came away thinking that
Gallistel’s critique is much more serious than I had believed before. The
problem is that CN has put aside the idea that brains are information
processing systems and sees them as fancy signal detection devices. And, until
this idea is put aside and CN finds the neural analogues of classical computational
concepts, mapping ling structure to neural mechanisms will be virtually
impossible, not because they are linguistic
but because they are cognitive. There
is no current way to link linguistic concepts to brain primitives because brain
primitives cannot do any kind of cognition at all (sensation yes, perception, partly, but cognition, nada).
Where does that leave us? We can still look for parts of
the brain that correlate with doing languagy things (what David Poeppel calls
the map problem (see next post)), but if the aim is to relate brain and
linguistic mechanisms, this is a long way off if we cannot find the kinds of
computational structures and operations that Gallistel has been urging CN to
look for.
So how did it go? Well, not bad. Nijmegen is nice. The
weather was good. The food served was delicious and, let me say this loud and
clear, I really enjoyed the time I spent talking with the CNers, especially Peter
Hagoort. He likes a good argument and is really fun to disagree with (and that
is easy to do given how wrong he is about things linguistic). So, it was fun.
It may even have been productive. However, I doubt the lectures, excellent
though they were, will mark a sea-change in ling-neuro interactions. I hope I
am wrong, but I doubt it. We shall see.
I agree that it's evidently true that the brain has infinite generative capacity, and that generative linguistics has discovered a huge array of facts that any theory of linguistic competence would need to explain. On the other hand, I either disagree with or don't understand the following part:
ReplyDeleteWell basically it is assumed that brains have a neural net/connectionist architecture. IMO, this is the main stumbling block: CNers take all brain function to be a species of signal detection.
It's still not clear to me why a neural network wouldn't be able to implement what we like to describe in symbolic terms (except perhaps for some very reductive architectures, cf Gary Marcus' work). On some level it's trivially true that that needs to be the case, because the brain is made up of neurons, not symbols, no? Sorry for stating the obvious - perhaps I'm not getting the distinction you're making between signal detection and information processing. I do agree though that much more work needs to be done to understand how what we characterize as symbol processing is implemented in the brain, though.
There are (good) arguments out there that neural net/connectionist architectures have trouble with representations. Marcus, Fodor&Pylyshyn make these for cog architectures and Gallsitel extends this to neural architecture. This is what I had in mind. Nets are fancy feature detectors: given a fixed inventory of features that nodes respond to we can get them to respond to the actual features in the environment by various "reward" schemes. There are no representations, nor the computational infrastructure to support such. Gallistel's point is that some computational architectures can support representations and some cannot and the problem right now is that neural net/connectionist architectures, which are the only ones being adopted, cannot. So, nobody is looking for the neural equivalent of an address or a stack or a pointer because it is assumed that these are not the kinds of notions a brain can support because that's not what neural nets can do. And this is a problem for us as linguists, I believe. How do brains embody these ideas? Once we know this, we can do a lot of CN of language. Until then, we cannot.
DeleteI don't know enough to discuss the neuro question (not sure anyone does), but as far as computational neural nets, I think you have a particular PDP-like minimal neural net architecture in mind. Gary Marcus was quite clear that his argument was about that particular type of neural net, but people have proposed a range of more structured architectures that combine basic neural nets with other elements. For example, this paper from Facebook AI Research proposes an architecture that adds stacks to a recurrent neural network. I don't know where this architecture would fall in your typology - it's clearly not a symbol manipulator in the traditional sense, but it's also not a "pure" pattern detector (I'm not actually sure that a recurrent neural network would count a pattern detector even without the stacks).
DeleteI don't know that we have evidence that this type of model couldn't learn to do what we take to be symbolic about language - indeed, the paper I linked to show that that model can learn various formal languages pretty well. Granted, there's a large gap between those miniature languages and natural languages, but my point is that it's still an empirical question which architectures can and can't implement the kind of symbolic processing that we associate with syntax (for example).
Maybe SRNs as discussed by Christiansen and Chater here: (http://cnl.psych.cornell.edu/pubs/1999-cc-CogSci.pdf) might be a useful initial target ... they are claimed to be able to learn certain recursive patterns, but I suspect that there's very little chance that they would tend to get the right typology or generalizations, such as that when fully center embedded structures occur, they tend to have the same overall structure as nonembedded ones (except for a very strong tendency not to contain further center embeddings of the same thing), and when there are differences between center-embedded and not-embedded at tall, as in German subordinate clauses, they apply to all subordinate structures, including the edge-embedded ones.
DeleteThis is pretty old, but it might help clarify the issues, such as at least some of what a connectionist network would actually have to do to satisfy linguists. And also clarify why flat models are hopeless.
@Avery, I agree - "what a connectionist network would actually have to do to satisfy linguists" is exactly the question that needs to be asked. Of course, the answer could also be "no connectionist network could possibly satisfy linguists", but I'm not sure what the arguments for that position are. To even state that answer we would probably need some conceptual clarity about what counts as a connectionist network, what counts as an "information processing system", and whether these two exhaustively cover the space of possible models.
DeleteNeuroscience I know nothing about but I was a connectionist before a linguist. One of the fascinating aspects of general purpose neural nets is that it works like a black box: given enough tweaking, it can approximate the input-output behavior very well--now better with Deep Learning. But that's also what makes these systems ultimately unsatisfying: because it's a blackbox, we don't really understand what's going on when we want to understand how the humans deal with languages. (And this is not to even mention that most of the NN models of language have not worked very well.) My understanding is that even when DL works much better than previous methods (for certain tasks), people currently don't understand why it works better. One may say that's not very satisfying either--if he/she is interested in understanding something.
DeleteIt is precisely the I/O nature of these NNs that Gallistel points to as the problem. THe issue is not matching behavior but strong representations that are then accessible and callable from memory when required. He argues that this is precisely what NNs cannot do, and that is the problem for a system that can handle cognition (even non-human cognition as in ant navigation or bird foraging). He also notes that the NN fixation is stopping CNers from thinking about how to neurally implement classical computational notions like 'address' 'write to' 'stack' 'buffer' etc. If Nijmegen was any indication, this seems correct. Nobody knows what the neural equivalents of these things are. But worse, nobody is even looking. It's just not on the radar. And this has implications for ling.
DeleteThe state of the art now in CN allows us to look at the brain performing some action; say parsing a sentence. We cannot look into the brain and see how knowledge is stored, only how it might be being used. But our computational theories of G performance really exploit classical computer architectures to performance them. Gs are in look up tables etc but we only "see" these through the activities of the perfumer. But if CN has no idea how these non-ling computational properties work in brains, then there is little hope that we can make serious contact at the level of mechanism. We can still look where things happen, but not how they do. Moreover, and this was my take away, this is not really a LANGUAGE and the brain problem, but a COGNITION and the brain problem. This is, of course, Randy's point. After Nijmigen, I believe that he is absolutely right.
Well I would take notice of a neural net which did some combination of the following things (even one would be interesting):
Deletea) picked up some grammatical generalizations of the kind linguists have been interested in for some time (eg the basics of Icelandic, Kayardild or Lardil, and, say Dinka) from a 'reasonable' amount of data (nature, amount and quality of data required is subject to negotiotion)
b) did not easily do things such as learn grammars that provided different possibilities for edge-embedded vs center-embedded structures (the problem I suspect that SRNs might have), different ordering rules for NPs with different grammatical genders, and similar nonexistent oddities.
c) can do two way transduction between sentences and 'knowledge state updates', that is meanings. EG given a desired information update, produce a sentence that an addressee can process to apply that update to their own information (dynamic semantics as in DRT and friends). There is some work on semantics for neural nets, although I have no idea what it does.
The problem I see with Norbert's general view above is that I've never noticed that people paid any attention to high-level criticisms of what they are doing that emerge from a somewhat different perspective, but 'your system doesn't do X but it should' and/or 'does do Y but it shouldn't', where the behavior you're recommending is basically an adjustment of what the system is actually doing (but perhaps requiring drastic changes in how the system works), is more likely to have an effect.
I have to side with Tal on the technical side, but I am also very sympathetic to Gallistel's methodological point.
DeleteTo the best of my knowledge (which is very lacking in this area), some types of recurrent neural networks are Turing complete. See for instance this paper. Any computable representation format can be reencoded into a string-based format manipulated by Turing machines, so the assertion that neural networks cannot support representations is false in the standard technical sense.
But that does not undermine Gallistel's basic point that neural networks have misled an entire field into believing that they can talk about cognition without talking about computation. And that simply doesn't work, hardware tells you precious little about software (if you want to argue otherwise, you have to give a very convincing argument why everything we have learned since Turing does not apply to brains). So in that sense of support neural networks are indeed a failure, and their prominence in various fields has been very unhealthy.
Yes, I think we agree on the technical vs methodological/sociological distinction. Our desiderata for a theory of language are framed in symbolic terms (e.g. Avery's points), so whoever proposes a neural network as an implementation of linguistic competence had better show that the network's behavior is consistent with those symbolic generalizations. And it may be the case that neural network mislead people into assuming without empirical demonstration that the network is powerful enough to learn whatever it needs to learn, given "enough" data.
DeleteThomas: These properties have been known for a long time but have had very little impact one way or another. Kurt Hornik showed in 1991 that backprop networks are universal approximators, and even earlier results in the 1980s showed that they are NP-hard. The same can be said about numerous other computing devices--e.g., cellular automata--but it's hard to see how that knowing that changes anything about anything, even on the technical side. We just have bigger machines now.
DeleteThis comment has been removed by the author.
DeleteI have a suspicion that formulating the symbolically stated desiderata that we find self-evidently worthy of explanation in such a way that people like Morten Christiansen would find them even worth thinking about at all is a bit of a challenge, since an assertion such as "XPs fully embedded within XPs tend to have pretty much the same structure other than a reluctance to contain yet another fully embedded XP" really do not seem to me to be meaningful to them ... they don't really accept grammaticality judgements, they're mostly interested in various kinds of quantitative data, so, how do you set the problem up in such a way that they can't ignore it?
DeleteRepeating the same kind of stuff that linguists have been saying for the last 50 years doesn't seem to have had much effect.
If those symbolic generalizations are going to be relevant to cognitive science, they'll ultimately need to make predictions about neural activity, or about behavior of whatever sort - reading times, syntactic priming, acceptability judgments, etc. Those are the desiderata. And if someone came up with a neural network that derived all of those facts, well, it might not be an interesting theory of the language faculty, but it would certainly suggest that you don't absolutely need symbols as part of your architecture.
DeleteArbitrarily excluding from consideration an entire class of human behavior, e.g. acceptability judgments, is an odd scientific move (do you remember the paper where Christiansen made that argument?). But that question seems orthogonal to the symbolic vs. nonsymbolic issue.
I just want to add to the discussion by pointing towards some recent work that tries to tackling David's "mapping problem" in the domain of syntax. John Hale and I have been trying to match parser states with neural signals. See this CMCL proceedings paper (http://www.aclweb.org/anthology/W15-1110) for an initial report (journal paper in the pipeline). This work evaluates parsers defined over a few different grammars, including a Minimalist Grammar (in Stabler's sense).
ReplyDeleteLeila Wehbe and the group at CMU (she is now at Cal) also been doing related work that uses machine learning to tease out what sorts of parse state information is encoded in different areas of the cortex during reading (http://dx.doi.org/10.1371/journal.pone.0112575).
No, this work doesn't say how the brain pops an entry off of a stack (or even if the human parser is best modeled with a stack-based memory), but it points towards an approach that I think is pretty promising: a family of computational models quantifies how a specific parser operation, such as identifying a syntactic constituent, might be carried out by a neural circuit word-by-word and model predictions are tested for their statistical fit with measured brain data (currently macro data like BOLD or neuromagnetic flux, one day micro-level signals??)