Sunday, February 10, 2013

Another Impossibility Argument

Things that are not possible don’t happen.  Sounds innocuous huh?  Perhaps, but that’s the basis of POS arguments.  If the evidence cannot dictate the outcome of acquisition, and nonetheless the outcome is not random, then there must be something other than the evidence guiding the process. Substitute ‘PLD’ for ‘evidence’ and ‘UG’ for ‘something other than the evidence’ and you get your standard POS argument. There have been and can be legitimate fights about how rich the data/PLD is and how rich UG is, but the form of the argument is dispositive and that’s why it’s so pretty and so useful.  A good argument form is worth its weight in theoretical gold. 

That this is what I believe is not news for those of you who have been following this blog.  And don’t worry, at least for today, I am not going to present another linguistic POS argument (although I am tempted to generate some new examples so that we can get off of the Yes/No question data).  Rather, what I want to do is publicize another application of the same argument form, the one deployed in Gallistel/King (G/K) in favor of the conclusion that connectionist architectures are biologically impossible.

The argument that they provide is quit simple: connectionism requires too many neurons to code various competences.  They dub the problematic fact over which connectionist models necessarily stumble and fall ‘the infinitude of the possible’ (IoP). The problem as they understand it is endemic; computations in a connectionist/neural net architecture (C/NN) cannot be “implemented by compact procedures.” This means that such a C/NN cannot “produce as an output an answer that the maker of the system did not hard wire into the look-up tables. (261)” In effect, C/NNs are fancy lists (aka look-up tables) where all possibilities are computed out rather than being implicit in more compact form in a generative procedure. And this leads to a fundamental problem: the brain is just not big enough to house the required C/NNs.  Big as brains are, they are still too small to explicitly code all the required possible realizable cognitive states.

G/K’s argument is all in service of arguing that neuroscientists must assume that brains are effectively Turing-von Neumann (TvN) machines with addressable, symbolic, read/write memories. In a nutshell:

 … a critical distinction between procedures implemented by means of look-up tables and … compact procedures …is that the specification of the physical structure of a look-up table requires more information than will ever be extracted by the use of that table.  By contrast, the information required to specify the structure of a mechanism that implements a compact procedure may be hundreds of orders of magnitude less than the information that can be extracted using that mechanism (xi).

What’s the argument? Like POS arguments, it starts with a rich description of various animal competences. The three that play staring roles in the book are dead reckoning in ants, bee dancing and food caching behavior in jays.  Those of you who like Animal Planet will love these sections. It is literally unbelievable what these bugs and birds can do.  Here is a glimpse of jay behavior as reported in G/K (c.f. 213-217).

In summer, when times are good, scrub jays collect and store food in different locations for later winter feasting. They cache this food in as many as 10,000 different locations. Doing this involves remembering what they hid, where they hid it, when they hid it, whether they emptied it, if the morsel was tasty, how quickly the morsel goes bad, and who was watching them when they hid it. This is a lot of information. Moreover, it is very specific information, sensitive to six different parameters.  Moreover, the values for these parameters are indeterminate and thus the number of possible memories these jays can produce and access is potentially unbounded.  Though there is an upper bound on the actual memories stored, the number of potentially storable memories is effectively unbounded (aka, infinite).  This is the big fact and it has a big implication. In order to store these memories the jays need some sort of template that roughly says ‘stored X at Y at time Z, X goes bad in W days, X is +/- retrieved, X is +/- tasty, storing was +/- observed.’ This template requires a brain/mind that can link variables, value variables, write to memory and retrieve from memory so as to store useful information and access it when necessary.  Note, we can treat this template as a large sentence frame, much like ‘X weighs Y pounds’ and like the latter there is no upper bound on the number of possible realizations of this frame (e.g. John weighs 200 pounds, Mary weighs 90 pounds, Trigger weighs 1000 pounds etc.).  These templates combined with the capacity to put actual food type/time/place/ etc. values for the variables constitute “compact procedures” for coding the relevant actual information required. Notice how “small” it is relative to the number of actual instances of such templates (finite specification versus unbounded number of instances).

If this description is correct (G/K review the evidence extensively), here is what is neuronally impossible: to list all the potential instantiations of this kind of proposition and simply choose the ones that are actual. Why?

… the infinitude of the possible looms. There are many possible locations, many possible kinds of food, many possible rates of decay, many possible delays between caching and recovery – and no restrictions on the possible combinations of these possibilities. No architecture with finite resources can cope with this infinitude by allocating resources in advance to every possible combination. (217)

However, current neuroscience takes read/write memory (a necessary feature of a system able to code the above information) to be neurobiologically implausible. Thus, current neuroscience only investigates systems (viz. C/NNs) that cannot in principle handle these kinds of behavioral data.  What’s the principled reason for this inadequacy? Computation in C/NNs is not implemented by compact procedures. Rather, C/NNs are effectively elaborate look-up tables and so cannot “output an answer that the maker of the system did not hard wire into one of its look-up tables.” (261).

That’s the impossibility argument. If we assume that brains mediate cognition then it cannot be the case that animal brains are C/NN devices.

I strongly recommend reading these sections of G/K’s book. There is terrific detailed discussion of how many neurons it would take to realize a C/NN capable of dead reckoning. By G/K’s estimates (261) it would take all the neurons in an ant’s brain (ants are terrific dead reckoners) to realize a more or less adequate system of dead reckoning.

This is fun stuff, but I am no expert in these matters and though I tend to trust Gallistel when he tells me something, I am in no position to independently verify his calculations regarding the number of neurons required to implement a reasonable C/NN device.[1]  However, G/K’s conclusion should resonate with generativists.  Grammars are compact procedures for coding what is essentially an infinite number of possible sentences and UG is (part of ) a compact procedure for coding what is (at the very least) a very large number of possible Gs.[2] Thus, whatever might be true of other animals, human brains clearly capable of language cannot be C/NNs.[3] Why do I mention this. For two reasons:

First, there is still a large cohort of neuroscientists, psychologists and computationalists who try to analyze linguistic phenomena in C/NN terms. They look at pictures of brains, see interconnected neurons, and conclude that our brains are C/NNs and that our linguistic competence must be analysed to fit these “data.” G/K argue that this is exactly the wrong conclusion to draw. Note that the IoP is trivially obvious in the domain of language, so the G/K argument is very potent here. And that is cause for celebration as these same net-enchanted types are also rather grubby empiricists.

G/K discuss the unholy affinities between associationism and C/NN infatuation (c.f. chapter 11) (the slogan “what fires together wires together” reveals all).  Putting a stake through the C/NN worldview also serves to weaken the empiricist-associationist learning conception of acquisition.[4] I doubt it will kill it. Nothing can it appears. But perhaps it can intellectually wound it yet again (though only if the G/K material is taken seriously, which I suspect it won’t be either because the net-nuts won’t get it or they will simply ignore it). So an attack on C/NNs is also a welcome attack on empiricist/associationist conceptions and that’s always a valuable public service.

Second, this is very good news for the minimalistically inclined. Here’s why.  Minimalists are counting on animal brains being similar enough to human brains for there to be features of the former that can be used to explain some of the properties of FL/UG.  Recall the conceit: take the cognitive capacities of our ancestors as given and ask what we need to add to get linguistically capable minds/brains.  However, were animal brains C/NNs and ours clearly are not (recall how easy it is to launch the IoP considerations in the domain of language) then it is very hard to see how something along these lines could be true.  Is it really plausible that the shift to language brought with it an entirely new kind of brain architecture?  The question answers itself.  So G/K’s conclusions regarding animal brain architectures is very good news.

Note, we can turn this argument around: if as minimalisms requires (and seems independently reasonable) human brains are continuous with non-human ones and if the IoP requires that brains have TvN architectures, then language in humans provides a strong prima facie argument for TvNs in animals.  For the reasons that G&K offer (not unlike those Chomsky originally deployed) human linguistic competence cannot supervene on C/NN systems. And as G/K note, this has profound implications for current work in neuroscience, viz. the bulk of the work is looking in the wrong places for answers that cannot possibly serve. Quite a conclusion, but that’s what a good impossibility argument delivers.

Let me end with one last observation: there is a tendency to think that neuroscientists hold the high intellectual ground and cognitive science/linguistics must accommodate itself to its conclusions. G/K demonstrate that this is nonsense.  Cognition supervenes on brains. If some kind of brain cannot support what we know are the cognitive facts, then this view of the brain must be wrong.  Anything else would be old fashion dualism. Hmm, wouldn’t it be amusing if to defend their actual practice neuroscientists had to endorse hard core Cartesian dualism? Yet, if G/K are right, that’s what they are effectively doing right now.

[1] Note that the G/K argument is about physical implementation and so is additional (though related) to the arguments presented by Fodor and Pylyshyn or Marcus against connectionism. Not only does C/NN get the cognition wrong, it is also physically impossible given how many neurons it would take to implement a C/NN device to do even a half-assed job. 
[2] If UG is parametric with a finite number of parameters then the number of possible Gs will also be finite. However, it is virtually certain that the number of parameters (if there are parameters) is very high (see here for discussion) and so the space of possibilities is very very large. OF course, if there are no parameters, then there need be no upper bound on the number of possible Gs.
[3] Which does not mean that for some kinds of pattern recognition the brain cannot use C/NNs. The point is that these are not sufficient.
[4] The cognoscenti will notice that I am here carrying on my effort to change our terminology so that we use ‘learning’ to denote one species of acquisition, the species tightly tied to empiricism/associationism.


  1. :-) Hope Christina won’t write a new Potpourri.

    We can’t be sure whether G&K’s arguments against connectionism are valid. They think like physicists. I tried to find some math proof (or disproof) on the Web recently but didn’t succeed. What G&K know for sure is that the Turing machine will do it. I just wonder whether connectionists know if neural nets can do it.

  2. I agree. The discussion on what it would take, however, is pretty detailed and Interesting. We heard someone come and argue that the numbers were off, but I was not entirely convinced. But that's for the experts and among these I have confidence that Gallistel can hold his own.

  3. No need for me to write another potpourri.

    Jerry Katz has provided arguments of this type more than 30 years ago and it is encouraging to see that Chomsky and Norbert and others finally agree that we can't have UG [or other architectures] that are biologically impossible. Problem is of course that this was only the first step of the unfinished Chomskyan revolution [Katz, 1996] now we still wait for the second: minimalism faces impossibility arguments as well, as nicely demonstrated by Paul Postal here: - Unless of course some minimalist finally puts a refutation of these arguments in print...

  4. I also can hardly claim to be an expert on these matters, but I think lumping neural nets and "empiricist" views on language together is oversimplifying too much. there are "connectionists" who offensively endorse "nativism", and while I haven't read G/K myself, what I got from this post doesn't seem to speak too strongly against some of their proposals (as far as I understand them). I'm thinking particularly of Paul Smolensky's work --- in particular his take on how recursive neural nets can represent tree structures strikes me as worthy of being discussed explicitly, rather than simply being dismissed in passing (e.g., see the first chapters of his and Geraldine Legendre's "Harmonic Mind, vol. 1").

  5. Yes, there is some discussion of this point on a blog here:
    Piccinini also provides a link to a paper on this from that blog.
    Can't do links in comments I am afraid.

    1. thanks for that link, in particular the paper linked to in the article is very interesting. here's two easy-to-click-on links, one to the article Alex linked to and one directly to the paper by Piccini:
      blog post by piccini
      Some Neural Networks Compute, others don't (pdf)

  6. I want to again emphasize that neural networks by themselves are not inherently limited, and in fact are not in principle hugely different from the computers we're using right not. The primary distinction is in how they come to have the designs they have. You could hand configure a classical neural network and get out your favorite CPU architecture (I'll take MIPS, thank you very much), or you could train with the fancy training techniques people have looked into over the years. The latter is the only aspect of neural networks that could be limited in the relevant ways.

    I think it's also important to realize that the brain itself might still be one such neural network, provided that the genome can pre-specify some amount of architecture. That is to say, maybe genome + learning algorithms are sufficient. Most neural network designs I know of are rather uninteresting in the initial layout of the neural net. It might instead be that 100% reliance on learning algorithms is insufficient but that the few hundred million years of evolution using _other_ learning techniques like survival of the fittest was sufficient. I don't know.

    1. BTW, I should add, the G/K hypothesis that brains are symbolic computing machines would be 100% true in the scenario I mention, so they could still be right even without going so far as saying we use some sort of DNA-based computing in neurons.

      Also, of course, real neurons are big complex beasts and we know that they do a whole bunch of chemical computation internally, some of which _is_ DNA-related. So I think the picture is really way more complicated than connectionists want to say it is, but I don't think it's quite correct as to make it valid to say the boring model of neurons is insufficient.

  7. There is an interesting methodological point which Christina alludes to which is the extent to which we should take out current knowledge of neurobiology as a boundary condition on our models of linguistics.

    G & K put themselves firmly in the 'no' camp, on the grounds that there must be a really important component there that current knowledge is missing.
    Norbert seems to have a more complex position since he just posted very favourably in his 'Grammars and Brains' post on the Poeppel paper which was equally firmly in the 'yes' camp.

    I think it is quite hard for generative grammarians to really
    buy into the neural realism stuff, given how idealised Merge and the architecture of minimalist competence grammars are as computational systems. (i.e. they don't correspond to the real processes of comprehension and production).

  8. 1. In fact, Poeppel is one of those neuroscientist who don't just scan brains. Methodologically, he starts with what is developed at the computational level, thinks of algorithms and for them he develops neuroscientific theories. He is primarily interested in speech perception, a process where Marr's levels make sense. For the language competence, the levels may be different but there have to be some there, too.

    2. Postal's main point, if I got it right, is that "natural languages (NLs), which like numbers, propositions, etc. are abstract objects, hence things not located in space and time, indeed not located anywhere". (p. 248).

    There's nothing wrong with such a platonic view - provided we have, as J. Peregrin put it, our platonism under control. It's useful in many applications in linguistics (I refuse to say languistics for principal reasons though it'd function here).

    Postal's "view is that not only is there no such thing as biolinguistics as this term is understood in the work Chomsky has influenced, there cannot be such a thing. The reasons are the same as those precluding there being a biomathematics or a biologic understood in the same way." (ibid.)

    This is absurd for math is build on (1) the ability to count small quantities, compare amounts etc. which we share with animals (and which surely has a biological base) and (2) human language (which has it too ;). Sure, number 2, the set of natural numbers, topological spaces etc. are abstractions. Now, of course, we are at a different level. GO TO 1.

  9. Replies
    1. It seems that you only understand part of Postal's point about the impossibility of biolinguistics and biomathematics.

      Lets take an analogy that should be uncontroversial. When we do geography we acquire knowledge about the location of geographical objects [rivers, oceans, mountains etc.] we do not have a geographical faculty that literally generates small rivers, oceans or mountains in our heads and arrives at a 'steady state' once we know say the geography of the United States. It is fairly easy to see why such a view would be incoherent: rivers, oceans and mountains are physical objects, nothing 'mental' or 'abstract'.
      Now when it comes to numbers you have a choice:
      [1] you accept that they are abstract objects. Then all of them are no matter how 'small' - so 1, 2, 3 are just as much abstract objects as 4,444,444,444,444,444 is. We can have knowledge about these objects and this knowledge is represented somehow in our brains but abstract objects are not in our brains [or, as you cite above, anywhere in space and time].
      [2] you reject realism about abstract objects and adopt some kind of fictionalism. Again, this applies to all numbers not just the large ones. That would require, among other things, that animals as well use 'fictions' to perceive small quantities. maybe they do but you need some evidence for it...
      [3] You adopt Chomsky's position that requires you that numbers and languages are literally products of human brains, part of biology as he never tires to claim. In that case they would be physical objects just as the mountains and rivers of a putative biogeography...

      All these views may have problems but to me [3] seems the most absurd. For some more details why Chomsky is committed to [3] I recommend the excellent Katz&Postal paper: and also Postal's paper on the foundation of linguistics:

    2. I didn’t say a word of my own about biomathematics. Referring to Postal’s statement that there cannot be ”such thing as biolinguistics” for the same reasons ”precluding there being a biomathematics”, I pointed out that math is (not entirely but in a tremendous extent) built *above* language. Language can exist without math (see Piraha) but not the other way round. Hence the statement is odd at best. Who would insist that there cannot be a biology of oak trees for the same reasons as there is none for oak tables?

  10. I am sorry that I missed where your confusion was. Postal [and Katz as well] makes a claim about ontology [that what exists] while you seem to be mostly concerned with epistemology [that what we know and how we come to know it]. While sometimes there is a causal connection between ontology and epistemology [the sharp pain I feel when I run into a table lets me know there IS a table in front of me] no such connection needs to exist. In fact, Chomsky has been arguing for decades against this kind of "billard ball" view of physical causation, when he questions that we know what matter is.

    Now usually Platonism is less controversial for mathematical objects [e.g., numbers, sets] than for language because many people believe that we cannot have languages without humans. You seem to believe we cannot have math without language [which would mean we cannot have math without humans] but again that is a claim about knowledge of math - which probably really is not possible without [knowledge of] language.

    Most mathematicians either explicitly or implicitly accept Platonism because the set of natural number is infinite [so is of course also the set of real numbers or the set of squares, of even numbers etc. etc.]. If these were physical objects of some kind they would need to exist somewhere in time and space, they could be destroyed etc. etc. This seems implausible. Abstract objects on the other hand do not exist in time and space, cannot be destroyed etc.
    Note these are all ontological claims.

    Now Katz and Postal believe natural languages are objects of the same kind as natural numbers [so in your analogy it would be oak trees vs. maple trees - the same kinds of objects]. One of the reasons they believe this is Chomsky's celebrated insight that there is no 'longest sentence' and that languages consist of infinite sets of sentences [he now talks about expressions but that is just a terminology issue]. Obviously the argument is more complex than my reader's digest version here and I really recommend you read where Katz and Postal go into some detail to defend Platonism. Or if you have access to it, try: Katz, J. (1996). The unfinished Chomskyan revolution. Mind and Language 11, 270-294 - that is really quite brilliant.

    For now keep in mind that math is not built any more *above language* than geography is - the rivers and mountains exist whether we have names for them or not. Just as the numbers exist whether we have names for them or not. Our knowledge of mathematics and geography may depend on knowledge of language but, to use one of Chomsky's favourite intuition pumps: for a Martian scientist knowledge of mathematics could depend on something entirely different from language. I hope this clears up the confusion.

    1. It’s an interesting discussion. But I’m no philosopher so I’m going to make it as simple and concrete as possible.

      Imagine a universe void of observers (eg. an earlier stage of our universe). Such a universe will presumably have certain topological properties, but there can be no infinite sets, no complex analysis, no general topology, that is no math - unless, of course, we assume there is an empire of ideas in there.

      What we try to answer is (1) what kind of thing is language, biological or abstract, and (2) if its bases are the same as those of math. The answer depends on whether you consider language a biological entity or an abstract object. We can do both as both can be useful. What we have to keep in mind, however, is that the empire of ideas is not a product of evolution, it’s just a useful cognitive construct. So my answers are as follows:

      Ad (1) It’s an empirical fact that language is a product of evolution and that it is as universal and natural in human species as is grasping by hands or bipedal locomotion.

      Ad (2) On the contrary, math is (except for its very rudimentary part) a product of language-based human cognition, which still is, like reading, far from being universal in humans.

  11. The point of Platonism is that abstract objects do not depend on any observer [human or otherwise]. Realists believe these abstract objects exist outside time and space, so they cannot come into existence or be destroyed. They exist in your earlier universe just as they exist now and they will exist once our universe has ceased to exist. Calling it an empire of ideas is a bit misleading because usually we need some intelligent creature who has these ideas. That is not the position Postal and Katz defend, so Chomsky's comment about 'Platonic heaven' was highly misleading.

    Now the position you adopt is a mixture of [F] fictionalism [the empire of ideas is a useful collective construct] and [B] biolinguistics. You also adopt what we call anthropocentrism: seeing the world exclusively from the human perspective.

    The mixture of F and B is common because it seemingly allows you to escape the incoherence Postal points out for B: that a biological organ can not have an output that contains infinite sets. Many people think if these sets are just fictions and not part of biology there is no incoherence. Katz and Postal have pointed out problems with that strategy. If there are no abstract objects [in the above sense because your empire of ideas is generated by human brains and hence ultimately consists of physical or concrete objects], then the type/token distinction collapses. The sentence "Chomsky is a famous linguist" does not exist except in the tokens like the one I just typed and you now read. There may be representations of this sentence in your brain and in my brain but these have to be short lived or our brains would "overflow" with sentences after a very short time. You seem to like imagining counterfactuals: try to imagine how your brain might encode this cone sentence. Then what is the likelihood that it is encoded exactly in the same way in my brain? In the brains of any of the other billion or so of English speakers. If tokens are all there is what ensures they are all tokens of the same sentence?

    This is of course only one of the problems with the B+F view. The more serious problems are Chomsky's postulation of infinite sets. Chomsky admitted in Science of Language that this is an unresolved problem for his view:
    There are a lot of promissory notes there when you talk about a generative grammar as being based on an operation of Merge that forms sets, and so on and so forth. That’s something metaphorical, and the metaphor has to be spelled out someday...if we want a productive theory-constructive [effort], we’re going to have to relax our stringent criteria and accept things that we know don’t make any sense, and hope that some day somebody will make some sense out of them – like sets. (Chomsky, 2012, p. 91

    So if someone as brilliant as Chomsky does not know how to make sense of sets in the biolinguistic framework, I am very skeptical that anyone does

    One last point: Postal is agnostic about evolutionary issues. But he nowhere denies that our ability to acquire knowledge about language is based on our biology [brains]. And since our brains have been shaped by evolution he would not deny that this ability has been shaped by evolution.

    1. The passage that you site in the Conversations book has, I believe, been misinterpreted. What Chomsky intends is a quite general point about how to understand the attribution of mathematical properties to physical systems. This is no less true for space-time being a 12 dimensional manifold, or your desktop being a Turing machine, or Bohr Atoms being miniature solar systems. What a biolinguistically inclined linguist provides is an abstract account of some as yet unknown biological system. It is intended to be a description of this object at the right level of abstraction. Of course, as all know, the details currently elude us, which is what all the fuss of the right implementation for cognitive proposals is all about.

      Is there something particularly troubling about this? Does it make the Chomsky program incoherent? Not at all. We understand how this could be shown to be true. For example, we know how to argue that a given hand calculator intstantiates 4 abstract arithmetical operations (addition, subtraction, multiplication, division). Similarly, there is no problem encoding operations like merge in a computer program run on a real live computer. The problem is that we don't know what the analogues of this are in brains. As G-K repeatedly stress, we don't have the analogues of notions like 'address' or 'read from' and 'write to' in neurological terms. Thus, we don't know how to do in wetware what we do in hardware. But this is not only true for language, it is true for virtually everything: bee dances, dead reckoning, object recognition, theory of mind, everything. As such, this is NOT a problem for biolinguistics in particular. It is a problem of figuring out how minds and brains relate, quite obviously a very big research topic. That said, unless you are a dualist (not currently a fashionable position) there is no reason to think that such an inter-field mapping won't one day be possible.

      So, is merge a biological proposal? Yes. Is there evidence in its favor? Yes. Is the evidence largely behavioral? Yes. Does this mean there is something wrong with the biolinguistic position? No. Conceptually, We are in a position analogous to where genetics was before Watson and Crick and where chemistry was before Bohr. This is what Chomsky had in mind.

      BTW, I am pretty sure of this. Why? I asked him.

    2. The review by Gualtiero Piccinini that I mentioned earlier contains the following objection to the Watson-Crick analogy.

      "But the analogy breaks down at a crucial place. Before Watson and Crick (and collaborators) figured out how to investigate the structure of DNA, such structure was beyond the observational power of available experimental techniques. Not so for the nervous system. The basic structure of the nervous system was figured out a century ago, and more and more details about its working have been progressively figured out. Before Watson and Crick, genes could be treated like black boxes because no one knew how to study their molecular structure. But to suggest that today nervous systems should be treated like black boxes is neither here nor there.

      Neuroscience has produced a huge amount of empirical evidence over the last decades. Any serious attempt at a theory of cognition must take this evidence seriously. Unfortunately, Gallistel and King do not. "

    3. I believe that this is very unfair to G&K. They do take the neuroscience very seriously, they just find it wanting. Moreover, from where we sit, the results have shed very little light on what we know to be the behavioral properties of brains. At least in the language domain, there is nothing of interest coming from the neurosciences in terms of mechanisms underlying the observed phenomena. As such, it is time to think anew. Happily, since G&K wrote their book, it seems that the neuroscience community is thinking along the same lines. It is too early to know how this will pan out, but it is interesting to see that looking at molecular computation is no longer the province of whackos. GP seems to me unduly pessimistic. What's he afraid of?