Monday, November 26, 2018

What's innate?

Johan Bolhuis sent me a copy of a recent comment in TiCS(Priors in animal and artificial intelligence (henceforth Priors))on the utility of rich innate priors in cognition, both in actual animals and artificially in machines. Following Pinker, Priorsframes the issue in terms of the blank slate hypothesis (BSH) (tabula rasafor you Latin lovers). It puts the issue as follows (963):

Empiricists and nativists have clashed for centuries in understanding the architecture of the mind: the former as a tabula rasa, and the latter as a system designed prior to experience…The question, summarized in the debate between the nativist Gary Marcus and the pioneer of machine learning, Yann LeCun, is the following: shall we search for a unitary general learning principle able to flexibly adapt to all conditions, including novel ones, or structure artificial minds with driving assumptions, or priors, that orient learning and improve acquisition speed by imposing limiting biases?

Marcus’ paper (here) (whose philosophical framework Priorsuses as backdrop for its more particular discussion) relates BSH to the old innateness question, which it contends revolves around “trying to reduce the amount of innate machinery in a given system” (1). I want to discuss this way of putting things, and I will be a bit critical. But before diving in, I want to say that I really enjoyed both papers and I believe that they are very useful additions to the current discussion. They both make excellent points and I agree with almost all their content. However, I think that the way they framed the relevant issue, in terms of innateness and blank slates, is misleading and concedes too much to the Empiricist (E) side of the debate. 

My point will be a simple one: the relevant question is not how much innate machinery, but what kindof innate machinery. As Chomsky andQuine observed a long time ago, everyonewho discusses learning and cognition is waste deep in a lot of innate machinery. The reason is that learning without a learning mechanismis impossible. And if one has a learning mechanism in terms of which learning occurs, then that learning mechanism is not itself learned. And if it is note learned then it is innate. Or, to put this more simply, the mechanism that allows for learning is a precondition for learning and preconditions are fixed prior to that which they precondition. Hence all features of the learning mechanism are innate in the simple sense of not themselves being learned. This is a simple logical point, and all who discuss these issues are aware of this point. So the question is not, never has been, and never could not have been is there innate structure?Rather the question is, always has been and always will be what structure is innate?

Why is putting things in this way important? Because arguing about the amountof innate structure gives Eists the argumentative edge. Ockham like considerations will always favor using less machinery rather than more all things being equal. So putting things as the Marcus paper and Priorsdoes is to say that the Eist position is methodologically preferable to the Rationalist (R) one. Putting things in terms of what kinds of innate machinery is required (to solve a given learning problem), rather than how much considerably levels the methodological playing field. If both E and R conceptions require boatloads of innate machinery to get anywhere, then the question moves from whether innate structure is needed (as the BSH slyly implicates) to what sort is needed (which is the serious empirical question).

This said, let’s zero in on some specifics. What makes an approach Eist? There are two basic ingredients. The first important ingredient is associationism (Aism). This is the glue that holds “ideas” together. However, this is not all. There is a second important ingredient: perceptualism (Pism). Pism is the idea that all mental contents are effectively reducible to perceptual contents, which are themselves effectively reducible to sensory concepts (sensationalism (Sism)). 

This pair of claims lies at the center of Eist theories of mind. And the notion of the blank slate emphasizes the second. We find this reflected in a famous Eish slogan: “there is nothing in the mind that is not first in the senses.” The Eish conception unites P/Sism with Aism to get to the conclusion that all mental contents are either primitive sensory/perceptual “ideas” or constructed out of sensory/perceptual input via association. The problems with Eism arise from both sources and revolve around two claims: the denial that mental concepts interrelate other than by association (they have no further interesting logical structure) and that all ideas are congeries of sensory perceptions. These two assumptions combine to provide a strong environmentalist approach to cognition wherein the structure of the environment largely shapes the contents of the mind through the probabilistic distributions of sensory/perceptual inputs. Rism denies bothclaims. It argues that association is not the fundamental conceptual glue that relates mental contents anddenies that all complex mental contents are combinations of sensory/perceptual inputs. To wax metaphorical, for Eists, only sensation can write on our mental blank slates and the greater the sensations the more vivid the images that appear. Rists think this is empirical bunk.

Note that this combination of cognitive assumptions has a third property. Given Eist assumptions, cognition is general purpose. If cognition is nothing but tracking the frequencies of sensory inputs then all cognition is of a piece, the only difference being the sensations/perceptions being tracked. There is no modularity or domain specificity beyond that afforded by the different sensory mechanisms, nor rules of “combination” beyond those tracking the differential exposure to some sensations over others. Thus for Eists, the domain generality of cognition is not an additional assumption. It is the consequence of Eisms two foundational premises.

Now, we actually know today that Eism will not work (actually, we knew this way back way back when). In particular, Pism/Sism was very thoroughly explored at the turn of the 20thcentury and shown to be hopeless. There were vigorous attempts to reduce our conceptual contents to sense data. And these efforts completely failed! Pism/Sism, in other words, is a hopeless position. So hopeless, in fact, that the only place it still survives is in AI and certain parts of psychology. Deep Learning (DL), it seems, is the latest incarnation of P/Sism+Aism right now. BothPriorsand Marcus elegantly debunk DLs inflated pretentions by showing both that the assumptions are biologically untenable and that they are adhered to more in the PR discussions than in the practice of the parade cases meant to illustrate successful AI learners.[1] I refer you to their useful discussions. See especially their excellent points concerning how much actual learning in humans and animals is based on very little input (i.e. from a very limited number of examples). DL requires Big Data (BD) to be even remotely plausible. And this data must be quite carefully curated (i.e. supervised) to be of use. Both papers make the obvious point that much biological learning is done from very few example cases (sparse data) and is unsupervised (hence notcurated). This makes most of what DLers have “discovered” largely irrelevant as models for biologically plausible theories of cognition. Sadly, the two papers do notcome right out and say this, though they hint at it furiously. It seems that the political power of DL is such that frankly saying that this emperor is hardly clothed will not be well rewarded.[2]Hence, though the papers make this point, it is largely done in a way that bends over backwards to emphasize the virtues of DL and not appear to be critically shrill. IMO, there is a cost to this politeness.

One last point and I stop. Priorsmakes a cute observation, at least one that I never considered. Eists of the DL and connectionist variety loveplasticity. They want flexible minds/brains because these are what the combination of Aism and P/Sism entails.Priorsmakes the nice observation that if flexibility is understood as plasticity then plasticity is something that biology only values in smalldoses. Brains cease being plastic after a shortish critical period. This Priorsnotes implies that there is a biological cost of being relentlessly open minded. You can see why I might positively reverberate to this observation.

Ok, nuff said. The two papers are very good and are shortish as well. Priorsis perfect for anyone wanting to have a non human case to illustrate Rish themes in a class on language and mind. The Marcus piece is part of a series of excellent papers he has been putting out reviewing the hype behind DL and taking it down several pegs (though, again, I wish he were less charitable). From these papers and the references they cite, it strikes me that the hype that has surrounded DL is starting to wear thin. Call me a hopeless romantic, but maybe when the overheated PR dies down and it becomes clear that the problems the latest round of Eish accounts solved were not the central problems in cognition, we can return to some serious science.  

[1]An aside: there is more than a passing similarity between the old attempts to reduce mental contents to sense data and the current fad in DL of trying to understand everything in terms of pixel distributional properties. History seems to constantly repeat; the first time as insight, the second time as a long con. Not surprisingly, the attempt to extract the notion “object” or “cat” from pixel distributions is no more successful today than were prior attempts to squeeze such notions from sense data. Ditto with algebraic structure from associations. It is really useful to appreciate how long we have known that Eism cannot be a serious basis for cognition. The failures Priorsand Marcus observe are not new ones, just the same old failures gussied up in technically spiffier garb.

[2]Some influential voices are becoming far more critical. Shalizi (here) notes that much of DL is simply a repackaging of perceptrons (“extracting features from the environment which work in that environment to make a behaviorally-relevant classificationor prediction or immediate action”) and will have roughly the same limitations that perceptrons had (viz. “This sort of perception is fast, automatic, and tuned to very, very particular features of the environment… They generalize to more data from their training environment, but not to new environments…”).  Shalizi, like Marcus andPriors, locates the problems with these systems in their lack of “abstract, compositional, combinatorial understanding we (and other animals) show in manipulating our environment, in planning, in social interaction, and in the structure of language.” 
            In other words, DL is basically the same old stuff repackaged for the credulous “smart” technopilic shopper. You cannot keep selling perceptrons, so repackage and sell it as DeepLearning (the ‘deep’ here is, no doubt, the contribution of the marketing department). The fact is that the same stuff that was problematic before is problematic still. There is no way to “abstract” out compositional and combinatorial principles and structures from devices aimed to track “particular features of the environment.” 


  1. I think that's "waist deep" (unless this is a subtle comment about cleaning the Augean stables here).

  2. Dear Dr Hornstein,
    I am a postdoc working on memory processing in monkeys and a regular reader of your blog. I find the content compelling and an intellectual treat as I am a complete novice about linguistics but like the approach of Chomsky and aspire to work on brain mechanism of language.
    Now coming to your current post, I have two specific queries -
    Firstly, I was wondering about your (and other distinguished contributors) thoughts on what does one really mean by "LEARNING". As indicated by this article -, learning is defined in various mutually incoherent ways by different sub-disciplines in psychology and affiliated sciences. Do you have a particular definition in mind?
    Secondly, is there a way to connect what you expressed in the post above (regarding how anyone who is tiny bit serious about explaining complex cognition and may be, one day build machines with these abilities, has to start with the assumption that there are innate learning mechanisms or priors) to this burgeoning literature on predictive coding and free energy principle enunciated by Karl Friston ( and its psychological and philosophical implications explored by Andy Clark (

  3. The stuff on plasticity reminded me of the controversy over the Baldwin Effect, which is where members of a species initially acquire an ability through experience but are subsequently able to inherit it as instinct, if retaining the plasticity of learning it is more costly than being innately disposed to it.

    As early as 1873, the biologist, Douglas Spalding, was doing experiments on the instincts of chickens (his "little victims of human curiosity") and arguing against the claims of Empiricists that such inheritances are impossible. One interesting line of thought he has is that Eists are effectively Cartesian dualists in disguise:

    "The position of psychologists of the too purely analytical school [i.e. Eists] ... is not that the facts of instinct are inexplicable; but that they are incredible ... and it is held, that all the supposed examples of instinct may be ... nothing more than cases of rapid learning, imitation, or instruction.


    [And] the reason is not far to seek. Educated men, even materialists ... have not yet quite escaped from the habit of regarding [the] mind as independent of bodily organization. Hence it is, that while familiar with the idea of physical peculiarities passing by inheritance from one generation to another, they find it difficult to conceive how anything so impalpable as [e.g.] fear at the sight of a bee should be transmitted in the same way. Obviously, this difficulty is not consistent with a thorough belief in the intimate and invariable dependence of all kinds of mental facts on nervous organization.


    The facts of mind that make up the stream of an individual life differ from material things in [only] this important respect, that whereas the latter can be stored up, volitions, thoughts, and feelings, as such, cannot ... They have to be for ever produced, created, one after another; and when gone they are out of existence. Whatever associations may be formed among these, must depend for their permanence on the corresponding impress given to the nervous organism; and why should not this, which is purely physical, be subject to the law of heredity?"

    To paraphrase, Spalding's point is that, setting aside individual experiences, any persistent mental content that we acquire through learning must, because of its persistence, correspond to some physical property of the nervous system and, being physical, it should be as accessible to evolutionary inheritance as any other bodily phenomenon. To deny this, he says, is to deny that the mind can be wholly reduced to the brain.

    1. Spalding's statement seems to me right now to be the best and clearest on this point that I've ever encountered.

  4. Is it always the case that Ockham-like considerations will favor an Empiricist theory? I agree that the framing of 'what is the structure that is innate' is better than 'is there innate structure', but I just wonder if you've conceded too much in saying that Ockham-like considerations will always favor an Empiricist account. In chapter 10, Gallistel & King (2010) make the argument that "[t]here is little appreciation of the radical nativism implicit in [connectionism]: the necessity of providing structurally in advance for every possible behaviorally relevant state of the world" (p. 177). The argument was based on what they call the "infinitude of the possible" and the fact that connectionist networks don't have a read/write memory. I'm sure I won't be able to do justice to the argument in trying to reproduce the reasoning (so readers of FoL should read it themselves if they're interested (the argument draws heavily on chapter 8)), but, briefly, the argument was that because you don't have read/write memory in a connectionist network, the only way you can encode a value for use in a future computation is via state memory, and if you have to encode a lot of things via state memory, you're quickly going to need an enormous finite-state automaton.

    Anyway, not that I have any idea how you would quantify relative "amounts" of innate machinery, but, if Gallistel & King are right, this seems like one Empircist theory where Ockham's razor plausibly cuts the other way (depending, of course, on what the alternative Rationalist theory is).