Johan Bolhuis sent me a copy of a recent comment in TiCS(Priors in animal and artificial intelligence (henceforth Priors))on the utility of rich innate priors in cognition, both in actual animals and artificially in machines. Following Pinker, Priorsframes the issue in terms of the blank slate hypothesis (BSH) (tabula rasafor you Latin lovers). It puts the issue as follows (963):
Empiricists and nativists have clashed for centuries in understanding the architecture of the mind: the former as a tabula rasa, and the latter as a system designed prior to experience…The question, summarized in the debate between the nativist Gary Marcus and the pioneer of machine learning, Yann LeCun, is the following: shall we search for a unitary general learning principle able to flexibly adapt to all conditions, including novel ones, or structure artificial minds with driving assumptions, or priors, that orient learning and improve acquisition speed by imposing limiting biases?
Marcus’ paper (here) (whose philosophical framework Priorsuses as backdrop for its more particular discussion) relates BSH to the old innateness question, which it contends revolves around “trying to reduce the amount of innate machinery in a given system” (1). I want to discuss this way of putting things, and I will be a bit critical. But before diving in, I want to say that I really enjoyed both papers and I believe that they are very useful additions to the current discussion. They both make excellent points and I agree with almost all their content. However, I think that the way they framed the relevant issue, in terms of innateness and blank slates, is misleading and concedes too much to the Empiricist (E) side of the debate.
My point will be a simple one: the relevant question is not how much innate machinery, but what kindof innate machinery. As Chomsky andQuine observed a long time ago, everyonewho discusses learning and cognition is waste deep in a lot of innate machinery. The reason is that learning without a learning mechanismis impossible. And if one has a learning mechanism in terms of which learning occurs, then that learning mechanism is not itself learned. And if it is note learned then it is innate. Or, to put this more simply, the mechanism that allows for learning is a precondition for learning and preconditions are fixed prior to that which they precondition. Hence all features of the learning mechanism are innate in the simple sense of not themselves being learned. This is a simple logical point, and all who discuss these issues are aware of this point. So the question is not, never has been, and never could not have been is there innate structure?Rather the question is, always has been and always will be what structure is innate?
Why is putting things in this way important? Because arguing about the amountof innate structure gives Eists the argumentative edge. Ockham like considerations will always favor using less machinery rather than more all things being equal. So putting things as the Marcus paper and Priorsdoes is to say that the Eist position is methodologically preferable to the Rationalist (R) one. Putting things in terms of what kinds of innate machinery is required (to solve a given learning problem), rather than how much considerably levels the methodological playing field. If both E and R conceptions require boatloads of innate machinery to get anywhere, then the question moves from whether innate structure is needed (as the BSH slyly implicates) to what sort is needed (which is the serious empirical question).
This said, let’s zero in on some specifics. What makes an approach Eist? There are two basic ingredients. The first important ingredient is associationism (Aism). This is the glue that holds “ideas” together. However, this is not all. There is a second important ingredient: perceptualism (Pism). Pism is the idea that all mental contents are effectively reducible to perceptual contents, which are themselves effectively reducible to sensory concepts (sensationalism (Sism)).
This pair of claims lies at the center of Eist theories of mind. And the notion of the blank slate emphasizes the second. We find this reflected in a famous Eish slogan: “there is nothing in the mind that is not first in the senses.” The Eish conception unites P/Sism with Aism to get to the conclusion that all mental contents are either primitive sensory/perceptual “ideas” or constructed out of sensory/perceptual input via association. The problems with Eism arise from both sources and revolve around two claims: the denial that mental concepts interrelate other than by association (they have no further interesting logical structure) and that all ideas are congeries of sensory perceptions. These two assumptions combine to provide a strong environmentalist approach to cognition wherein the structure of the environment largely shapes the contents of the mind through the probabilistic distributions of sensory/perceptual inputs. Rism denies bothclaims. It argues that association is not the fundamental conceptual glue that relates mental contents anddenies that all complex mental contents are combinations of sensory/perceptual inputs. To wax metaphorical, for Eists, only sensation can write on our mental blank slates and the greater the sensations the more vivid the images that appear. Rists think this is empirical bunk.
Note that this combination of cognitive assumptions has a third property. Given Eist assumptions, cognition is general purpose. If cognition is nothing but tracking the frequencies of sensory inputs then all cognition is of a piece, the only difference being the sensations/perceptions being tracked. There is no modularity or domain specificity beyond that afforded by the different sensory mechanisms, nor rules of “combination” beyond those tracking the differential exposure to some sensations over others. Thus for Eists, the domain generality of cognition is not an additional assumption. It is the consequence of Eisms two foundational premises.
Now, we actually know today that Eism will not work (actually, we knew this way back way back when). In particular, Pism/Sism was very thoroughly explored at the turn of the 20thcentury and shown to be hopeless. There were vigorous attempts to reduce our conceptual contents to sense data. And these efforts completely failed! Pism/Sism, in other words, is a hopeless position. So hopeless, in fact, that the only place it still survives is in AI and certain parts of psychology. Deep Learning (DL), it seems, is the latest incarnation of P/Sism+Aism right now. BothPriorsand Marcus elegantly debunk DLs inflated pretentions by showing both that the assumptions are biologically untenable and that they are adhered to more in the PR discussions than in the practice of the parade cases meant to illustrate successful AI learners. I refer you to their useful discussions. See especially their excellent points concerning how much actual learning in humans and animals is based on very little input (i.e. from a very limited number of examples). DL requires Big Data (BD) to be even remotely plausible. And this data must be quite carefully curated (i.e. supervised) to be of use. Both papers make the obvious point that much biological learning is done from very few example cases (sparse data) and is unsupervised (hence notcurated). This makes most of what DLers have “discovered” largely irrelevant as models for biologically plausible theories of cognition. Sadly, the two papers do notcome right out and say this, though they hint at it furiously. It seems that the political power of DL is such that frankly saying that this emperor is hardly clothed will not be well rewarded.Hence, though the papers make this point, it is largely done in a way that bends over backwards to emphasize the virtues of DL and not appear to be critically shrill. IMO, there is a cost to this politeness.
One last point and I stop. Priorsmakes a cute observation, at least one that I never considered. Eists of the DL and connectionist variety loveplasticity. They want flexible minds/brains because these are what the combination of Aism and P/Sism entails.Priorsmakes the nice observation that if flexibility is understood as plasticity then plasticity is something that biology only values in smalldoses. Brains cease being plastic after a shortish critical period. This Priorsnotes implies that there is a biological cost of being relentlessly open minded. You can see why I might positively reverberate to this observation.
Ok, nuff said. The two papers are very good and are shortish as well. Priorsis perfect for anyone wanting to have a non human case to illustrate Rish themes in a class on language and mind. The Marcus piece is part of a series of excellent papers he has been putting out reviewing the hype behind DL and taking it down several pegs (though, again, I wish he were less charitable). From these papers and the references they cite, it strikes me that the hype that has surrounded DL is starting to wear thin. Call me a hopeless romantic, but maybe when the overheated PR dies down and it becomes clear that the problems the latest round of Eish accounts solved were not the central problems in cognition, we can return to some serious science.
An aside: there is more than a passing similarity between the old attempts to reduce mental contents to sense data and the current fad in DL of trying to understand everything in terms of pixel distributional properties. History seems to constantly repeat; the first time as insight, the second time as a long con. Not surprisingly, the attempt to extract the notion “object” or “cat” from pixel distributions is no more successful today than were prior attempts to squeeze such notions from sense data. Ditto with algebraic structure from associations. It is really useful to appreciate how long we have known that Eism cannot be a serious basis for cognition. The failures Priorsand Marcus observe are not new ones, just the same old failures gussied up in technically spiffier garb.
Some influential voices are becoming far more critical. Shalizi (here) notes that much of DL is simply a repackaging of perceptrons (“extracting features from the environment which work in that environment to make a behaviorally-relevant classificationor prediction or immediate action”) and will have roughly the same limitations that perceptrons had (viz. “This sort of perception is fast, automatic, and tuned to very, very particular features of the environment… They generalize to more data from their training environment, but not to new environments…”). Shalizi, like Marcus andPriors, locates the problems with these systems in their lack of “abstract, compositional, combinatorial understanding we (and other animals) show in manipulating our environment, in planning, in social interaction, and in the structure of language.”
In other words, DL is basically the same old stuff repackaged for the credulous “smart” technopilic shopper. You cannot keep selling perceptrons, so repackage and sell it as DeepLearning (the ‘deep’ here is, no doubt, the contribution of the marketing department). The fact is that the same stuff that was problematic before is problematic still. There is no way to “abstract” out compositional and combinatorial principles and structures from devices aimed to track “particular features of the environment.”