An “unknown” commentator left links to two very interesting Gary Marcus (GM) pieces (here1 and here2) on the current state of Deep Learning (DL) research. His two pieces make the points that I tried to make in a previous post (here), but do so much more efficiently and insightfully than I did. They are MUCH better. I strongly recommend that you take a look if you are interested in the topics.
Here are, FWIW, a couple of reactions to the excellent discussion these papers provide.
Consider first here1.
1. GM observes that the main critiques of DL contend not that DL is useless or uninteresting, but (i) that it leaves out a lot if one’s research interests lie with biological cognition, and (ii) that the part that DL leaves out is precisely what theories promoting symbolic computation have always focused on. In other words, the idea that DL suffices as a framework for serious cognition is what is up for grabs not whether it is necessary. Recall, Rs are comfortable with the kinds of mechanisms DLers favor. The E mistake is to think that this is all there is. It isn’t. As GM puts it (here1:4): DL is “not a universal…solvent, but simply…one tool among many…”
I am tempted to go a bit farther (something that Lake et. al. (see here) moot as well). I suspect that if one’s goal is to understand cognitive processes then DL will play a decidedly secondary explanatory role. The hard problem is figuring out the right representational format (the kinds of generalizations it licenses and categorizations it encourages). These fixed, DL can work its magic. Without these, DL will be relatively idle. These facts can be obscured by DLers that do not seem to appreciate the kinds of Rish debts their own programs actually incur (a point that GM makes eloquently in here 2). However, as we all know a truly blank slate generalizes not at all. We all need built-ins to do anything. The only relevant question is which ones and how much, not whether. DLers (almost always of an Eish persuasion) seem to have a hard time understanding this or drawing the appropriate conclusions from this uncontentious fact.
2. GM makes clear (here1:5) in what sense DL is bad at hierarchy. The piece contrasts “feature-wise hierarchy” from systems that “can make explicit reference to the parts of larger wholes.” GM describes the former as a species of “hierarchical feature detection; you build lines out of pixels, letters out of lines, words out of letters and so forth.” DL is very good at this (GM: “the best ever”). But it cannot do the second at all well, which is the kind of hierarchy we need to describe, say, linguistic objects with constituents that are computationally active. Note, that what GM calls “hierarchical feature detection” corresponds quite well with the kind of discovery procedures earlier structuralism advocated and whose limitations Chomsky exposed over 60 years ago. As GM notes, pure DL does not handle at all well the kinds of structures GGers regularly make use of to explain the simplest linguistic facts. Moreover, DL fails for roughly the reasons that Chomsky originally laid out; it does not appreciate the particular computational challenges that constituency highlights.
3. GM has a very nice discussion of where/how exactly DLs fail. It relates to “extrapolation” (see discussion of question 9, 10ff). And why? Because DL networks “don’t have a way of incorporating prior knowledge” that involve “operations over variables.” For these kinds of “extrapolations” we need standard symbolic representations, and this is something that DL eschews (for typically anti-nativist/rationalist motives). So they fail to do what humans find trivially easy (viz. to “learn from examples the function you want and extrapolate it”). Can one build into DL systems that employ operations over variables? GM notes that they can. But in doing so they will not be pure DL devices and will have to allow for symbolic computations and the innate (i.e. given) principles and operations that DLers regularly deny is needed.
4. GM’s second paper also has makes for very useful reading. It specifically discusses the AlphaGO programs recently in the news for doing for Go what other programs did for chess (beat the human champions). GM asks whether the success of these programs support the anti R conclusions that its makers have bruited about? The short answer is ‘NO!”. The reason, as GM shows, is that there is lots of specialized pre-packaged machinery that allows these programs to succeed. In other words, they are elbow deep into very specific “innate” architectural assumptions without which the programs would not function.
Nor should this be surprising for this is precisely what one should expect. The discussion is very good and anyone interested in a good short discussion of innateness and why it is important should take a look.
5. One point struck me as particularly useful. If what GM says is right then it appears that the non nativist Es don’t really understand what their own machines are doing. If GM is right, then they don’t seem to see how to approach the E/R debate because they have no idea what the debate is about. The issue is not whether machines can cognize. The issue is what needs to be in a machine that cognizes. I have a glimmer of a suspicion that DLers (and maybe other Eish AIers) confuse two different questions: (a) Is cognition mechanizable (i.e does cognition require a kind of mentalistic vitalism )? versus (b) What goes into a cognitively capable mind: how rasa can a cognitively competent tabula be?
These are two very different questions. The first takes mentalism to be opposed to physicalism, the suggestion being that mental life requires something above and beyond the standard computational apparatus to explain how we cognize as we do. The second is a question within physicalism and asks how much “innate” (i.e. given) knowledge is required to get a computational system to cognize as we do. The E answer to the second question is that not much given structure is needed. The Rs beg to differ. However, Rs are not committed to operations and mechanisms that transcend the standard variety computational mechanisms we are all familiar with. No ghosts or special mental stuff required. If indeed DLers confuse these two questions then it explains why they consider whatever program they produce (no matter how jam packed with specialized “given” structures (of the kind that GM notes to be the case with AlphaGO)) as justifying Eism. But as this is not what the debate is about, the conclusion is a non-sequitur. AlphaGo is very Rish precisely because it is very non rasa tabularly.
To end: These two pieces are very good and important. DL has been massively oversold. We need papers that keep yelling about how little cloth surrounds the emperor. If your interests are in human (or even animal) cognition then DL cannot be the whole answer. Indeed, it may not even be much or the most important part of the answer. But for now if we can get it agreed that DL requires serious supplementation to get off the ground, that will be a good result. GM’s papers are a very good at getting us to this conclusion.
 I should add, that there are serious mental mysteries that we don’t know how to account for conutationally. Chomsky describes these as the free use of our capacities and what Fodor discusses under the heading central systems. We have no decent handle on how we freely exercise our capacities or how the complex judgments work. These are mysteries, but these mysteries are not what the E/R debate is mostly about.