Wednesday, January 24, 2018

Gary Marcus on deep learning

An “unknown” commentator left links to two very interesting Gary Marcus (GM) pieces (here1 and here2) on the current state of Deep Learning (DL) research. His two pieces make the points that I tried to make in a previous post (here), but do so much more efficiently and insightfully than I did. They are MUCH better. I strongly recommend that you take a look if you are interested in the topics.

Here are, FWIW, a couple of reactions to the excellent discussion these papers provide.

Consider first here1.

1. GM observes that the main critiques of DL contend not that DL is useless or uninteresting, but (i) that it leaves out a lot if one’s research interests lie with biological cognition, and (ii) that the part that DL leaves out is precisely what theories promoting symbolic computation have always focused on. In other words, the idea that DL suffices as a framework for serious cognition is what is up for grabs not whether it is necessary. Recall, Rs are comfortable with the kinds of mechanisms DLers favor. The E mistake is to think that this is all there is. It isn’t. As GM puts it (here1:4): DL is “not a universal…solvent, but simply…one tool among many…”

I am tempted to go a bit farther (something that Lake et. al. (see here) moot as well). I suspect that if one’s goal is to understand cognitive processes then DL will play a decidedly secondary explanatory role. The hard problem is figuring out the right representational format (the kinds of generalizations it licenses and categorizations it encourages). These fixed, DL can work its magic. Without these, DL will be relatively idle. These facts can be obscured by DLers that do not seem to appreciate the kinds of Rish debts their own programs actually incur (a point that GM makes eloquently in here 2). However, as we all know a truly blank slate generalizes not at all. We all need built-ins to do anything. The only relevant question is which ones and how much, not whether. DLers (almost always of an Eish persuasion) seem to have a hard time understanding this or drawing the appropriate conclusions from this uncontentious fact.

2. GM makes clear (here1:5) in what sense DL is bad at hierarchy. The piece contrasts “feature-wise hierarchy” from systems that “can make explicit reference to the parts of larger wholes.” GM describes the former as a species of “hierarchical feature detection; you build lines out of pixels, letters out of lines, words out of letters and so forth.” DL is very good at this (GM: “the best ever”). But it cannot do the second at all well, which is the kind of hierarchy we need to describe, say, linguistic objects with constituents that are computationally active. Note, that what GM calls “hierarchical feature detection” corresponds quite well with the kind of discovery procedures earlier structuralism advocated and whose limitations Chomsky exposed over 60 years ago. As GM notes, pure DL does not handle at all well the kinds of structures GGers regularly make use of to explain the simplest linguistic facts. Moreover, DL fails for roughly the reasons that Chomsky originally laid out; it does not appreciate the particular computational challenges that constituency highlights.

3. GM has a very nice discussion of where/how exactly DLs fail. It relates to “extrapolation” (see discussion of question 9, 10ff). And why? Because DL networks “don’t have a way of incorporating prior knowledge” that involve “operations over variables.” For these kinds of “extrapolations” we need standard symbolic representations, and this is something that DL eschews (for typically anti-nativist/rationalist motives). So they fail to do what humans find trivially easy (viz. to “learn from examples the function you want and extrapolate it”). Can one build into DL systems that employ operations over variables? GM notes that they can. But in doing so they will not be pure DL devices and will have to allow for symbolic computations and the innate (i.e. given) principles and operations that DLers regularly deny is needed.

4. GM’s second paper also has makes for very useful reading. It specifically discusses the AlphaGO programs recently in the news for doing for Go what other programs did for chess (beat the human champions). GM asks whether the success of these programs support the anti R conclusions that its makers have bruited about? The short answer is ‘NO!”. The reason, as GM shows, is that there is lots of specialized pre-packaged machinery that allows these programs to succeed. In other words, they are elbow deep into very specific “innate” architectural assumptions without which the programs would not function.

Nor should this be surprising for this is precisely what one should expect. The discussion is very good and anyone interested in a good short discussion of innateness and why it is important should take a look.

5. One point struck me as particularly useful. If what GM says is right then it appears that the non nativist Es don’t really understand what their own machines are doing. If GM is right, then they don’t seem to see how to approach the E/R debate because they have no idea what the debate is about. The issue is not whether machines can cognize. The issue is what needs to be in a machine that cognizes. I have a glimmer of a suspicion that DLers (and maybe other Eish AIers) confuse two different questions: (a) Is cognition mechanizable (i.e does cognition require a kind of mentalistic vitalism )? versus (b) What goes into a cognitively capable mind: how rasa can a cognitively competent tabula be?
These are two very different questions. The first takes mentalism to be opposed to physicalism, the suggestion being that mental life requires something above and beyond the standard computational apparatus to explain how we cognize as we do. The second is a question within physicalism and asks how much “innate” (i.e. given) knowledge is required to get a computational system to cognize as we do. The E answer to the second question is that not much given structure is needed. The Rs beg to differ. However, Rs are not committed to operations and mechanisms that transcend the standard variety computational mechanisms we are all familiar with. No ghosts or special mental stuff required. If indeed DLers confuse these two questions then it explains why they consider whatever program they produce (no matter how jam packed with specialized “given” structures (of the kind that GM notes to be the case with AlphaGO)) as justifying Eism. But as this is not what the debate is about, the conclusion is a non-sequitur. AlphaGo is very Rish precisely because it is very non rasa tabularly.[1]

To end: These two pieces are very good and important. DL has been massively oversold. We need papers that keep yelling about how little cloth surrounds the emperor. If your interests are in human (or even animal) cognition then DL cannot be the whole answer. Indeed, it may not even be much or the most important part of the answer. But for now if we can get it agreed that DL requires serious supplementation to get off the ground, that will be a good result. GM’s papers are a very good at getting us to this conclusion.

[1] I should add, that there are serious mental mysteries that we don’t know how to account for conutationally. Chomsky describes these as the free use of our capacities and what Fodor discusses under the heading central systems. We have no decent handle on how we freely exercise our capacities or how the complex judgments work. These are mysteries, but these mysteries are not what the E/R debate is mostly about.


  1. What are the big success stories of AI based on an R-ish approach? What is a good positive example to contrast with the Deepmind type of approach?

    1. Actually that is poorly phrased since GM is arguing that the Go programs are very R-ish.

      GM says "An alternative approach might start from the top down, examining properties of (e.g.) adult or
      child cognitive systems, seeking clues as to what might be innate (based on behavioral grounds),
      or conceptually necessary."

      What I am asking is what are the good or best examples of this approach actually working on some actual task?

    2. My understanding is that GM himself is pursuing such a combined cog-AI strategy, but I don't have many details. Moreover, given the techno interests of most AI (they want to build usable systems) it is unclear why the cog issue should interest them. This has been true, IMO, since the inception of AI where narrow technical concerns have tended to dominate. This is what Dresher and I argue long ago in our critique of early AI that was in Cognition. I think that things have gotten yet more product oriented. And if that is your interest then why build a system up from GENERAL principles why to build in what you need for the specific task at hand. That is what GM argues AlphaGO does. That is fine, but it precludes arguing that the success of AlphaGO implies anything about the Fish nature of DL systems. As you note, GM's argument is that it is very Rish, with lots of GOish specifications built into the system.

      Does this mean that GM's proposal is untenable as a research program in euro-cog? No, I don't think so. This is what Lake et al were suggesting, and I am pretty sure that most of those that I know that are very Rish would have no problem with DL as part of a larger system that includes symbol manipulations with lots of special properties. The Deep Mind people suggested something like this earlier this year and I think I blogged about it. So, I know of no DL plus systems given the lack of interest that DLers seem to have for the cog questions unsullied by applications.

    3. @Alex I guess it would depend on how you define AI.

      Opitcal Character Recognition, I think, tends to use R-ish methods, ditto computer algebra systems.

      This might be a stretch, but mathematical models of perspective date back at least to the Renaissance, and I would say they have an R-ish flavour. They certainly don't depend on neural nets or deep learning.

    4. This comment has been removed by the author.

    5. Does MNIST count as OCR?

      I think GM is right to push on the fact that DL has been overhyped at the moment.
      We are starting to understand some of the limitations, and see how some of the success is using some low level features that are different to what humans use.

      But what are the worked out alternatives? GM is pushing for some integration of symbolic GOFAI techniques with statistical learning. I don't have a feel for what that would look like.

  2. > An “unknown” commentator

    That looks mysterious ;). I did not have a Blogger profile, but I have now configured a display name.

  3. Our Vip Escorts in Islamabad are chiefly strong at their work and can lure any purchaser as indicated by their requesting. Each one of them is readied Islamabad Escorts and their sizzling displays are much of the time supported by the clients. Islamabad escorts That is the strategy for thinking purchasers Islamabad Escort are requesting them again and again and their pervasiveness among the various Islamabad Escorts purchasers is very much arranged progressing toward the skies.