An “unknown” commentator left links to two very interesting Gary
Marcus (GM) pieces (here1
and here2) on the current state
of Deep Learning (DL) research. His two pieces make the points that I tried to
make in a previous post (here),
but do so much more efficiently and insightfully than I did. They are MUCH
better. I strongly recommend that you take a look if you are interested in the
topics.
Here are, FWIW, a couple of reactions to the excellent
discussion these papers provide.
Consider first here1.
1. GM observes that the main critiques of DL contend not
that DL is useless or uninteresting, but (i) that it leaves out a lot if one’s
research interests lie with biological cognition, and (ii) that the part that
DL leaves out is precisely what theories promoting symbolic computation have
always focused on. In other words, the idea that DL suffices as a framework for serious cognition is what is up for
grabs not whether it is necessary. Recall, Rs are comfortable with the kinds of
mechanisms DLers favor. The E mistake is to think that this is all there is. It isn’t. As GM puts it
(here1:4): DL is “not a universal…solvent, but simply…one tool among many…”
I am tempted to go a bit farther (something that Lake et.
al. (see here)
moot as well). I suspect that if one’s goal is to understand cognitive
processes then DL will play a decidedly secondary explanatory role. The hard
problem is figuring out the right representational format (the kinds of
generalizations it licenses and categorizations it encourages). These fixed, DL
can work its magic. Without these, DL will be relatively idle. These facts can
be obscured by DLers that do not seem to appreciate the kinds of Rish debts
their own programs actually incur (a point that GM makes eloquently in here 2).
However, as we all know a truly blank slate generalizes not at all. We all need
built-ins to do anything. The only relevant question is which ones and how
much, not whether. DLers (almost always of an Eish persuasion) seem to have a
hard time understanding this or drawing the appropriate conclusions from this
uncontentious fact.
2. GM makes clear (here1:5) in what sense DL is bad at
hierarchy. The piece contrasts “feature-wise hierarchy” from systems that “can
make explicit reference to the parts of larger wholes.” GM describes the former
as a species of “hierarchical feature detection; you build lines out of pixels,
letters out of lines, words out of letters and so forth.” DL is very good at
this (GM: “the best ever”). But it cannot do the second at all well, which is
the kind of hierarchy we need to describe, say, linguistic objects with
constituents that are computationally active. Note, that what GM calls
“hierarchical feature detection” corresponds quite well with the kind of
discovery procedures earlier structuralism advocated and whose limitations
Chomsky exposed over 60 years ago. As GM notes, pure DL does not handle at all
well the kinds of structures GGers regularly make use of to explain the
simplest linguistic facts. Moreover, DL fails for roughly the reasons that
Chomsky originally laid out; it does not appreciate the particular
computational challenges that constituency highlights.
3. GM has a very nice discussion of where/how exactly DLs
fail. It relates to “extrapolation” (see discussion of question 9, 10ff). And
why? Because DL networks “don’t have a way of incorporating prior knowledge”
that involve “operations over variables.” For these kinds of “extrapolations”
we need standard symbolic representations, and this is something that DL
eschews (for typically anti-nativist/rationalist motives). So they fail to do
what humans find trivially easy (viz. to “learn from examples the function you
want and extrapolate it”). Can one build into DL systems that employ operations
over variables? GM notes that they can. But in doing so they will not be pure
DL devices and will have to allow for symbolic computations and the innate
(i.e. given) principles and operations that DLers regularly deny is needed.
4. GM’s second paper also has makes for very useful reading.
It specifically discusses the AlphaGO programs recently in the news for doing
for Go what other programs did for chess (beat the human champions). GM asks
whether the success of these programs support the anti R conclusions that its
makers have bruited about? The short answer is ‘NO!”. The reason, as GM shows, is
that there is lots of specialized pre-packaged machinery that allows these programs
to succeed. In other words, they are elbow deep into very specific “innate”
architectural assumptions without which the programs would not function.
Nor should this be surprising for this is precisely what one
should expect. The discussion is very good and anyone interested in a good
short discussion of innateness and why it is important should take a look.
5. One point struck me as particularly useful. If what GM
says is right then it appears that the non nativist Es don’t really understand
what their own machines are doing. If GM is right, then they don’t seem to see
how to approach the E/R debate because they have no idea what the debate is
about. The issue is not whether machines can cognize. The issue is what needs
to be in a machine that cognizes. I have a glimmer of a suspicion that DLers
(and maybe other Eish AIers) confuse two different questions: (a) Is cognition
mechanizable (i.e does cognition require a kind of mentalistic vitalism )? versus
(b) What goes into a cognitively capable mind: how rasa can a cognitively
competent tabula be?
These are two very different questions. The first takes
mentalism to be opposed to physicalism, the suggestion being that mental life
requires something above and beyond the standard computational apparatus to
explain how we cognize as we do. The second is a question within physicalism
and asks how much “innate” (i.e. given) knowledge is required to get a
computational system to cognize as we do. The E answer to the second question
is that not much given structure is needed. The Rs beg to differ. However, Rs
are not committed to operations and mechanisms that transcend the standard
variety computational mechanisms we are all familiar with. No ghosts or special
mental stuff required. If indeed DLers confuse these two questions then it
explains why they consider whatever program they produce (no matter how jam
packed with specialized “given” structures (of the kind that GM notes to be the
case with AlphaGO)) as justifying Eism. But as this is not what the debate is about, the conclusion is a non-sequitur.
AlphaGo is very Rish precisely because it is very non rasa tabularly.[1]
To end: These two pieces are very good and important. DL has
been massively oversold. We need papers that keep yelling about how little cloth
surrounds the emperor. If your interests are in human (or even animal)
cognition then DL cannot be the whole answer. Indeed, it may not even be much
or the most important part of the answer. But for now if we can get it agreed
that DL requires serious supplementation to get off the ground, that will be a
good result. GM’s papers are a very good at getting us to this conclusion.
[1]
I should add, that there are serious mental mysteries that we don’t know how to
account for conutationally. Chomsky describes these as the free use of our
capacities and what Fodor discusses under the heading central systems. We have
no decent handle on how we freely exercise our capacities or how the complex
judgments work. These are mysteries, but these mysteries are not what the E/R debate is mostly about.
What are the big success stories of AI based on an R-ish approach? What is a good positive example to contrast with the Deepmind type of approach?
ReplyDeleteActually that is poorly phrased since GM is arguing that the Go programs are very R-ish.
DeleteGM says "An alternative approach might start from the top down, examining properties of (e.g.) adult or
child cognitive systems, seeking clues as to what might be innate (based on behavioral grounds),
or conceptually necessary."
What I am asking is what are the good or best examples of this approach actually working on some actual task?
My understanding is that GM himself is pursuing such a combined cog-AI strategy, but I don't have many details. Moreover, given the techno interests of most AI (they want to build usable systems) it is unclear why the cog issue should interest them. This has been true, IMO, since the inception of AI where narrow technical concerns have tended to dominate. This is what Dresher and I argue long ago in our critique of early AI that was in Cognition. I think that things have gotten yet more product oriented. And if that is your interest then why build a system up from GENERAL principles why to build in what you need for the specific task at hand. That is what GM argues AlphaGO does. That is fine, but it precludes arguing that the success of AlphaGO implies anything about the Fish nature of DL systems. As you note, GM's argument is that it is very Rish, with lots of GOish specifications built into the system.
DeleteDoes this mean that GM's proposal is untenable as a research program in euro-cog? No, I don't think so. This is what Lake et al were suggesting, and I am pretty sure that most of those that I know that are very Rish would have no problem with DL as part of a larger system that includes symbol manipulations with lots of special properties. The Deep Mind people suggested something like this earlier this year and I think I blogged about it. So, I know of no DL plus systems given the lack of interest that DLers seem to have for the cog questions unsullied by applications.
@Alex I guess it would depend on how you define AI.
DeleteOpitcal Character Recognition, I think, tends to use R-ish methods, ditto computer algebra systems.
This might be a stretch, but mathematical models of perspective date back at least to the Renaissance, and I would say they have an R-ish flavour. They certainly don't depend on neural nets or deep learning.
This comment has been removed by the author.
DeleteDoes MNIST count as OCR?
DeleteI think GM is right to push on the fact that DL has been overhyped at the moment.
We are starting to understand some of the limitations, and see how some of the success is using some low level features that are different to what humans use.
But what are the worked out alternatives? GM is pushing for some integration of symbolic GOFAI techniques with statistical learning. I don't have a feel for what that would look like.
> An “unknown” commentator
ReplyDeleteThat looks mysterious ;). I did not have a Blogger profile, but I have now configured a display name.
Online home tuition in Bangalore is the need of the hour as school learning is not enough, thus students are seeking online home tutors in Bangalore to clear their concepts.
ReplyDeleteCall Our Experts :- +91-9654271931
Visit Us:- home tuition in bangalore