All papers involve at least two levels of discourse. The
first comprises the thesis: what
being argued? What evidence is there for what is being argued? What’s the point
of what is being argued? What follows from what is being argued? The second is
less a matter of content than of style: how
does the paper say what it says? This latter dimension most often reveals the
(often more obscure) political dimensions of the discipline, and relates to
topics like the following: What parts of one’s butt is it necessary to cover so
as to avoid annoying the powers that be? Which powers is it necessary to
flatter in order to get a hearing or to avoid the ruinous “high standards of
scrutiny” that can always be deployed to block publication? Whose work must get
cited and whose can be safely ignored? What are the parameters of the
? If you’ve ever been lucky enough to promote an unfashionable
position, you have developed a sensitivity to these “howish” concerns. And if
you have not, you should. They are very important. Nothing reveals the core
shibboleths (and concomitant power structures) of a discipline more usefully than
a forensic inquiry into the nature of the eggshells upon which a paper is treading.
In what follows I would like to do a little
eggshell inspection in discussing a pretty good paper from a (for me)
unexpected source (a bunch of Bayesians). The authors are Lake, Ullman,
Tenenbaum and Gersham (LUTG) all from the BCS group at MIT. The paper (here
) is about the
next steps forward for an AI that aspires to be cognitively interesting (like
the AI of old), and maybe even technologically cutting edge, (though LUTG makes
this second point very gingerly).
But that is not all the paper is about. LUTG is also a
useful addition to the discussions about Empiricism (E) vs Rationalism (R) in
the study of mind, though LUTG does not put matters in quite this way (IMO, for
quasi-political reasons, recall “howish” concerns!). To locate the LUTG on this
axis will require some work on my part. Here goes.
As I’ve noted before, E and R have both a metaphysical and
an epistemological side. Epistemologically, E takes minds to be initially
(relatively) unstructured, with mental contour forged over time as a by-product
of environmental input as sampled by the senses.
Minds on this view come to represent their
environments more or less by copying their structures as manifest in the
Successful minds are ones
that effectively (faithfully and efficiently) copy the sensory input. As
Gallistel & Matzel put it, for Es minds are “recapitulative,” their
function being to reliably take “an input that is part of the training input, or
similar to it, [to] evoke[s] the trained output, or an output similar to it”
for discussion and links). Another way of putting this point is that E minds
are pattern matchers, whose job is to track the patternings evident in the
input (see here).
Pattern matching is recapitulative and E relies on the idea that cognition
amounts to tracking the patternings in the sensory input to yield reliable
Coupled with this E theory of mind comes a metaphysics.
Reality has a relatively flat causal structure. There is no rich hierarchy of
causal mechanisms whose complex interactions lie behind what we experience. Rather,
what you sense is all there is. Therefore, effectively tracking and cataloguing
this experience and setting up the right I/O associations suffices to get at a
decent representation of what exists. There is no hidden causal iceberg of
which this is the sensory/perceptual tip. The I/O relations are all there is. Not
surprisingly (this is what philosophers are paid for), the epistemology and the
metaphysics fit snugly together.
R combines its epistemology and its metaphysics differently.
Epistemologically, R regards sensory innocent minds to be highly structured.
The structure is there to allow minds to use sensory/perceptual information to
suss out the (largely) hidden causal structures that produce these
sensations/perceptions. As Gallistel & Matzel put it R minds are more like
information processing systems structured to enable minds to extract
information about the unperceived causal structures of the environment that
generate the observed patterns of sensation/perception. On the R view, minds
sample the available sensory input to construct causal models of the world
which generate the perceived sensory patterns. And in order to do this, minds
come richly stocked with the cognitive wherewithal necessary to build such
models. R epistemology takes it for granted that what one perceives vastly
underdetermines what there is and takes it to be impossible to generate models
of what there is from sensory perception without a big boost from given
(i.e. unlearned) mental structures
that make possible the relevant induction to the underlying causal mechanisms/models.
R metaphysics complements this epistemological view by
assuming that the world is richly structured causally and that what we
sense/perceive is a complex interaction effect of these more basic complexly
interacting causal mechanisms. There are hidden powers etc. that lie behind
what we have sensory access to and that in no way “resembles” (or
“recapitulates”) the observables (see here
Why do I rehearse these points yet again? Because I want to
highlight a key feature of the E/R dynamic: what distinguishes E from R is not
whether pattern matching is a legit
mental operation (both views agree that it is (or at least, can be)). What
distinguishes them is the Es think that this is all
there is (and all there needs to be), while Rs reserve an
important place for another kind of mental process, one that builds models of
the underlying complex non visible causal systems that generate these patterns.
In other words, what distinguishes E from R is the belief that there is more to
mental life than tracking the statistical patterns of the inputs. R doesn’t
deny that this is operative. R denies that this suffices. There needs to me
more, a lot more.
Given this backdrop, it is clear that GG is very much in the
R tradition. The basic observation is that human linguistic facility requires
knowledge a G (a set of recursive rules). Gs are not perceptually visible
(though their products often are). Further, cursory inspection of natural
languages indicates that human Gs are quite complex and cannot
be acquired solely by induction (even sophisticate discovery
procedures that allowed for inductions over inductions over inductions…).
Acquiring Gs requires some mental pre-packaging (aka, UG) that enables an LAD
to construct a G on the basis of simple haphazard bits of G output (aka, PLD).
Once acquired, Gs can be used to execute many different kinds of linguistic
behaviors, including parsing and producing novel sentences and phrases. That’s
the GG conceit in a small nutshell: human linguistic facility implicates Gs
which implicates UGs, both G and UG being systems of knowledge that can be put
to multiple uses, including G acquisition, and the production and comprehension
of an unbounded number of very different novel linguistic expressions within a
given natural language. G, UG, competence vs performance: the hallmarks of a R
theory of linguistic minds.
LUTG makes effectively these points but without much
discussing the language case (it does at the end but mainly to say that it
won’t discuss it).
critical claim is that human cognition relies on more than pattern matching.
There is, in addition, model building, which relies on two kinds of given
mental contents to guide
The first kind, what LUTG
calls “core” (and an R would call “given”) involves substantive
specific content (e.g. naïve physics and psychology).
The second involves formal
(e.g. compositionality and certain kinds of formal analogy (what LUTG calls
LUTG notes that these two features pre-condition further learning and are
indispensible. LUTG further notes that the two mental powers (i.e. model
building and pattern recognition) can likely be fruitfully combined, though it
(rightly, IMO) insists that model building is the central cognitive operation.
In other words, LUTG recapitulates the observations above that R theories can
incorporate E mechanisms and that E mechanisms alone
are insufficient to model human cognition and that without
the part left out, E pattern matching models are poor fits with what we have
tons of evidence to be central features of cognitive life. Here’s how the
abstract puts it:
review progress in cognitive science suggesting that truly human-like learning
and thinking machines will have to reach beyond current engineering trends in
both what they learn and how they learn it. Specifically, we argue that these
machines should (1) build causal models of the world that support explanation
and understanding, rather than merely solving pattern recognition problems; (2)
ground learning in intuitive theories of physics and psychology to support and
enrich the knowledge that is learned; and (3) harness compositionality and
learning-to-learn to rapidly acquire and generalize knowledge to new tasks and
situations. We suggest concrete challenges and promising routes toward these
goals that can combine the strengths of recent neural network advances with
more structured cognitive models.
The paper usefully goes over these points in some detail.
It further notes that this conception of the
learning (I prefer the term “acquisition” myself) problem in cognition challenges
associationist assumptions which LUTG observes (again rightly IMO) is characteristic
of most work in connectionism, machine learning and deep learning. LUTG also
points to ways that “model free methods” (aka pattern matching algorithms) might
usefully supplement the model building cognitive basics to improve performance
and implementation of model based knowledge (see 1.2 and 4.3.2).
Section 3 is the heart of the critique of contemporary AI,
which largely ignores the (R) model building ethos that LUTG champions. As it
observes, the main fact about humans when compared with current AI models is
that humans “learn a lot more from a lot less” (6). How much more? Well, human
learning is very flexible and rarely tied to the specifics of the learning situation.
LUTG provides a nice discussion of this in the domain of learning written
characters. Further, human learning generally requires very little “input.” Often,
a couple of minutes or exposures to the relevant stimulus is more than enough. In
fact, in contrast to current machine learning or DL systems humans do not need
to have the input curated and organized (or, as Geoff Hinton recently put it (here
humans, as opposed to DLers, “clearly don’t need all the labeled data.”). And
why not? Because unlike what DLers and connectionists and associationists have
forever assumed, human minds (and hence brains) are not E devices but R ones.
Or as LUTG puts it (9):
never start completely from scratch, or even close to “from scratch,” and that
is the secret to their success. The challenge of building models of human
learning and thinking then becomes: How do we bring to bear rich prior
knowledge to learn new tasks and solve new problems so quickly? What form does
that prior knowledge take, and how is it constructed, from some combination of
inbuilt capacities and previous experience?
In short, the
blank tablet assumption endemic to E conceptions of mind is fundamentally
misguided and a (the?) central problem of cognitive psychology is to figure out
what is given and how this given stuff is used. Amen (and about time)!
Section 4 goes
over what LUTG takes to be some core powers of human minds. This includes naïve
theories of how the physical world functions and how animate agents operate. In
addition, with a nod to Fodor, it outlines the critical role of
compositionality in allowing for human cognitive productivity.
It is a nice discussion and makes useful comments on causal models and their
relation to generative ones (section 4.2.2).
Now let’s note
a few eggshells. A central recurring feature of the LUTG discussion is the
observation that it is unclear how or whether current DL approaches might
integrate these necessary mechanisms. The paper does not come right out and say
that DL models will have a hard time with these without radically changing its
sub-symbolic associationist pattern matching monomania, but it strongly
suggests this. Here’s a taste of this recurring theme (and please note the R
in the case of intuitive physics, the success that generic networks will have
in capturing intuitive psychological reasoning will depend in part on the
representations humans use. Although deep networks have not yet been applied to
scenarios involving theory of mind and intuitive psychology, they could
probably learn visual cues, heuristics, and summary statistics of a scene that
happens to involve agents. If that is all that underlies human psychological
reasoning, a data-driven deep learning approach can likely find success in this
it seems to us that any full formal account of intuitive psychological reasoning
needs to include representations of agency, goals, efficiency, and reciprocal
relations. As with objects and forces, it is unclear whether a complete
representation of these concepts (agents, goals, etc.) could emerge from deep
neural networks trained in a purely predictive capacity. Similar to the
intuitive physics domain, it is possible that with a tremendous number of
training trajectories in a variety of scenarios, deep learning techniques could
approximate the reasoning found in infancy even without learning anything about
goal- directed or socially directed behavior more generally. But this is also
unlikely to resemble how humans learn, understand, and apply intuitive
psychology unless the concepts are genuine. In the same way that altering the
setting of a scene or the target of inference in a physics-related task may be
difficult to generalize without an understanding of objects, altering the
setting of an agent or their goals and beliefs is difficult to reason about
without understanding intuitive psychology.
Yup. Right on.
But why so tentative? “It is unclear?” Nope it is very clear, and has been for
about 300 years since these issues were first extensively discussed. And we all
know why. Because concepts like “agent,” “goal,” “cause,” “force,” are not
observables and not reducible to observables. So, if they play central roles in
our cognitive models then pattern matching algorithms won’t suffice. But as
this is all that E DL systems countenance, then DL is not enough. This is the
correct point, and LUTG makes it. But note the hesitancy with which it does so.
Is it too much to think that this likely reflects the issue mooted at the
outset about the implicit politics that one’s dance over the eggshells reveals.
It is not hard
to see from how LUTG makes its very reasonable case that it is a bit nervous
about DL (the current star of AI). LUTG is rhetorically covering its posterior
while (correctly) noting that unreconstructed DL will never make the grade. The
same wariness makes it impossible for LUTG to acknowledge its great debt to R
As LUTG states, its “goal is to build on their [neural networks, NH] successes
rather than dwell on their shortcomings” (2). But those that always look
forward and never back won’t move forward particularly well either (think Obama
and the financial crisis). Understanding that E is deeply inadequate is a
prerequisite for moving forward. It is no service to be mealy-mouthed about
this. One does not finesse one’s way around veery influential bad ideas.
Ok, so I have a
few reservations about how LUTG makes its basic points. That said, this is a
very useful paper. It is nice to see this coming out of the very influential
Bayesian group at MIT and in a prominent place like B&BS. I am hoping that
it indicates that the pendulum is swinging away from E and towards a more
reasonable R conception of minds. As I’ve noted the analogies with standard GG
practice is hard to miss. In addition LUTG rightly points to the shortcomings
with connectionist/deep learning/neural net approaches to mental life. This is
good. It may not be news to many of us, but if this signals a return to R
conceptions of mind, it is a very positive step in the right direction.