All papers involve at least two levels of discourse. The
first comprises the thesis: what is
being argued? What evidence is there for what is being argued? What’s the point
of what is being argued? What follows from what is being argued? The second is
less a matter of content than of style: how
does the paper say what it says? This latter dimension most often reveals the
(often more obscure) political dimensions of the discipline, and relates to
topics like the following: What parts of one’s butt is it necessary to cover so
as to avoid annoying the powers that be? Which powers is it necessary to
flatter in order to get a hearing or to avoid the ruinous “high standards of
scrutiny” that can always be deployed to block publication? Whose work must get
cited and whose can be safely ignored? What are the parameters of the
discipline’s Overton
Window? If you’ve ever been lucky enough to promote an unfashionable
position, you have developed a sensitivity to these “howish” concerns. And if
you have not, you should. They are very important. Nothing reveals the core
shibboleths (and concomitant power structures) of a discipline more usefully than
a forensic inquiry into the nature of the eggshells upon which a paper is treading. In what follows I would like to do a little
eggshell inspection in discussing a pretty good paper from a (for me)
unexpected source (a bunch of Bayesians). The authors are Lake, Ullman,
Tenenbaum and Gersham (LUTG) all from the BCS group at MIT. The paper (here) is about the
next steps forward for an AI that aspires to be cognitively interesting (like
the AI of old), and maybe even technologically cutting edge, (though LUTG makes
this second point very gingerly).
But that is not all the paper is about. LUTG is also a
useful addition to the discussions about Empiricism (E) vs Rationalism (R) in
the study of mind, though LUTG does not put matters in quite this way (IMO, for
quasi-political reasons, recall “howish” concerns!). To locate the LUTG on this
axis will require some work on my part. Here goes.
As I’ve noted before, E and R have both a metaphysical and
an epistemological side. Epistemologically, E takes minds to be initially
(relatively) unstructured, with mental contour forged over time as a by-product
of environmental input as sampled by the senses. Minds on this view come to represent their
environments more or less by copying their structures as manifest in the
sensory input. Successful minds are ones
that effectively (faithfully and efficiently) copy the sensory input. As
Gallistel & Matzel put it, for Es minds are “recapitulative,” their
function being to reliably take “an input that is part of the training input, or
similar to it, [to] evoke[s] the trained output, or an output similar to it”
(see here
for discussion and links). Another way of putting this point is that E minds
are pattern matchers, whose job is to track the patternings evident in the
input (see here).
Pattern matching is recapitulative and E relies on the idea that cognition
amounts to tracking the patternings in the sensory input to yield reliable
patterns.
Coupled with this E theory of mind comes a metaphysics.
Reality has a relatively flat causal structure. There is no rich hierarchy of
causal mechanisms whose complex interactions lie behind what we experience. Rather,
what you sense is all there is. Therefore, effectively tracking and cataloguing
this experience and setting up the right I/O associations suffices to get at a
decent representation of what exists. There is no hidden causal iceberg of
which this is the sensory/perceptual tip. The I/O relations are all there is. Not
surprisingly (this is what philosophers are paid for), the epistemology and the
metaphysics fit snugly together.
R combines its epistemology and its metaphysics differently.
Epistemologically, R regards sensory innocent minds to be highly structured.
The structure is there to allow minds to use sensory/perceptual information to
suss out the (largely) hidden causal structures that produce these
sensations/perceptions. As Gallistel & Matzel put it R minds are more like
information processing systems structured to enable minds to extract
information about the unperceived causal structures of the environment that
generate the observed patterns of sensation/perception. On the R view, minds
sample the available sensory input to construct causal models of the world
which generate the perceived sensory patterns. And in order to do this, minds
come richly stocked with the cognitive wherewithal necessary to build such
models. R epistemology takes it for granted that what one perceives vastly
underdetermines what there is and takes it to be impossible to generate models
of what there is from sensory perception without a big boost from given (i.e. unlearned) mental structures
that make possible the relevant induction to the underlying causal mechanisms/models.
R metaphysics complements this epistemological view by
assuming that the world is richly structured causally and that what we
sense/perceive is a complex interaction effect of these more basic complexly
interacting causal mechanisms. There are hidden powers etc. that lie behind
what we have sensory access to and that in no way “resembles” (or
“recapitulates”) the observables (see here
for discussion).
Why do I rehearse these points yet again? Because I want to
highlight a key feature of the E/R dynamic: what distinguishes E from R is not whether pattern matching is a legit
mental operation (both views agree that it is (or at least, can be)). What
distinguishes them is the Es think that this is all there is (and all there needs to be), while Rs reserve an
important place for another kind of mental process, one that builds models of
the underlying complex non visible causal systems that generate these patterns.
In other words, what distinguishes E from R is the belief that there is more to
mental life than tracking the statistical patterns of the inputs. R doesn’t
deny that this is operative. R denies that this suffices. There needs to me
more, a lot more.[1]
Given this backdrop, it is clear that GG is very much in the
R tradition. The basic observation is that human linguistic facility requires
knowledge a G (a set of recursive rules). Gs are not perceptually visible
(though their products often are). Further, cursory inspection of natural
languages indicates that human Gs are quite complex and cannot be acquired solely by induction (even sophisticate discovery
procedures that allowed for inductions over inductions over inductions…).
Acquiring Gs requires some mental pre-packaging (aka, UG) that enables an LAD
to construct a G on the basis of simple haphazard bits of G output (aka, PLD).
Once acquired, Gs can be used to execute many different kinds of linguistic
behaviors, including parsing and producing novel sentences and phrases. That’s
the GG conceit in a small nutshell: human linguistic facility implicates Gs
which implicates UGs, both G and UG being systems of knowledge that can be put
to multiple uses, including G acquisition, and the production and comprehension
of an unbounded number of very different novel linguistic expressions within a
given natural language. G, UG, competence vs performance: the hallmarks of a R
theory of linguistic minds.
LUTG makes effectively these points but without much
discussing the language case (it does at the end but mainly to say that it
won’t discuss it). LUTG’s central
critical claim is that human cognition relies on more than pattern matching.
There is, in addition, model building, which relies on two kinds of given mental contents to guide
learning. The first kind, what LUTG
calls “core” (and an R would call “given”) involves substantive specific content (e.g. naïve physics and psychology).
The second involves formal properties
(e.g. compositionality and certain kinds of formal analogy (what LUTG calls
“learning-to-learn”).[2]
LUTG notes that these two features pre-condition further learning and are
indispensible. LUTG further notes that the two mental powers (i.e. model
building and pattern recognition) can likely be fruitfully combined, though it
(rightly, IMO) insists that model building is the central cognitive operation.
In other words, LUTG recapitulates the observations above that R theories can
incorporate E mechanisms and that E mechanisms alone are insufficient to model human cognition and that without
the part left out, E pattern matching models are poor fits with what we have
tons of evidence to be central features of cognitive life. Here’s how the
abstract puts it:
We
review progress in cognitive science suggesting that truly human-like learning
and thinking machines will have to reach beyond current engineering trends in
both what they learn and how they learn it. Specifically, we argue that these
machines should (1) build causal models of the world that support explanation
and understanding, rather than merely solving pattern recognition problems; (2)
ground learning in intuitive theories of physics and psychology to support and
enrich the knowledge that is learned; and (3) harness compositionality and
learning-to-learn to rapidly acquire and generalize knowledge to new tasks and
situations. We suggest concrete challenges and promising routes toward these
goals that can combine the strengths of recent neural network advances with
more structured cognitive models.
The paper usefully goes over these points in some detail. It further notes that this conception of the
learning (I prefer the term “acquisition” myself) problem in cognition challenges
associationist assumptions which LUTG observes (again rightly IMO) is characteristic
of most work in connectionism, machine learning and deep learning. LUTG also
points to ways that “model free methods” (aka pattern matching algorithms) might
usefully supplement the model building cognitive basics to improve performance
and implementation of model based knowledge (see 1.2 and 4.3.2).[3]
Section 3 is the heart of the critique of contemporary AI,
which largely ignores the (R) model building ethos that LUTG champions. As it
observes, the main fact about humans when compared with current AI models is
that humans “learn a lot more from a lot less” (6). How much more? Well, human
learning is very flexible and rarely tied to the specifics of the learning situation.
LUTG provides a nice discussion of this in the domain of learning written
characters. Further, human learning generally requires very little “input.” Often,
a couple of minutes or exposures to the relevant stimulus is more than enough. In
fact, in contrast to current machine learning or DL systems humans do not need
to have the input curated and organized (or, as Geoff Hinton recently put it (here):
humans, as opposed to DLers, “clearly don’t need all the labeled data.”). And
why not? Because unlike what DLers and connectionists and associationists have
forever assumed, human minds (and hence brains) are not E devices but R ones.
Or as LUTG puts it (9):
People
never start completely from scratch, or even close to “from scratch,” and that
is the secret to their success. The challenge of building models of human
learning and thinking then becomes: How do we bring to bear rich prior
knowledge to learn new tasks and solve new problems so quickly? What form does
that prior knowledge take, and how is it constructed, from some combination of
inbuilt capacities and previous experience?
Section 4 goes over what LUTG takes to be some core powers of human minds. This includes naïve theories of how the physical world functions and how animate agents operate. In addition, with a nod to Fodor, it outlines the critical role of compositionality in allowing for human cognitive productivity.[4] It is a nice discussion and makes useful comments on causal models and their relation to generative ones (section 4.2.2).
Now let’s note a few eggshells. A central recurring feature of the LUTG discussion is the observation that it is unclear how or whether current DL approaches might integrate these necessary mechanisms. The paper does not come right out and say that DL models will have a hard time with these without radically changing its sub-symbolic associationist pattern matching monomania, but it strongly suggests this. Here’s a taste of this recurring theme (and please note the R overtones).
As
in the case of intuitive physics, the success that generic networks will have
in capturing intuitive psychological reasoning will depend in part on the
representations humans use. Although deep networks have not yet been applied to
scenarios involving theory of mind and intuitive psychology, they could
probably learn visual cues, heuristics, and summary statistics of a scene that
happens to involve agents. If that is all that underlies human psychological
reasoning, a data-driven deep learning approach can likely find success in this
domain.
However,
it seems to us that any full formal account of intuitive psychological reasoning
needs to include representations of agency, goals, efficiency, and reciprocal
relations. As with objects and forces, it is unclear whether a complete
representation of these concepts (agents, goals, etc.) could emerge from deep
neural networks trained in a purely predictive capacity. Similar to the
intuitive physics domain, it is possible that with a tremendous number of
training trajectories in a variety of scenarios, deep learning techniques could
approximate the reasoning found in infancy even without learning anything about
goal- directed or socially directed behavior more generally. But this is also
unlikely to resemble how humans learn, understand, and apply intuitive
psychology unless the concepts are genuine. In the same way that altering the
setting of a scene or the target of inference in a physics-related task may be
difficult to generalize without an understanding of objects, altering the
setting of an agent or their goals and beliefs is difficult to reason about
without understanding intuitive psychology.
It is not hard to see from how LUTG makes its very reasonable case that it is a bit nervous about DL (the current star of AI). LUTG is rhetorically covering its posterior while (correctly) noting that unreconstructed DL will never make the grade. The same wariness makes it impossible for LUTG to acknowledge its great debt to R predecessors.[5] As LUTG states, its “goal is to build on their [neural networks, NH] successes rather than dwell on their shortcomings” (2). But those that always look forward and never back won’t move forward particularly well either (think Obama and the financial crisis). Understanding that E is deeply inadequate is a prerequisite for moving forward. It is no service to be mealy-mouthed about this. One does not finesse one’s way around veery influential bad ideas.
Ok, so I have a few reservations about how LUTG makes its basic points. That said, this is a very useful paper. It is nice to see this coming out of the very influential Bayesian group at MIT and in a prominent place like B&BS. I am hoping that it indicates that the pendulum is swinging away from E and towards a more reasonable R conception of minds. As I’ve noted the analogies with standard GG practice is hard to miss. In addition LUTG rightly points to the shortcomings with connectionist/deep learning/neural net approaches to mental life. This is good. It may not be news to many of us, but if this signals a return to R conceptions of mind, it is a very positive step in the right direction.[6]
[1]
R often goes farther: that even tracking the relevant perceptual regularities requires lots of given mental
baggage. The world does not come pre-labeled. So zeroing in on the relevant
dimensions for inductive generalization itself requires lots of pre-packaged “knowledge.”
Geoff Hinton (the daddy of deep learning) seems to have come to a similar view
of late concerning how hard it is to get things off the ground without curated
data. See below for a reference.
[2]
GGers should find this reminiscent of Chomsky’s discussion in Aspects of substantive and formal
universals and how these interact to create Gs (models) of the ambient
degenerate and deficient linguistic input available to the child (PLD).
[3]
Or to put this in GG friendly terms; LUTG resurrects something very like a
competence/performance distinction, model building being analogous to the
former and model free methods applied to these models being analogous to the
latter. An idea I found interesting is that given
models of competence provides a useful domain for performance models wherein
sophisticated pattern matching algorithms do their work. Conceptually, this
idea seems very similar to what Charles Yang has been advocating as well (see here).
[4]
Actually, though compositionality is critical to productivity, it does not
suffice for generating an “infinite number of thoughts” or using an “infinite
number of sentences” “from a finite set
if primitives” (14). For this we need more than compositionality, we need
recursion as well.
[5]
It is amazing, IMO, how small a role Chomsky plays in LUTG’s discussion. So far
as I can tell, all of its major
points were developed in the modern period by him. I am pretty sure that some
of the authors know this but that highlighting this fact would hurt the paper
politically by offending the relevant leading DL lights.
[6]
BTW, LUTG has a nice discussion of the biological “plausibility” of neural net
models of the brain. The short point is that short of being pictorially
suggestive, there is no real reason for thinking that brains are like
connectionist nets. As LUTG puts it (20):
Many seemingly well-accepted
ideas concerning neural computation are in fact biologically dubious, or
uncertain at best…
For example, most neural
networks use some form of gradient based (e.g. back propogation) or Hebbian
learning. It has long been argued, however, that
backpropagation is not
biologically plausible. As Crick (1989) famously pointed out, backpropagation
seems to rquire that information be transmitted backward along the axon, which
does not fit with realistic models of neuronal function…
As LUTG observes, this should not in and of itself stop people from investigation neural nets as
possible models of brain computation, but it should put an end to the prejudice
that brains are nets because they look net-like. Sheesh!
A smallish question about your taxonomizing/analysis of E v R. You link to your discussion of Nancy Cartwright's "Dappled World", which you recommend for her discussion of E v R science. Question is this: Cartwright, so far as I know, considers herself an "empiricist", and I seem not to be alone in this view (this from Wikipedia, quoting Carl Hoefer):
ReplyDelete"Nancy Cartwright’s philosophy of science is, in her view, a form of empiricism but empiricism in the style of Neurath and Mill, rather than of Hume or Carnap. Her concerns are not with the problems of skepticism, induction, or demarcation; she is concerned with how actual science achieves the successes it does, and what sort of metaphysical and epistemological presuppositions are needed to understand that success.
Cartwright, like many working scientists themselves, takes a rather pragmatic/realist stance toward observations and interventions made by scientists and engineers and particularly toward their connections to causality: Scientists see impurities causing signal loss in a cable, and they stimulate an inverted population, causing it to lase. Given these starting points, there can be no question of a skeptical attitude toward causation, in either singular or generic form. The fundamental role (or better, roles) played by causation in scientific practice is undeniable; what Cartwright does, then, is reconfigure empiricism from the ground up based on this insight. In the reconfiguration process, many mainstays of the received view of science take a beating; especially [...] the fundamentality of laws of nature."
So, if she's an empiricist (and, I suppose Ian Hacking too?), then what by your lights is empiricist about this "reconfigured empiricism"?
--RC
In case you hadn't seen these yet, some interesting recent thoughts from Gary Marcus on AI, deep learning and innateness:
ReplyDelete- Innateness, AlphaZero, and Artificial Intelligence (https://arxiv.org/abs/1801.05667)
- In defense of skepticism about deep learning (https://medium.com/@GaryMarcus/in-defense-of-skepticism-about-deep-learning-6e8bfd5ae0f1)
Thank you so much for allowing us to opinions regarding the aforementioned information.
ReplyDeletebest astrologer in india