Here are some readables.
First, for those interested in some background reading on
the DMZTP paper (Nai Ding, Lucia Melloni, Hang Zhang, Xiang Tian and David
Poeppel) discussed here
can look at Embick & Poeppel (E&P) (part deux) here. This
2015 paper is a sequel to a much earlier 2005 paper outlining the problems of
relating work on the brain bases of linguistic behavior and linguistic research
(discussed here).
The paper situates the discussion in DMZTP against a larger research program in
the cog-neuro of language.
E&P identifies three neuroling projects of varying
degrees of depth and difficulty.
1. Correlational
neuroling
2. Integrated
neuroling
3. Explanatory
neuroling
DMZTP fits neatly into the first project and makes tentative
conjectures relevant to the second and third. What are these three projects.
Here’s how E&P describes each (CR=computational/representational and NB=Neurobiological)
(360):
Correlational
neurolinguistics: CR theories of language
are used to
investigate the NB foundations of language.
Knowledge of
how the brain computes is gained by
capitalising on
CR knowledge of language.
Integrated
neurolinguistics: CR neurolinguistics plus the
NB perspective
provides crucial evidence that adjudicates
among different
CR theories. That is, brain data enrich our
understanding
of language at the CR level.
Explanatory
neurolinguistics: (Correlational + Integrated
neurolinguistics)
plus something about NB structure/
function
explains why the CR theory of language involves
particular
computations and representations (and not
others).
The whole paper is a great read (nothing surprising here)
and does a good job at identifying the kinds of questions worth answering. It’s
greatest virtue, IMO, is that it treats results both in linguistics and in
cog-neuro respectfully and asks how their respective insights can be
integrated. This is not a program of mindless reduction, something that is
unfortunately characteristic of too much current NB work on language.
Second, here’s
a piece on some big methodological goings-on in physics. The question,
relevant to our little part of the scientific universe, is what makes a theory
scientific. It seems that many don’t like string theory or multiverses and
think them and the thoeires that make use of them unscientific. Philosophers
are called in to help clear the muddle (something that physicists hate even the
idea of, but times are desperate it seems) and philosophers note that the
muddle partly arises from mistaking the hallmarks of what makes something
science. Popper and falsificationism is now understood by everyone to be a
simplification at best and a sever distortion with very bad consequences at
worst.
Chomsky once observed that big methodological questions of
what makes something science is usefully focused on those areas of our greatest
scientific success. I think that this is right. However, I also think that
listening in on these discussions is instructive. Here’s a place where eavesdropping might be
fun and instructive.
Third, here’s
a Tech Review article on some recent work by Tenenbaum and colleagues on
handwriting recognition (funny, just as cursive is on the cusp of disappearing,
we show how to get machines to recognize it. The curious twists and turns of
history). The research described is quite self-consciously opposed to deep
learning approaches to similar problems. Where does the difference lie?
Effectively the program uses a generative procedure using “strokes of an
imaginary pen” to match the incoming letters. Bayes is then used to refine
these generated objects. In other words, given a set of generative procedures
for constructing letters, we can generate better and better matches to input
through an iterative process in a Bayes like framework. And there is real
payoff. In putting this kind of generative procedure into the Bayes system, you
can learn to recognize novel “letters” from a very small number of examples.
Sound familiar? It should. Think Aspects! So, it looks like the tech world is coming to appreciate
the power of “innate” knowledge, i.e. how given
information can be used and extended.
Good. This is just the kind of stories GGers should delight in.
How’s this different from the deep learning/big data (DL/BD)
stuff? Well, by packing in prior info you can “learn” from a small number of
examples. Thus, this simplifies the inductive problem. Hinton, one of the
mucky-mucks in the DL/BD world notes that this stuff is “compatible with deep
learning.” Yup. Nonetheless, it fits ill with the general ethos behind the
enterprise. Why? Because it exploits an entirely different intuition concerning
how to approach “learning.” From the few discussions I have seen, DL/BD starts
from the idea that get enough data and learning will take care of itself. Why? Because
learning consists in extracting the generalizations in the data. If the
relevant generalizations are there in the data to be gleaned (even if lots of
data is needed to glean it as there is often A LOT of noise obscuring the
signal) then given enough data (hence the ‘big’ in Big Data) learning will
occur. The method described here questions the utility of this premise. As Tenenbaum notes:
“The key thing about probabilistic
programming—and rather different from the way most of the deep-learning stuff
is working—is that it starts with a program that describes the causal processes
in the world,” says Tenenbaum. “What we’re trying to learn is not a signature
of features, or a pattern of features. We’re trying to learn a program that
generates those characters.”
Does this make
the two approaches irreconcilable? Yes and no. No, in that one can always
combine two points of view that are not logically contradictory, and these
views are not. But yes in that what one takes to be central to solving a
learning problem (e.g. a pre-packaged program describing the causal process) is
absent form the other. It’s once again the question of where one thinks what
the hard problem is: describing the hypothesis space or the movements around
it. DL/BD downplays the former and bets on the latter. In this case, Tenenbaum
does a Chomsky and bets on the former.
As many of you know,
I have had my reservations concerning many of Tenenbaum’s projects, but I am think
he is making the right moves in this case. It always pays to recognize common
ground. DL/BD is the current home of unreconstructed empiricism. In this
particular case, Tenenbaum is making points that challenge this worldview. Good
for him, and me.
I don't think Lake, Salakhutdinov and Tenenbaum would argue that the tendency to decompose handwritten characters into pen strokes is ultimately innate - the next step in this research program would be to figure out how machines (and humans) acquire this ability using more fundamental principles. A handwritten character recognition system is nice, but the goal is presumably a general purpose artificial intelligence system that can also recognize animals, faces, phonemes, etc.
ReplyDeleteAs for the sociological point (unreconstructed empiricists etc), it's maybe worth noting that Ruslan Salakhutdinov works almost exclusively on deep learning, and that Brendan Lake has done both symbolic and "deep learning" work (see e.g. this paper). I'm not saying that people in AI never have their ideological commitments and pet frameworks, but many people in that world are happy to experiment with multiple frameworks and go with whatever works best.
ReplyDelete@Tal: Maybe they would, maybe not. You are likely right. However, this is not relevant to the point I was trying to make (badly it appears). What the Tenenbaum et al paper does that goes against the DL/BD conception is seed the hypothesis space with structure. What the paper notes is that doing this, if one chooses the structures right, is make the "learning" problem less data demanding (i.e. small data gets the system to recognize letters). This form of argument is the one that Rationalists have embraced with delight, and it is one that DL/BD resists. Now if you are Rationalistically inclined you will find favor with Tenenbaum et al's proposal for it mirrors what we do all the time. What their work as reported notes is what we try to emphasize all the time; seeding the mind/brain with structure (i.e. structuring the hypothesis space) has some very nice virtues when it comes to learning. Not a big point, at least not for us, but now a point of agreement,
ReplyDeleteI am not sure what to make of your sociological point. Ideology matters when we are dealing with psycho-bio problems, not engineering. When it comes to the latter let a 1000 flowers bloom. When it comes to the former, IMO, we know that we need highly structured hypothesis spaces and so we should be trying to figure out what these look like, not trying to finesse their investigation by assuming that learners have access to billions of data points. This said, thx for the little sociology of AI lesson. How nice for them.
As in other threads, I was just trying to take issue with the generative linguists vs. empiricists (=everyone else) narrative, which doesn't really fit with my own impression of neighboring fields. But you're right that I may have belabored the point.
DeleteHi Norbert, on the recent work by Tenenbaum, the conceptual part of this that you're commending him for is not something new. It's the same shtick he's had all along: roughly, rapid learning on the basis of very limited evidence only possible if you have both highly structured generative models and powerful statistical inference mechanisms. The first two chapters of his 1999 Ph.D. thesis made this case beautifully, and there is a clear (and acknowledged) debt to Chomsky in this. The perspective is highly compatible with generative linguistics. But like Tal I don't really see what it has to do with nativism... no one is proposing an innate module for handwriting recognition. Instead, the idea is that we have very powerful mechanisms for hypothesis formation, which are basically grammars, and that this explains how people can learn new concepts with much less evidence than typical deep learning methods require. This proposal is bound to nativism only in the very limited sense that everyone is (including deep learning enthusiasts): as machine learning researchers frequently note, assumption-free learning is not possible. But that only means that people have _some_ kind of innate knowledge – it doesn't tell you how much, or what kind. (Or did you have something else in mind with the comment about _Aspects_?)
ReplyDeleteYes, I agree that this is nothing new for T and that it is based very much on earlier ideas of Chomsky's. As Morris Halle used to say, I was not delivering the news here (or not intending to) but to agree with the basic idea behind his approach and contrasting it with the Deep Learning/ Big Data view of the world. I take the latter as eschewing the kinds of given structures that T exploits to make this work. BTW, the idea that the paper actually deploys wrt handwriting is very similar to much earlier work by Murray Eden that Bob Berwick brought to my attention (in Transactions of Information Theory, Feb 1962).
DeleteWhat does this have to do with nativism? This particular account of T's, not much so far as I can tell. It is an argument for looking for generative procedures rather than patterns in the the data and that these procedures might be quite opaque wrt to the patterns they generate. This insight can be applied in many ways in many contexts. In the contexts of acquisition, it leads very quickly to nativist (rich nativist, as everyone is a nativist in some way as I (following Chomsky) and you in your comment have noted) conclusions. So the style of argument, one that zeros in on generative procedures as key is what I was commending T for and suggesting, as he did, that DL/BD types have a very different take on things. But, I would agree, though I did not intend to say this in the post, that the route (from generative procedures to rich nativism) is a short one. How so?
Once on zeros in on Generative Procedures, then the relevant acquisition question becomes not how to match a pattern but how to get to rules that generate it. And, in my experience, doing this requires a rich and structured space to get from input to the rules. This is where the rich nativism comes in; getting from inputs to RULES that generate them.
So, I agree, you need assumptions to get learning. But if you think that learning involves getting to rules rather than sussing out patterns you will look for different things and be more comfortable with richer nativism. Maybe a way of putting this is that we need to stop thinking of acquisition/learning as essentially pattern matching. That's a big mistake (and a popular one if exemplar theories are any indication). That was my point. I hope it is clearer now. Thx for the comment.
This all seems reasonable, Norbert. I think structured Bayesians are committed to a kind of weak nativism. Amy Perfors makes this point very clearly in her 2012 Phil. Compass paper. They could probably go either way on whether there are innate hypothesis-generating mechanisms that are specific to language, or whether this ability is a reflection of a broader ability to construct generative models and evaluate them against data in many domains. (I found Kemp & Tenenbaum's 2009 Psych Review paper "Structured statistical models of inductive reasoning" quite enlightening in this regard.)
ReplyDeleteProbably there is a tendency among the Bayesians to default to domain-generality, but this may just be sociological, or maybe motivated by Ockham's razor considerations. In any case, I think there has been a tendency on both sides of the discussion to focus on these small differences in perspective, when the commonalities are really much more fundamental.
Agree with last paragraph. At the end of the day, the issue of domain generality is an empirical one. There is no principled reason to prefer one or the other. It really depends what the gacts are. This is what we should be arguing about.
DeleteForget the gacts. The facts ma'm just the facts. Sorry.
Delete