Wednesday, December 16, 2015

Three readables

Here are some readables.

First, for those interested in some background reading on the DMZTP paper (Nai Ding, Lucia Melloni, Hang Zhang, Xiang Tian and David Poeppel) discussed here can look at Embick & Poeppel (E&P) (part deux) here. This 2015 paper is a sequel to a much earlier 2005 paper outlining the problems of relating work on the brain bases of linguistic behavior and linguistic research (discussed here). The paper situates the discussion in DMZTP against a larger research program in the cog-neuro of language.

E&P identifies three neuroling projects of varying degrees of depth and difficulty.

1.     Correlational neuroling
2.     Integrated neuroling
3.     Explanatory neuroling

DMZTP fits neatly into the first project and makes tentative conjectures relevant to the second and third. What are these three projects. Here’s how E&P describes each (CR=computational/representational and NB=Neurobiological) (360):

Correlational neurolinguistics: CR theories of language
are used to investigate the NB foundations of language.
Knowledge of how the brain computes is gained by
capitalising on CR knowledge of language.

Integrated neurolinguistics: CR neurolinguistics plus the
NB perspective provides crucial evidence that adjudicates
among different CR theories. That is, brain data enrich our
understanding of language at the CR level.

Explanatory neurolinguistics: (Correlational + Integrated
neurolinguistics) plus something about NB structure/
function explains why the CR theory of language involves
particular computations and representations (and not

The whole paper is a great read (nothing surprising here) and does a good job at identifying the kinds of questions worth answering. It’s greatest virtue, IMO, is that it treats results both in linguistics and in cog-neuro respectfully and asks how their respective insights can be integrated. This is not a program of mindless reduction, something that is unfortunately characteristic of too much current NB work on language.

Second, here’s a piece on some big methodological goings-on in physics. The question, relevant to our little part of the scientific universe, is what makes a theory scientific. It seems that many don’t like string theory or multiverses and think them and the thoeires that make use of them unscientific. Philosophers are called in to help clear the muddle (something that physicists hate even the idea of, but times are desperate it seems) and philosophers note that the muddle partly arises from mistaking the hallmarks of what makes something science. Popper and falsificationism is now understood by everyone to be a simplification at best and a sever distortion with very bad consequences at worst.

Chomsky once observed that big methodological questions of what makes something science is usefully focused on those areas of our greatest scientific success. I think that this is right. However, I also think that listening in on these discussions is instructive.  Here’s a place where eavesdropping might be fun and instructive.

Third, here’s a Tech Review article on some recent work by Tenenbaum and colleagues on handwriting recognition (funny, just as cursive is on the cusp of disappearing, we show how to get machines to recognize it. The curious twists and turns of history). The research described is quite self-consciously opposed to deep learning approaches to similar problems. Where does the difference lie? Effectively the program uses a generative procedure using “strokes of an imaginary pen” to match the incoming letters. Bayes is then used to refine these generated objects. In other words, given a set of generative procedures for constructing letters, we can generate better and better matches to input through an iterative process in a Bayes like framework. And there is real payoff. In putting this kind of generative procedure into the Bayes system, you can learn to recognize novel “letters” from a very small number of examples.

Sound familiar? It should. Think Aspects! So, it looks like the tech world is coming to appreciate the power of “innate” knowledge, i.e. how given information can be used and extended.  Good. This is just the kind of stories GGers should delight in.

How’s this different from the deep learning/big data (DL/BD) stuff? Well, by packing in prior info you can “learn” from a small number of examples. Thus, this simplifies the inductive problem. Hinton, one of the mucky-mucks in the DL/BD world notes that this stuff is “compatible with deep learning.” Yup. Nonetheless, it fits ill with the general ethos behind the enterprise. Why? Because it exploits an entirely different intuition concerning how to approach “learning.” From the few discussions I have seen, DL/BD starts from the idea that get enough data and learning will take care of itself. Why? Because learning consists in extracting the generalizations in the data. If the relevant generalizations are there in the data to be gleaned (even if lots of data is needed to glean it as there is often A LOT of noise obscuring the signal) then given enough data (hence the ‘big’ in Big Data) learning will occur. The method described here questions the utility of this premise. As Tenenbaum notes:

“The key thing about probabilistic programming—and rather different from the way most of the deep-learning stuff is working—is that it starts with a program that describes the causal processes in the world,” says Tenenbaum. “What we’re trying to learn is not a signature of features, or a pattern of features. We’re trying to learn a program that generates those characters.”
Does this make the two approaches irreconcilable? Yes and no. No, in that one can always combine two points of view that are not logically contradictory, and these views are not. But yes in that what one takes to be central to solving a learning problem (e.g. a pre-packaged program describing the causal process) is absent form the other. It’s once again the question of where one thinks what the hard problem is: describing the hypothesis space or the movements around it. DL/BD downplays the former and bets on the latter. In this case, Tenenbaum does a Chomsky and bets on the former.

As many of you know, I have had my reservations concerning many of Tenenbaum’s projects, but I am think he is making the right moves in this case. It always pays to recognize common ground. DL/BD is the current home of unreconstructed empiricism. In this particular case, Tenenbaum is making points that challenge this worldview. Good for him, and me.


  1. I don't think Lake, Salakhutdinov and Tenenbaum would argue that the tendency to decompose handwritten characters into pen strokes is ultimately innate - the next step in this research program would be to figure out how machines (and humans) acquire this ability using more fundamental principles. A handwritten character recognition system is nice, but the goal is presumably a general purpose artificial intelligence system that can also recognize animals, faces, phonemes, etc.

  2. As for the sociological point (unreconstructed empiricists etc), it's maybe worth noting that Ruslan Salakhutdinov works almost exclusively on deep learning, and that Brendan Lake has done both symbolic and "deep learning" work (see e.g. this paper). I'm not saying that people in AI never have their ideological commitments and pet frameworks, but many people in that world are happy to experiment with multiple frameworks and go with whatever works best.

  3. @Tal: Maybe they would, maybe not. You are likely right. However, this is not relevant to the point I was trying to make (badly it appears). What the Tenenbaum et al paper does that goes against the DL/BD conception is seed the hypothesis space with structure. What the paper notes is that doing this, if one chooses the structures right, is make the "learning" problem less data demanding (i.e. small data gets the system to recognize letters). This form of argument is the one that Rationalists have embraced with delight, and it is one that DL/BD resists. Now if you are Rationalistically inclined you will find favor with Tenenbaum et al's proposal for it mirrors what we do all the time. What their work as reported notes is what we try to emphasize all the time; seeding the mind/brain with structure (i.e. structuring the hypothesis space) has some very nice virtues when it comes to learning. Not a big point, at least not for us, but now a point of agreement,

    I am not sure what to make of your sociological point. Ideology matters when we are dealing with psycho-bio problems, not engineering. When it comes to the latter let a 1000 flowers bloom. When it comes to the former, IMO, we know that we need highly structured hypothesis spaces and so we should be trying to figure out what these look like, not trying to finesse their investigation by assuming that learners have access to billions of data points. This said, thx for the little sociology of AI lesson. How nice for them.

    1. As in other threads, I was just trying to take issue with the generative linguists vs. empiricists (=everyone else) narrative, which doesn't really fit with my own impression of neighboring fields. But you're right that I may have belabored the point.

  4. Hi Norbert, on the recent work by Tenenbaum, the conceptual part of this that you're commending him for is not something new. It's the same shtick he's had all along: roughly, rapid learning on the basis of very limited evidence only possible if you have both highly structured generative models and powerful statistical inference mechanisms. The first two chapters of his 1999 Ph.D. thesis made this case beautifully, and there is a clear (and acknowledged) debt to Chomsky in this. The perspective is highly compatible with generative linguistics. But like Tal I don't really see what it has to do with nativism... no one is proposing an innate module for handwriting recognition. Instead, the idea is that we have very powerful mechanisms for hypothesis formation, which are basically grammars, and that this explains how people can learn new concepts with much less evidence than typical deep learning methods require. This proposal is bound to nativism only in the very limited sense that everyone is (including deep learning enthusiasts): as machine learning researchers frequently note, assumption-free learning is not possible. But that only means that people have _some_ kind of innate knowledge – it doesn't tell you how much, or what kind. (Or did you have something else in mind with the comment about _Aspects_?)

    1. Yes, I agree that this is nothing new for T and that it is based very much on earlier ideas of Chomsky's. As Morris Halle used to say, I was not delivering the news here (or not intending to) but to agree with the basic idea behind his approach and contrasting it with the Deep Learning/ Big Data view of the world. I take the latter as eschewing the kinds of given structures that T exploits to make this work. BTW, the idea that the paper actually deploys wrt handwriting is very similar to much earlier work by Murray Eden that Bob Berwick brought to my attention (in Transactions of Information Theory, Feb 1962).

      What does this have to do with nativism? This particular account of T's, not much so far as I can tell. It is an argument for looking for generative procedures rather than patterns in the the data and that these procedures might be quite opaque wrt to the patterns they generate. This insight can be applied in many ways in many contexts. In the contexts of acquisition, it leads very quickly to nativist (rich nativist, as everyone is a nativist in some way as I (following Chomsky) and you in your comment have noted) conclusions. So the style of argument, one that zeros in on generative procedures as key is what I was commending T for and suggesting, as he did, that DL/BD types have a very different take on things. But, I would agree, though I did not intend to say this in the post, that the route (from generative procedures to rich nativism) is a short one. How so?

      Once on zeros in on Generative Procedures, then the relevant acquisition question becomes not how to match a pattern but how to get to rules that generate it. And, in my experience, doing this requires a rich and structured space to get from input to the rules. This is where the rich nativism comes in; getting from inputs to RULES that generate them.

      So, I agree, you need assumptions to get learning. But if you think that learning involves getting to rules rather than sussing out patterns you will look for different things and be more comfortable with richer nativism. Maybe a way of putting this is that we need to stop thinking of acquisition/learning as essentially pattern matching. That's a big mistake (and a popular one if exemplar theories are any indication). That was my point. I hope it is clearer now. Thx for the comment.

  5. This all seems reasonable, Norbert. I think structured Bayesians are committed to a kind of weak nativism. Amy Perfors makes this point very clearly in her 2012 Phil. Compass paper. They could probably go either way on whether there are innate hypothesis-generating mechanisms that are specific to language, or whether this ability is a reflection of a broader ability to construct generative models and evaluate them against data in many domains. (I found Kemp & Tenenbaum's 2009 Psych Review paper "Structured statistical models of inductive reasoning" quite enlightening in this regard.)

    Probably there is a tendency among the Bayesians to default to domain-generality, but this may just be sociological, or maybe motivated by Ockham's razor considerations. In any case, I think there has been a tendency on both sides of the discussion to focus on these small differences in perspective, when the commonalities are really much more fundamental.

    1. Agree with last paragraph. At the end of the day, the issue of domain generality is an empirical one. There is no principled reason to prefer one or the other. It really depends what the gacts are. This is what we should be arguing about.

    2. Forget the gacts. The facts ma'm just the facts. Sorry.