I want to pour some oil on the flames. Which flames? The ones that I had hoped that my two recent posts on Yang’s critique of Bayes (here and here) would engender. There has been some mild pushback (from Ewan, Tal, Alex and Avery). But the comments section has been pretty quiet. I want to restate what I take to be the heart of the critique because, if correct, it is very important. If correct, it suggests that there is nothing worth salvaging from the Bayes “revolution” for there is no there there. Let me repeat this. If Yang is right, then Bayes is a dead end with no redeeming scientific (as opposed to orthographic) value. This does not mean that specific Bayes proposals are worthless. They may not be. What it means is that Bayes per se not only adds nothing to the discussion, but that taking its tenets to heart will mislead inquiry. How so? It endorses the wrong idealization of how stats are relevant to cognition. And misidealizations are as big a mistake as one can make, scientifically speaking. Here’s the bare bones of the argument.
1. Everyone agrees that data matters for hypothesis choice
2. Everyone agrees that stats matter in making this choice
3. Bayes makes 4 specific claims about how stats matter for hypothesis choice:
a. The hypothesis space is cast very wide. In the limit all possible hypothesis are in the space of options
b. All potentially relevant data is considered, i.e. any data that could decide between competing hypotheses is used to adjudicate among the hypotheses in the space
c. All hypotheses are evaluated wrt to all of the data. So, as data is considered every hypothesis’ chance of being true is evaluated wrt to every data point considered
d. When all data has been considered the rule is to choose that hypothesis in the space with the highest score
Two things are worth noting about the above.
First, that (3) provides serious content to a Bayesian theory, unlike (1) and (2). The latter are trivial in that nobody has ever thought otherwise. Nobody. Ever. So if this is the point of Bayes, then this ain’t no revolution!
Second, (3) has serious normative motivation. It is a good analysis of what kind of inference an inference to the best explanation might be. Normatively, an explanation is best if it is better than all other possible explanations and accounts for all of the possibly relevant data. Ideally, this implies evaluating all alternatives wrt to all the data and choosing the best. This gives us (3a-d). Cognitive Bayes (CB) is the hypothesis that normative Bayes (NB) is a reasonable idealization for people actually do when the learn/acquire something. And we should appreciate that this could be the case. Let’s consider how for a moment.
The idealization would make sense for the following kind of case (let’s restrict ourselves to language). Say that the hypothesis space of a potential Gs was quite big. For concreteness, say that we were always considering about 50 different candidate Gs. This is not all possible Gs, but 50 is a pretty big number computationally speaking. So say 50 or more alternatives is the norm. Then Bayes (3a) would function a lot like the standard linguistic assumption that the set of well-formed syntactic objects in a given language is effectively infinite. Let me unpack the analogy.
This infinity assumption need not be accurate to be a good idealization. Say it turns out that the number of well-formed sentences a native speaker of English is competent wrt is “only” 101000. Wouldn’t this invalidate the infinity assumption? No, it would show that it is false, but not that it is a bad idealization. Why? Because the idealization is a good one because it focuses attention onto the right problem. Which one? The Projection Problem: how do native speakers go from a part of the language all of it? How given exposure to only a subset of the language does a LAD get mastery over a whole language? The answer: you acquire recursive rules, a G, that’s how. And this is true whether or not the “language” is infinite or just very big. The problem, going from a subset to its containing superset, will transit via a specification of rules whether or not the set is actually infinite. All the infinite idealization does is concentrate the mind on the projection problem by making the alternative tempting idea (learning by listing) silly. This is what Chomsky means when he says in Current Issues” “once we have mastered a language, the class of sentences with which we can operate fluently or hesitation is so vast that for all practical purposes (and, obviously, for all theoretical purposes), we may regard it as infinite” (7, my emphasis NH). See: the idealization is reasonable because it does not materially change the problem to be solved (i.e. how to go from part of the language you are exposed to, to the whole language that you have mastery over).
A similar claim could be true of Bayes. Yes, the domain of Gs a LAD considers is in fact big. Maybe not thousands or millions of alternatives, but big enough to be worth idealizing to a big hypothesis space in the same way that it is worth assuming that the class sentences a native speaker is competent wrt is infinite. Is this so? Probably not. Why not? Because even moderately large hypothesis spaces (say with over 5 competing alternatives) turns out to be very hard to manage. So the standard practice is to use really truncated spaces, really small SWSs. But when you so radically truncate the space, there is no reason to think that the inductive problem remains the same. Just think if the number of sentences we actually knew was about 5 (roughly what happens in animal communication systems). Would the necessity of rules really be obvious? Might we not reject the idealization Chomky argues for (and note that I emphasize ‘argue’)? So, rejecting (3a) means rejecting part of the Bayes idealization.
What of the other parts, (3b-d)? Well, as I noted in my posts, Charles argues that each and every one is wrong in such a way as to be not worth making. It gets the shape of the problem wrong. He may be right. He may be wrong (not really, IMO), but he makes an argument. And if he is right, then what’s at stake is the utility of RB as a useful idealization for cognitive purposes. And, if you accept this, we are left with (1-2), which is methodological pablum.
I noted one other thing the normative idealization above was once considered as a cognitive option within linguistics. It was knows as the child-as-little-linguist theory. And it had exactly the same problems that Bayes has. It suggests that what kids do is what linguists do. But it is not the same thing at all. And realizing this helped focus on what the problem the LAD faces is. Bayes is not unique in misidealizing a problem.
Three more points and I end today’s diatribe.
First, one can pick and choose among the four features above. In other words, there is no law saying that one must choose the various assumptions as a package. One can adopt a SWS assumption (rejecting 3a) while adopting a panoramic view of the updating function (assuming that every hypothesis in the space is updated wrt every new data point) and rejecting choice optimization (3d). In other words, mixing and matching is fine and worth exploring. But what gives Bayes content, and makes it more than one of many bookkeeping notations, is the idealization implicit in CB as NB.
Second, what makes Bayes scientifically interesting is the idealization implicit in it. I mention this because as Tal notes in a comment (here), it seems that current Bayesians are promoting their views as just “set of modeling practices.” The ‘just’ is mine, but this seems to me what Tal is indicating about the paper he links to. But the “just” matters. Modeling practices are scientifically interesting to the degree that they embody ideas about the problem being modeled. The good ones are ones that embody a good idealization. So, either these practices are based on substantive assumptions or they are “mere” practices. If the latter, then the Bayes modeling is in itself of zero scientific interest. Does anyone really want to defend Bayes in this way? I confess that if this is the intent then there is nothing much to argue about given how modest (how really modest, how really really modest) the Bayes claim is.
Last, there is a tendency to insulate one’s work from criticism. One way of doing this is to refuse to defend the idealizations implicit in one’s technology. But technology is never innocent. It always embodies assumptions about the way the world is so that the technology used is a good technology in that it allows one to see/do things that other technologies do not permit or, at least, does not distort how the basic problems of interest are to be investigated. But researchers hate having to defend their technology, more often favoring the view that how it runs is its own defense. I have been arguing that this is incorrect. It does matter. So, if it turns out that Bayesians now are urging us to use the technology but are backing away from the idealizations implicit in it, that is good to know. This was not how it was initially sold. It was sold as a good way of developing level 1 cognitive theories. But if Bayes has no content then this is false. It cannot be the source of level 1 theories for on the revised version of Bayes as a “set of modeling practices” Bayes per se has no content so Bayes is not and cannot be a level 1 theory of anything. It is vacuous. Good to know. I would be happy if this is now widely conceded by our most eminent Bayesians. If this is now the current view of things, then there is nothing to argue about. If only Bayes had told us this sooner.