Faculty of Language: Yay!! I'm not the only one with a Reverand Bayes problem

Friday, August 30, 2013

Yay!! I'm not the only one with a Reverand Bayes problem

Bob Berwick is planning to write something erudite about Bayes in a forthcoming post. I cannot do this, for obvious reasons. But I can throw oil on the fires. The following is my reaction to a paper that Ewan suggested that I read the critiques of. I doubt that his advice had the intended consequence. But, as Yogi Berra observed, it’s always hard to accurately predict how things will turn out, especially in the future. So here goes.

In an effort to calm my disquiet about Bayes and his contemporary acolytes, Ewan was kind enough to suggest that I read the comments to this B&BS (is it just me, or does the acronym suggest something about the contents?) target article. Of course, being a supremely complaisant personality, I immediately did as bid, trolling through the commentaries and even reading the main piece and the response of the authors to the critics’ remarks so that I could appreciate the subtleties of the various parries and thrusts. Before moving forward, I would love to thank Ewan for this tip. The paper is a lark, the comments are terrific and, all in all, it’s just the kind of heated debate that warms my cold cynical heart. Let me provide some of my personal highlights. But, please, read this for yourself. It’s a page-turner, and though I cannot endorse their views due to my incompetence, I would be lying if I denied being comforted by their misgivings. It’s always nice to know you are not alone in the intellectual universe (see here).

The main point that the authors Jones and Love (J&L) make is that, as practiced, there’s not much to current Bayesian analyses, though they are hopeful that this is repairable (in contrast to many of the commentators who believe them to be overly sanguine e.g. see Glymour’s comments. BTW, no fool he!). Indeed, so far as I can tell, they suggest that this is not a surprise for Bayes as such is little more than a pretty simple weighted voting scheme to determine which among a set of given alternatives best fits the data (see L&J’s 3). There is some brouhaha over this characterization by the law firm of Chater, Goodman, Griffiths, Kemp, Oaksford and Tenebaum (they charge well over $1000 per probable hour I hear), but J&L stick to their guns and characterization (see p. 219) claiming that the sophisticated procedures that Chater et. al. advert to “introduces little added complexity” once the mathematical fog is cleared (219).

So, their view is that Bayesianism per se is pretty weak stuff. Let me explain what I take them to mean. J&L note (section 3 again) that there are two parts to any Bayesian model, the voting/counting scheme and the structure of the hypothesis space. The latter provides the alternatives voted on and a weighting of the votes (some alternatives are given head starts). Now, Bayes’ Rule (BR) is a specification of how votes should be allocated as data comes in. The hypothesis space is where the real heavy lifting is done. In effect, in J&L’s view (and they are by no means the most extreme voices here as the comment sections show) BR, and modern souped up versions thereof, add very little of explanatory significance to the mix. If so, J&L observe, then most of the psychological interest of Bayesian models resides in the structure of the assumed hypotheses spaces, i.e. whatever interesting results emerge from a Bayesian model, stem not from the counting scheme but the structure of the hypothesis space. That’s where the empirical meat lies:

All a Bayesian model does is determine which of the patterns or classes of patterns it is endowed with is most consistent with the data it is given. Thus, there is no explanation of where those patterns (i.e. hypotheses) come from. (220)

This is what I meant by saying that, in J&L’s view, Bayes, in and of itself, amounts to little more than the view that “people use past experience to decide what to do or expect in the future” (217). In and of itself Bayes does not specify or bound the class of possible or plausible hypothesis spaces and so in and of itself it fails to make much of a contribution to our understanding of mental life. Rather, in and of itself, Bayesian precepts are anodyne: who doesn’t think that experience matters to our mental life?

This view is, needless to say, heartily contested. Or so it appears on the surface. So Chater et. al. assert that:

By adopting appropriate representations of a problem in terms of random variables and probabilistic dependencies between them, probability theory and its decision theoretic extensions offer a unifying framework for understanding all aspects of cognition that can be properly understood as inference under uncertainty: perception, learning, reasoning, language comprehension and production, social cognition, action planning, and motor control, as well as innumerable real world tasks that require the integration of these capacities. (194)

Wow! Seems really opposed to J&L, right? Well maybe not. Note the first seven words of the quote that I have conveniently highlighted. Take the right representation of the problem add a dash of BR and out pops all of human psychology. Hmm. Is this antithetical to J&L’s claims? Not until we factor out how much of the explanation in these domains comes from the “appropriate representations” and how much from the probability add on. Nobody (or at least nobody I know) has any problem adding probabilities to mentalist theories, at least not in principle (one always wants to see the payoff). However, if we were to ask where the hard work comes in, J&L argue that it’s in choosing the right hypothesis space and not in probabilizing up a given such space. Or this is the way it looks to J&L and many many many, if not most, of the other commentators.

Let me mote one more thing before ending. J&L also pick up on something that bothered me in my earlier post. They observe more than a passing resemblance between modern Bayesians and earlier Behaviorism (see their section 4). They assert that in many cases, the hypotheses that populate Bayesian spaces “are not psychological constructs …but instead reflect characteristics of the environment. The set of hypotheses, together with their prior probabilities, constitute a description of the environment by specifying the likelihood of all possible patterns of empirical observations (e.g. sense data)” (175). J&L go further and claim that in many cases, modern Bayesians are mainly interested in just covering the observed behavior, no matter how it is done. Glymour dubs this “Osiander’s Psychology,” the aim being to “provide a calculus consistent with the observations” and nothing more. At any rate, it appears that there is a general perception out there that in practice Bayesians have looked to “environmental regularities” rather than “accounts of how information is represented and manipulated in the head” as the correct bases of optimal inference.

Chater et. al. object to this characterization and allow “mental states over which such [Bayesian] computations exist…” (195). This need not invalidate J&L’s main point however. The problem with Behaviorism was not merely that it eschewed mental states, but that it endorsed a radical form of associationism. Behaviorism is the natural end point of radical associationism for the view postulates that mental structures largely reflect the properties of environmental regularities. If this is correct, then it is not clear what adding mental representations buys you. Why not go directly from regularities in the environment to regularities in behavior and skip the isomorphic middle man.

It is worth noting that Chater et. al. seem to endorse a rough vision of this environmentalist project, at least in the domains of vision and language. As they note “Bayesian approaches to vision essentially involve careful analysis of the structure of the visual environment” and “in the context of language acquisition” Bayesians have focused on “how learning depends on the details of the “linguistic environment,” which determines the linguistic structures to be acquired” (195).

Not much talk here of structured hypothesis spaces for vision or language, no mention of Ullman-like Rigidity Principles or principles of UG. Just a nod towards structured environments and how they drive mental processing. Nothing prevents Bayesians from including these, but there seems to be a predisposition to focus on environmental influences. Why? Well, if you believe that the overarching “framework” question is how data (i.e. environmental input) moves you around a hypothesis space, then maybe you’ll be more inclined to downplay the role of the structure of that space and highlight how input (environmental input) moves you around it. Indeed, a strong environmentalism will be attractive to you if you believe this. Why? Well, given this assumption then mental structures are just reflections of environmental regularities and, if so, the name of the psychological game will focus on explaining how data is processed to identify these regularities. No need to worry about the structure of hypothesis spaces for they are simple reflections of environmental regularities, i.e. regularities in the data.

Of course, this is not a logically necessary move. Nothing in Bayes requires that one downplay the importance of hypothesis spaces, but one can see, without too much effort, why these views will live comfortably together. And it seems that Chater et. al., the leading Young Bayesians, have no trouble seeing the utility of structured environments to the Bayesian project. Need I add that this is the source for the expressed unease in my previous post on Bayes (here).

Let me reitterate one more point and then stop. There is no reason to think that the practice that J&L describe, even if it is accurate, is endemic to Bayesian modeling. It is not. Clearly, it is possible to choose hypothesis spaces that are more psychologically grounded and then investigate the properties of Bayesian models that incorporate these. However, if the more critical of the commentators are correct (see Glymour, Rehder, Anderson a.o.) then the real problem lies with the fact that Bayesians have hyped their contributions by confusing a useful tool with a theory, and a pretty simple tool at that. Here are two quotes expressing this:

Rehder [i.e. in his comment, NH] goes as far as to suggest viewing the Bayesian framework as a programming language, in which Bayes’ rule is universal but fairly trivial, and all of the explanatory power lies in the assumed goals and hypotheses. (218)

…that all viable approaches ultimately reduce to Bayesian methods does not imply that Byesian inference encompasses their explanatory contribution. Such an argument is akin to concluding that, because the dynamics of all macroscopic physical systems can be modeled using Newton’s calculus, or because all cognitive models can be programmed in Python, calculus or python constitutes a complete and correct theory of cognition. (217)

So, in conclusion: go read the paper, the commentaries and the replies. It’s loads of fun. At the very least it comforts me to know that there is a large swath of people out there (some of them prodigiously smart) that have problems not dissimilar to mine with the old Reverend’s modern day followers. I suspect that were the revolutionary swagger toned down and replaced with the observation that Bayes provides one possibly useful way for exploring how to incorporate probabilities into the mental sciences, nobody would bat an eye. I’m pretty sure that I wouldn’t. All that we would ask is what one should always ask: what does doing this buy us?

19 comments:

ewanAugust 30, 2013 at 2:43 PM
I still think there is genuine theoretical claimery in Bayes, although I admit Glymour's stern piece gives even me a bit of the willies about the prospects. But as I've said, almost no one actually does this (properly) - and that's a topic for elsewhere. I would just say, no small thing, a tool, and as I've said before, I think that is what Bayesian models are principally used for - a standard, off-the-shelf tool for plotting out the map of what a set of model assumptions (as you say, a particular hypothesis space) is likely to shake out to when the rubber meets the data road, after teeing off on a few dots and eyeing a few crosses. Whether that set of model assumptions is concocted for scientific or engineering reasons depends on the source (of course I am as loathe as J&L are and the law firm profess to be when the model is put forward for the reason "to show there exists a model"). But even to play this convenient shoe goo role is really nothing to sneeze at. Pick a useful tool - calculus, Python, telescopes, shoes - well, for Bayes (really the deal is about hierarchical Bayesian models)... time will tell if it reaches this height, or if it winds up merely being shoe goo: a reasonable enough invention, but those little nails were holding it together pretty well before, so it doesn't exactly make the world go round. Still, a tool - nothing to sneeze at.
ReplyDelete
Replies
Noah MotionAugust 31, 2013 at 7:33 AM
If you enjoyed Jones & Love, you'll also enjoy Bowers & Davis:

http://www.ncbi.nlm.nih.gov/pubmed/22545686

pdf here:

http://www.clm.utexas.edu/compjclub/papers/Bowers2012.pdf
ReplyDelete
Replies
NorbertAugust 31, 2013 at 8:15 AM
Yes, I did. Thx for putting up the reference.
ReplyDelete
Replies
Alex ClarkAugust 31, 2013 at 9:00 AM
I am curious about the relationship between the Bayesians' claim that the mind is optimally rational in some sense, and the claim in the MP that language is an optimal solution to certain interface conditions -- and the related ideas of "third factor principles" -- general principles of computation etc. Both of them hover on the verge of being empirical claims, without ever really being testable/falsifiable. But in both cases the goal is to derive nontrivial predictions about human behaviour from some deeper assumption, while reducing the amount of stipulation.
ReplyDelete
Replies
AnonymousSeptember 1, 2013 at 8:21 AM
From the perspective of the Minimalist sleeper agent that I am :) the "what does this buy us" has some political value. The challenge in "justifying" "Chomskyan" linguistic theory in certain quarters is made easier by being able to say that there is this sophisticated way to explore the hypothesis space that allows the inclusion of both a rich linguistic environment and a richly structured formal "environment." Or there may plausibly be at some point in time. One may wonder why this matters, but the value of being able to show a working model of a learner shouldn't be discounted.
ReplyDelete
Replies

Add comment