Bob Berwick is planning to write something erudite about
Bayes in a forthcoming post. I cannot do this, for obvious reasons. But I can
throw oil on the fires. The following is my reaction to a paper that Ewan
suggested that I read the critiques of. I doubt that his advice had the
intended consequence. But, as Yogi Berra observed, it’s always hard to accurately
predict how things will turn out, especially in the future. So here goes.
In an effort to calm my disquiet about Bayes and his
contemporary acolytes, Ewan was kind enough to suggest that I read the comments
to this
B&BS (is it just me, or does the acronym suggest something about the
contents?) target article. Of course,
being a supremely complaisant personality, I immediately did as bid, trolling
through the commentaries and even reading the main piece and the response of
the authors to the critics’ remarks so that I could appreciate the subtleties
of the various parries and thrusts.
Before moving forward, I would love to thank Ewan for this tip. The paper is a lark, the comments are
terrific and, all in all, it’s just the kind of heated debate that warms my cold
cynical heart. Let me provide some of my
personal highlights. But, please, read this for yourself. It’s a page-turner,
and though I cannot endorse their views due to my incompetence, I would be lying if I denied being comforted by their misgivings. It’s always
nice to know you are not alone in the intellectual universe (see here).
The main point that the authors Jones and Love (J&L)
make is that, as practiced, there’s not much to current Bayesian analyses,
though they are hopeful that this is repairable (in contrast to many of the
commentators who believe them to be overly sanguine e.g. see Glymour’s comments. BTW, no
fool he!). Indeed, so far as I can tell, they suggest that this is not a
surprise for Bayes as such is little
more than a pretty simple weighted voting scheme to determine which among a set
of given alternatives best fits the
data (see L&J’s 3). There is some
brouhaha over this characterization by the law firm of Chater, Goodman,
Griffiths, Kemp, Oaksford and Tenebaum (they charge well over $1000 per probable
hour I hear), but J&L stick to their guns and characterization (see p. 219)
claiming that the sophisticated procedures that Chater et. al. advert to
“introduces little added complexity” once the mathematical fog is cleared (219).
So, their view is that Bayesianism per se is pretty weak stuff. Let me explain what I take them to
mean. J&L note (section 3 again) that there are two parts to any Bayesian
model, the voting/counting scheme and the structure of the hypothesis space.
The latter provides the alternatives voted on and a weighting of the votes
(some alternatives are given head starts). Now, Bayes’ Rule (BR) is a
specification of how votes should be allocated as data comes in. The hypothesis space is where the real heavy
lifting is done. In effect, in J&L’s view (and they are by no means the
most extreme voices here as the comment sections show) BR, and modern souped up
versions thereof, add very little of explanatory significance to the mix. If so,
J&L observe, then most of the psychological interest of Bayesian models
resides in the structure of the assumed hypotheses spaces, i.e. whatever
interesting results emerge from a Bayesian model, stem not from the counting
scheme but the structure of the hypothesis space. That’s where the empirical meat lies:
All a Bayesian model does is
determine which of the patterns or classes of patterns it is endowed with is
most consistent with the data it is given. Thus, there is no explanation of
where those patterns (i.e. hypotheses) come from. (220)
This is what I meant by saying that, in J&L’s view,
Bayes, in and of itself, amounts to
little more than the view that “people use past experience to decide what to do
or expect in the future” (217). In and of
itself Bayes does not specify or bound the class of possible or plausible
hypothesis spaces and so in and of itself
it fails to make much of a contribution to our understanding of mental life.
Rather, in and of itself, Bayesian
precepts are anodyne: who doesn’t think that experience matters to our mental
life?
This view is, needless to say, heartily contested. Or so it
appears on the surface. So Chater et. al. assert that:
By adopting appropriate representations of a problem
in terms of random variables and probabilistic dependencies between them,
probability theory and its decision theoretic extensions offer a unifying framework
for understanding all aspects of cognition that can be properly understood as
inference under uncertainty: perception, learning, reasoning, language
comprehension and production, social cognition, action planning, and motor
control, as well as innumerable real world tasks that require the integration
of these capacities. (194)
Wow! Seems really opposed to J&L, right? Well maybe not. Note the first seven words of
the quote that I have conveniently highlighted. Take the right representation
of the problem add a dash of BR and out pops all of human psychology. Hmm. Is this antithetical to J&L’s
claims? Not until we factor out how much
of the explanation in these domains comes from the “appropriate
representations” and how much from the probability add on. Nobody (or at least nobody I know) has any
problem adding probabilities to mentalist theories, at least not in principle
(one always wants to see the payoff). However, if we were to ask where the hard
work comes in, J&L argue that it’s in choosing the right hypothesis space
and not in probabilizing up a given such space.
Or this is the way it looks to J&L and many many many, if not most,
of the other commentators.
Let me mote one more thing before ending. J&L also pick
up on something that bothered me in my earlier post. They observe more than a
passing resemblance between modern Bayesians and earlier Behaviorism (see their
section 4). They assert that in many
cases, the hypotheses that populate Bayesian spaces “are not psychological
constructs …but instead reflect characteristics of the environment. The set of
hypotheses, together with their prior probabilities, constitute a description
of the environment by specifying the likelihood of all possible patterns of
empirical observations (e.g. sense data)” (175). J&L go further and claim
that in many cases, modern Bayesians are mainly interested in just covering the
observed behavior, no matter how it is done. Glymour dubs this “Osiander’s
Psychology,” the aim being to “provide a calculus consistent with the
observations” and nothing more. At any
rate, it appears that there is a general perception out there that in practice Bayesians have looked to
“environmental regularities” rather than “accounts of how information is
represented and manipulated in the head” as the correct bases of optimal
inference.
Chater et. al. object to this characterization and allow “mental states over which such [Bayesian] computations exist…” (195). This need not invalidate J&L’s main point however. The problem with Behaviorism was not merely that it eschewed mental states, but that it endorsed a radical form of associationism. Behaviorism is the natural end point of radical associationism for the view postulates that mental structures largely reflect the properties of environmental regularities. If this is correct, then it is not clear what adding mental representations buys you. Why not go directly from regularities in the environment to regularities in behavior and skip the isomorphic middle man.
Chater et. al. object to this characterization and allow “mental states over which such [Bayesian] computations exist…” (195). This need not invalidate J&L’s main point however. The problem with Behaviorism was not merely that it eschewed mental states, but that it endorsed a radical form of associationism. Behaviorism is the natural end point of radical associationism for the view postulates that mental structures largely reflect the properties of environmental regularities. If this is correct, then it is not clear what adding mental representations buys you. Why not go directly from regularities in the environment to regularities in behavior and skip the isomorphic middle man.
It is worth noting that Chater et. al. seem to endorse a
rough vision of this environmentalist project, at least in the domains of vision
and language. As they note “Bayesian approaches to vision essentially involve
careful analysis of the structure of the visual environment” and “in the
context of language acquisition” Bayesians have focused on “how learning
depends on the details of the “linguistic environment,” which determines the
linguistic structures to be acquired” (195).
Not much talk here of structured hypothesis spaces for
vision or language, no mention of Ullman-like Rigidity Principles or principles
of UG. Just a nod towards structured environments and how they drive mental
processing. Nothing prevents Bayesians from including these, but there seems to be a predisposition to focus on environmental influences. Why?
Well, if you believe that the overarching “framework” question is how
data (i.e. environmental input) moves you around a hypothesis space, then maybe
you’ll be more inclined to downplay the role of the structure of that space and
highlight how input (environmental input) moves you around it. Indeed, a
strong environmentalism will be attractive to you if you believe this. Why? Well, given this
assumption then mental structures are just reflections of environmental
regularities and, if so, the name of the psychological game will focus on explaining how data is
processed to identify these regularities.
No need to worry about the structure of hypothesis spaces for they are
simple reflections of environmental regularities, i.e. regularities in the
data.
Of course, this is not a logically
necessary move. Nothing in Bayes requires that one downplay the importance of hypothesis
spaces, but one can see, without too much effort, why these views will live
comfortably together. And it seems that Chater et. al., the leading Young
Bayesians, have no trouble seeing the utility of structured environments to the
Bayesian project. Need I add that this is the source for the expressed unease
in my previous post on Bayes (here).
Let me reitterate one more point and then stop. There is no reason to think that the practice
that J&L describe, even if it is accurate, is endemic to Bayesian modeling.
It is not. Clearly, it is possible to choose hypothesis spaces that are more
psychologically grounded and then investigate the properties of Bayesian models
that incorporate these. However, if the
more critical of the commentators are correct (see Glymour, Rehder, Anderson
a.o.) then the real problem lies with the fact that Bayesians have hyped their
contributions by confusing a useful tool with a theory, and a pretty simple
tool at that. Here are two quotes
expressing this:
Rehder [i.e. in his comment, NH] goes as far as to suggest viewing
the Bayesian framework as a programming language, in which Bayes’ rule is
universal but fairly trivial, and all of the explanatory power lies in the
assumed goals and hypotheses. (218)
…that all viable approaches
ultimately reduce to Bayesian methods does not imply that Byesian inference
encompasses their explanatory contribution. Such an argument is akin to
concluding that, because the dynamics of all macroscopic physical systems can
be modeled using Newton’s calculus, or because all cognitive models can be
programmed in Python, calculus or python constitutes a complete and correct
theory of cognition. (217)
So, in conclusion: go read the paper, the commentaries and
the replies. It’s loads of fun. At the
very least it comforts me to know that there is a large swath of people out
there (some of them prodigiously smart) that have problems not dissimilar to
mine with the old Reverend’s modern day followers. I suspect that were the revolutionary swagger
toned down and replaced with the observation that Bayes provides one possibly useful
way for exploring how to incorporate probabilities into the mental sciences,
nobody would bat an eye. I’m pretty sure
that I wouldn’t. All that we would ask is what one should always ask: what does
doing this buy us?
I still think there is genuine theoretical claimery in Bayes, although I admit Glymour's stern piece gives even me a bit of the willies about the prospects. But as I've said, almost no one actually does this (properly) - and that's a topic for elsewhere. I would just say, no small thing, a tool, and as I've said before, I think that is what Bayesian models are principally used for - a standard, off-the-shelf tool for plotting out the map of what a set of model assumptions (as you say, a particular hypothesis space) is likely to shake out to when the rubber meets the data road, after teeing off on a few dots and eyeing a few crosses. Whether that set of model assumptions is concocted for scientific or engineering reasons depends on the source (of course I am as loathe as J&L are and the law firm profess to be when the model is put forward for the reason "to show there exists a model"). But even to play this convenient shoe goo role is really nothing to sneeze at. Pick a useful tool - calculus, Python, telescopes, shoes - well, for Bayes (really the deal is about hierarchical Bayesian models)... time will tell if it reaches this height, or if it winds up merely being shoe goo: a reasonable enough invention, but those little nails were holding it together pretty well before, so it doesn't exactly make the world go round. Still, a tool - nothing to sneeze at.
ReplyDeleteI agree, good tools are very useful. But like all tools it is also useful to know their limitations. Glymour has a useful paper reviewing what he takes to be some of the problems with the Bayesian "tools" when applied to psychological problems (I should add that he is no fan of SOAR or ACT-R or neural nets either, and for the same resasons). As he sees it they all have "an unlimited supply of parameters, adequate to account for any finite behavior, that is, for behavior subjects can exhibit." (see his "Bayesian Ptolemaic Psychology" http://www.error06.econ.vt.edu/Glymourp.pdf)
DeleteThis suggests that the tool can be useful but is very very flexible. A bayesian analysis is guaranteed of success, as are the other formalisms Glymour identifies. This means that *by themselves* they are empirically inert. All that matters are the particular substantive assumptions packed into the model. Adding Bayes is more like putting these substantive proposals in Normal Form for easier viewing than making additional substantive claims, if he is correct. So, Ewan, is he? Unlike me, he seems professionally entitled to his very strong opinions.
All Glymour's points in the BBS reply are not only valid but also on point and not only valid and on point but damaging to the thoughtless rampaging Bayesian fundamentalist. But there's a slight corrective on the empirical vacuity thing. There are simplicity effects that come about from requiring that beliefs be a unit measure (ie sum to one). One can replicate the induced behavior with another inference tool but can't explain it. So I think that counts as a meaningful empirical claim. But as I said, I think although it's gotten some attention the "laws of inference" are still not the centrepiece of the Bayesian claim the way they should be.
DeleteIf you enjoyed Jones & Love, you'll also enjoy Bowers & Davis:
ReplyDeletehttp://www.ncbi.nlm.nih.gov/pubmed/22545686
pdf here:
http://www.clm.utexas.edu/compjclub/papers/Bowers2012.pdf
Yes, I did. Thx for putting up the reference.
ReplyDeleteI am curious about the relationship between the Bayesians' claim that the mind is optimally rational in some sense, and the claim in the MP that language is an optimal solution to certain interface conditions -- and the related ideas of "third factor principles" -- general principles of computation etc. Both of them hover on the verge of being empirical claims, without ever really being testable/falsifiable. But in both cases the goal is to derive nontrivial predictions about human behaviour from some deeper assumption, while reducing the amount of stipulation.
ReplyDeleteYes, it's only a matter of time before ordinary linguists are made aware of the fact that Bayesians, at least the non-fundamentalist kind, are trying to do MP (and then I daresay God help us all; in fact, stay tuned...). The Bayesian claim has a leg up on MP, which is to have at least said something formally about what it would mean to be an optimal solution. It could be instructive for linguists: here is a perfectly good way of operationalizing "optimal", and, as you can see, it still has a godawful number of free parameters, so you had better start getting a bit more specific if you want to ever hope to be making empirical claims by asserting the Minimalist claim. With enough detail filled in about the prior or the background assumptions about other cognitive systems and how they work, one can get past the point of vacuity in either case, and I do think that J&L are trying to sketch out "Enlightened Bayesians" who look just like that. We had a back and forth a few months ago here about whether the numerical cognition paper by Justin Halberda, Tim Hunter, Jeff Lidz, and the rest was an instance of filling in enough detail to giving the Strong Minimalist Thesis empirical teeth. Jeff and I concluded that it was not (or maybe that it was even at odds with it, I can't remember), countering Norbert's initial claim that it was. I believe Norbert stuck to his guns.
DeleteI think that Alex's question is a good one, though I am not sure that Bayesians are trying to do MP, contra Ewan. The way I see it, optimality in MP is about the computational properties of the grammatical code. Is the code well designed or not. Now, as you no doubt have noticed, I have been trying to explain what this might mean by pointing to potential incarnations of the idea, in, e.g. the work on 'most' by Pietorski, Lidz, Halberda and Hunter. Here, the system that uses it has certain predilections and different coding of the meaning of 'most' fit more or less nicely with these. I think a similar idea can be extended to "explain" why grammatical dependencies obey something like relativized minimality, given the kinds of memory mammals have. At any rate, this is one way of understanding optimality and it seems, at least to me, different from what Bayesians are up to. But then I might be wrong. We need to wait and see the candidates put forward. However, I should add that obody I know will be upset if Bayesians manage to pull this off. I eagerly await the tirumphs and will personally throw rose petals in front of the victorious chariot.
DeleteThis comment has been removed by the author.
DeleteWell, I merely meant that Bayesians live in the Greater Minimalist Metropolitan Area. And those in the downtown core would be wise to take the lessons of those getting beaten up in the suburbs. The fact that the optimality is in a slightly different domain (ontogeny of inference rather than phylogeny of hypothesis space) and may (or may not) be subject to slightly different notions of optimality doesn't change the set of caveats that apply in any way that I can see...
Delete"The way I see it, optimality in MP is about the computational properties of the grammatical code. Is the code well designed or not."
DeleteAs Alex has been pointing out a couple of times, asking this question requires a prior (no pun intended) specification of what precise notion of computation is assumed. Nothing wrong with saying "we don't have a precise specification as of now, but some hunches which allow us to make first stabs", but complaining about the Bayesian notion of optimality being relative to their rather arbitrary models seems unfair --- there is no non-relative notion of optimality, and at least Bayesians make themselves easy targets by spelling out every little assumption. Which is not to say that spelling out obviously false assumptions is to be encouraged (why bother with the obviously false stuff to begin with), but not all Bayesians are doing this kind of work. Yes, there is a hype, and you should not believe the hype. But as you point out yourself, Bayes and Empiricism are logically independent, so what exactly _are_ we discussing? That within every framework, there's work of questionable quality (and that, of course, these judgements differ depending on whom you ask)? That hardly seems controversial, and I don't see what the poor Reverend has to do with that.
As for actual pros of a Bayesian as opposed to "simpler" approaches, let me quote Partha Nyogi for whom Bayesian models would have qualified as mathematical ("computational" really is a horribly overloaded word):
"In the first [i.e. mathematical model], one constructs idealized and simplified models but one can now reason precisely about the behavior of such models and therefore be very sure of one’s conclusions. In the second [i.e. computational model], one constructs more realistic models but because of the complexity, one will need to resort to heuristic arguments and simulations. In summary, for mathematical models the assumptions are more questionable but the conclusions are more reliable - for computational models, the assumptions are more believable but the conclusions more suspect." (2006, p.39)
Perhaps it's due to my philosophical (mis)education, but I do like reliable conclusions. (Which is also why I bought into Generative Linguistics, as opposed to any of the Cognitive Linguistics theories taught almost exclusively in Heidelberg when I was doing my undergrad)
Finally, Bayesians of course don't have the monopoly on mathematical models and I'm always happy to see other people getting their hands "dirty" (Bob was co-author on a paper along those lines, presented at this year's ACL).
Yeah, let me just put the point bluntly: MP needs to put up some details about why such-and-such is optimal and such-and-such is not or else simply can it. If you buy the Glymour "Bayesian cognitive science [BCS] is empirically vacuous" criticism, you can't fail to accept this conclusion and remain consistent. MP is engaged in the exact same enterprise in a different domain, but has up to now been vague exactly where BCS has been precise. Yet BCS gets hit for being vacuous precisely BECAUSE there are so many free parameters one can fiddle with in specifying the basic axioms against which optimality is computed. Now, C tells me he thinks it's an "oddity" that so many linguists are so insistent on having criteria for optimality/simplicity that are better nailed down than vague intuitions, while in the rest of the sciences no one cares. But the other sciences do not make optimality do this same kind of heavy lifting, and I think the debate over whether BCS is empirically vacuous indicates strongly that this is a major problem.
DeleteTwo points, one sociological, the other contentful.
DeleteFirst, the problem with Bayes is not ONLY that it seems to deliver far less than it promises, but that it is believed to be delivering far far more. THus it plays a very outsized role in the intellectual universe. Anyone that thinks that Bayes and Minimalism enjoy the same standing has been smoking some very expensive stuff. Moreover, the real problem with Bayes in this regard is not that it is a very weak theory, but that the weakness has often been filled by associationist garbage. So what we have seen is a very weak theory sneak in very strong FALSE ideas all the while being feted as groundbreaking, game-changing, revolutionary innovation. So, my real objection is not that it is less wonderful than meets the eye, but that B's rep has allowed for associationism to revive yet again. This was why I also objected to connectionism (which also was a stalking horse for an associationist revival). Were Bayes work clearly anti-associationist, I would barely give a damn. So, were I looking for a slogan, it would be, "It's the associationism, Stupid!"
Second, Minimalism's "optimality" problem. Yup. I agree. It needs A LOT of elaboration. Oddly, I thought that I had effectively conceded this given my efforts, no doubt inadequate, to elaborate some possible versions of the thesis that I thought had content. So trying to understand the thesis wrt various kinds of interface uses: (i) Berwick and Weinberg on transparency and parsing, Pietroski, Lidz, Hunter, and Halberda on meaning reps and analogical number systems, minimality and the bounded content addressability of memory. These are plausible examples of how to interpret the Strong Minimalist Thesis, I believe. As you might have noticed, these views have hardly been received with unabashed enthusiasm. Oh well. However, I agree that something like this needs doing.
Oddly, as a matter of fact, these higher level concerns (both important and interesting in my view) have not really played that big a role in actual grammatical practice (in contrast with what I take to be rationaly analysis in Bayes). So rather than optimal design being a precept that regulates practice, it is more an aspiration that wants elaboration. I think that Chomsky often speaks this way: the aim is to show HOW grammars are optimal, i.e. develop theories where what we see is such. This, sadly, leaves the idea as pretty idle and I would like more. But if it is true that it plays a relatively small role in argument and research, then the practical effect is pretty small. Like I said, I don't like this. I want more. But if so, it appears, at least from the outside, to be different from the way rational analysis operates in the Bayesian domain. But, I could be wrong, as all I am doing is reading tea leaves here.
Okay, yes, I concede. Bayes as it stands is Connectionism II in virtually every respect. So my hope is that by god we fix that ASAP. Thus, on point 2, shrugging off the vacuity problem there suggests that point 1 must really be the kicker for you about Bayes - if that wasn't already clear. My position is that the incoherent "I don't believe in hypothesis spaces" relic of the connectionist era is at least still under threat by Bayes. Slogan: radical empiricism, now less incoherent but likely just as wrong.
DeleteHonestly there is far too little of the high-level reasoning in Bayes too. It's rarely done to actually reason about what it would mean to be a rational X and even rarer that it's done well - Hale, Levy, yes, Anderson in his book - but most people are just using it as a tool to crank out models (=papers). So my assessment is that the situation is largely the same. Lots of doing, not enough thinking. C'est la vie.
My god, a meeting of the minds. Now let's get to work. Box, Bob has another Bayes Daze post on the way, and hopefully one other after that. It should add some to the discussion.
Delete"So, were I looking for a slogan, it would be, "It's the associationism, Stupid!""
DeleteI can agree with that, of course.
OMG, consensus!
Delete_interesting_ disagreements aren't that easy to come by ;-)
Delete"Science is hard, theory is long, and life is short. Still, we should all do our best not to think in headlines."
(Fodor 2001)
From the perspective of the Minimalist sleeper agent that I am :) the "what does this buy us" has some political value. The challenge in "justifying" "Chomskyan" linguistic theory in certain quarters is made easier by being able to say that there is this sophisticated way to explore the hypothesis space that allows the inclusion of both a rich linguistic environment and a richly structured formal "environment." Or there may plausibly be at some point in time. One may wonder why this matters, but the value of being able to show a working model of a learner shouldn't be discounted.
ReplyDelete