Thursday, April 14, 2016

Once more into the breech: Re (3d)

So, what makes an inductive theory Bayesian? I have no idea. Nor, it appears does anyone else. This is too bad. Why? Because though ti is always the case that particular models must be evaluated on their own merits (as Charles rightly notes in the previous post), the interest in particular models, IMO, stems from the light they shine on the class of models of which they are a particular instance. In other words, specific models are interesting both for their empirical coverage AND (IMO, more specifically) for the insight they provide for the theoretical commitments a model embodies (hence one model from the class of models).

My discussion of Bayes rested on the assumption that Bayes commits one to some interesting theoretical claims and that the specific models offered are in service of advancing more general claims that Bayes embodies. From where I sit, it seems to me that for many there are no theoretical claims that Bayes embodies so that the supposition that a Bayes model intends to tell us something beyond what the specific model is a model of is off base. Ok. I can live with that. It just means that the whole Bayes thing is not that interesting, except technologically. What's potential interest are the individual proposals, but they don't have theoretical legs as they are not in service of larger claims.

I should add, however, that many "experts" are not quite so catholic. Here is a quote from Gelman and Shalizi's paper on Bayes.

The common core of various conceptions of induction is some form of inference from particulars to the general – in the statistical context, presumably, inference from the observations y to parameters describing the data-generating process. But if that were all that was meant, then not only is ‘frequentist statistics a theory of inductive inference’ (Mayo & Cox, 2006), but the whole range of guess-and-test behaviors engaged in by animals (Holland, Holyoak, Nisbett, & Thagard, 1986), including those formalized in the hypothetico-deductive method, are also inductive. Even the unpromising-sounding procedure, ‘pick a model at random and keep it until its accumulated error gets too big, then pick another model completely at random’, would qualify (and could work surprisingly well under some circumstances – cf. Ashby, 1960; Foster & Young, 2003). So would utterly irrational procedures (‘pick a new random when the sum of the least significant digits in y is 13’). Clearly something more is required, or at least implied, by those claiming that Bayesian updating is inductive. (25-26)    
Note the theories that they count as "inductive" under the general heading but find to be unlikely candidates for the Bayes moniker. See what they consider not Bayes inductive rules? Here are two, in case you missed it: "the whole range of guess-and-test behaviors" and even the "pick a model at random and keep it until its accumulated error gets too big, then pick another model completely at random." G&S take it that if even there methods are instances of Bayesian updating, then there is nothing interesting to discuss for it denudes Bayes of any interesting content.

Of course, you will have noticed that these two procedures are in fact the ones that people (e.g. Charles, Trueswell and Gleitman and Co) have been arguing in fact characterize acquisition in various linguistic domains of interest. Thus, they reasonably enough (at least if they understand things the way Gelman and Shalizi do) conclude that these methods are not interestingly Bayesian (or for that matter "inductive," except in a degenerate sense).

So, there is a choice: treat "Bayes" as an honorific in which case there is no interesting content to being Bayesian beyond "hooray!!" or treat it as having content, in which case it seems opposed to systems like "guess-and-test" or "pick at random." Which one picks is irrelevant to me. It would be nice to know, however, which is intended when someone offers up a Bayesian model. In the first case it 'Bayesian' just means "one that I think is correct." In the second, it has slightly more content. But what that is? Beats me.

One last thing. It is possible to understand the Aspects model of G acquisition as Bayesian (I have this from an excellent (let's say, impeccable) source). Chomsky took the computational intractability of that model (its infeasibility) to imply that we need to abandon the Aspect model in favor of a P&P view of acquisition (though whether this is tractable is an open question as well). In other words, Chomsky took seriously the mechanics of the Aspects model and thought that its intractability indicated that it was fatally flawed. Good for him. He opted for being wrong over being vacuous. May this be a lesson for us all.

















5 comments:

  1. If someone could explain to me how the Aspects model is Bayesian when it isn't even probabilistic, then I would be grateful.

    I kind of see how you could argue that the evaluation metric is a sort of prior, though of course it may be improper (not a fatal flaw).

    ReplyDelete
    Replies
    1. Hi Alex, I don't think ASPECTS talks much about the technical formulation of the Evaluation Metric. But LSLT does. There are several places where it comes up, and some of the formulations are certainly information-theoretic and thus probabilistic. See Chapter IV, around page 140 or so. (This refers to the 1955 version, not sure if it shows up in the 1975 book: I think Bob Berwick has a scanned copy somewhere.) Later in that chapter, there was a hand calculation to determine the syntactic categories for 300 words from a dictionary. I think that was joint work with Peter Elias; this I know didn't appear in the 1975 book. As you can imagine, it's not computationally trivial. There was then a Master's thesis by Anatol Holt who worked with Chomsky on a clustering approach to categories (http://dspace.mit.edu/bitstream/handle/1721.1/37712/30912464-MIT.pdf?sequence=2). The complexity and ineffectiveness of the method was, again from an impeccably reliable source, why Chomsky abandoned the earlier approaches.

      Delete
    2. I am aware of the probabilistic stuff in LSLT. But AFAIK, Chomsky completely abandoned that by Aspects, and it is the Aspects model I am interested in here.

      Delete
    3. I don't think that part is in the 1975 version, but the whole 1955 155MB iguana can be found on Berwick's site here.

      Delete
    4. @Alex: Two points. First, though Aspects does not state matters in terms of probabilities, I have been told by Bayesian experts (a guy whose name starts with a T and rhymes with 'cherry bomb') that adding such to the basic framework would be very natural.

      Second, one of the nice things of trying to articulate Bayes' principle features (3a-d in earlier posts) is that one can ask whether other systems have ANY of those. The eval metric in Aspects understands its task as evaluation ALL Gs ALL the time wrt to ALL of the data. It's this universal feature that a well place confidential source citse as the root of the infeasibility. Oddly, from the little I know, this is also a feature of Bayes that makes it intractable (infeasible anyone?). So one can share properties with Bayes even in the absence of probabilities, and the shared features may be enough to suggest moving to another model.

      One of the nice things of trying to get clear as to what properties an idealization has is that it is possible to isolate features and isolate contentious ones. Aspects IS Bayes-like in this one respect, and this is recognized to be a problem. Is it entirely Bayes-like? Well not as it stands. Does it matter for the issue at hand? Not so far as I can tell. So is Aspects Bayesian? Well maybe not. Let's just say Bayesianish.

      Delete