Comments on Faculty of Language: Bayes Daze I

Hey Alex, your parenthetical facts there aren'...

2013-08-16T18:17:15.598-07:00

Hey Alex, your parenthetical facts there aren't quite right. Two of the major people working on IBM's machine translation system (not speech recognition), Peter F. Brown and Bob Mercer, left IBM to join Renaissance. If anyone else from that (large) team now works in finance, I don't know it; the one speech recognition person involved was Fred Jelinek, who is (tellingly enough) not co-author on their 1993 paper (the one with details).

Like Avery I think the claim that probabilistic le...

2013-08-14T07:32:37.234-07:00

Like Avery I think the claim that probabilistic learning was widely accepted in models of language acquisition in generative grammar is a little questionable; Charles Yang was complaining in the last thread about getting stick for probabilistic learning (in 2002!)

I'd be particularly interested to hear your take on the Subset Principle (Wexler, Manzini, Berwick...) because to me, it only gets its force from a non-probabilistic learning paradigm.

(BTW, if you only want Bayesian models of learning -- then maybe Chater and Vitanyi's paper in Journal of Mathematical Psychology is essentially a very big extension of Horning's work. (but not computationally bounded).)

without noting that <- without that

2013-08-14T04:25:18.444-07:00

without noting that <- without that

And, if Bayes is so old-hat and done business, who...

2013-08-14T04:22:31.959-07:00

And, if Bayes is so old-hat and done business, who did people such as Crain and Thorton in 2002 (and maybe even more recently) complain about Wexler and Culicover's Uniqueness Principle without that it has a softer and more plausible statistical version in which you try to improve your grammar by increasing the extent to which it can predict what people will say under various circumstances, including their intended meaning, which can often be guessed from context and partial knowledge of the language?

Boy you guys are a tough crowd -- I give you a pa...

2013-08-13T23:58:31.297-07:00

Boy you guys are a tough crowd -- I give you a paper so new it hasn't been published yet, and you reject it without even reading it because a friend of yours says it was all done before in the 80s!

I am a proud owner of the Systems that learn book, and a paper copy of Horning's thesis for that matter, and they are both good reads, but the STL book does have some gaps -- and probabilistic learning and complexity theory are the two main ones. So if you think that computational complexity is a big concern, and you think that probabilistic reasoning is important, then the STL book is *not* the right place to start -- IIRC they even removed all of the probabilistic stuff (all 3 pages of it in the first edition) from the second edition.

(Lots of Machine learning people do leave academia and go to Wall street (e.g. all of the IBM speech recognition guys to Jim Simon's hedge fund .. ), and I guess by now they all have yachts and/or private islands but unfortunately CFGs and MCFGS aren't widely used in finance so my chances of a nice consultancy gig are low ....)

These are all worthy examples from the Formal Lear...

2013-08-13T09:46:15.372-07:00

These are all worthy examples from the Formal Learning Theory tradition but...
Bayes Daze was meant to focus on, well, Bayesian approaches to learning. We'll
take up Formal Language Theory in later posts. But, as an appe-teaser, it may be of some interest to note that recent work cited in the comment or recent
reviews of the same is seemingly unaware that in many cases it is rehearsing the findings of Formal Learning Theory from long ago. The various notions of 'prudence' (introduced by Scott Weinstein and Dan Osherson in 1982), exact learning, information-efficiency, probabilistic data, and computational complexity were all thoroughly examined in the Formal Learning Theory tradition, as described in the Osherson, Stob, & Weinstein book, "Systems That Learn" (1987) or its 2nd edition. One difference is that this older tradition used recursive function theory applied to learning, derived from Manuel Blum's (1975) seminal paper, rather than modern day complexity classes. But qualitatively it's the same. As Dan Osherson (p.c.) tells me, there don't seem to be any spanking-new, crisp results here. Now, if I were a betting man, when it comes to learning theory, I'd wager that Dan and Scott know what they are talking about. Interestingly enough, Scott and Dan have also found that there *is* a connection between Formal Learning Theory and statistical inference, though it may not be quite what you would have desired. So much for the teaser. And in truth though, if any of this really *was* a solid solution to the problem of induction, do you think we'd all be sitting around typing into funny blogger boxes? I'd be on Wall Street, cleaning up, or, better yet, I'd have already been on Wall Street and cleaned up, and now retired to a sunny island sipping those drinks….

On the computational complexity front: Chihiro S...

2013-08-13T06:18:43.548-07:00

On the computational complexity front:
Chihiro Shibata and Ryo Yoshinaka. PAC Learning of Some Subclasses of Context-Free Grammars with Basic Distributional Properties in ALT 2013 (so not published yet) show some computationally efficient and sample efficient (non-Bayesian) learning of CFGs. Not all CFGs of course, but classes that include all regular languages.

And of course the Smith and Cohen paper we discussed last time establishes learnability of PCFGs from small samples if we neglect computational complexity. Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning, Shay B. Cohen and Noah A. Smith, Computational Linguistics (2012)

And if you don't care about sample complexity or computational complexity then Angluin's technical report from 1988 "Identifying languages from stochastic examples" shows that a huge class can be learned.