Faculty of Language: July 2013

Wednesday, July 31, 2013

More on MOOCs

Not surprisingly, MOOCs are increasingly a hot topic and now that some are being tried out, problems are sprouting. Here's a piece reporting on one such experiment. I wouldn't make too big a deal out of this apparent failure (at least if Udacity's reaction is anything to go by) for the technology is in its earlier stages and will no doubt improve. What is worth noting in this piece starts at paragraph four onwards. The hype associates with MOOCs is nothing new. Moreover, what makes most salivate is not the immeasurable improvements to education that MOOCs will generate, but the COST SAVINGS and PROFITS that its biggest fans can clearly taste. The piece reminds us that this is always so with new technologies. Moreover, it is always the case that boosters (those who have the most to gain) are optimistic and sell the innovations based on best case scenarios of what the new technology will bring. Some believe (e.g. Brad DeLong) that the good features of MOOCs can be harnessed and the downsides mitigated through careful monitoring of their implementation. I am less sanguine. When there is money to be made and saved and that it the primary attraction, then more hard to measure features, e.g. enhanced education, often give way.

This said, I have a modest proposal. If MOOCs are really the way of the future and their real attraction is their enhanced educational promise, then let's try them out FIRST in elite institutions. I suggest that harvard, Yale, Princeton, MIT, Stanford, Duke, etc. announce that in order to enhance the educational experience of their undergrads that they are going to shift to MOOCs in a big way. If it's the case that MOOCs really are better then students and their parents should be delighted to see them replace more stuffy, less educationally advanced methods of knowledge delivery. If MOOCs are all that they are cracked up to be, or can be all that they are so cracked, then let's experiment first with elite students and when the kinks are worked out, expand these to everyone else.

Call me cynical, but I suspect that this will be a hard sell at elite schools. I don't see their "customers" rallying to MOOC style education. Of course, I might be wrong. Let's see.

Monday, July 29, 2013

More on Word Acquisition

In some earlier posts (e.g. here, here), I discussed a theory of word acquisition developed by Medina, Snedeker, Trueswell and Gleitman (MSTG) that I took to question whether learning in the classical sense ever takes place. MSTG propose a theory they dub “Propose-but-Verify” (PbV) that postulates that word learning in kids is (i) essentially a one trial process where everything but the first encounter with a word is essentially irrelevant, (ii) at any given time only one hypothesis is being entertained (i.e. there is no hypothesis testing;/comparison going on) and (iii) that updating only occurs if the first guess is disconfirmed, and then it occurs pretty rapidly. MSTG’s theory has two important features. First, it proceeds without much counting of any sort, and second, the hypothesis space is very restricted (viz. it includes exactly one hypothesis at any given time). These two properties leave relatively little for stats to do as there is no serious comparison of alternatives going on (as there’s only one candidate at a time and it gets abandoned when falsified).

This story was always a little too good to be true. After all, it seems quite counterintuitive to believe that single instances of disconfirmation would lead word acquirers (WA) to abandon a hypothesis. And not surprisingly, as is often the case, those things too good to be true might not be. However, a later reconsideration of the same kind of data by a distinguished foursome (partially overlapping, partially different) argues that the earlier MSTG model is “almost” true, if not exactly spot on.

In a new paper (here) Stevens, Yang, Trueswell and Gleitman (SYTG) adopt (i)-(iii) but modify it to add a more incremental response to relevant data. The new model, like that older MSTG one, rejects “cross situational learning” which SYTG take to involve “the tabulation of multiple, possibly all, word-meaning associations across learning instances” (p.3) but adds a more gradient probabilistic data evaluation procedure. The process works as follows. It has two parts.

First, for “familiar” words, this account, dubbed “Pursuit with abandon” (p. 3) (“Pursuit” (P) for short), selects the single most highly valued option (just one!) and rewards it incrementally if consistent with the input and if not it decreases its score a bit while also randomly selecting a single new meaning from “the available meanings in that utterance” (p. 2) and rewarding it a bit. This take-a-little, give-a-little is the stats part. In contrast to PbV, P does not completely dump a disconfirmed meaning, but only lowers its overall score somewhat. Thus, “a disconfirmed meaning may still remain the most probable hypothesis and will be selected for verification the next time the word is presented in the learning data” (p. 3). SYTG note that replacing MSTG’s one strike you’re out “counting” procedure, with a more gradient probabilistic evaluation measure adds a good deal of “robustness” to the learning procedure.

Second, for novel words, P encodes “a probabilistic form of the Mutual Exclusivity Constraint…[viz.] when encountering novel words, children favor mapping to novel rather than familiar meanings” (p. 4). Here too the procedure is myopic, selecting one option among many and sticking with it until it fails enough to be replaced via step one above.

Thus, the P model, from what I can tell, is effectively the old PbV model but with a probabilistic procedure for, initially, deciding on which is the “least probable” candidate (i.e. to guide an initial pick) and for (dis)confirming a given candidate (i.e. to up/downgrade a previously encountered entry). Like the PbV, P is very myopic. Both reject cross situational learning and concentrate on one candidate at a time, ignoring other options if all goes well and choosing at random if things go awry.

This is the P model. Using simulations based on Childes data, the paper goes on to show that this system is very good when compared both with PbV and, more interestingly, with more comprehensive theories that keep many hypothesis in play throughout the acquisition process. To my mind, the most interesting comparison is with Bayesian approaches. I encourage you to take a look at the discussion of the simulations (section 3 in the paper). The bottom line is that the P model bested the three others on overall score, including the Bayesian alternative. Moreover, SYTG was able to identify the main reason for the success: non-myopic comprehensive procedures fail to sufficiently value “high informative cues” provided early in the acquisition process. Why? Because comprehensive comparison among a wide range of alternatives serves to “dilute the probability space” for correct hits, thereby “making the correct meaning less likely to be added to the lexicon” (P. 6-7). It seems that in the acquisition settings found in CHILDES (and in MSTGs more realistic visual settings), this dilution prevents WAs from more rapidly building up their lexicons. As SYTG put it:

The advantage of the Pursuit model over cross-situational models derives from its apparent sub-optimal design. The pursuit of the most favored hypothesis limits the range of competing meanings. But at the same time, it obviates the dilution of cues, especially the highly saline first scene…which is weakened by averaging with more ambiguous leaning instances…[which] are precisely the types of highly salient instances that the learner takes advantage of…(p. 7).

There is a second advantage of the P model as compared to a more sophisticated and comprehensive Bayesian approach. SYTG just touch on this, but I think it is worth mentioning. The Bayesian model is computationally very costly. In fact, SYTG notes that full simulations proved impractical as “each simulation can take several hours to run” (p. 8). Scaling up is a well-known problem for Bayesian accounts (see here), which is probably why Bayesian proposals are often presented as Marrian level 1 theories rather than actual algorithmic procedures. At any rate, it seems that the computational cost stems from precisely the feature that makes Bayesian models so popular: their comprehensiveness. The usual procedure is to make the hypothesis space as wide as possible and then allow the “data” to find the optimal one. However, it is precisely this feature that makes the obvious algorithm built on this procedure intractable.

In effect, SYTG show the potential value of myopia, i.e. in very narrow hypothesis spaces. Part of the value lies in computational tractability. Why? The narrower the hypothesis space, the less work is required of Bayesian procedures to effectively navigate the space of alternatives to find the best candidate. In other words, if the alternatives are few in number, the bulk of explaining why we see what we get will lie not with fancy evaluation procedures, but with the small set of options that are being evaluated. How to count may be important, but it is less important the fewer things there are to count among. In the limit, sophisticated methods of counting may be unnecessary, if not downright unproductive.

The theme that comprehensiveness may not actually be “optimal” is one that SYTG emphasize at the end of their paper. Let me end this little advertisement by quoting them again:

Our model pursues the [i.e. unique NH] highly valued, and thus probabilistically defined, word meaning at the expense of other meaning candidates. By contrast, cross-situational models do not favor any one particular meaning, but rather tabulate statistics across learning instances to look for consistent co-occurrences. While the cross-situational approach seems optimally designed [my emph, NH], its advantage seems outweighed by its dilution effects that distract the learner away from clear unambiguous learning instances…It is notable that the apparently sub-optimal Pursuit model produces superior results over the more powerful models with richer statistical information about words and their associated meanings: word learning is hard, but trying to hard may not help.

I would put this slightly differently: it seems that what you choose to compare may be as (more?) important than how you choose to compare them. SYTG reinforces MSTG’s earlier warning about the perils of open-mindedness. Nothing like a well designed narrow hypothesis space to aid acquisition. I leave the rationalist/empiricist overtones of this as an exercise for the reader.

Friday, July 26, 2013

Guest Post: Tim Hunter on Minimalist Grammars and Stats

I have more than once gotten the impression that some think that generative grammarians (minimalists in particular) have a hostility to combing grammars and stats because of some (misguided, yet principled) belief that grammars and probabilities don't mix. Given the wide role that probability estimates play in processing theories, learnability models, language evolution proposals, etc. the question is not whether grammars and stats ought to be combined (yes they should be) but how they should be combined. Grammarians should not fear stats and the probabilistically inclined should welcome grammars. As Tim notes below there are two closely related issues: what to count and how to count it. Grammars specify the whats, stats the hows. The work Tim discusses was done jointly with Chris Dyer (both, I am proud to say, UMD products) and I hope that it encourages some useful discussion on how to marry work on grammars with stats to produce useful and enlightening combinations.

Tim Hunter Post:

Norbert came across this paper, which defines a kind of probabilistic minimalist grammar based on Ed Stabler's formalisation of (non-probabilistic) minimalist grammars, and asked how one might try to sum up "what it all means". I'll mention two basic upshots of what we propose: the first is a simple point about the compatibility of minimalist syntax with probabilistic techniques, and the second is a more subtle point about the significance of the particular nuts and bolts (e.g. merge and move operations) that are hypothesised by minimalist syntacticians. Most or all of this is agnostic about whether minimalist syntax is being considered as a scientific hypothesis about the human language faculty, or as a model that concisely captures useful generalisations about patterns of language use for NLP/engineering purposes.

Norbert noted that it is relatively rare to see minimalist syntax combined explicitly with probabilities and statistics, and that this might give the impression that minimalist syntax is somehow "incompatible" with probabilistic techniques. The straightforward first take-home message is simply that we provide an illustration that there is no deep in-principle incompatibility there.

This, however, is not a novel contribution. John Hale (2006) combined probabilities with minimalist grammars, but this detail was not particularly prominent in that paper because it was only a small piece of a much larger puzzle. The important technical property of Stabler's formulation of minimalist syntax that Hale made use of had been established even earlier: Michaelis (2001) showed that the well-formed derivation trees can be defined in the same way as those of a context-free grammar, and given this fact probabilities can be added in essentially the same straightforward way that is often used to construct probabilistic context-free grammars. So everything one needs for showing that it is at least possible for these minimalist grammars to be supplemented with probabilities has been known for some time.

While the straightforward Hale/Michaelis approach should dispel any suspicions of a deep in-principle incompatibility, there is a sense in which it does not have as much in common with (non-probabilistic) minimalist grammars as one might want or expect. The second, more subtle take-home message from our paper is a suggestion for how to build on the Hale/Michaelis method in a way that better respects the hypothesised grammatical machinery that distinguishes minimalist/generative syntax from other formalisms.

As mentioned above, an important fact for the Hale/Michaelis method is that minimalist derivations can be given a context-free characterisation; more precisely, any minimalist grammar can be converted into an equivalent multiple context-free grammar (MCFG), and it is from the perspective of this MCFG that it becomes particularly straightforward to add probabilities. The MCFG that results from this conversion, however, "misses generalisations" that the original minimalist grammar captured. (The details are described in the paper, and are reminiscent of the way GPSG encodes long-distance dependencies in context-free machinery by using distinct symbols for, say, "verb phrase" and "verb phrase with a wh-object", although MCFGs do not reject movement transformations in the way that GPSG does.) In keeping with the slogan that "Grammars tell us what to count, and statistical methods tell us how to do the counting", in the Hale/Michaelis method it is the MCFG that tells us what to count, not the minimalist grammar that we began with. This means that the things that get counted are not defined by notions such as merge and move operations, theta roles or case features or wh features, which appeared in the original minimalist grammar; rather, the counts are tied to less transparent notions that emerge in the conversion to the MCFG.

We suggest a way around this hurdle, which allows the "what to count" question to be answered in terms of merge and move and feature-checking and so on (while still relying on the context-free characterisation of derivations to a large extent). The resulting probability model therefore works within the parameters that one would intuitively expect to be laid out for it by the non-probabilistic machinery that defines minimalist syntax; to adopt merge and move and feature-checking and so on is to hypothesise certain joints at which nature is to be carved, and the probability model we propose works with these same joints. Therefore to the extent that this kind of probability model fares empirically better than others based on different nuts and bolts, this would (in principle, prima facie, all else being equal, etc.) constitute evidence in favour of the hypothesis that merge and move operations are the correct underlying grammatical machinery.

Thursday, July 25, 2013

Noam Chomsky Walks into a Bar

Bill Idsardi sent me this link to an interview of Chomsky when he was in Ann Arbor. It pretty well reprises the themes that Chomsky touched on in his public lecture. The discussion makes clear why Chomsky thinks that the communicative view of language is both empirically incorrect (basic structure does not facilitate or reflect communication goals) and methodologically hopeless as an object of study (just much too complicated) There are also some pretty amusing comments that the interviewer intersperses and a link to the full raw interview. A personal remark: there is something charming about the interview, both the questions and internal dialogue and Chomsky's openness and availability. Those who have interacted with Chomsky will recognize these traits and be, once again, delighted. Enjoy.

Wednesday, July 24, 2013

LSA Summer Camp

The LSA summer institute just finished last week. Here are some impressions.

In many ways it was a wonderful experience and it brought back to me my life as a graduate student. My apartment was “functional” (i.e. spare and tending towards the slovenly). As in my first grad student apartments, I had a mattress on the floor and an AC unit that I slept under. The main difference this time around was that that the AC unit I had at U Mich was considerably smaller than the earlier industrial strength machine that was able to turn my various abodes into a meat locker (I’m Canadian/Quebecois and ¯“mon pays ce n’est pas un pays c’est l’hiver…”¯ !). In fact, this time around the AC was more like ten flies flapping vigorously. It was ok if I slept directly under the fan (hence the floor mattress). The downside, something that I do not remember from my experience 40 years ago, was that this time around, getting up out of bed was more demanding now than it was then.

I was at the LSA to teach intro to minimalist syntax. It was a fun course to teach. There were between 80-90 people that attended regularly, about half taking the course for some kind of credit. To my delight, there was real enthusiasm for minimalist topics and the discussion in class was always lively. The master narrative for the course was that the Minimalist Program (MP) aims to answer a “newish” question: what features of FL are peculiarly linguistic? The first lecture and a half consisted of a Whig history of Generative Grammar, which tried to locate the MP project historically. The main idea was that if one’s interest lies in distinguishing the cognitively general from the linguistically parochial within FL there have to be candidate theories of FL to investigate. GB (for the first time) provides an articulated version of such a theory, with the sub-modules, (i.e. Binding theory, control theory, movement, subjacency, the ECP, X’ theory etc.) providing candidate “laws of grammar.” The goal of MP is to repackage these “laws” in such a way as to factor out those features that are peculiar to FL from those that are part of general cognition/computation. I then suggested that this project could be advanced by unifying the various principles in the different modules in terms of Merge, in effect eliminating the modular structure of FL. In this frame of mind, I showed how various proposals within MP could be seen as doing just this: Phrase Structure and Movement as instances of Merge (E and I respectively), case theory as an instance of I-merge, control, and anaphoric binding as instances of I-merge (A-chain variety) etc. It was fun. The last lectures were by far the most speculative (it involved seeing if we could model pronominal binding as an instance of A-to-A’-to-A movement (don’t ask)) but there was a lot of interesting ongoing discussion as we examined various approaches for possible unification. We went over a lot of the standard technology and I think we had a pretty good time going over the material.

I also went on a personal crusade against AGREE. I did this partly to be provocative (after all most current approaches to non-local dependencies rely on AGREE in a probe-goal configuration to mediate I-merge) and partly because I believe that AGREE introduces a lot of redundancy into the theory, not a good thing, so it allowed us to have a lively discussion of some of the more recondite evaluative considerations that MP elevates.[1] At any rate, here the discussion was particularly lively (thanks Vicki) and fun. I would love to say that the class was a big hit, but this is an evaluation better left to the attendees than to me. Suffice it to say, I had a good time and the attrition rate seemed to be pretty low.

One of the perks of teaching at the institute is that one can sit in on one’s colleagues’ classes. I attended the class given by Sam Epstein, Hisa Kitihara and Dan Seely (EKS). It was attended by about 60 people (like I said, minimalism did well at this LSA summer camp). The material they covered required more background than the intro course I taught and EKS walked us through some of their recent research. It was very interesting. The aim was to develop of an account of why transfer applies when it does. The key idea was that cyclic transfer is forced in computations that result in in multi-peaked structures that themselves result from strict adherence to derivations that respect (an analogue of) Merge-Over-Move and feature lowering of the kind that Chomsky has recently proposed. The technical details are non-trivial so those interested should hunt down some of their recent papers.[2]

A second important benefit of EKS’s course was the careful way that they went through some of Chomsky’s more demanding technical suggestions, sympathetically yet critically. We had a great time discussing various conceptions of Merge and how/if labeling should be incorporated into core syntax. As many of you know, Chomsky has lately made noises that labeling should be dispensed with on simplicity grounds. Hisa (with kibbitzing from Sam and Dan) walked us though some of his arguments (especially those outlined in “Problems of Projection”). I was not convinced, but I was enlightened.

Happily, in the third week, Chomsky himself came and discussed these issues in EKS’s class. The idea he proposed therein was that phrases require labels at least when transferred to the CI interface. Indeed, Chomsky proposed a labeling algorithm that incorporated Spec-Head agreement as a core component (yes, it’s back folks!!). It resolves labeling ambiguities. To be slightly less opaque: in {X, YP} configurations the label is the most prominent (least embedded) lexical item (LI) (viz. X). In {XP, YP} configurations there are two least embedded LIs (viz. descriptively, the head of X and the head of Y). In these cases, agreement enters to resolve the ambiguity by identifying the two heads (i.e. thereby making them the same). Where agreement is possible, labeling is as well. Where it is not, one of the phrases must move to allow labeling to occur in transfer to CI. Chomsky suggested that this requirement for unambiguous labeling (viz. the demand that labels be deterministically computed) underlies successive cyclic movement.

To be honest, I am not sure that I yet fully understand the details enough to evaluate it (to be more honest, I think I get enough of it to be very skeptical). However, I can say that the class was a lot of fun and very thought provoking. As an added bonus, it brought me and Vicki Carstens together on a common squibbish project (currently under construction). For me it felt like being back in one of Chomsky’s Thursday lectures. It was great.

Chomsky gave two other less technical talks that were also very well attended. All in all, a great two days.

There were other highlights. I got to talk to Rick Lewis a lot. We “discussed” matters of great moment over excellent local beer and some very good single malt scotch. It was as part of one of these outings that I got him to allow me to post his two papers here. One particularly enlightening discussion involved the interpretation of the competence/performance distinction. He proposed that it be interpreted as analogous to the distinction between capacities and exercisings of capacities. A performance is the exercise of a capacity. Capacities are never exhausted by their exercisings. As he noted, on this version of the distinction one can have competence theories of grammars, of parsers, and of producers. On this view, it’s not that grammars are part of the theory of competence and parsers part of the theory of performance. Rather, the distinction marks the important point that the aim of cognitive theory is to understand capacities, not particular exercisings thereof. I’m not sure if this is exactly what Chomsky had in mind when he introduced the distinction, but I do think that it marks an important distinction that should be highlighted (one further discussed here).

Let me end with one last impression, maybe an inaccurate one, but one that I nonetheless left with. Despite the evident interest in minimalist/biolinguistic themes at the institute, it struck me that this conception of linguistics is very much in the minority within the discipline at large. There really is a linguistics/languistics divide that is quite deep, with a very large part of the field focused on the proper description of language data in all of its vast complexity as the central object of study. Though, there is no a priori reason why this endeavor should clash with the biolinguistic one, in practice it does.

The two pursuits are animated by very different aesthetics, and increasingly by different analytical techniques. They endorse different conceptions of the role of idealization, and different attitudes towards variation and complexity. For biolinguists, the aim is to eliminate the variation, in effect to see through it and isolate the individual interacting sub-systems that combine to produce the surface complexity. The trick on this view is to find a way of ignoring a lot of the complex surface data and hone in on the simple underlying mechanisms. This contrasts with a second conception, one that embraces the complexity and thinks that it needs to be understood as a whole. On this second view, abstracting from the complex variety manifested in the surface forms is to abstract away from the key features of language. On this second view, language IS variation, whereas from the biolinguistic perspective a good deal of variation is noise.

This, of course, is a vast over-simplification. But I sense that it reflects two different approaches to the study of language, approaches that won’t (and can’t) fit comfortably together. If so, linguistics will (has) split into two disciplines, one closer to philology (albeit with fancy new statistical techniques to bolster the descriptive enterprise) and one closer to Chomsky’s original biolinguistic conception whose central object of inquiry is FL.

Last point: One thing I also discovered is how much work running one of these Insitutes can be. The organizers at U Michigan did an outstanding job. I would like to thank Andries Coetze, Robin Queen, Jennifer Nguyen and all their student helpers for all their efforts. I can be very cranky (and I was on some days) and when I was, instead of hitting me upside the head, they calmly and graciously settled me down, solved my “very pressing” problem and sent me on my merry way. Thanks for your efforts, forbearance and constant good cheer.

[1] I make this argument in chapter 6 here.

[2] See the three papers in 2010, 2011, and 2012 by EKS noted here.

Dear Peter

CB assures me that Peter Ackema was NOT review editor when the review of Of Mind and Language mentioned in the previous post was solicited. As such you deserve my sincerest apologies. I can see how being falsely accused of such solicitation would border on defamation of intellect. Sadly, you were review editor at the time of its publication. I hope (and would like to believe) that you were not part of the review process and that the unstoppable wheels of the JL juggernaut would have crushed you had you tried to intervene and delay publication until a modicum of content could have been added. But that is a lot to ask: schedules are schedules after all. So, sorry for personally singling you out. I should have appreciated the long lag time between solicitation and publication would have had another captain at the helm. However, now that you are the editor may I suggest a little more quality control.

Tuesday, July 23, 2013

"Inverse" Reviews

One of the secrets of time management for the busy linguistic professional is to know what to read. I rely on two sources to guide me.

First and most important, I rely on my colleagues. If they recommend something, I generally rush off to take a look. Why so obedient? Because I really trust my colleagues’ judgments. Not only do they know a good paper when they read one, they also know have very good taste in topics, i.e. they know which are worth worrying about and which a waste of time. They know a good argument, analysis, criticism, proposal when they see one and, equally important, can spot work that is best left undisturbed by human eyes. In other words, they have judgment and good taste and so their recommendations are golden.

Second I rely on reviews. I love reviews. These come in two flavors: reviews by those whose taste and judgment you trust and those (let’s dub these “Inverse” reviews (IR)) by reviewers whose taste and judgment you don’t. Both are very very useful. You can guess why I value the former. They strongly correlate with “worthwhiledness,” (W). But the latter are also very useful for they are full of information (in the technical sense: viz. they inversely strongly correlate with W). After all if someone with execrable taste and barely competent analytic abilities loves a certain piece of work, then what better reason can one have to avoid it. And the converse holds as well: what better recommendation for a book than a negative review by one whose taste and competence you deplore. One could go further, praise from such a source would be reason enough to question one’s own positive evaluations!

Why do I mention this? For two reasons. First, I recently came across a very useful review of what I took to be a very good and enlightening book about the Minimalist Program. The book is Of Minds and Language. Here’s a review I did with Alex Drummond. Happily, my positive reaction to this discussion about Minimalism was seconded by this very negative review here. Faithful readers of this blog will recognize the deft comments of CB. With characteristic flair, CB pans the book, and applies her keen critical analysis to Chomsky’s every (imaginary) faux pas. So there you have it; a strong recommendation from me (and Alex) and, if possible, an even stronger inverse recommendation (and hence a definite must-read for the wise) from CB.

One last point: despite the utility of reviews like CB’s for people like me, they do have one failing. They don’t really engage the subject matter and so do not add much to the discussion. This is too bad. A good review is often as (and sometimes, more) interesting as the thing reviewed. Think of Chomsky’s famous review of Skinner’s Verbal Behavior for example (here). IRs despite their great informational content are not really worth reading. As such, a question arises: why do journals like the Journal of Lingusitics (JL) and review editors like Peter Ackema solicit this kind of junk? True, Ackema and JL are doing a public service in that these reviews are excellent guides about what to read (remember inverse correlations of quality are reliable indicators of quality; all that’s needed is a handy minus sign to reveal the truth). But really, do Ackema and JL really think that this adds anything of substance? Impossible! So why do they solicit and print this stuff? I think I know.

Taken in the right frame of mind, these kinds of reviews can be quite entertaining. I think that JL has joined the Royal Society’s efforts (here) to lighten the tone of scholarly research by soliciting (unconscious?) parodies. It’s probably a British thing, you know Monty Python Does Scholarship or Linguistics Beyond the Fringe. Or maybe this stuff reads better with a British accent (I still think that half of what makes Monty Python, The Goon ~~Squad~~ Show (thanks to David P for correction) and Beyond the Fringe funny is the accent). At any rate, it’s clear that the editors of these journals have decided that there’s no future in real scholarship and have decided to go into show business big time (not unlike Science). I have nothing against this, though I do think that they should have warned their readers before including parody in their pages. But now you are warned and you can read these papers with pleasure, all the while also extracting lots of useful information, as one can from perfect negative correlations.

Oh yes: I am sure that CB will be busy correcting my errors in the comment sections. I will refrain from replying given my policy of refraining from engaging CB ever again, but this should not prevent you from enjoying yourselves.