Monday, September 30, 2013

Evilicious


As I have noted in the past, Andrew Gelman, a statistician at Columbia (and someone whose work I both follow and have learned from) has a bee in his bonnet about Marc Hauser (and, it seems, Chomsky) (see here).  In many of his posts he has asserted that Hauser fraudulently published material that he knew to be false, and this is why he takes such a negative view of Hauser (and those like Chomsky who have been circumspect in their criticisms).  Well, Hauser has a new book out Evilicious (see here for a short review). Interestingly, the book has a blurb from Nicholas Wade, the NYTs science writer that covered the original case. More interestingly the post provides links to Wade’s coverage for the NYT of the original “case” and because I have nothing better to do with my time I decided to go back and read some of that coverage.  It makes for very interesting reading.  Here are several points that come out pretty clearly:

1.     The case against Hauser was always quite tenuous (see here and here).  Of the papers for which he as accused of fabrication, two were replicated very quickly to the satisfaction of Science’s referees. The problem for these was not fabrication, but not having the original tapes readily available.  Sloppy? Perhaps. Fraud? No evidence here.
2.     Of the eight charges of misconduct, five involved unpublished material.  This is a very high standard. I would be curious to know how many other scientists would like to be held responsible for what they did not publish.  A Dr. Galef from McMaster University (here) notes incredulously in the NYT (rightly in my view): “How in the world can they get in trouble for data they didn’t publish?”
3.     L’Affaire Hauser then devolves down to one experiment published in Cognition that Galef notes “was very deeply flawed.” However, as I noted (here) the results have since been replicated. That, like the Science replications, suggests that the original paper was not as flawed as supposed. Sloppy? Yes. Flawed? Perhaps (the replication suggests that what was screwed up did not matter much). Fraud? Possible, but the evidence is largely speculative.
4.     The NYT’s pieces provide a pretty good case that (at least one) outside scientists, who reviewed the case against Hauser, thought that “the accusations were unfounded.” Maybe Galef is a dupe, but his creds look ok to me. At any rate, someone with expertise in Hauser’s field reviewed the evidence against him and concluded that there was not enough evidence to conclude fraud, or anything close to it.
5.     Galef further noted that the Harvard investigating committee did not include people familiar with “the culture of an animal behavior laboratory,” which has “a different approach to research and data-keeping” than what one finds in other domains, especially the physical sciences from which the members of the Harvard investigating committee appeared to come from. I’m pretty sure that few behavioral or Social scientists would like to be subject to the standards of experimental hygiene characteristic of the work in the physical sciences. Is it possible that in the investigation that set the tone for what followed, Hauser was not judged by scientists with the right background knowledge?
6.     The final ORI report concentrates on the retracted (subsequently replicated) Cognition article (here). The claim is that “half the data in the graph were fabricated.” Maybe. It would be nice to know what the ORI based this judgment on. All involved admit that the experimental controls were screwed up and that some of the graphs, as reported, did not reflect the experiments conducted. I have no trouble believing that there was considerable sloppiness (though to repeat, it seems not to have been fatal given the subsequent replication), but this would not support ORI’s assertion of fabrication, a term that carries the connotation of intentional deceit.  I suspect that the ORI judgment rests on the prior Harvard findings. This leaves me thinking: why did this particular set of data get through the otherwise pretty reliable vetting process in Hauser’s lab, one that nixed earlier questionable data?  Recall, the data from five of the other investigated papers were vetted before being sent out and as a result the reported data were changed. What happened in this case? Why did this one slip through? I can understand ORI’s conclusion if tons of published data was fabricated. But this is precisely what there is no evidence for. Why then this one case?  Is it so unlikely that some goof up casued the slip? As Altmann, one of Hauser’s early accusers, notes in the NYT: “It is conceivable that the data were not fabricated, but rather that the experiment was set up wrong, and that nobody realized this until after it was published.” In the end, does the whole case against Hauser really devolve to what happened in this one case? With effectively one set of controls? Really?

I suspect that the real judgment against Hauser rests on the fact that he resigned from Harvard and that the Harvard committee initially set up to investigate him (but whose report, so far as I know, has never been publically made available) decided he was guilty. Again, as Altmann notes in the NYT: his earlier accusation was “heavily dependent on the knowledge that Harvard found Professor Hauser guilty of misconduct.” This, coupled with the thought that Hauser would not have quit were he innocent. But, it’s not a stretch to think of many reasons why a non-guilty person might quit, being fed up with one’s treatment perhaps topping the list. I am not saying that this is why Hauser resigned. I don’t know. But to conclude that he must be guilty because he left Harvard (who but a criminal would leave this august place, right? Though on second thought…) is hardly an apodictic conclusion. In fact, given all the evidence to date, it strikes me that the charges of fraud have very little evidential support, and that there are decent alternative explanations for what took place absent the intention to deceive. In fact, I would suggest that given the gravity of the charge, we should set a pretty high evidential bar before confidently declaring fraud. As far as I can tell from reading what’s in the NYT, this bar has not been remotely approached, let alone cleared.

I think that this will be the last post on this topic for me.  I have harped on it because, frankly, I think Hauser got a raw deal, from what I can tell, and that the condemnations he drew down on himself struck me as a tad too smug, uninformed, and self satisfied. As I’ve said before: fraud does not seem to me to be the biggest source of data pollution, nor the hardest one to root out.  However, such charges are serious and should not be leveled carelessly. Careers are at stake. And from what I can tell, in this particular case, Hauser was not given the benefit of the doubt, as he should have been.   

Thursday, September 26, 2013

Some Free Advice for Authors


Writing a paper is hard. Getting others to read it seriously is often harder.  Writers often make the second task harder for readers by overwriting and the review process often encourages it, with authors trying to bury reviewers’ critical comments under pages of caveats that serve mainly to obscure the main point of the paper. Here are a couple of posts that I’ve found that deliver some useful advice to authors (here and here).

The first post is on Elmore Leonard and his ten rules for writing. If you have never heard of him or never read any of his novels, let me recommend them to you wholeheartedly. He is a terrific “crime” novelist whose books are one of life’s guilty wonderful pleasures. As you will notice, not all the suggested rules will be all that applicable to linguistics writing (though if the paper is on expletives (i.e. it’s raining) then maybe the first one should be ignored. However, I agree with Taylor about 2 and 10 with one caveat. Throat clearing should be avoided but a concise description of the problem of interest and why it is of interest is often very helpful.  This more or less is what Krugman is highlighting in his deliciously nasty piece. 

David Poeppel used to emphasize the importance of explaining why anyone should care about the work you are presenting. Why is the paper worth anyone’s time? Why is the problem important? Why is the data worthy of note?  Why should anyone who has access to a good Elmore Leonard novel spend it reading your paper instead?  You’d be surprised (or not) how often this simple question stumps, and if it stumps an author, then chances are it will have baleful effects on a reader.

Tuesday, September 24, 2013

When UG?

MP makes the working assumption that whatever happened to allow FL to emerge in its current state happened recently in evo time. This, in turn relies on assuming that precursors of us were without our UG (though they may have had quite a bit of other stuff going on between the ears, in fact, they MUST have had quite a bit of stuff going on there). This assumption was recently challenged by Dediu and Levinson (D&L). Here's an evaluation of their paper by Berwick, Hauser and Tattersall (BHT) (here). BHT argue that there is no there there, a feature, it appears, of much of Levinson's current oeuvre (see here). They observe that the evidence for the quick time frame is sorta/kinda supported by the archeological record, but that such evidence can hardly be dispositive as it is not fine grained enough to address the properties of "the core linguistic competence" of our predecessors as this "does not fossilize." However, such that exists does appear to (weakly) support the envisaged timeframe proposed (roughly 100,000 years). Indeed, as BHT note, D&L misrepresent an important source (Somel et. al) which, concludes, contrary to D&L that: "There is accumulating evidence that human brain development was fundamentally reshaped through several genetic events within the short time space between the human-Neandertahl split and the emergence of modern humans."

So take a look. The D&L paper got a lot of play, but if BHT are right (that's where my money is) then it's pretty much a time sink with little to add to the discussion. You surprised? I'm not. But read away.

Monday, September 23, 2013

Why Formalize- II?


Ewan (here) provides me the necessary slap upside the head, thereby preventing a personality shift from stiff-necked defender of the faith to agreeable, reasonable interlocutor. Thanks, I needed that. My recognition that Alex C (and Thomas G) had reasonable points to make in the context of our discussion, had stopped me from thinking through what I take the responsibilities of formalizations to consist in. I will try to remedy this a bit now. 

Here’s the question: what makes for a good formalization. My answer: a good formalization renders perspicuous the intended interpretation of the theory that it is formalizing.  In other words, a good formalization (among other things) clarifies vagaries that, though not (necessarily) particularly relevant in theoretical practice, constitute areas where understanding is incomplete. A good formalization, therefore, consults the theoretical practice of interest and aims to rationalize it through formal exposition. Thus formalizing theoretical practice can have several kinds of consequences. Here are three (I’m sure there are others): it might reveal that a practice faces serious problems of one kind or another due to implicit features of its practice (see Berwick’s note here) (or even possible inconsistency (think Russell on Frege’s program)), or it might lay deeper foundations (and so set new questions) for a practice that is vibrant and healthy (think Hilbert on Geometry), or it may attempt to clarify the conceptual bases of a widespread practice (think Frege and Russell/Whitehead on the foundations of arithmetic). At any rate, on this conception, it is always legit to ask if the formalization has in fact captured the practice accurately. Formalizations are judged against the accuracy of their depictions of the theory of interest, not vice versa.

Now rendering the intended interpretation of a practice is not at all easy.  The reason is that most practices (at least in the sciences) consist of a pretty well articulated body of doctrine (a relatively explicit theoretical tradition) and an oral tradition. This is as true for the Minimalist Program (MP) as for any other empirical practice. The explicit tradition involves the rules (e.g. Merge) and restrictions on them (e.g. Shortest Attract/Move, Subjacency). The oral tradition includes (partially inchoate) assumptions about what the class of admissible features are, what a lexical item is, how to draw the functional/lexical distinction, how to understand thematic notions like ‘agent,’ ‘theme,’ etc.  The written tradition relies on the oral one to launch explanations: e.g. thematic roles are used by UTAH to project vP structure, which in turn feeds into a specification of the class of licit dependencies as described by the rules and the conditions on them. Now in general, the inchoate assumptions of the oral tradition are good enough to serve various explanatory ends, for there is wide agreement on how they are to be applied in practice. So for example, in the thread to A Cute Example (here) what I (and, I believe David A) found hard to understand in Alex C’s remarks revolved around how he was conceptualizing the class of possible features. Thomas G came to the rescue and explained what kinds of features Alex C likely had in mind:

"Is the idea that to get round 'No Complex Values', you add an extra feature each time you want to encode a non-local selectional relation? (so you'd encode a verb that selects an N which selects a P with [V, +F] and a verb that selects an N which selects a C with [V, +G], etc)?"



Yes, that's pretty much it. Usually one just splits V into two categories V_F and V_G, but that's just a notational variant of what you have in mind.

Now, this really does clarify things. How? Well, for people like me, these kinds of features fall outside the pale of our oral tradition, i.e. nobody would suggest using such contrived items/features to drive a derivation. They are deeply unlike the garden variety features we standardly invoke (e.g. +Wh, case, phi, etc.) and, so far as I can tell, restricted to these kinds of features, the problem Alex C notes does not seem to arise.

Does this mean that all is well in the MP world?  Yes and No.

Yes, in the sense that Alex C’s worries, though logically on point, are weak in a standard MP (or GB) context for nobody supposes the kinds of features he is using to generate the problem exist. This is why I still find the adverb fronting argument convincing and dispositive with regard to the learnability concerns it was deployed to address. Though I may not know how to completely identify the feature malefactors that Thomas G describes, I am pretty sure that nothing like them is part of a standard MPish account of anything. [1] For the problem Alex C identifies to be actually worrisome (rather than just be possibly so) would require showing that the run of the mill, every day, garden variety features that MPers use daily could generate trouble, not features that practitioners would reject as “nutty” could.[2]

No, because it would be nice to know how to define these possible “nutty” features out of existence and not merely rely on inchoate aspects of the oral tradition to block them. If we could provide an explicit definition for what counts as a legit feature (and what not) then we will have learned something of theoretical interest even if it fails to have much of an impact on the practice as a result the latter’s phobia for such kinds of features to begin with. Let me be clear here: this is definitely a worthwhile thing to do and I hope that someone figures out a way to do this. However, I doubt that it will (or should) significantly alter the conclusion concerning FL/UG like those animated by Chomsky’s “cute example.” [3]  


[1] Alex D makes this same point (here), and I agree:
I completely agree with you that if we reject P&P, the evaluation measure ought to receive a lot more attention. However, in the case of trivial POS argument such as subject/aux inversion, I think the argument can be profitably run without having a precise theory of the evaluation measure.
[2] Bob Berwick’s comment (here) shows how to do this in the context of GPSG and HPSG style grammars. These systems rely on complex features to do what transformations do in GB style theories. What Bob notes is that in the context of features endowed with these capacities, serious problems arise. The take home message: don’t allow such features.
[3] This seconds David Adger’s conclusion in the comment section (here). Let me quote:
I am convinced by what I think is a slightly different argument, which is that you can use another technology to get the dependency to work (passing up features), but that just means our theory should be ruling out that technology in these cases. as the core explanandum (i.e. the generalization) still needs explained. I think that makes sense…

Thursday, September 19, 2013

More on the Gallistel-King Conjecture

Recall the idea: mental computation involves DNA/RNA manipulations. The idea is that these complex molecules have the structure to support classical computations, as opposed to neural nets, and are ideal candidates for memory storage, again as opposed to nets. The Gallsitel-King conjecture is that brain computations exploit this structure (e.g. here).  One place quite ripe for this kind of process is long term memory. There is progressively more evidence that the genome plays an important role in this area (see here). Here are two more reports (here and here) on recent memory research highlighting  the role of genes in fixing and extinguishing memories.

Faking it


Andrew Gelman, a statistician at Columbia (and one whose opinions I generally respect (I read his blog regularly) and whose work, to the degree that I understand it, I really like), has a thing about Hauser (here).[1] What offends him are Hauser’s (alleged) data faking (yes, I use ‘alleged’ because I have personally not seen the evidence, only heard allegations, and given how easy these are to make, well, let's just try not to jump to comfortable conclusions). Here he explains why the “faking” is so bad, not rape, murder or torture bad, but science wise bad. Why? Because  what fake data does is “waste people’s time (and some lab animals’ lives) and slow down the progress of science.” Color me skeptical.

Here’s what I mean: is this a generic claim or one specific to Hauser’s work?  If the latter, then I would like to see the evidence that his alleged improprieties had any such effect. Let me remind you again (see here, here,) that the results of all of Hauser’s papers that were questioned have since been replicated. Thus, the conclusions of these papers stand. Anyone who relied on them to do their research did just fine.  Was there a huge amount of time and effort wasted? Did lab animals get used in vain? Maybe. What’s the evidence? And maybe not. They all replicated. Moreover, if the measure of his crime is wasted time and effort, did Hauser’s papers really lead down more blind alleys and wild goose chases then your average unreplicable psych or neuro paper (here).

As for the generic claim, I would like to see more evidence for this as well.  Among the “time wasters” out there, is faked data really the biggest problem, or even a very big problem?  Or is this sort of like Republican "worries" about fake voters inundating the polls and voting for Democrats? My impression is that the misapplication of standard statistical techniques to get BS results that fail to replicate are far more problematic (see here and here). If this is so, then Gelman’s fake data worries, by misdirection, may be leading us away from the more serious time wasters, i.e. it diverts attention from the real time sinks, viz. the production of non-replicable “results,” which, so far as I can tell is closely tied to the use of BS statistical techniques to coax significance in one out of every 20 or so experiments. We should be so lucky that the main problem is fakery!

So that I am not misunderstood, let me add, that nobody I know condones faking data. But this is not because it in some large measure retards the forward march of science (this claim may be true, but it is not a truism), but because faking is quite generally bad. Period (again, not unlike voter fraud). And it should not be practiced or condoned for the same reason that lying, bullying, and plagiarism should not be practiced or condoned. These are all lousy ways to behave.  However, that said, I have real doubts that fake data is the main problem holding back so many of the “sciences,” and claiming otherwise without evidence can misdirect attention from where it belongs. The main problem with many of the “sciences” is absence of even a modicum of theory, i.e. lack of insight (i.e. absence of any idea about what’s going on), and all the data mining in the world cannot substitute for one or two really good ideas. The problem I have with Gelman’s obsession is that in the end it suggests a view of science that I find wrongheaded: that data mining is what science is all about. As the posts noted above indicate, I could not disagree more.


[1] This is just the latest of many posts on this topic. Gelman, for some reason I cannot fathom, also has a thing about Chomsky, as flipping through his blog will demonstrate (e.g. here, and here).