Faculty of Language: Two more short posts by others on stats

Friday, March 11, 2016

Two more short posts by others on stats

Stats are powerful tools that, apparently, are also very confusing. Many scientists who use them don't understand what they are using. Or that, at least, is what Andrew Gelman thinks (here). As he puts it, "applied statistics is hard," and though one can improve with practice, it seems that many don't really get it. This is useful for those of us in fields where stats are coming into vogue.We should not ignore a tool that when used correctly can be very useful, but we should not be mesmerized by technical pyrotechnics (and R packages). In fields with large effect sizes (e.g. like most of syntax if Sprouse and Co are correct (which, I never tire from reiterating, they are)), stats are of largely secondary importance. Here's some Gelmanic wisdom:

Incompetent statistics does not necessarily doom a research paper: some findings are solid enough that they show up even when there are mistakes in the data collection and data analyses. But we’ve also seen many examples where incompetent statistics led to conclusions that made no sense but still received publication and publicity.

Someone once mentioned to me the following advice that they got in their first stats class (at MIT no less). The prof said that if you need fancy stats to drag a conclusion from the data generated by your experiment, then do another experiment. Stats are largely useful from distinguishing signal from noise. When things are messy, it can help you find out the underlying trends. Of course, there is always another way of doing this: make sure that things are not messy in the first place, and this is means make sure your design does not generate a lot of noise. Sadly, we cannot always do this and so we need to reach for that R package. But, more often than not, powerful techniques create a kind of moral hazard wrt our methods of inquiry. Sadly, there really is no substitute for thinking clearly and creatively about a problem.

Here's a second post for those clambering to understand what their statistically capable colleagues are talking about when they talk p-values. Look at the comments too, as some heavy weights chime in. Here's one that Sean Carroll (the physicist makes):

Particle physicists have the luxury of waiting for five sigma since their data is very clean and they know how to collect more and more of it.

In this regard, I think that most linguists (those not doing pragmatics) are in a similar situation. The data is pretty clean and we can easily get lots more.

22 comments:

Noah MotionMarch 12, 2016 at 6:31 AM
Particle physicists have the luxury of waiting for five sigma since their data is very clean and they know how to collect more and more of it.

If the data were very clean, they wouldn't need fancy statistical tests to distinguish the signal from the noise.

Numerous three- and four-sigma results have turned out to be false alarms, so physicists have adopted a stricter criterion. Also, experiments in particle physics produce huge numbers of observations, and they're often looking for signals in many places, so it's actually quite likely that they will observe at least a few three- and four-sigma events, even when there's no signal at all.

The responses from Michael Betancourt, Jay Wacker, and Joshua Engel in this Quora Q&A talk about these issues in some detail.
ReplyDelete
Replies
Kirill VasiltsovMarch 13, 2016 at 7:57 AM
This comment has been removed by the author.
ReplyDelete
Replies
Shuichi YatabeMarch 13, 2016 at 10:20 PM
The factual claims that syntacticians make are not that reliable when it comes to languages other than English. That is what is shown in Tal Linzen and Yohei Oseki's paper "The reliability of acceptability judgments across languages," available at: http://tallinzen.net/media/papers/linzen_oseki_acceptability.pdf
My own experience with the literature on Japanese syntax corroborates their claim. I fear that Sprouse and Co. may be doing a disservice to the field by spreading a false sense of security.
ReplyDelete
Replies
UnknownMarch 14, 2016 at 10:37 AM
"Stats are powerful tools that, apparently, are also very confusing. Many scientists who use them don't understand what they are using. Or that, at least, is what Andrew Gelman thinks"

I would go a step further here. I would say that is actually impossible to use statistics *correctly* because there is simply no unifying theory of statistics. If every stats course started by simply explaining this simple fact, it would go a long ways to help people stop fixating on whether stats are *correct* and *incorrect* or even *necessary* in some absolute, use-independent sense, and start focusing a little more on how stats can be *useful*, as opposed to *useless*.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Friday, March 11, 2016

Two more short posts by others on stats

22 comments:

Contributors