Monday, August 27, 2018

Revolutions in science; a comment on Gelman

In what follows I am going to wander way beyond my level of expertise (perhaps even rudimentary competence). I am going to discuss statistics and its place in the contemporary “replication crisis” debates. So, reader be warned that you should take what I write with a very large grain of salt. 

Andrew Gelman has a long post (here, AG) where he ruminates about a comparatively small revolution in statistics that he has been a central part of (I know, it is a bit unseemly to toot your own horn, but heh, false modesty is nothing to be proud of either). It is small (or “far more trivial”) when compared to more substantial revolutions in Biology (Darwin) or Physics (Relativity and Quantum mechanics), but AG argues that the “Replication revolution” is an important step in enhancing our “understanding of how we learn about the world.” It may be right. But…

But, I am not sure that it has the narrative quite right. As AG portrays matters, the revolution need not have happened. The same ground could have been covered with “incremental corrections and adjustments.” Why weren’t they? The reactionaries forced a revolutionary change because of their reactions to reasonable criticisms by the likes of Meehl, Mayo, Ioannidis, Gelman, Simonsohn, Dreber, and “various other well-known skeptics.” Their reaction to these reasonable critiques was to charge the critics with bullying or insist that the indicated problems are all part of normal science and will eventually be removed by better training, higher standards etc. This, AG argues, was the wrong reaction and required a revolution, albeit a minor one relatively speaking, to overturn.

Now, I am very sympathetic to a large part of this position. I have long appreciated the work of the critics and have covered their work in FoL. I think that the critics have done a public service in pointing out that stats has served to confuse as often (maybe more often) than it has served to illuminate. And some have made the more important point (AG prominently among them) that this is not some mistake, but serves a need in the disciplines where it is most prominent (see here). What’s the need? Here is AG:[1]

Not understanding statistics is part of it, but another part is that people—applied researchers and also many professional statisticians—want statistics to do things it just can’t do. “Statistical significance” satisfies a real demand for certainty in the face of noise. It’s hard to teach people to accept uncertainty. I agree that we should try, but it’s tough, as so many of the incentives of publication and publicity go in the other direction.

And observe that the need is Janus faced. It faces inwards to relieve the anxiety of uncertainty and it faces outwards in relieving professional publish-or-perish anxiety. Much to AG’s credit he notices that these are different things, though they are mutually supporting. I suspect that the incentive structure is important, but secondary to the desire to “get results” and “find the truth” that animates most academics. Yes, lucre, fame, fortune, status are nice (well, very nice) but I agree that the main motivation for academics is the less tangible one, wanting to get results just for the sake of getting them. Being productive is a huge goal for any academic, and a big part of the lure of stats, IMO, is that it promises to get one there if one just works hard and keeps plugging away. 

So, what AG says about the curative nature of the mini-revolution rings true, but only in part. I think that the post fails to identify the three main causal spurs to stats overreach when combined with the desire to be a good productive scientist.

The first it mentions, but makes less off than perhaps others have. It is that stats are hard and interpreting them and applying them correctly takes a lot of subtlety. So much indeed that even experts often fail (see here). There is clearly something wrong with a tool that seems to insure large scale misuse. AG in fact notes this (here), but it does not play much of a role in the post cited above, though IMO it should have. What is it about stats techniques that make them so hard to get right? That I think is the real question. After all, as AG notes, it is not as if all domain find it hard to get things right. As he notes, psychometricians seem to get their stats right most of the time (as do those looking for the Higgs boson). So what is it about those domains where stats regularly fails to get things right that makes it the case that they so generally fail to get things right?  And this leads me to my second point.

Stats techniques play an outsized role in just those domains where theory is weakest. This is an old hobby horse of mine (see here for one example). Stats, especially fancy stats, induces the illusion that deep significant scientific insights are for the having if one just gets enough data points and learns to massage them correctly (and responsibly, no forking paths for me thank you very much). This conception is uncomfortable with the idea that there is no quick fix for ignorance. No amount of hard work, good ethics, or careful application suffices when we really have no idea what is going on. Why do I mention this? Because, in many of the domains where the replication crisis has been ripest are domains that are very very hard and where we really don’t have much of an understanding of what is happening. Or maybe to put this more gracefully, either the hypotheses of interest are too shallow and vague to be taken seriously (lots of social psych) or the effects of interest are the results of myriad interactions that are too hard to disentangle. In either case, stats will often provide an illusion of rigor while leading one down a forking garden path. Note, if this is right, then we have no problem seeing why psychometricians were in no need of the replication revolution. We really do have some good theory in the domains like sensory perception, and here stats have proven to be reliable and effective tools. The problem is not with stats, but with stats applied where they cannot be guided (and misapplications tamed) by significant theory.

Let me add two more codicils to this point.

First, here I part ways with AG. The post suggests that one source of the replication problem is with people having too great “an attachment to particular scientific theories or hypotheses.” But if I am right this is not the problem, at least not the problem behind the replication crisis. Being theoretically stubborn may make you wrong, but it is not clear why it makes your work shoddy. You get results you do not like and ignore them. That may or may not be bad. But with a modicum of honesty, the most stiff necked theoretician can appreciate that her/his favorite account, the one true theory, appears inconsistent with some data. I know whereof I speak, btw. The problem here, if there is one, is not generating misleading tests and non-replicable results, but if ignoring the (apparent) counter data. And this, though possibly a problem for an individual, may not be a problem for a field of inquiry as a whole. 

Second, there is a second temptation that today needs to be seriously resisted but that severely leads to replication problems: because of the ubiquity and availability of cheap “data” nowadays, the temptation to think that this time it’s different is very alluring. Big Data types often seem to think that get a large enough set of numbers, apply the right stats techniques (rinse and repeat) and out will plop The Truth. But this is wrong. Lars Syll puts it well here in a post entitled correctly “Why data is NOT enough to answer scientific questions”:

The central problem with present ‘machine learning’ and ‘big data’ hype is that so many –falsely- think that they can ge away with analyzing real-world phenomena without any (commitment to) theory. But –data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And – using a machine learning algorithm will only produce what you are looking for.

Clever data mining tricks are never enough to answer important scientific questions. Theory matters.

So, when one combines the fact that in many domains we have, at best, very weak theory, and that nowadays we are flooded with cheap available data the temptation to go hyper statistical can be overwhelming.

Let me put this another way. As AG notes, successful inquiry needs strong theory and careful measurement. Not the ‘and.’ Many read the ‘and’ as an ‘or’ and allow that strong theory can substitute for paucity of data or that tons of statistically curated data can substitute for virtual absence of significant theory. But this is a mistake. But a very tempting one if the alternative is having nothing much of interest or relevance to say at all. And this is what AG underplays: a central problem with stats is that it often tries to sell itself as allowing one to bypass the theory half of the conjunction. Further, because it “looks” technical and impressive (i.e. has a mathematical sheen) it leads to cargo cult science, scientific practice that looks "science" rather than being scientific. 

Note, this is not bad faith or corrupt practice (though there can be this as well). This stems from the desire to be, what AG dubs, a scientific “hero,” a disinterested searcher for the truth. The problem is not with the ambition, but the added supposition that any problem will yield to scientific inquiry if pursued conscientiously. Nope. Sorry. There are times when there is no obvious way to proceed because we have no idea how to proceed. And in these domains no matter how careful we are we are likely to find ourselves getting nowhere.

I think that there is a third source of the problem that resides in the complexity of the problems being studied. In particular, the fact that many phenomena we are interested in arise from the interaction of many causal sub-systems. When this happens there is bound to be a lot of sensitivity to the particular conditions of the experimental set up and so lots of opportunities for forking paths (i.e. p-hacking) stats (unintentional) abuse. 

Now, every domain of inquiry has this problem and needs to manage it. In the physical sciences this is done by (as Diogo once put it to me) “controlling the shit out of the experimental set up.” Physicists control for interaction effects by removing many (most) of the interfering factors. A good experiment requires creating a non-natural artificial environment in which problematic factors are managed via elimination. Diogo convinced me that one of the nice features of linguistic inquiry is that it is possible to “control the shit” out of the stimuli thereby vastly reducing noise generated by an experimental subject. At any rate, one way of getting around interaction effects problem is to manage the noise by simplifying the experimental set up and isolating the relevant causal sub-systems.

But often this cannot be done, among other reasons because we have no idea what the interacting subsystems are or how they function (think, for example, pragmatics).  Then we cannot simplify the set up and we will find that our experiments are often task dependent and very noisy. Stats offers a possible way out. In place of controlling the design of the set up the aim is to statistically manage (partial out) the noise. What seems to have been discovered (IMO, not surprisingly) is that this is very hard to do in the absence of relevant theory. You cannot control for the noise if you have no idea where it comes from or what is causing it. There is no such thing as a theory free lunch (or at least not a nutritious one). The revolution AG discusses, I believe, has rediscovered this bit of wisdom.

Let me end with an observation special to linguistics. There are parts of linguistics (syntax, large parts of phonology and morphology) where we are lucky in that the signal from the underlying mechanisms are remarkably strong in that they withstand all manner of secondary effects. Such data are, relatively speaking, very robust. So, for example, ECP or island or binding violations show few context effects. This does not mean to say that there are no effects at all of context wrt acceptability (Sprouse and Co. have shown that these do exist). But the main effect is usually easy to discern. We are lucky. Other domains of linguistic inquiry are far noisier (I mentioned pragmatics, but even large parts of semantics strike me as similar (maybe because it is hard to know where semantics ends and pragmatics begins)). I suspect that a good part of the success of linguistics can be traced to the fact that FL is largely insulated from the effects of the other cognitive subsystems it interacts with. As Jerry Fodor once observed (in his discussion of modularity), the degree to which a psych system is modular to that degree it is comprehensible. Some linguists have lucked out. But as we more and more study the interaction effects wrt language we will run into the same problems. If we are lucky, linguistic theory will help us avoid many of the pitfalls AG has noted and categorized. But there are no guarantees, sadly.



[1]I apologize for not being able to link to the original. It seems that in the post where I discussed it, I failed to link to the original and now cannot find it. It should have appeared in roughly June 2017, but I have not managed to track it down. Sorry.

Wednesday, August 22, 2018

Some thoughts on the relation between syntax and semantics, through the lens of compositionality: https://omer.lingsite.org/blogpost-meaning-based-syntax-co…/

Tuesday, August 21, 2018

Language and cognition and evolang

Just back from vacation and here is a starter post on, of all things, Evolang (once again).  Frans de Waal has written a short and useful piece relevant to the continuity thesis (see here). It is useful for it makes two obvious points, and it is important because de Waal is the one making them. The two points are the following:

1.     Humans are the only linguistic species.
2.     Language is not the medium of thought for there are non-verbal organisms that think.

Let me say a few words about each.

de Waal is quite categorical about the each point. He is worth quoting here so that next time you hear someone droning on about how we are just like other animals just a little more so you can whip out this quote and flog the interlocutor with it mercilessly. 

You won’t often hear me say something like this, but I consider humans the only linguistic species. We honestly have no evidence for symbolic communication, equally rich and multifunctional as ours, outside our species. (3)

That’s right: nothing does language like humans do language, not even sorta kinda. This is just a fact and those that know something about apes (and de Waal knows everything about apes) are the first to understand this. And if one is interested in Evolang then this fact must form a boundary condition on whatever speculations are on offer. Or, to put this more crudely: assuming the continuity thesis disqualifies one from participating intelligently in the Evolang discussion. Period. End of story. And, sadly, given the current state of play, this is a point worth emphasizing again and again and again and… So thanks to de Waal for making it so plainly.

This said, De Waal goes on to make a second important point: that even if it is the case that no other animals have our linguistic capacities even sorta kinda, it does not mean that some of the capacities underlying language might not be shared with other animals. In other words, the distinction between a faculty for language in the narrow versus a faculty for language in the broad sense is a very useful one (I cannot recall right now who first proposed such a distinction, but whoever it was thx!). This, of course, cheers a modern minimalist’s heart cockles, and should be another important boundary condition on any Evolang account. 

That said, de Waal’s two specific linguistic examples are of limited use, at least wrt Evolang. The first is bees and monkeys who de Waal claims use “sequences that resemble a rudimentary syntax” (3). The second “most intriguing parallel” is the “referential signaling” of vervet monkey alarm calls. I doubt that these analogous capacities will shed much light on our peculiar linguistic capacities precisely because the properties of natural language words and natural language syntax are where humans are so distinctive. Human syntax is completely unlike bee or monkey syntax and it seems pretty clear that referential signaling, though it is one use to which we put language, is not a particularly deep property of our basic words/atoms (and yes I know that words are not atoms but, well, you know…). In fact, as Chomsky has persuasively argued IMO, Referentialism (the doctrine (see here)) does a piss poor job of describing how words actually function semantically within natural language. If this is right, then the fact that we and monkeys can both engage in referential signaling will not be of much use in understanding how words came to have the basic odd properties they seem to have. 

This, of course, does not detract from de Waal’s correct two observations above. We certainly do share capacities with other animals that contribute to how FL functions and we certainly are unique in our linguistic capacities. The two cases of similarity that de Waal cites, given that they are nothing like what we do, endorses the second point in spades (which, given the ethos of the times, is always worth doing).

Onto point deux. Cognition is possible without a natural language. FoLers are already familiar with Gallistel’s countless discussions of dead reckoning, foraging, and caching behavior in various animals. This is really amazing stuff and demands cognitive powers that dwarf ours (or at least mine: e.g. I can hardly remember where I put my keys, let alone where I might have hidden 500 different delicacies time stamped, location stamped, nutrition stamped and surveillance stamped). And they seem to do this without a natural language. Indeed, the de Waal piece has the nice feature of demonstrating that smart people with strong views can agree even if they have entirely different interests. De Waal cites none other than Jerry Fodor to second his correct observation that cognition is possible without natural language. Here’s Jerry from the Language of Thought:

‘The obvious (and I should have thought sufficient) refutation of the claim that natural languages are the medium of thought is that there are non-verbal organisms that think.’ (3)

Jerry never avoided kicking a stone when doing so was all a philosophical argument needed. At any rate, here Fodor and de Waal agree. 

But I suspect that there would be more fundamental disagreements down the road. Fodor, contra de Waal, was not that enthusiastic of the idea that we can think in pictures, or at least not think in pictures fundamentally. The reason is that pictures have little propositional structure and thinking, especially any degree of fancy thinking, requires propositional structures to get going. The old Kosslyn-Pylyshyn debate over imagery went over all of this, but the main line can be summed up by one of Lila Gleitman’s bon mots: a picture is worth a thousand words, and that is the problem. Pictures may be useful aids to thinking, but only if supplied with captions to guide the thinking. In and of themselves, pictures depict too much and hence are not good vehicles for logical linkage. And if this is so (and it is) then where there is cognition there may not be natural language, but there must be a language of thought (LOT) (i.e. something with propositional structure that licenses the inferences that is characteristic of cognitive expansiveness in a given domain). 

Again this is something that warms a minimalist heart (cockles and all). Recall the problem: find the minimal syntax to link the CI system and the AP system. CI is where language of thought lives. So, like de Waal, minimalists assume that there is quite a lot of cognition independent of natural language, which is why a syntax that links to it is doing something interesting.

Truth be told, we know relatively little about LOT and it properties, a lot less that we know about the properties of natural language syntax IMO. But regardless, de Waal and Fodor are right to insist that we not mix up the two. So don’t.

Ok, that’s enough for an inaugural post vaca post. I hope your last two weeks were as enjoyable as mine and that you are ready for the exciting pedagogical times ahead.

Friday, August 10, 2018

A sort-of BS follow-up: Alternatives to LMS

Norbert's recent post on the BSification of academic life skipped one particular wart that's at the front of my mind now that I'm in the middle of preparing classes for the coming semester: Learning Management Systems (LMS). In my case, it's Blackboard, but Moddle and Canvas are also common. It is a truth universally acknowledged that every LMS sucks. It's not even mail client territory, where the accepted truth is "all mail clients suck, but the one I use sucks less". LMS are slow, clunky, inflexible, do not support common standards, and quite generally feel like they're designed by people who couldn't cut it at a real software company. They try to do a million things, and don't do any of them well. Alright, this is where the rant will stop. Cathartic as it may be, these things have been discussed a million times. Originally I had an entire post here that explores how the abysmal state of LMS is caused by different kinds of academic BS, but I'd like to focus on something more positive instead: alternatives to LMS that work well for me and might be useful for you, too. If somebody really wants to hear me complain about LMS and the academic BS surrounding them, let me know in the comments and I'll post that part as a follow-up.