Thursday, June 21, 2018

Sad news: a linguistically prominent primate passes

Tim Stowell just notified me that Koko the gorilla just died (here). The obit is a bit over the top. Kokeodid not really teach us much about language and her skills were vastly over-hyped. However, I have a very warm spot in my heart for her as I once cross-dressed and played her in a skit given at MIT for Chomsky's birthday. For about 5 minutes Koko debated Chomsky to a standstill (or that is the way Koko saw it, reports differ). This even let me get to know Koko from the inside as it were (I recommend inhabiting a Gorilla Suit if you ever get the chance). Elan Dresher played Chomsky and Amy Weinberg played Penny Patterson. The names were changed, of course, to protect the innocent. At any rate, she has passed. RIP.

Wednesday, June 20, 2018

Classical and dependent case

This is a long rambling post. It has been a way for me to clarify my own thoughts. here goes.

Every year I co-teach the graduate syntax 2 course at UMD. It follows Howard Lasnik’s classic intro to syntax that takes students from LSLT/Syntactic Structures to roughly the early 1990s, just before the dawn of the Minimalist Program (MP) (personal note: it is perhaps the greatest intro class I have ever had the pleasure to sit in on). Syntax 2 takes the story forward starting with early MP and going up to more or less the present day. 

For most of the last 10 years, I have taught this course with someone else. The last several years I have had Omer Preminger as wingman (or maybe, more accurately, I have been hiswingman) and it has been delightfully educational. He is a lot of fun to argue with. I am not sure that I have come out victorious even once (actually I am but my self- confidence doesn’t allow such a bald admission), but it has been educational and exhilarating. And because he is an excellent teacher, I learn at least as much as the other students do. Why do I mention this? It’s all in service of allowing me to discuss something that arose from Omer’s class discussion of case theory over the last several years and that I hope will provoke him to post something that will help me (and I suspect many others) get clear what the “other” theory of case, Dependent Case Theory (DCT), takes its core empirical and theoretical content to be. 

To help things along, I will outline oneversion of DCT, observe some obvious problems for that interpretation and note another version (spurred by Omer’s remarks in the Syntax 2 class). By way of contrast, I will outline what I take the core features of classical case theory (CCT) to be and, at the end, I will explain why I personally am still a partisan CCT despite the many problems that beset it.[1]So here goes. I start with CCT as background theory.

CCT is primarily a theory of the distribution of lexical nominals (though what “counts” as lexicalis stipulated (phonetically overt material always does, some traces might (A’-traces) and some traces never do (A-traces and maybe PRO)). The theory takes the core licensing relation to be of the X0-YP variety, some set of heads licensing the presence of some set of DPs. CCT, as Vergnaud conceived it, is intended, in the first instance,to unify/reduce the various filters that Chomsky and Lasnik (1977) proposed. The core assumptions are that (i) nominals require case and (ii) certain heads have the power to assign case. Together these two assumptions explain that we find nominals in positions where heads can case mark them. 

Overt morphological case marking patterns no doubt influenced Vergnaud. However, given that CCT was intended to apply to languages like English and French and Chinese where many/most nominals show no apparent case morphology, it was understood from the get-go that CCT’s conception of case was abstract, in the sense that it is present and regulates the distribution of lexical nominals even when it is morpho-phonologically invisible. This abstractness, and the fact that case lives on a head-XP relation are the two central characteristics of the CCT in what follows.

GB extended the empirical reach of CCT beyond the distribution of nominals by making a second assumption. Assume that abstract case can be morphologically realized. In particular, if it is assumed that in grammars/languages with richer morphological case, the core structuralcases (nominative, accusative, etc.)[2]can be made visible then in these languages case theory can explain why some expressions have the morpho-phonological shapes they do. Adding this codicil allows CCT to claim that case marked nominals look as they do in virtue of overtlyexpressing the abstractcases nominals bear universally.[3]

In sum, CCT covers two domains of data: the original explicandum is the distribution of lexical nominals, the second is the morpho-phonological forms found in richer case languages. Importantly, this second empirical domain requires an extra assumption concerning the relationship between abstract case and its morpho-phonological(henceforth ‘overt’) expression. Usefully, these two parts of the theory allow for the investigation of abstract case by tracking the properties of the overtly marked nominals in those languages where it can be seen (e.g. the pronominal system in English).

CCT has other relevant (ahem) subtleties. 

First, it primarily concerns structuralcase. This contrasts with inherent/lexicalcase, which is tied to particular thematic or semantic roles and is idiosyncratically tied to the specific lexical predicates assigning the roles.[4]For example, some languages, most famously Icelandic, have predicates that require that their nominal arguments have a particular overt case marking (dative or genitive or even accusative). 

CCT has almost always distinguished inherent from lexical case, though the two bear some similarities. For example, both are products of head-XP relations (lexical case piggy backing on theta roles which, like case, are assigned by heads). That said, CCT has generally put lexical case to one side and has concentrated on the structural ones, those unrelated to any thematic function.[5]

Second, MP slightly revamped the CCT. The contemporary version takes structural case to alwaysbe a dependency between a functionalhead and an XP  while lexical case, as the name implies, is alwaysmediated by a lexical head. This differs from earlier GB accounts where nominative is assigned by a functional head (finite T) but accusative is assigned by a lexical V (or P) which also, typically (modulo ECM constructions) theta marks the nominal it case assigns. Thus, MP assimilates accusative to the nominative configuration and takes structural accusative, like nominative, to involve a non-local dependency between a functional head (one that does not theta mark the dependent) and a nominal that is not a complement. The relevant functional head for accusative is Agror some flavor of v. What’s crucial is that in MP accounts the case assigner is always significantly “higher” in the phrase marker and so the dependency must be mediated by AGREE or I-Merge (aka, Move). The most interesting consequence of this amendment to the CCT is the consequence that case should affect scope.[6]Surprisingly, there is rather good evidence that it does. As Lasnik and Saito showed (reanalyzing data from Postal) case dependency affects the domain of a nominal’s scope.[7]The dependency between case and scope is one of the deeper discoveries of MP in my opinion, though I will not pursue that claim here.[8]

That’s the sum of what is relevant about CCT for this post. There are well-known problems (wagerverbs being the most conspicuous (see the discussion section of the earlier post for elucidation)) but IMO the main virtue of CCT is that it combines quite smoothly with a Merge based conception of FL/UG. I will say something about this at the end, but first it’s time to look at the DCT.

First off, the DCT is actually not that new. A version was mooted by the Brandeis Bunch of Yip, Mailing and Jackendoff in the mid 1980s.[9]However, it didn’t catch on much until Alec Marantz revived a version in 1991. Since then, I would wager that the DCT has become the “standard” approach to morphological case, though, as we shall see, what it claims exactly, has been (at least to me) elusive.

Here are some points that are fairly clear. 

First, it is exclusively a theory of overt case marking. It has no intention of accounting for nominal distributions. So, DCT is a theory of what CCT takes to be an add-on.[10]This is not in itself a problem for either account, but it is worth noting (yet again). 

So, DCT is a theory of the overtly witnessed case values (or at least it is so in Marantz and Bobaljik and most other advocates (though see Omer’s take below)). The standard version thus aims to explain the distribution of the overt morpho-phonologically case marking found on nominals (in many, though not all, languages). An important virtue is that it extends to bothnominative-accusative case marking and ergative-absolutive case marking systems. Consequently, investigations of DCT have enriched our appreciation of the complexity of case marking patterns cross linguistically.

A second key feature: the standard version of DCT (e.g. Marantz, Bobaljik) treats case marking as a non-syntactic process. More concretely, it is part of post syntactic processing. Omer and others have argued against this conception. But as originally mooted, DCT operations are outside the syntax.

Third, CCT’s core idea is that overt case marking is an inter-nominal dependency. Case marking reflects a relation between nominals within a specified domain. It is not primarily a head-nominal relation. So, for example, accusative is what one finds on DPs that have anothernon-case valued nominal higherup in the samedomain. Overt case, then, is a XP-YP relation (notan X0-YP relation) with the case value of the dependentnominal stemming from the presence of another non-dependentnon-valuednominal in the same domain. The advertising that generally accompanies DCT stresses that for DCT (in contrast to CCT) case is notabstract, but very concrete in that it concerns itself with theovertcase that we can hear/see on nominals, not some abstract feature that only sometimes overtly surfaces. The strongest version of this idea is that the purview of the DCT is the full panoply of morpho-phonologically realized cases. However, this is very clearly too strong. Let me explain.

First, like CCT, DCT agrees that there is an important distinction between inherent/lexical case that is semantically/thematically restricted and structural case that isn’t. Like CCT, it takes the former to be effectively a X0-YP dependency with X0being a theta role assigner. As lexical case is a by-product of theta assignment, this supposition, like the one made in CCT, is not unreasonable. The important point is that like CCT, DCT focuses on a subset of the cases overtly expressed. So, like CCT, the dependent part of the DCT is at besta theory of overt structuralcase.

A strong version of a DCT approach to structural case would have it that all dependent case values in a language are the product of a DP-DP relation holding within a circumscribed domain. So, for example, if accusative is the realization of dependent case in language L then modulo some quirky lexical case instanceswe would find accusative case on dependentDPs alone (nominative in L being what we find on otherwise non-valued non-dependent cases). This would be a strong theory. Unfortunately, it looks to be false.

In particular, we know that there exists languages in which overt structural accusative case occurs on DPs in structures where the DP that carries it cannot bea dependent case. For example, the accusative we find in forinfinitives (For her to leave would be terrible) or the one we find in acc-ing gerunds (her kissing him would be terrible) or in languages like Latin which have acc-infinitive constructions. In all these examples, the overt case of the DP looks like the case we find on the object of transitive verbs, which is the canonical dependent case value. 

Moreover, these are clearly structural cases as they are in no way thematically restricted. And it is clear the accusative we find here is not dependent on any other DP in its domain (we can always find them, for example, on the sole argument of an unaccusative in these settings). So there exists a class of morpho-phonologically overt accusatives that are not overtly marked accusative in virtue of being dependent. What then marks them overtly accusative? Well, the obvious answer is some local head (foror the head of the gerund/infinitive). Thus, DCT must allow that there are two ways of assigning accusative values, only one of which reflects its “dependent” status. Or to put this less charitably, DCT presupposes that the core case marking relation identified by CCT also plays a role in accounting for overt structural case. What DCT denies, then, is that allaccusatives are marked by heads. Some are dependent cases, some are marked by heads. On this construal of the DCT, it appears that CCT is not so much wrong as incomplete. 

Moreoever, it appears that within DCT case is no less abstract than it is in CCT as there is, at best, an indirect relationbetween a specific overt value (e.g. accusative) and being a dependent or a non dependent case.

Nor is this problem restricted to subjects, or so I have been reliably told. Here’s what I mean. I gave a paper at the German LSA this past spring and one of the perks of doing so was hearing Julie Legate give a paper on argument structure and passives. She provided scads of data that showed that in languages that systematically violate Burzio’s generalization we can nonetheless find “objects” that bear accusative case. In other words, there exist many nom-acc languages where we find accusative case sitting on DPs in object positions in sentences where there is every reason to believe that there are no additional arguments (i.e. 1-place predicates where the internal argument is marked accusative). If this is right, then we cannot analyze the accusative as an instance of dependent case and must, instead, trace the witnessed value to some head that provides it (a CCTer would suggest v). 

So, neither subjects nor objects in general can be assumed to bear a dependent case valuein virtue of being a dependent case. The generalization seems to be not that DCT is a theory of overtmorpho-phonological structural case in general, but only of overt structural case that we find most typically in transitive constructions (ones where Burzio’s generalization holds). In other words, the purview of the theory seems, at least to me, much narrower and different than I thought it was from reading the general advertising.

Let me repeat an important point that Omer repeatedly made in class is that there is a good sense in which DCT is no less abstract than CCT. Why? Because, we cannot go from a witnessed case value to the conclusion that the DP that carries it is dependent. Rather we can only assume that if a DP is dependent it will carry the dependent case value though what that overtvalue is remains a free parameter (so far as I can tell).[11]

So, in effect, DCT is effectively a theory of abstract case defines a mapping between some DPs in multiple DP domains (the dependent ones) and some case values (e.g. acc in nom-acc Gs). At best, the theory explains what values one finds in transitive clauses (one with at least two nominals) modulo some specification of the relevant domains.[12]

This last point is worth emphasizing. For illustration consider a sentence like (1):

(1)  DP1V [IPDP2Infinitive V DP3]]

What case should we expect here? Well DP3should get dependent case if DP2is in its domain and DP2is unvalued (at least at the point that DP3’s value is determined). So, if this is a nom-acc language, we should expect to find DP3bearing acc case. What of DP2? This will depend on whether DP1is in the same domain as DP2. If it is, then if DP1has no case value then DP2will also bear acc case. In effect what we will have here is an instance of what we find in English ECMs. DP1will surface with nominative case. 

However, what if DP2is not in the same domain as DP1? If it isn’t then it would get assigned nominative case. What we would to find is something that does not arise in English, the sentential analogue of He expects he to kiss her. So, to block this we need a story that forcesDP2to be in the same domain as DP1, maybe by forcing it to move. What forces this? So far as I know, nothing. We can, for example, stipulate that it is in the same domain or that it move (e.g. by putting some EPP feature on a functional (phase) head in the higher clause), but there is nothing that forces either conclusion. So this would have to be an extra feature. Recall that DCT is nota theory of DP distribution. There is no analogue of the case filter that forces movement. And even if there were, DP2could be assigned nominative case if it did not move. So why must DP2bear acc case? Because we are assuming it moves orbecause the domain of case assignment is the whole clause and case is assigned bottom up or….[13]All of these are free parameters in DCT. At any rate, there is nothing in the theory that in principle prohibits nominative case in ECM constructions. It all depends on how the relevant domains are constructed.[14]

There are other questions as well. For example, given that we can get the same surface forms either via a X0-DP relation or a XP-YP relation learnability issues will most likely arise. Whenever we can do things in various ways then in any given circumstance the child will have to choose, presumably based on PLD. SO there are potential learnability issues with this kind of mixed theory.

In addition, there are theoretical puzzles: why do the very same case values surface in these very different ways? Are there are languages that distinguish the class of morphological cases assigned by CCT mechanisms and those assigned by DCT mechanisms. If there aren’t, why not.[15]

None of these questions are unanswerable (I would assume), but it did surprise me that how much narrower the scope of the DCT was than I had supposed it to be before Omer walked us through the details. It is basically a theory of the distribution of overt case in transitive clauses. It is nota general theory of overt case morphology (as sometimes advertised), or at least the distinctive features are not. From where I sit, it’s a pretty modest part of UG even if correct and it leaves large chunks of CCT potentially intact

Given all of this, I would argue that the empirical advantages of the DCT had better be pretty evident if we are to add it to CCT. Here is what I mean. Given that it presupposes mechanisms and relations very like those found in CCT, then the argument that we need DCT ones in addition to those of CCTmust argue either that CCT cannothandle the data that DCT addresses or that DCT handles it in a far superior manner. The arguments I have seen fall into the second disjunct. It is actually quite hard to argue that noversion of CCT could accommodate the data (after all, there is always a possible (ad hoc) head around to do the CCT dirty work). So, the argument must be that even though CCTcando it, it does not do it nicely or elegantly. But given that adding the DCT to the required mechanics of CCT creates a theoretically more redundant account, then the superiority of DTC accounts had better be overwhelming to pay the cost of such theoretical enrichment. 

As a personal matter, I have not been convinced that the cost is worth it, but others will no doubt differ in their judgments. What seems to me, however, is that given that DCT requires something like CCT to getsomeof the case values right, even a cumbersome CCT account might not be theoretically worse off than a DCT account that incorporates a CCT component.

That said, none of this is why I favor the CCT accounts. The main reason is that I understand how a CCT could be incorporated into a minimalist Merge based account of FL/UG.  Here’s what I mean.

There is a version of MP that I am very fond of. It adopts the following Fundamental Principle of Grammar (FPG):

FPG: For expressions A and B to grammatically interact, A and B must merge

In other words, the only way for expressions to grammatically interact is via Merge. They locally interact under E-merge and non-locally under I-merge (which, as you all know, is one and the same operation). Thus, e.g. selection or subcategorization or theta marking is under E-merge and antecedence or control or movement is under I-merge. I like the FPG. It’s simple. It even strikes me as natural as it enshrines the importance of constituency at its very core.

What’s important here is that it easily accommodates the CCT. A head A case marks a nominal B only if A and B Merge. In MP accounts, this would be I-merge. 

In contrast, I don’t see how to make DCT fit with this picture. The inter-nominal dependency that marks the most fundamental relation is not plausibly a chain relation formed under I-merge.  So, it is not a possible legit relation at all if we buy into something like the FPG. As I have so purchased, the DCT would induce in me a strong sense of caveat emptor.

I know, I know; one thoerists’s modus ponens is another’s modus tollens. But ifyou like the FPG (which I do) then you won’t like replacing CCT with DCT (which I don’t).

Ok, enough. There are many reasons to get off this long post at many points. The real intent here has been to provoke DCTers (Omer!) to specify what they take the theory to be about and whether they agree that it is quite abstract and that it requires something like CCT as a sub-part. And if this is so, why we should be happy about it. So, I hope this provoked, and thx for the course Omer. It was terrifically enlightening.

[1]I’ve reviewed CCT before (here) so some of this will be redundant. That’s the nice thing about blog posts. They can be. We would neversee anything similar in a refereed article where all contents are novel and nobody ever chews their food twice.
[2]Actually what goes into the “etc” is not set. Is dative structural? It can be. Genitive? Can be, but need not be. The clean ones in nom-acc languages are nominative and accusative, though even these can come from another source as we shall presently note.
[3]This does not require assuming that FL have but one morpho-phono realization scheme. There is room for an erg-abs as well as a nom-acc schema. But as these matters are way beyond my competence, I leave them with this briefest mention.
[4]I frankly have never fully grasped the distinction (if there is one) between inherent and lexical case. However, what is critical here is that it is semantically restricted, being the overt manifestation of a semantic role/value. This is decidedly notwhat we see with structural case, which seems in no way restricted thematically (or semantically for that matter).
[5]FWIW, IMO, structural case (both abstract and overt) is the really weird one. Were all cases lexical there would be nothing odd about them. They would just be the outward form of a semantic dependency. But this is precisely what structural case is not. And so why nominals bear or need them is quite unclear. At least it is to me. In fact, I would call it one of the great MP conundra: why case?
[6]This consequence is obvious if one assumes that case is discharged under I-merge. It is less obvious if it case is discharged via Agree. The original Lasnik and Saito analysis assumed the former.
[7]We find similar data in many other languages (e.g. Japanese) where we can predict a DP’s scope properties from the case values it bears. 
[8]Interestingly, this data has been largely ignored by many mainstream minimalists for it remains inexplicable in theories where features “lower” fromto V and case is executed via Agree rather than I-merge.
[9]At the time, they were all Brandeis faculty.
[10]And as such, DCT should be in principle compatible with CCT. However, though this should be possible, my impression is that devotees of the DCT have been disinclined to treat the two in this way. I do not know why.
[11]It is not even clear to me that one can conclude that alldependent cases will realize a single value. This does not follow from the theory so far as I can tell.
[12]Of course, not all clauses with two nominal include a subject, but thee are niceties that I don’t want to get into here.
[13]Note, if we could treat the relevant domain as the whole clause, then there would be no need for DP2to move at all and the correlation between scope and case that Lasnik and Saito discussed would theoretically evaporate. 
[14]Note that the acc feature in ECM in CCT is a product of the particular head we have. The analogue of this stipulation in DCT is the size of the relevant domain. One might try to get around this theoretical clunkiness by assuming the relevant domains are phases (though deciding what is and isn’t a phase is no small theoretical feat nowadays either). But what matters is that the embedded clause is not phase like in (1). Were it such, the problem would reappear. By breaking the link between a theory of the distribution of DPs and their values, DCT has little to say about such cases except that something else must determine when the two subjects are in the same local domain relevant for determining dependent case.
[15]E.g. There are lexical cases (e.g. instrumental, comitative, ablative) that seem to be exclusively lexical. Are there any dependent cases that are exclusively dependent?
            Note that this is theoretically similar to an observation standardly made wrt resumptive pronouns. Why do they always assume the shape of “regular” bound pronouns? One answer is that they arebound pronouns, i.e. that there is no separate class of resumptivesprecisely because if there were we might expect them to have a distinctive morphology somewhere. The absence of a distinction argues for an absence of the category. Similar logic applies in the domain of dependent case vs head assigned case.

Friday, June 8, 2018

Science without theory

Sometimes we just don't know much and this puts us in an odd epistemic position. Not knowing much comes with an imperative of intellectual modesty: one should have relatively little confidence in one's descriptions of what is going on and even less in projections of what might happen counterfactually. Being ignorant is a real bummer, especially scientifically.

Now, all of this should be obvious. And to many it is. But is is also something that working scientists have a professional interest in "bracketing" (a terms that roughly means "setting aside" that I learned as a philo grad student and that has come in very handy over the years as it sounds so much better than "ignore" (something an honest intellectual should not do (purportedly)) which is more or less what it amounts to), and so, not surprisingly, they largely do. Moreover, as nobody gets kudos for advertising their ignorance ((well, there was Socrates, I guess) or less kudos and occasionally a hemlock milkshake), scientists, especially given the current incentive system, are less restrained in making the flimsiness of their conclusions apparent than perhaps they should be. And this is a problem, for it is really hard to say anything useful when you have no idea what the hell is going on.

Why do I mention this? Rob Chametzky sent me a recent paper on this topic (here). The post (the author is Denny Borsboom (here), so I will dub the post DB) makes the reasonable point (reasonable to me as I have been making it as well for a while now) that the absence of "unambiguously formalized theory in psychology" lies behind much of the "replication" crisis in psych (p.1).[1]It, moreover, suggests that this is more or less endemic to the discipline becausepsychology has the hardest subject matter ever studied(p. 3). I do not know whether this last point is correct, but it is certainly true that much of what psychologists insist on studying is almost surely the product of many many interacting systems (e.g. almost any social psych topic!) and it is well-known that interaction effects are very hard to disentangle, very very hard. So, it is not surprising that fit hese topics are the sorts of things that psychologists study then the level of non-trivial theory that exists is close to zero. That is what one would expect (and what one finds). DB traces out some of the implications of this for the replication crisis. Here are some consequences:

·      The field is heavily stats dependent, as stats methods substitute for theoretical infrastructure.
·      The role of stats can grow so great as to induce theoretical amnesia on the practitioners (a mental state wherein those in the field no longer know what a theory is(p.2).
·      Progress in atheoretical psych is necessarily very slow given that experiments are always trying to factor out poorly understood context dependent variables.
·      The discipline is susceptible to fadsbased on poorly tested generalizationsthat serve to make research manageable (at best) and allows for a kind of predatory free-riding (at worst).

Needless to say, this is not a pretty state of affairs. The solution for this? Well, of course, more care with the stats and a kind of re-education system for psychologists: It would be extremely healthy if pshychologists received more education in fields which do have some theories, even if they are empirically shaky onesso that the discipline can try to remember what a theory is and what it is good for, so that we dont fall into theoretical amnesia(p.3). 

I cannot say that I find DB's description all that far off base. However, I think that a few caveats are in order. Here are some.

First, why is theoretical amnesia a bad thing if the field is doomed to be forever theoryless given its endemic difficulty? It is useful to understand how theory functions in a field where it does so usefully if this utility can be imported into one's own. Then being able to recognize it and value it is important. But if this is impossible then why bother?  

I suspect that DB's real gripe is that there is theory to be had (maybe by reshaping the topics studied) but that psychologists have been trained to ignore it and to substitute stats methods for theoretical insight. If this DB's point, then it is an important one. And it applies to many domains where empirical methods often overrun their useful boundaries. If this is DB's point, then there is a better way to put it: stats, no matter how technically fancy, cannot substitute for theory. Or, to put this in lay terms: lots of data carefully organized is not a theory, and thinking it is is just a confusion.

Second, the general point that RB makes (and I agree with) is not at all idiosyncratic. Gellman has made the point here (again) recently. The last paragraph is a good summation of RB's basic point:

…hypotheses in psychology, especially social psychology, are often vague, and data are noisy. Indeed, there often seems to be a tradition of casual measurement, the idea perhaps being that it doesn’t matter exactly what you measure because if you get statistical significance, you’ve discovered something. This is different from econ where there seems there’s more of a tradition of large datasets, careful measurements, and theory-based hypotheses. Anyway, psychology studies often (not always, but often) feature weak theory + weak measurement, which is a recipe for unreplicable findings.

Having little theory is real problem even if one's aim is to get a decent stats description of the lay of the land. The problems one finds in theoryless fields is what one should expect and the methodological sloppiness comes with the territory. As Gellman puts it:

p-hacking is not the cause of the problem; p-hacking is a symptom. Researchers don’t want to p-hack; they’d prefer to confirm their original hypotheses. They p-hack only because they have to.

If this is right, however, I think that both Gellman's and RB's optimism that this can be solved by better methodological hygiene is probably unfounded. The problem is that there is a real cost to not knowing what the hell is going on, and that cost is not merelytheoretical but observationalas well. Why? 

Here is a good place to play the Einstein card (roll the drums): theory is implicated in determining what counts as observational! Here's a quote: It is the theory which decides what we can observe.[2]To know what to count and how to count it (that's what stats does) you need a way of determining what should be counted and how (that's what theory does). So, no theory, no observations in the relevant scientific sense either. And if this is so, then when you really have no idea what the hell is going on, then you are in deep doo-doo whether you know it or not. Can stats help? I doubt it. Being very careful and very cautious might help. But really the only thing to do in these cases is pray to the God of scientific traction for a bit of luck in getting you started. There is a reason why researchers who figure out anything are heros. They are the people whose ideas allow us to get the ball rolling. Once it's rolling it's a whole new game. Until then. Nada! So, I doubt that the techno optimism that Gellman and RB point to, the idea that sans theory stats can step in and allow us to do another kind of useful science, will really fly. But, then I am a pessimist in general.

Third, I don't think that what holds for social psych is characteristic of the whole endeavor. There are large parts of psych broadly understood (e.g. large parts of learning theory (see Gallistel's work on this), development (see Carey and Spelke and Baillargeon and R. Gelman a.o. for example), perception, math capacities, language, where there is quite a bit of decent non-trivial theory that usefully guides inquiry). The problem is that social psych is where the fame and money are. You get on NPR for work on power poses, but not on Weber's law. 

Fourth, this is really not the state of play in most of linguistics. We really do have some decent theory to fall back on in many parts of the core discipline (syntax, phonology, parts of semantics) and that is why we have been able to make non-trivial progress. The funny thing is that if RB and Gellman are correct, the brouhaha over linguistic data foisted upon the field by the stats inclined has things exactly backwards for if they are right the methods adopted by fields without any theoretical ideas are bad models for those that have some theoretical sub-structure.

Fifth, what RB and Gellman describe is really what we should expect. There is a long-standing hope that there exists a mechanical way of doing science (i.e. gaining insight). If we were just careful enough in how we gathered data, if we only got rid of our pre-conceptions, if only our morals were higher, we could just look and see the truth. The problem this simple method fails is because we don’t do it right. This is, of course, a reflection of the old Empiricist dream (see the previous post for the logical Positivist version of this). It repeatedly fails. It will always fail. 

That’s it. Thx to Rob for sending me the RM piece. 

[1]The adjective “formalized” actually understates the problem. There is precious little non-trivial theory in large parts of psych (especially social psych, the epicenter of the crisis), formalized or not. The problem is not “formalization”per se.  
[2]Quoted in What is realby Adam Becker, p. 29. The philosopher Gorver Maxwell made a similar point oh so many years ago: “It is theory…which tells us what is or is not…observable” (Becker 184). If this is right, then the idea of grounding science on some a prioriconception of the observable independent of any theoretical assumptions is pretty much a non-starter. We indulge in “theory” either explicitly or tacitly. The optimism arises, I believe, by ignoring this or by hoping that for much of what we look at the tacit theory is pretty solid. Given that such theory will, when inexlicit, revert to “common sense,” this hope strikes me as idle given that common sense is precisely what scientific insight almost always overturns.