Monday, October 31, 2016
At Talking Brains, David has helpfully posted a must read list of cog material for the well educated cog-neuro person. Thought linguists might find the list useful as well. David has excellent taste, and the work he notes is actually a lot of fun to read.
Sunday, October 30, 2016
The first dropped shoe announced the “collapse” of science. It clearly dropped with a loud bang as this “news” has become a staple of conventional wisdom. The second shoe is poised and ready to drop. It’s ambition? To explain why the first shoe fell. Now that we know that science is collapsing we all want to know why exactly it is doing so and whether there is anything we can do to bring back the good old days.
So why the fall? The current favorite answer appears to be a combination of bad incentives for ambitious scientists and statistical tools (significance testing being the current bête noir) that “gave scientists a mathematical machine for turning baloney into breakthroughs, and flukes into funding” ((now that’s a rhetorical flourish!) cited here p. 12). So, powerful tools in ambitious hands lead to scientific collapse. In fact, ambition may be beside the point, academic survival alone may be a sufficient motive. Put people in hyper competitive environments and give them a tool that “lets” them get their work “done” in a timely manner and all hell breaks loose.
I have just read several papers that develop this theme in great detail. They are worth reading, IMO, for they do a pretty good job of identifying real forces in contemporary academic research (and not limited to the sciences). These forces are not new. The above “baloney” quote is from 1998 and there are prescient observations relating to somewhat similar (though not identical) effects made as early as 1948. Here’s Leo Szilard (cited here):
Answer from the hero in Leo Szilard’s 1948 story “The Mark Gable Foundation” when asked by a wealthy entrepreneur who believes that science has progressed too quickly, what he should do to retard this progress: “You could set up a foundation with an annual endowment of thirty million dollars. Research workers in need of funds could apply for grants, if they could make a convincing case. Have ten committees, each composed of twelve scientists, appointed to pass on these applications. Take the most active scientists out of the laboratory and make them members of these committees. ...First of all, the best scientists would be removed from their laboratories and kept busy on committees passing on applications for funds. Secondly the scientific workers in need
of funds would concentrate on problems which were considered promising and were pretty certain to lead to publishable results. ...By going after the obvious, pretty soon science would dry out. Science would become something like a parlor game. ...There would be fashions. Those who followed the fashions would get grants. Those who wouldn’t would not.”
of funds would concentrate on problems which were considered promising and were pretty certain to lead to publishable results. ...By going after the obvious, pretty soon science would dry out. Science would become something like a parlor game. ...There would be fashions. Those who followed the fashions would get grants. Those who wouldn’t would not.”
The papers I’ve read come in two flavors. The first are discussions of the perils of p-values. Those who read the Andrew Gelman blog are already familiar with many of the problems. The main issue seems to be that phishing for significance is extremely hard to avoid, even by those with true hearts and noble natures (see the Simonsohn (a scourge of p-hacking) quote here). Here (and the more popular here) are a pair of papers that go into how this works in ways that I found helpful. One important point the author (David Colquhoun (DC)) makes is that the false discovery (aka: the false positive) problem is quite general, and endemic to all forms of inductive reasoning. It follows from the “obvious rules of conditional probabilities.” So this is not just a problem for Fisher and significance testing, but applies to all modes of inductive inquiry, including Bayesian modes.
Assuming this is right and that even the noble might be easily mislead statistically, is there some way of mitigating the problem? One rather pessimistic paper suggests that the answer is no. Here (with a popular exposition here) is a paper that gives an evolutionary model of how bad science must win out over good in our current academic environment. It is a kind of Gresham’s law theory where quick successful bad work floods less quick, careful good work. In fact, the paper argues that not even a culture where replication is highly valued will stop bad work from pushing out the good so long as “original” research remains more highly valued than “mere” replication.
The authors, Smaldino and McElreath (S&M), base these grim projections on an evolutionary model they develop which tracks the reward structure of publication and the incentives that these impose on individual and labs. I am no expert in these matters, but the model looks reasonable enough and the forces it identifies and incorporates seem real enough. The solution: shift from a culture that rewards “discovery” to one that rewards “understanding.”
I personally like the sound of this (see below), but I am skeptical that it is operationalizable, at least institutionally. The reason is that valuing understanding requires exercising judgment (it involves more than simple bookkeeping) and this is both subjective (and hence hard to defend in large institutional settings) and effortful (which makes it hard to get busy people to do). Moreover, it requires some very non-trivial understanding of the relevant disciplines and this is a lot to expect even within small departments, let alone university wide APT committees or broad based funding agencies. A tweet by a senior scientist (quoted in S&M p.2) makes the relevant point: “I’ve been on a number of search committees. I don’t remember anybody looking at anybody’s papers. Number and IF [impact factor] of pubs are what counts.” I don’t believe that this is only the result of sloth and irresponsibility. In many circumstances it is silly to rely on your own judgment. Given how specialized so much good work has become, it is unreasonable to think that we can as individuals make useful judgments about the quality of work. I don’t see this changing, especially above the department level anytime soon.
Let me belabor this. It is not clear how people above the department level would competently judge work outside their area of expertise. I know that I would not feel competent to read and understand a paper in most areas outside of syntax, especially if my judgment carried real consequences. If so, who can we get to judge whose judgments would be reasonable? And if there is no one then what can one do but count papers weighted by some “prestige” factor? Damn if I know. So, I agree that it would be nice if we could weight matters towards more thoughtful measures that involved serious judgment, but this will require putting most APT decisions in the hands of those that can make these judgments, namely leave them at effectively the department level, which will not be happening anytime soon (and which has its own downsides if my own institution is anything to go by).
An aside: this is where journals should be stepping in. However, it appears that they are no longer reliable indicators of quality. Many are very conservative institutions whose stringent review processes tend to promote “safe” incremental findings. Many work hard to protect their impact factors to the point of only very reluctantly publishing work critical of previously published work. Many seem just a stones throw removed from show business where results are embargoed until an opening day splash can be arranged. At any rate, professional journals is a venue in which responsible judgment could be exercised, but, it appears, that it is difficult even here.
So, there are science (indeed academy) wide forces imposing shallow measures for evaluation and reward that bad statistical habits can successfully game. I have no problem believing this. But I still do not see how these forces suffice to explain the “crisis” before us. Why? Because such explanations are too general and the problems appear to hold not in general but in localizable domains of inquiry. More exactly, the incentives S&M cites and the problems of induction that DC elaborates are pervasive. Nonetheless, the science (more particularly, replication) crisis seems localized in specific sub-areas of investigation, ones that I would describe as more concerned with establishing facts than in detailing causal mechanisms.  Here’s what I mean.
What’s the aim of inquiry? For DC it is “to establish facts, as accurately as possible” (here, 1). For me, it is to explain why things are as they are. Now, I concede that the second project relies on the first. But I would equally claim that the first relies on the second. Just as we need facts to verify theories, we need theories to validate facts. The main problem with lots of “science” (and I am sure you won’t be surprised to hear me write this) is that it is theory free. Thus, the only way to curb its statistical enthusiasm is by being methodologically pristine. You gotta get the stats exactly right for this is the only thing grounding the result. In most cases of drug trials, for example, we have no idea why they work, and for practical purposes we may not (immediately) care. The question is do they, not how. Sciences stuck in the “does it” stage rather than the “how does it do it and why” stages, not surprisingly, have it tough. Fact gathering in the absence of understanding is going to really hard even with great stats tools. Should we be surprised that in areas where we know very little that stats can and do regularly mislead?
Note that the real sciences do not seem to be in the same sad state as psych, bio-med and neuroscience. You don’t see tons of articles explaining how the physics of the last 20 years is rotten to its empirical core. Not that Nobel winning results are not challenged. They can be and are. Here’s a recent example in which dark energy and the thesis that the universe is expanding at an accelerating rate is being challenged (see here) based on more extensive data. But in this case, evaluation of the empirical possibilities heavily relies on a rich theoretical background. Here’s a quote from one of the lead critics. Note how the critique relies on an analysis of an “oversimplified theoretical model” and how some further theoretical sophistication would lead to different empirical results. This interplay between theory and data (statistically interpreted data by and large) is not available in domains where there is no “fundamental theory,” (i.e. non-trival theory).
'So it is quite possible that we are being misled and that the apparent manifestation of dark energy is a consequence of analysing the data in an oversimplified theoretical model - one that was in fact constructed in the 1930s, long before there was any real data. A more sophisticated theoretical framework accounting for the observation that the universe is not exactly homogeneous and that its matter content may not behave as an ideal gas - two key assumptions of standard cosmology - may well be able to account for all observations without requiring dark energy. Indeed, vacuum energy is something of which we have absolutely no understanding in fundamental theory.'
So, IMO, the problem with most problematic “science” is that it is not yet really science. It has not moved from the earliest data collection stage to the explanation stage where what’s at issue are not facts but mechanisms. If this is roughly right, then the “end of science” problems will dissipate as understanding deepens (if it ever does (no guarantee that it will or should)) in these domains. So understood, the demise of science that replication problems herald is more a problem for the particular areas identified (and more an indication of how little is known here) than for science as a whole.
That said, let me end with one or two caveats. The science-in-crisis narrative rests on the slew of false discoveries regularly churned out. Szilard’s worry mooted in the quote above is different. His worry is not false discoveries but the trivialization of research as big science promotes quantity and incrementalism over quality and concern for the big issues. Interestingly, this too is a recurrent theme. Szilard voiced this worry over 60 years ago. More recently (the last 15 years or so), Peter Lawrence voiced similar concerns in two pieces that discuss Szilard’s problem in the context of how scientific work is evaluated for granting and publication (here and here). And the problem is discussed in very much the same terms today. Here (and here) are two papers in Nature from 2016 which address virtually the same questions in virtually the same terms (i.e. how institutions reward more of the same reserach, punish thinking about new questions, look at publication numbers rather than judge quality etc.). What is striking is that this is all stuff noted and lamented before and the proposed fixes are pretty much the same: calls for judgment to replace auditing.
I agree that this would be a good idea. In fact, I believe that one of the reasons for the disparagement of theory in linguistics is a reflection of the same demands it makes on judgment for adequate evaluation. It is easier to see if a story “captures” the facts than to see if it offers an interesting explanation. So I am all in favor of promoting judgment as an important factor in scientific evaluation. However, to repeat, I am skeptical this is actually doable as judgment is not something that bureaucracies do well and like it or not, today science is big and so, not surprisingly, it comes with a large bureaucracy attached. Let me explain.
Today science is conducted in big settings (universities, labs, foundations, funding agencies). Big settings engender bureaucratic oversight, and not for entirely bad reasons. Bureaucracies arise in response to real needs where the actions of large numbers of people require coordination. And given the size of modern science, bureaucracy is inevitable. Unfortunately, bureaucracies by necessity favor blunt metrics over refined judgment (i.e. quantitative auditable measures over nuanced hard to compare evaluations). And all of this fosters the problems that Szilard and Lawrence and the Nature comments worry about. As noted, I think that this is simply unavoidable given the current economics of research. The hopeful (e.g. Lawrence) think that there are ways of mitigating these trends. I hope they are right. However, given the fact that this problem recurs regularly and the same solutions get suggested just as regularly, I have my doubts.
Let me end on a more positive note. It may not be possible to inject judgment into the process in a systematic way. However, it may be possible to find ways to promote unconventional research by having a sub-part of the bureaucracy looking for it. In the old days when money was plentiful, “whacky” research got institutional support because everything did (think of the early days of GG funding, or early CS). When money gets scarcer we need to still put aside some for work to support the unconventional. This is a problem in portfolio management: put most of your cash on safe stuff and 10% or so on unconventional stuff. The latter will mostly fail, but when it pays off, it pays off big. The former rarely fails, but its payoffs are small. Maybe the best we can do right now is allow our institutions to start thinking about the wild 10% just a little bit more.
So, the replication crisis will take care of itself as it is largely a reflection of the primitive nature of most of the “science” that it infects. The trivialization problem, IMO, is more serious and here, IMO, the problem is and will remain much harder to solve.
 I have long thought that stats should be treated a little like the Rabbis treated Kabbalah. The Rabbis banned its study as too dangerous until the age of forty, i.e. explosive in the hands of clever but callow neophytes.
 The collapse seems to be restricted. In psych, it is largely restricted to social psych. Perception and cognition, for example, seem relatively immune to the non-replicability disease. In bio-medicine, the bio part also seems healthy. Nobody is worrying about the findings in basic cell biology or physiology. The problem seems limited to non “basic” discoveries (e.g. is cholesterol/fat bad for you, does such and such drug work as advertised, and so on). In neuroscience the problems also seem largely restricted to fMRI results of the sort that make it into the NYTs. If one were inclined to be skeptical, one might say that the problems arise not in those areas where we know something about the underlying mechanisms but in those domains where we know relatively little. But who would be so skeptical?
 The search for explanation ends up generating novel data (facts). But the aim is not to establish new facts but to understand what is going on. In the absence of theory it might even be hard to know what a “fact” is. Is it a fact that the sun rises in the East and sets in the west? Well, yes and no. It depends.
 It also reflects the current scientism of the age. Nothing nowadays is legit unless wrapped up in scientific looking layers. Not surprisingly much trivial insight is therefore statistically marinated so that it can look scientific.
Thursday, October 27, 2016
Monday, October 24, 2016
Let’s say we find two languages displaying a common pattern, or two languages converging towards a common pattern, or even all languages doing the same. How should we explain this? Stephen Anderson (here, and discussed by Haspelmath here) notes that if you are a GGer there are three available options: (i) the nature of the input, (ii) the learning theory and (iii) the cognitive limits of the LAD (be they linguistically specific or domain general). Note that (ii) will include (iii) as a subpart and will have to reflect the properties of (i) but will also include all sorts other features (cognitive control, structure of memory and attention, the number of options the LAD considers at one time etc.). These, as Anderson notes, are the only options available to a GGer for s/he takes G change to reflect the changing distribution of Gs in the heads of a population of speakers. Or, to put this more provocatively: languages don't exist apart from their incarnation in speakers’ minds/brains. And given this, all diachronic “laws” (laws that explain how languages or Gs change over time) must reflect the cognitive, linguistic or computational properties of human minds/brains.
This said, Haspelmath (H) observes (here and here) (correctly in my view) that GGers have long “preferred purely synchronic ways of explaining typological distributions,” and by this he means explanations that allude to properties of the “innate Language Faculty” (see here for discussion). In other words, GGers like to think that typological differences reflect intrinsic properties of FL/UG and that studying patterns of variation will hence shed light on its properties. I have voiced some skepticism concerning this “hence” here. In what follows I would like to comment on H’s remarks on a similar topic. However, before I get into details I should note that we might not be talking about the same thing. Here’s what I mean.
The way I understand it, FL/UG bears on properties of Gs not on properties of their outputs. Hence, when I look at typology I am asking how variation in typologies and historical change might explain changes in Gs. Of course, I use outputs of these Gs to try to discern the properties of the underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar (identical?) outputs from different congeries of G rules, operations and filters. In effect, whereas changing surface patterns do signal some change in the underlying Gs, similarity of surface patterns need not. Moreover, given our current accounts there is (sadly) too many roads to Rome, thus the fact that two Gs generate similar outputs (or have moved towards similar outputs from different Gish starting points) does not imply that they must be doing so in the same way. Maybe they are and maybe not. It really all depends.
Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.” He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?
Recall, we need to keep our questions clear. Say that we have identified an actual NUT (i.e. we have compelling evidence that certain kinds of G changes are “preferred”). If we have this and we find another G changing in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it. Well, in part: we have identified the kind of thing it is even if we do not yet know why these types of things exist. An analogy: I have a pencil in my hand. I open it. The pencil falls. Why? Gravitational attraction. I then find out that the same thing happens when I have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and any other school supply at hand. I conclude that these falls are all instances of the same causal power (i.e. gravity). Have I explained why when I pick up a thumbtack and let it loose and it too falls that it falls because of gravity? Well, up to a point. A small point IMO, but a point nonetheless. Of course we want to know how Gravity does this, what exactly it does when it does it and even why it does is the way that it does, but classifying phenomena into various explanatory pots is often a vital step in setting up the next step of the investigation (viz. identifying and explaining the properties of the alleged underlying “force”).
This said, I agree that the explanation is pretty lame if left like this. Why did X fall when I dropped it? Because everything falls when you drop it. Satisfied? I hope not.
Sadly, from where I sit, many explanations of typological difference or diachronic change have this flavor. In GG we often identify a parameter that has switched value and (more rarely) some PLD that might have led to the switch. This is devilishly hard to do right and I am not dissing this kind of work. However, it is often very unsatisfying given how easy it is to postulate parameters for any observable difference. Moreover, very few proposals actually do the hard work of sketching the presupposed learning theory that would drive the change or looking at the distribution of PLD that the learning theory would evaluate in making the change. To get beyond the weak explanations noted above, we need more robust accounts of the nature of the learning mechanisms and the data that was input to it (PLD) that led to the change. Absent this, we do have an explanation of a very weak sort.
Would H agree? I think so, but I am not absolutely sure of this. I think that H runs together things that I would keep separate. For example: H considers Anderson’s view that many synchronic features of a G are best seen as remnants of earlier patterns. In other words, what we see in particular Gs might be reflections of “the shaping effects of history” and “not because the nature of the Language Faculty requires it” (H quoting Anderson: p. 2). H rejects this for the following reason: he doesn’t see “how the historical developments can have “shaping effects” if they are “contingent” (p. 2). But why not? What does the fact that something is contingent have to do with whether it can be systematically causal? 1066 and all that was contingent, yet its effects on “English” Gs has been long lasting. There is no reason to think that contingent events cannot have long lasting shaping effects.
Nor, so far as I can tell, is there reason to think that this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies might not explain “universal tendencies.” Here’s what I mean.
Let’s for the sake of argument assume that there are around 50 different parameters (and this number is surely small). This gives a space of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different languages out there (and I assume, maybe incorrectly, Gs) is on the order of 7,000, at least that’s the number I hear bandied about among typologists. This number is miniscule. It covers .0005% of the possible space. It is not inconceivable that languages in this part of the space have many properties in common purely because they are all in the same part of the space. These common properties would be contingent in a UG sense if we assumed that we only accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these properties. It is even possible that it is hard to get to any other of the G possibilities given that we are in this region. On this sort of account, there might be many apparent universals that have no deep cognitive grounding and are nonetheless pervasive. Don’t get me wrong, I am not saying these exist, only that we really have no knock down reason for thinking they do not. And if something like this could be true, then the fact that some property did or didn’t occur in every G could be attributed to the nature of the kind of PLD our part of the G space makes available (or how this kind of PLD interacts with the learning algorithm). This would fit with Anderson’s view: contingent yet systematic and attributable to the properties of the PLD plus learning theory.
I don’t think that H (nor most linguists) would find this possibility compelling. If something is absent from 7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well maybe not. My only claim is that the basis for this confidence is not particularly clear. And thinking through this scenario makes it clear that gaps in the existing language patterns/Gs are (at best) suggestive about FL/UG properties rather than strongly dispositive. It could be our ambient PLD that is responsible. We need to see the reasoning. Culbertson and Adger provide a nice model for how this might be done (see here).
One last point: what makes PoS arguments powerful is that they are not subject to this kind of sampling skepticism. PoS arguments really do, if successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs abstract away from PLD altogether and so remove this as a causal source of systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of course, the two kinds of investigation can be combined However, it is worth keeping in mind that typological investigations will always suffer from the kind of sampling problem noted above and will thus be less direct probes of FL/UG than will PoS considerations. This suggests, IMO, that it would be very good practice to supplement typologically based conclusions with PoS style arguments. Even better would be explicit learning models, though these will be far more demanding given how hard it likely is to settle on what the PLD is for any historical change.
I found H’s discussion of these matters to be interesting and provocative. I disagree with many things that H says (he really is focused on languages rather than Gs). Nonetheless, his discussion can be translated well enough into my own favored terms to be worth thinking about. Take a look.
 I say ‘apparent’ for I know very little of this literature though I am willing to assume H is correct that these exist for the sake of argument.
 Which does not mean that we have nice models of what better accounts might look like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William Sakas, Charles Yang, a.o., have provided excellent models of what such explanations would look like.
 Again a nice example of this is Culbertson and Adger’s work discussed here. It develops an artificial G argument (meatier than a simple PoS argument) to more firmly ground a typological conclusion.
 Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example, shows.
Tuesday, October 18, 2016
I have a question: what’s the “natural” size of a publishable linguistics paper? I ask because after indulging in a reading binge of papers I had agreed to look at for various reasons, it seems that 50 is the assumed magic number. And this number, IMO, is too high. If it really takes 50 pages for you to make your point, then either you are having trouble locating the point that you want to make, or you are trying to make too many of them in a single paper. Why care?
I care about this for two reasons. First I think that the size of the “natural” paper is a fair indicator of the theoretical sophistication of a field. Second, I believe that if the “natural” size is, say, 50 pages, then 50 pages will be the benchmark of a “serious” paper and people will aim to produce 50 page papers even if this means taking a 20 page idea and blowing it up to 50 pages. And we all know where this leads. To bloated papers that make it harder than it should be (and given the explosion of new (and excellent) research, it’s already harder than it used to be) to stay current with the new ideas in the field. Let me expand on these two points just a bit.
There are several kinds of linguistics papers. The ones that I am talking would be classified as in theoretical linguistics, specifically syntax. The aim of such a paper is to make a theoretical point. Data and argument are marshaled in service of making this point. Now, in a field with well-developed theory, this can be usually done economically. Why? Because the theoretical question/point of interest can be crisply stated and identified. Thus, the data and arguments of interest can be efficiently deployed wrt this identified theoretical question/point. The less theoretically firm the discipline the harder it is to do this well and the longer (more pages) it takes to identify the relevant point etc. This is what I mean by saying that the size of the “natural” paper can be taken as a (rough) indicator of how theoretically successful a field is. In the “real” sciences, only review papers go on for 50 pages. Most are under 10 and many are less than that (it is called “Phys Rev Letters” for a reason). In the “real” sciences, one does not extensively review earlier results. One cites them, takes what is needed and moves on. Put another way, in “real” sciences one builds on earlier results, one does not rehearse them and re-litigate them. They are there to be built on and your contribution is one more brick in a pretty well specified wall of interlocking assumptions, principles and empirical results.
This is less true in theoretical syntax. Most likely it is because practitioners do not agree as widely about the theoretical results in syntax than people in physics agree about the results there. But, I suspect, that there is another reason as well. In many of the real sciences, papers don’t locally aim for truth (of course, every scientific endeavor globally does). Here’s what I mean.
Many theoretical papers are explorations of what you get by combining ideas in a certain way. The point of interest is that some combinations lead to interesting empirical, theoretical or conceptual consequences. The hope is that these consequences are also true (evaluated over a longer run), but the immediate assumption of many papers is that the assumptions are (or look) true enough (or are interesting enough even if recognizably false) to explore even if there are (acknowledged) problems with them. My impression is that this is not the accepted practice in syntax. Here if you start with assumptions that have “problems” (in syntax, usually, (apparent) empirical difficulties) then it is thought illegitimate to use these assumptions or further explore their consequences. And this has two baleful influences in paper writing: it creates an incentive to fudge one’s assumptions and/or creates a requirement to (re)defend them. In either case, we get pressure to bloat.
A detour: I have never really understood why exploring problematic assumptions (PA) is so regularly dismissed. Actually, I do understand. It is a reflex of theoretical syntax’s general anti-theoretical stance. IMO, theory is that activity that explores how assumptions connect to lead to interesting consequences. That’s what theoretical exploration is. If done correctly, it leads to a modicum of explanation.
This activity is different from how theory is often described in the syntax literature. There it is (often) characterized as a way of “capturing” data. On this view, the data are unruly and wild and need to be corralled and tamed. Theory is that instrument used to pen it in. But if your aim is to “capture” the data, then capturing some, while loosing others is not a win. This is why problematic assumptions (PA) are non grata. Empirically leaky PAs are not interesting precisely because they are leaky. Note, then, that the difference between “capturing” and “explaining” is critical. Leaky PAs might be explanatorily rich even if empirically problematic. Explanation and data coverage are two different dimensions of evaluation. The aim, of course, is to get to those accounts that both explain and are empirically justified. The goal of “capture” blurs these two dimensions. It is also, IMO, very counterproductive. Here’s why.
Say that one takes a PA and finds that it leads to a nice result, be it empirical or theoretical or conceptual. Then shouldn’t this be seen as an argument for PA regardless of its other problems? And shouldn’t this also be an argument that the antecedent problems the PA suffers from might possibly be apparent rather than real? All we really can (and should) do as theorists is explore the consequences of sets of assumptions. One hopes that over time the consequences as a whole favor one set over others. Hence, there is nothing methodologically inapposite in assuming some PA if it fits the bill. In fact, it is a virtue theoretically speaking for it allows us to more fully explore that idea and see if we can understand why even if false it seems to be doing useful work.
Let’s now turn to the second more pragmatic point. There has been an explosion of research in syntax. It used to be possible to keep up with, by reading, everything. I don’t believe that this is still possible. However, it would make it easier to stay tuned to the important issues if papers were more succinct. I think I’ve said this on FOL before (though I can’t recall where), but I have often found it to be the case that a short form version of a later published paper (say a NELs or WCCFL version) is more useful than the longer more elaborated descendant. Why? Because the longer version is generally more “careful,” and not always in a good way. By this I mean that there are replies to reviewers that require elaboration but that often obscure the main idea. Not always, but often enough.
So as not to end on too grumpy a note, let me suggest the following template for syntax papers. It answers three questions: What’s the problem? Why is it interesting? How to solve it?
The first section should be short and to the point. A paper that cannot identify a crisp problem is one that should likely be rewritten.
The second section should also be short, but it is important. Not all problems are equally interesting. It’s the job of a paper to indicate why the reader should care. In linguistics this means identifying how the results bear on the structure of FL/UG. What light does your question, if answered, hope to shed on the central question of modern GG, the fine structure of FL.
The last section is the meat, generally. Only tell the reader enough to understand the explanation to the question being offered. For a theory paper, raw data should be offered but the discussion should proceed by discussing the structures that these data imply. GGers truck in grammars, which truck in rules and structures and derivations. A theory paper that is not careful and explicit about these is not written correctly. Many paeprs in very good journals take great care to get the morphological diacritics right in the glosses but often eschew providing explicit derivations and phrase markers that exhibit the purported theoretical point. For GG, God is not in the data points, but in the derivations etc. that these data points are in service of illuminating.
Let me go a bit over the top here. IMO, journals would do well to stop publishing most data, reserving this for available methods addenda available online. The raw data is important, and the exposition should rely on it and make it available but the exposition should advert to it not present it. This is now standard practice in journals like Science and there is no reason why it should not be standard practice in ling journals too. It would immediately cut down the size of most articles by at least a third (try this for a typical NLLT paper for example).
Only after the paper has offered its novelties should one compare what’s been offered to other approaches in the field. I agree that this is suggestion should not be elevated to a hard and fast rule. Sometimes a proposal is usefully advanced by demonstrating the shortcomings in others that it will repair. However, more often than not comparisons of old and new are hard to make without some advanced glimpse of the new. In my experience, comparison is most useful after the fact.
Delaying comparison will also have another positive feature, I believe. A proposal might be interesting even if it does no better than earlier approaches. I suspect that we upfront “problems” with extant hypotheses because it is considered illicit to offer an alternative unless the current favorite is shown to be in some way defective. There is a founder prejudice operative that requires that the reigning champion not be discomfited unless proven to be inferior. But this is false. It is useful to know that there are many routes to a common conclusion (see here for discussion). It is often even useful to have an alternative that does less well.
So, What, Why How with a 15-20 page limit, with the hopes of lowering this to 10-15. If that were to happen I would feel a whole lot guiltier for being so far behind in my reading.
 Actually, I do understand. It is a reflex of theoretical syntax’s general anti-theory stance.
 This might be showing my age for I think that it is well nigh impossible nowadays to publish a short version of a paper in a NELs or WCCFL proceeding and then an elaborated version in more prestigious journal. If so, take it from me!