Monday, October 31, 2016
Professor Poeppel's suggested reading list
At Talking Brains, David has helpfully posted a must read list of cog material for the well educated cog-neuro person. Thought linguists might find the list useful as well. David has excellent taste, and the work he notes is actually a lot of fun to read.
Sunday, October 30, 2016
More on the collapse of science
The first dropped shoe announced the “collapse” of science.
It clearly dropped with a loud bang as this “news” has become a staple of
conventional wisdom. The second shoe is poised and ready to drop. It’s
ambition? To explain why the first shoe fell. Now that we know that science is collapsing we all want to know why
exactly it is doing so and whether there is anything we can do to bring back
the good old days.
So why the fall? The current favorite answer appears to be a
combination of bad incentives for ambitious scientists and statistical tools
(significance testing being the current bête noir) that “gave scientists a
mathematical machine for turning baloney into breakthroughs, and flukes into
funding” ((now that’s a rhetorical
flourish!) cited here p.
12). So, powerful tools in ambitious hands lead to scientific collapse. In
fact, ambition may be beside the point, academic survival alone may be a sufficient
motive. Put people in hyper competitive environments and give them a tool that “lets”
them get their work “done” in a timely manner and all hell breaks loose.[1]
I have just read several papers that develop this theme in
great detail. They are worth reading, IMO, for they do a pretty good job of
identifying real forces in contemporary academic research (and not limited to
the sciences). These forces are not new. The above “baloney” quote is from 1998
and there are prescient observations relating to somewhat similar (though not
identical) effects made as early as 1948. Here’s Leo Szilard (cited here):
Answer from the hero in Leo Szilard’s 1948 story
“The Mark Gable Foundation” when asked by a wealthy entrepreneur who believes
that science has progressed too quickly, what he should do to retard this
progress: “You could set up a foundation with an annual endowment of thirty
million dollars. Research workers in need of funds could apply for grants, if they
could make a convincing case. Have ten committees, each composed of twelve
scientists, appointed to pass on these applications. Take the most active
scientists out of the laboratory and make them members of these committees.
...First of all, the best scientists would be removed from their laboratories
and kept busy on committees passing on applications for funds. Secondly the scientific
workers in need
of funds would concentrate on problems which were considered promising and were pretty certain to lead to publishable results. ...By going after the obvious, pretty soon science would dry out. Science would become something like a parlor game. ...There would be fashions. Those who followed the fashions would get grants. Those who wouldn’t would not.”
of funds would concentrate on problems which were considered promising and were pretty certain to lead to publishable results. ...By going after the obvious, pretty soon science would dry out. Science would become something like a parlor game. ...There would be fashions. Those who followed the fashions would get grants. Those who wouldn’t would not.”
The papers I’ve read come in two flavors. The first are
discussions of the perils of p-values. Those who read the Andrew Gelman blog
are already familiar with many of the problems. The main issue seems to be that
phishing for significance is extremely hard to avoid, even by those with true
hearts and noble natures (see the Simonsohn (a scourge of p-hacking) quote here).
Here
(and the more popular here)
are a pair of papers that go into how this works in ways that I found helpful.
One important point the author (David Colquhoun (DC))
makes is that the false discovery (aka: the false positive) problem is quite
general, and endemic to all forms of inductive reasoning. It follows from the
“obvious rules of conditional probabilities.” So this is not just a problem for
Fisher and significance testing, but applies to all modes of inductive inquiry,
including Bayesian modes.
Assuming this is right and that even the noble might be
easily mislead statistically, is there some way of mitigating the problem? One
rather pessimistic paper suggests that the answer is no. Here (with
a popular exposition here)
is a paper that gives an evolutionary model of how bad science must win out over good in our current
academic environment. It is a kind of Gresham’s law theory where quick
successful bad work floods less quick, careful good work. In fact, the paper
argues that not even a culture where replication is highly valued will stop bad
work from pushing out the good so long as “original” research remains more highly
valued than “mere” replication.
The authors, Smaldino and McElreath (S&M), base these
grim projections on an evolutionary model they develop which tracks the reward
structure of publication and the incentives that these impose on individual and
labs. I am no expert in these matters, but the model looks reasonable enough
and the forces it identifies and incorporates seem real enough. The solution: shift
from a culture that rewards “discovery” to one that rewards “understanding.”
I personally like the sound of this (see below), but I am
skeptical that it is operationalizable, at least institutionally. The reason is
that valuing understanding requires exercising judgment (it involves more than
simple bookkeeping) and this is both subjective (and hence hard to defend in
large institutional settings) and effortful (which makes it hard to get busy
people to do). Moreover, it requires some very non-trivial understanding of the
relevant disciplines and this is a lot to expect even within small departments,
let alone university wide APT committees or broad based funding agencies. A
tweet by a senior scientist (quoted in S&M p.2) makes the relevant point:
“I’ve been on a number of search committees. I don’t remember anybody looking
at anybody’s papers. Number and IF [impact factor] of pubs are what counts.” I
don’t believe that this is only the
result of sloth and irresponsibility. In many circumstances it is silly to rely
on your own judgment. Given how specialized so much good work has become, it is unreasonable to think that we can as
individuals make useful judgments about the quality of work. I don’t see this
changing, especially above the department level anytime soon.
Let me belabor this. It is not clear how people above the
department level would competently judge work outside their area of expertise.
I know that I would not feel competent to read and understand a paper in most
areas outside of syntax, especially if my judgment carried real consequences.
If so, who can we get to judge whose judgments would be reasonable? And if there is no one then what can one do
but count papers weighted by some “prestige” factor? Damn if I know. So, I
agree that it would be nice if we could weight matters towards more thoughtful
measures that involved serious judgment, but this will require putting most APT
decisions in the hands of those that can make these judgments, namely leave
them at effectively the department level, which will not be happening anytime
soon (and which has its own downsides if my own institution is anything to go
by).
An aside: this is where journals should be stepping in.
However, it appears that they are no longer reliable indicators of quality.
Many are very conservative institutions whose stringent review processes tend
to promote “safe” incremental findings. Many work hard to protect their impact
factors to the point of only very reluctantly publishing work critical of
previously published work. Many seem just a stones throw removed from show
business where results are embargoed until an opening day splash can be
arranged. At any rate, professional journals is a venue in which responsible
judgment could be exercised, but, it appears, that it is difficult even here.
So, there are science (indeed academy) wide forces imposing
shallow measures for evaluation and reward that bad statistical habits can
successfully game. I have no problem believing this. But I still do not see how
these forces suffice to explain the “crisis” before us. Why? Because such explanations are too general and
the problems appear to hold not in
general but in localizable domains of inquiry. More exactly, the incentives
S&M cites and the problems of induction that DC elaborates are pervasive.
Nonetheless, the science (more particularly, replication) crisis seems
localized in specific sub-areas of investigation, ones that I would describe as
more concerned with establishing facts than in detailing causal mechanisms. [2]
Here’s what I mean.
What’s the aim of inquiry? For DC it is “to establish facts,
as accurately as possible” (here,
1). For me, it is to explain why things are as they are.[3]
Now, I concede that the second project relies on the first. But I would equally
claim that the first relies on the second. Just as we need facts to verify
theories, we need theories to validate facts. The main problem with lots of
“science” (and I am sure you won’t be surprised to hear me write this) is that
it is theory free. Thus, the only way
to curb its statistical enthusiasm is by being methodologically pristine. You
gotta get the stats exactly right for this is the only thing grounding the
result. In most cases of drug trials, for example, we have no idea why they
work, and for practical purposes we may not (immediately) care. The question is
do they, not how. Sciences stuck in the “does it” stage rather than the “how
does it do it and why” stages, not surprisingly, have it tough. Fact gathering
in the absence of understanding is going to really hard even with great stats
tools. Should we be surprised that in areas where we know very little that
stats can and do regularly mislead?
Note that the real sciences do not seem to be in the same
sad state as psych, bio-med and neuroscience. You don’t see tons of articles
explaining how the physics of the last 20 years is rotten to its empirical
core. Not that Nobel winning results are not challenged. They can be and are.
Here’s a recent example in which dark energy and the thesis that the universe
is expanding at an accelerating rate is being challenged (see here) based on
more extensive data. But in this case, evaluation of the empirical
possibilities heavily relies on a rich theoretical background. Here’s a quote
from one of the lead critics. Note how the critique relies on an analysis of an
“oversimplified theoretical model” and how some further theoretical
sophistication would lead to different empirical results. This interplay
between theory and data (statistically interpreted data by and large) is not
available in domains where there is no “fundamental theory,” (i.e. non-trival
theory).
'So it is quite possible that we are being misled and
that the apparent manifestation of dark energy is a consequence of analysing
the data in an oversimplified theoretical model - one that was in fact
constructed in the 1930s, long before there was any real data. A more
sophisticated theoretical framework accounting for the observation that the
universe is not exactly homogeneous and that its matter content may not behave
as an ideal gas - two key assumptions of standard cosmology - may well be able
to account for all observations without requiring dark energy. Indeed, vacuum
energy is something of which we have absolutely no understanding in fundamental
theory.'
So, IMO, the problem with most problematic “science” is that
it is not yet really science. It has not moved from the earliest data
collection stage to the explanation stage where what’s at issue are not facts
but mechanisms. If this is roughly right, then the “end of science” problems
will dissipate as understanding deepens (if it ever does (no guarantee that it
will or should)) in these domains. So understood, the demise of science that
replication problems herald is more a problem for the particular areas
identified (and more an indication of how little is known here) than for
science as a whole.[4]
That said, let me end with one or two caveats. The science-in-crisis
narrative rests on the slew of false discoveries regularly churned out. Szilard’s
worry mooted in the quote above is different. His worry is not false
discoveries but the trivialization of research as big science promotes quantity
and incrementalism over quality and concern for the big issues. Interestingly,
this too is a recurrent theme. Szilard voiced this worry over 60 years ago.
More recently (the last 15 years or so), Peter Lawrence voiced similar concerns
in two pieces that discuss Szilard’s problem in the context of how scientific
work is evaluated for granting and publication (here
and here).
And the problem is discussed in very much the same terms today. Here
(and here)
are two papers in Nature from 2016
which address virtually the same questions in virtually the same terms (i.e.
how institutions reward more of the same reserach, punish thinking about new
questions, look at publication numbers rather than judge quality etc.). What is
striking is that this is all stuff noted and lamented before and the proposed
fixes are pretty much the same: calls for judgment to replace auditing.
I agree that this would be a good idea. In fact, I believe
that one of the reasons for the disparagement of theory in linguistics is a
reflection of the same demands it makes on judgment for adequate evaluation. It
is easier to see if a story “captures” the facts than to see if it offers an
interesting explanation. So I am all in favor of promoting judgment as an
important factor in scientific evaluation. However, to repeat, I am skeptical
this is actually doable as judgment is not something that bureaucracies do well
and like it or not, today science is big and so, not surprisingly, it comes
with a large bureaucracy attached. Let me explain.
Today science is conducted in big settings (universities,
labs, foundations, funding agencies). Big settings engender bureaucratic
oversight, and not for entirely bad reasons. Bureaucracies arise in response to
real needs where the actions of large numbers of people require coordination.
And given the size of modern science, bureaucracy is inevitable. Unfortunately,
bureaucracies by necessity favor blunt metrics over refined judgment (i.e.
quantitative auditable measures over nuanced hard to compare evaluations). And
all of this fosters the problems that Szilard and Lawrence and the Nature comments worry about. As noted, I
think that this is simply unavoidable given the current economics of research.
The hopeful (e.g. Lawrence) think that there are ways of mitigating these
trends. I hope they are right. However, given the fact that this problem recurs
regularly and the same solutions get suggested just as regularly, I have my
doubts.
Let me end on a more positive note. It may not be possible
to inject judgment into the process in a systematic way. However, it may be
possible to find ways to promote unconventional research by having a sub-part
of the bureaucracy looking for it. In the old days when money was plentiful,
“whacky” research got institutional support because everything did (think of
the early days of GG funding, or early CS). When money gets scarcer we need to
still put aside some for work to support the unconventional. This is a problem
in portfolio management: put most of your cash on safe stuff and 10% or so on
unconventional stuff. The latter will mostly fail, but when it pays off, it
pays off big. The former rarely fails, but its payoffs are small. Maybe the
best we can do right now is allow our institutions to start thinking about the
wild 10% just a little bit more.
So, the replication crisis will take care of itself as it is
largely a reflection of the primitive nature of most of the “science” that it
infects. The trivialization problem, IMO, is more serious and here, IMO, the
problem is and will remain much harder to solve.
[1]
I have long thought that stats should be treated a little like the Rabbis
treated Kabbalah. The Rabbis banned its study as too dangerous until the age of
forty, i.e. explosive in the hands of clever but callow neophytes.
[2]
The collapse seems to be restricted. In psych, it is largely restricted to
social psych. Perception and cognition, for example, seem relatively immune to
the non-replicability disease. In bio-medicine, the bio part also seems
healthy. Nobody is worrying about the findings in basic cell biology or physiology.
The problem seems limited to non “basic” discoveries (e.g. is cholesterol/fat
bad for you, does such and such drug work as advertised, and so on). In
neuroscience the problems also seem largely restricted to fMRI results of the
sort that make it into the NYTs. If one were inclined to be skeptical, one
might say that the problems arise not in those areas where we know something
about the underlying mechanisms but in those domains where we know relatively
little. But who would be so skeptical?
[3]
The search for explanation ends up generating novel data (facts). But the aim
is not to establish new facts but to understand what is going on. In the
absence of theory it might even be hard to know what a “fact” is. Is it a fact that the sun rises in the East
and sets in the west? Well, yes and no. It depends.
[4]
It also reflects the current scientism of the age. Nothing nowadays is legit
unless wrapped up in scientific looking layers. Not surprisingly much trivial
insight is therefore statistically marinated so that it can look scientific.
Thursday, October 27, 2016
Who says that linguistics isn't glamorous?
First, Klingon and now this. Maybe there will soon be academy awards for best original language and best language adapted from fieldwork.
Monday, October 24, 2016
Universal tendencies
Let’s say we find two languages displaying a common pattern,
or two languages converging towards a common pattern, or even all languages
doing the same. How should we explain this? Stephen Anderson (here,
and discussed by Haspelmath here)
notes that if you are a GGer there are three available options: (i) the nature
of the input, (ii) the learning theory and (iii) the cognitive limits of the
LAD (be they linguistically specific or domain general). Note that (ii) will
include (iii) as a subpart and will have to reflect the properties of (i) but
will also include all sorts other features (cognitive control, structure of memory
and attention, the number of options the LAD considers at one time etc.). These,
as Anderson notes, are the only options available to a GGer for s/he takes G
change to reflect the changing distribution of Gs in the heads of a population
of speakers. Or, to put this more provocatively: languages don't exist apart
from their incarnation in speakers’ minds/brains. And given this, all
diachronic “laws” (laws that explain how languages or Gs change over time) must
reflect the cognitive, linguistic or computational properties of human
minds/brains.
This said, Haspelmath (H) observes (here and here) (correctly in my view) that
GGers have long “preferred purely synchronic ways of explaining typological
distributions,” and by this he means explanations that allude to properties of
the “innate Language Faculty” (see here for
discussion). In other words, GGers like to think that typological differences
reflect intrinsic properties of FL/UG and that studying patterns of variation
will hence shed light on its properties. I have voiced some skepticism
concerning this “hence”
here. In what follows I would like to comment on H’s remarks on a similar
topic. However, before I get into details I should note that we might not be
talking about the same thing. Here’s what I mean.
The way I understand it, FL/UG bears on properties of Gs not
on properties of their outputs. Hence, when I look at typology I am asking how
variation in typologies and historical change might explain changes in Gs. Of
course, I use outputs of these Gs to try to discern the properties of the
underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar
(identical?) outputs from different congeries of G rules, operations and filters.
In effect, whereas changing surface patterns do signal some change in the
underlying Gs, similarity of surface patterns need not. Moreover, given our
current accounts there is (sadly) too many roads to Rome, thus the fact that
two Gs generate similar outputs (or have moved towards similar outputs from
different Gish starting points) does not imply that they must be doing so in
the same way. Maybe they are and maybe not. It really all depends.
Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?
Recall, we need to keep our questions clear. Say that we have
identified an actual NUT (i.e. we have compelling evidence that certain kinds
of G changes are “preferred”). If we have this and we find another G changing
in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it.
Well, in part: we have identified the kind of thing it is even if we do not yet
know why these types of things exist. An
analogy: I have a pencil in my hand. I open it. The pencil falls. Why?
Gravitational attraction. I then find out that the same thing happens when I
have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and
any other school supply at hand. I conclude that these falls are all instances
of the same causal power (i.e.
gravity). Have I explained why when I pick up a thumbtack and let it loose and
it too falls that it falls because of gravity? Well, up to a point. A small
point IMO, but a point nonetheless. Of
course we want to know how Gravity
does this, what exactly it does when it does it and even why it does is the way
that it does, but classifying phenomena into various explanatory pots is often
a vital step in setting up the next step of the investigation (viz. identifying
and explaining the properties of the alleged underlying “force”).
This said, I agree that the explanation is pretty lame if
left like this. Why did X fall when I dropped it? Because everything falls when
you drop it. Satisfied? I hope not.
Sadly, from where I sit, many explanations of typological
difference or diachronic change have this flavor. In GG we often identify a
parameter that has switched value and (more rarely) some PLD that might have
led to the switch. This is devilishly hard to do right and I am not dissing
this kind of work. However, it is often very unsatisfying given how easy it is
to postulate parameters for any observable difference. Moreover, very few
proposals actually do the hard work of sketching the presupposed learning
theory that would drive the change or looking at the distribution of PLD that
the learning theory would evaluate in making the change. To get beyond the weak
explanations noted above, we need more robust accounts of the nature of the
learning mechanisms and the data that was input to it (PLD) that led to the
change.[2]
Absent this, we do have an explanation of a very weak sort.
Would H agree? I think so, but I am not absolutely sure of
this. I think that H runs together things that I would keep separate. For
example: H considers Anderson’s view that many synchronic features of a G are
best seen as remnants of earlier patterns. In other words, what we see in
particular Gs might be reflections of “the shaping effects of history” and “not
because the nature of the Language Faculty requires it” (H quoting Anderson: p.
2). H rejects this for the following reason: he doesn’t see “how the historical
developments can have “shaping effects” if they are “contingent” (p. 2). But
why not? What does the fact that
something is contingent have to do with whether it can be systematically
causal? 1066 and all that was contingent, yet its effects on “English” Gs has
been long lasting. There is no reason to think that contingent events cannot
have long lasting shaping effects.
Nor, so far as I can tell, is there reason to think that
this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies
might not explain “universal tendencies.” Here’s what I mean.
Let’s for the sake of argument assume that there are around
50 different parameters (and this number is surely small). This gives a space
of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different
languages out there (and I assume, maybe incorrectly, Gs) is on the order of
7,000, at least that’s the number I hear bandied about among typologists. This
number is miniscule. It covers .0005% of the possible space. It is not
inconceivable that languages in this part of the space have many properties in
common purely because they are all in the same part of the space. These common
properties would be contingent in a UG sense if we assumed that we only
accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these
properties. It is even possible that it is hard to get to any other of the G
possibilities given that we are in this region.
On this sort of account, there might be many apparent universals that
have no deep cognitive grounding and are nonetheless pervasive. Don’t get me
wrong, I am not saying these exist, only that we really have no knock down reason
for thinking they do not. And if
something like this could be true, then the fact that some property did or
didn’t occur in every G could be attributed to the nature of the kind of PLD
our part of the G space makes available (or how this kind of PLD interacts with
the learning algorithm). This would fit with Anderson’s view: contingent yet
systematic and attributable to the properties of the PLD plus learning theory.
I don’t think that H (nor most
linguists) would find this possibility compelling. If something is absent from
7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well
maybe not. My only claim is that the basis for this confidence is not
particularly clear. And thinking through this scenario makes it clear that gaps
in the existing language patterns/Gs are (at best) suggestive about FL/UG properties
rather than strongly dispositive. It could be our ambient PLD that is
responsible. We need to see the reasoning. Culbertson and Adger provide a nice
model for how this might be done (see here).
One last point: what makes PoS
arguments powerful is that they are not
subject to this kind of sampling skepticism. PoS arguments really do, if
successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs
abstract away from PLD altogether and so remove this as a causal source of
systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of
course, the two kinds of investigation can be combined However, it is worth
keeping in mind that typological investigations will always suffer from the
kind of sampling problem noted above and will thus be less direct probes of
FL/UG than will PoS considerations. This suggests, IMO, that it would be very good
practice to supplement typologically based conclusions with PoS style arguments.[3]
Even better would be explicit learning models, though these will be far more
demanding given how hard it likely is to settle on what the PLD is for any
historical change.[4]
I found H’s discussion of these matters to be interesting
and provocative. I disagree with many things that H says (he really is focused
on languages rather than Gs). Nonetheless, his discussion can be translated
well enough into my own favored terms to be worth thinking about. Take a look.
[1]
I say ‘apparent’ for I know very little of this literature though I am willing
to assume H is correct that these exist for the sake of argument.
[2]
Which does not mean that we have nice models of what better accounts might look
like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William
Sakas, Charles Yang, a.o., have provided excellent models of what such
explanations would look like.
[3]
Again a nice example of this is Culbertson and Adger’s work discussed here.
It develops an artificial G argument (meatier than a simple PoS argument) to
more firmly ground a typological conclusion.
[4]
Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example,
shows.
Tuesday, October 18, 2016
Right sizing ling papers
I have a question: what’s the “natural” size of a
publishable linguistics paper? I ask because after indulging in a reading binge
of papers I had agreed to look at for various reasons, it seems that 50 is the assumed
magic number. And this number, IMO, is too high. If it really takes 50 pages for you to make
your point, then either you are having trouble locating the point that you want
to make, or you are trying to make too many of them in a single paper. Why
care?
I care about this for two reasons. First I think that the
size of the “natural” paper is a fair indicator of the theoretical
sophistication of a field. Second, I believe that if the “natural” size is,
say, 50 pages, then 50 pages will be the benchmark of a “serious” paper and
people will aim to produce 50 page papers even if this means taking a 20 page
idea and blowing it up to 50 pages. And we all know where this leads. To
bloated papers that make it harder than it should be (and given the explosion
of new (and excellent) research, it’s already harder than it used to be) to
stay current with the new ideas in the field. Let me expand on these two points
just a bit.
There are several kinds of linguistics papers. The ones that
I am talking would be classified as in theoretical
linguistics, specifically syntax. The aim of such a paper is to make a
theoretical point. Data and argument are marshaled in service of making this point. Now, in a field with
well-developed theory, this can be usually done economically. Why? Because the
theoretical question/point of interest can be crisply stated and identified.
Thus, the data and arguments of interest can be efficiently deployed wrt this
identified theoretical question/point. The less theoretically firm the
discipline the harder it is to do this well and the longer (more pages) it
takes to identify the relevant point etc.
This is what I mean by saying that the size of the “natural” paper can
be taken as a (rough) indicator of how theoretically successful a field is. In
the “real” sciences, only review papers go on for 50 pages. Most are under 10
and many are less than that (it is called “Phys Rev Letters” for a reason). In the “real” sciences, one does not extensively
review earlier results. One cites them, takes what is needed and moves on. Put
another way, in “real” sciences one builds on earlier results, one does not
rehearse them and re-litigate them. They are there to be built on and your
contribution is one more brick in a pretty well specified wall of interlocking
assumptions, principles and empirical results.
This is less true in theoretical syntax. Most likely it is
because practitioners do not agree as widely about the theoretical results in
syntax than people in physics agree about the results there. But, I suspect,
that there is another reason as well. In many of the real sciences, papers
don’t locally aim for truth (of
course, every scientific endeavor globally
does). Here’s what I mean.
Many theoretical papers are explorations of what you get by
combining ideas in a certain way. The point of interest is that some
combinations lead to interesting empirical, theoretical or conceptual
consequences. The hope is that these consequences are also true (evaluated over
a longer run), but the immediate assumption of many papers is that the assumptions
are (or look) true enough (or are interesting
enough even if recognizably false) to explore even if there are (acknowledged)
problems with them. My impression is that this is not the accepted practice in
syntax. Here if you start with assumptions that have “problems” (in syntax,
usually, (apparent) empirical difficulties) then it is thought illegitimate to
use these assumptions or further explore their consequences. And this has two
baleful influences in paper writing: it creates an incentive to fudge one’s
assumptions and/or creates a requirement to (re)defend them. In either case, we
get pressure to bloat.
A detour: I have never really understood why exploring
problematic assumptions (PA) is so regularly dismissed.[1]
Actually, I do understand. It is a
reflex of theoretical syntax’s general anti-theoretical stance. IMO, theory is
that activity that explores how assumptions connect to lead to interesting
consequences. That’s what theoretical exploration is. If done correctly, it
leads to a modicum of explanation.
This activity is different from how theory is often
described in the syntax literature. There it is (often) characterized as a way
of “capturing” data. On this view, the data are unruly and wild and need to be
corralled and tamed. Theory is that instrument used to pen it in. But if your aim
is to “capture” the data, then capturing some, while loosing others is not a
win. This is why problematic assumptions (PA) are non grata. Empirically leaky PAs are not interesting precisely
because they are leaky. Note, then, that the difference between “capturing” and
“explaining” is critical. Leaky PAs might be explanatorily rich even if
empirically problematic. Explanation and data coverage are two different dimensions of evaluation. The
aim, of course, is to get to those accounts that both explain and are
empirically justified. The goal of “capture” blurs these two dimensions. It is
also, IMO, very counterproductive. Here’s why.
Say that one takes a PA and finds that it leads to a nice
result, be it empirical or theoretical or conceptual. Then shouldn’t this be
seen as an argument for PA regardless
of its other problems? And shouldn’t this also be an argument that the
antecedent problems the PA suffers from might possibly be apparent rather than
real? All we really can (and should) do as theorists is explore the
consequences of sets of assumptions. One hopes that over time the consequences
as a whole favor one set over others. Hence, there is nothing methodologically
inapposite in assuming some PA if it fits the bill. In fact, it is a virtue
theoretically speaking for it allows us to more fully explore that idea and see
if we can understand why even if false
it seems to be doing useful work.
Let’s now turn to the second more pragmatic point. There has
been an explosion of research in syntax. It used to be possible to keep up
with, by reading, everything. I don’t believe that this is still possible.
However, it would make it easier to stay tuned to the important issues if
papers were more succinct. I think I’ve said this on FOL before (though I can’t
recall where), but I have often found it to be the case that a short form
version of a later published paper (say a NELs or WCCFL version) is more useful
than the longer more elaborated descendant.[2]
Why? Because the longer version is generally more “careful,” and not always in
a good way. By this I mean that there are replies to reviewers that require
elaboration but that often obscure the main idea. Not always, but often enough.
So as not to end on too grumpy a note, let me suggest the
following template for syntax papers. It answers three questions: What’s the
problem? Why is it interesting? How to solve it?
The first section should be short and to the point. A paper
that cannot identify a crisp problem is one that should likely be rewritten.
The second section should also be short, but it is
important. Not all problems are equally interesting. It’s the job of a paper to
indicate why the reader should care. In linguistics this means identifying how
the results bear on the structure of FL/UG. What light does your question, if
answered, hope to shed on the central question of modern GG, the fine structure
of FL.
The last section is the meat, generally. Only tell the
reader enough to understand the explanation to the question being offered. For
a theory paper, raw data should be offered but the discussion should proceed by
discussing the structures that these data imply. GGers truck in grammars, which
truck in rules and structures and derivations. A theory paper that is not
careful and explicit about these is not written correctly. Many paeprs in very
good journals take great care to get the morphological diacritics right in the
glosses but often eschew providing explicit derivations and phrase markers that
exhibit the purported theoretical point. For GG, God is not in the data points,
but in the derivations etc. that these data points are in service of
illuminating.
Let me go a bit over the top here. IMO, journals would do
well to stop publishing most data, reserving this for available methods addenda
available online. The raw data is important, and the exposition should rely on
it and make it available but the exposition should advert to
it not present it. This is now standard practice in journals like Science and there is no reason why it
should not be standard practice in ling journals too. It would immediately cut
down the size of most articles by at least a third (try this for a typical NLLT
paper for example).
Only after the paper has offered its novelties should one
compare what’s been offered to other approaches in the field. I agree that this
is suggestion should not be elevated to a hard and fast rule. Sometimes a
proposal is usefully advanced by demonstrating the shortcomings in others that
it will repair. However, more often than not comparisons of old and new are
hard to make without some advanced glimpse of the new. In my experience,
comparison is most useful after the fact.
Delaying comparison will also have another positive feature,
I believe. A proposal might be interesting even if it does no better than
earlier approaches. I suspect that we upfront “problems” with extant hypotheses
because it is considered illicit to offer an alternative unless the current favorite is shown to be in some way defective. There
is a founder prejudice operative that requires that the reigning champion not
be discomfited unless proven to be inferior. But this is false. It is useful to
know that there are many routes to a common conclusion (see here
for discussion). It is often even useful to have an alternative that does less
well.
So, What, Why How with a 15-20 page limit, with the hopes of
lowering this to 10-15. If that were to happen I would feel a whole lot
guiltier for being so far behind in my reading.
[1]
Actually, I do understand. It is a
reflex of theoretical syntax’s general anti-theory stance.
[2]
This might be showing my age for I think that it is well nigh impossible
nowadays to publish a short version of a paper in a NELs or WCCFL proceeding
and then an elaborated version in more prestigious journal. If so, take it from
me!