Let’s say we find two languages displaying a common pattern,
or two languages converging towards a common pattern, or even all languages
doing the same. How should we explain this? Stephen Anderson (here,
and discussed by Haspelmath here)
notes that if you are a GGer there are three available options: (i) the nature
of the input, (ii) the learning theory and (iii) the cognitive limits of the
LAD (be they linguistically specific or domain general). Note that (ii) will
include (iii) as a subpart and will have to reflect the properties of (i) but
will also include all sorts other features (cognitive control, structure of memory
and attention, the number of options the LAD considers at one time etc.). These,
as Anderson notes, are the only options available to a GGer for s/he takes G
change to reflect the changing distribution of Gs in the heads of a population
of speakers. Or, to put this more provocatively: languages don't exist apart
from their incarnation in speakers’ minds/brains. And given this, all
diachronic “laws” (laws that explain how languages or Gs change over time) must
reflect the cognitive, linguistic or computational properties of human
minds/brains.
This said, Haspelmath (H) observes (here and here) (correctly in my view) that
GGers have long “preferred purely synchronic ways of explaining typological
distributions,” and by this he means explanations that allude to properties of
the “innate Language Faculty” (see here for
discussion). In other words, GGers like to think that typological differences
reflect intrinsic properties of FL/UG and that studying patterns of variation
will hence shed light on its properties. I have voiced some skepticism
concerning this “hence”
here. In what follows I would like to comment on H’s remarks on a similar
topic. However, before I get into details I should note that we might not be
talking about the same thing. Here’s what I mean.
The way I understand it, FL/UG bears on properties of Gs not
on properties of their outputs. Hence, when I look at typology I am asking how
variation in typologies and historical change might explain changes in Gs. Of
course, I use outputs of these Gs to try to discern the properties of the
underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar
(identical?) outputs from different congeries of G rules, operations and filters.
In effect, whereas changing surface patterns do signal some change in the
underlying Gs, similarity of surface patterns need not. Moreover, given our
current accounts there is (sadly) too many roads to Rome, thus the fact that
two Gs generate similar outputs (or have moved towards similar outputs from
different Gish starting points) does not imply that they must be doing so in
the same way. Maybe they are and maybe not. It really all depends.
Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?
Recall, we need to keep our questions clear. Say that we have
identified an actual NUT (i.e. we have compelling evidence that certain kinds
of G changes are “preferred”). If we have this and we find another G changing
in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it.
Well, in part: we have identified the kind of thing it is even if we do not yet
know why these types of things exist. An
analogy: I have a pencil in my hand. I open it. The pencil falls. Why?
Gravitational attraction. I then find out that the same thing happens when I
have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and
any other school supply at hand. I conclude that these falls are all instances
of the same causal power (i.e.
gravity). Have I explained why when I pick up a thumbtack and let it loose and
it too falls that it falls because of gravity? Well, up to a point. A small
point IMO, but a point nonetheless. Of
course we want to know how Gravity
does this, what exactly it does when it does it and even why it does is the way
that it does, but classifying phenomena into various explanatory pots is often
a vital step in setting up the next step of the investigation (viz. identifying
and explaining the properties of the alleged underlying “force”).
This said, I agree that the explanation is pretty lame if
left like this. Why did X fall when I dropped it? Because everything falls when
you drop it. Satisfied? I hope not.
Sadly, from where I sit, many explanations of typological
difference or diachronic change have this flavor. In GG we often identify a
parameter that has switched value and (more rarely) some PLD that might have
led to the switch. This is devilishly hard to do right and I am not dissing
this kind of work. However, it is often very unsatisfying given how easy it is
to postulate parameters for any observable difference. Moreover, very few
proposals actually do the hard work of sketching the presupposed learning
theory that would drive the change or looking at the distribution of PLD that
the learning theory would evaluate in making the change. To get beyond the weak
explanations noted above, we need more robust accounts of the nature of the
learning mechanisms and the data that was input to it (PLD) that led to the
change.[2]
Absent this, we do have an explanation of a very weak sort.
Would H agree? I think so, but I am not absolutely sure of
this. I think that H runs together things that I would keep separate. For
example: H considers Anderson’s view that many synchronic features of a G are
best seen as remnants of earlier patterns. In other words, what we see in
particular Gs might be reflections of “the shaping effects of history” and “not
because the nature of the Language Faculty requires it” (H quoting Anderson: p.
2). H rejects this for the following reason: he doesn’t see “how the historical
developments can have “shaping effects” if they are “contingent” (p. 2). But
why not? What does the fact that
something is contingent have to do with whether it can be systematically
causal? 1066 and all that was contingent, yet its effects on “English” Gs has
been long lasting. There is no reason to think that contingent events cannot
have long lasting shaping effects.
Nor, so far as I can tell, is there reason to think that
this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies
might not explain “universal tendencies.” Here’s what I mean.
Let’s for the sake of argument assume that there are around
50 different parameters (and this number is surely small). This gives a space
of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different
languages out there (and I assume, maybe incorrectly, Gs) is on the order of
7,000, at least that’s the number I hear bandied about among typologists. This
number is miniscule. It covers .0005% of the possible space. It is not
inconceivable that languages in this part of the space have many properties in
common purely because they are all in the same part of the space. These common
properties would be contingent in a UG sense if we assumed that we only
accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these
properties. It is even possible that it is hard to get to any other of the G
possibilities given that we are in this region.
On this sort of account, there might be many apparent universals that
have no deep cognitive grounding and are nonetheless pervasive. Don’t get me
wrong, I am not saying these exist, only that we really have no knock down reason
for thinking they do not. And if
something like this could be true, then the fact that some property did or
didn’t occur in every G could be attributed to the nature of the kind of PLD
our part of the G space makes available (or how this kind of PLD interacts with
the learning algorithm). This would fit with Anderson’s view: contingent yet
systematic and attributable to the properties of the PLD plus learning theory.
I don’t think that H (nor most
linguists) would find this possibility compelling. If something is absent from
7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well
maybe not. My only claim is that the basis for this confidence is not
particularly clear. And thinking through this scenario makes it clear that gaps
in the existing language patterns/Gs are (at best) suggestive about FL/UG properties
rather than strongly dispositive. It could be our ambient PLD that is
responsible. We need to see the reasoning. Culbertson and Adger provide a nice
model for how this might be done (see here).
One last point: what makes PoS
arguments powerful is that they are not
subject to this kind of sampling skepticism. PoS arguments really do, if
successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs
abstract away from PLD altogether and so remove this as a causal source of
systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of
course, the two kinds of investigation can be combined However, it is worth
keeping in mind that typological investigations will always suffer from the
kind of sampling problem noted above and will thus be less direct probes of
FL/UG than will PoS considerations. This suggests, IMO, that it would be very good
practice to supplement typologically based conclusions with PoS style arguments.[3]
Even better would be explicit learning models, though these will be far more
demanding given how hard it likely is to settle on what the PLD is for any
historical change.[4]
I found H’s discussion of these matters to be interesting
and provocative. I disagree with many things that H says (he really is focused
on languages rather than Gs). Nonetheless, his discussion can be translated
well enough into my own favored terms to be worth thinking about. Take a look.
[1]
I say ‘apparent’ for I know very little of this literature though I am willing
to assume H is correct that these exist for the sake of argument.
[2]
Which does not mean that we have nice models of what better accounts might look
like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William
Sakas, Charles Yang, a.o., have provided excellent models of what such
explanations would look like.
[3]
Again a nice example of this is Culbertson and Adger’s work discussed here.
It develops an artificial G argument (meatier than a simple PoS argument) to
more firmly ground a typological conclusion.
[4]
Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example,
shows.
Are not all parameters binary? In that case would not the number of possible Gs be 2^50, (i.e., more than 10^15)?
ReplyDeleteAre you saying that the FoL may allow Gs that have/lack certain properties that none of the languages we have access to actually instantiate?
ReplyDeleteYes. Logically this is possible. The range of Gs we see might be a small subset of those FL allows and so generalizing from properties of those we see to those FL allows is potentially risky. Hence, inferring FL structures from typological generalizations, though common and reasonable, is hardly trivial. IMO, this problem does not infect other kinds of inferences to the structure of FL in the same way or degree (e.g. inference from PoS reasoning).
DeleteI agree that something that's absent from the 6000-7000 languages we see is not in principle guaranteed to be ruled out by our mental capacities (be they linguistic or otherwise). This is a methodological heuristic, and I think it's one that has served us well. I personally find this heuristic to be much more reasonable than the one that underlies work in the artificial grammar paradigm – namely, that how adults treat novel linguistic data is relevant to how children do so.
ReplyDelete(And I agree with the first commenter regarding the math: 2^50 is 1,125,899,906,842,624.)
I agree with the math too. But that makes the sample even smaller. So, heuristic it is. A good one? Not bad, and as I think one always uses what one can and does the best with it one can, I have no problems with it. But, I think it is worth noting how tendentious it is as a probe for FL/UG. The unquestioned assumption is that this is the best (form some, only) way to investigate FL/UG. I thought that highlighting its weaknesses was a public service. Hope you agree.
Delete>> PoS arguments really do, if successful, shed direct light on FL/UG
DeleteNorbert, it seems to me you are saying that there is something nearly infallible about PoS arguments, which is not true of "heuristics". I very much like PoS arguments, personally, but it is not clear that they necessarily shed light on FL/UG.
If we go back to Zeno’s paradoxes, which seemed like unassailable arguments for a couple of millennia, then many would argue in retrospect, that they didn’t really shed any light on reality, but really, they now shed light on our collective ignorance (of calculus).
This is what makes me uncomfortable when anyone starts using PoS arguments as proofs, instead of interesting starting points for future inquiry. I am not sure whether they necessarily shed light on FL/UG or whether they reveal our collective ignorance. But, PoS arguments do make for interesting further questions, which for me is the real reason to follow them.
@Norbert: I agree that it's worth reminding ourselves that this is a heuristic. And, for what it's worth, yes: the math correction means that our coverage is even smaller than you made it out to be. But whether or not this coverage is "too small" depends entirely on the structure of the space. There are certain spaces of size 2^50 for which a sample of 6000 is much more than you need in order to figure out everything about how the space is structured. (Is language that way? I don't know.)
DeleteBut all of this is specifically about the question of whether lack-of-crosslinguistic-attestation is dispositive. There is something related, but separate, about which I know we disagree: how necessary is it to look at what *is* attested in other languages, even if one is ultimately interested in the mental infrastructure. I feel confident saying that it's very necessary, simply judging by the history. I don't think it's really disputable that some principles, proposed on the basis of English (and closely related languages), have turned out to be wrong when confronted with facts from a broader set of languages. Here's a few examples: the purported relation between A-movement and case; the Activity Condition; the impossibility of A-movement out of finite clauses; the idea that finiteness is linked to tense in particular (as opposed to aspect, or even person or location; see Ritter & Wiltschko on Halkomelem and Blackfoot); the idea that Spec,TP has anything to do with agreement (an interestingly persistent misapprehension); the idea that the presence or absence of agreement is intrinsically linked to finiteness (or tense, for that matter); etc. And note, these are just from the particular area that I work in – I'm sure that people who work in other areas could make equally good lists from their respective necks of the woods.
Let me be clear: I think there is tons we can learn from a single language, and most of it stands up to scrutiny (or needs very minor tweaking) when confronted with the broader crosslinguistic picture. But some of it doesn't. Perhaps not for any deep, principled reason (i.e., it could have been otherwise) – but that's just how things have turned out (at least from where I sit).
So, no, I don't really trust results that have not been vetted by at least a basic crosslinguistic sanity-check. The good news is, most people listening to a talk or reading a paper do this implicitly; they think about how this would work in (other) languages that they know something about. And if there's a problem, they bring it up. That is, after all, how we know that all of the principles I listed above are false :-)
@Karthik:
DeleteThough my love of PoS arguments is nearly unbounded I don't quite go to infallibility (though Ido wish I could claim they were apodictic!). What I do think is that they are more DIRECT windows onto the structure of FL/UG than typological inductions over Gs. And this, I believe, is the opposite of the common view, which is why I try to strongly make the case for it.
Furthermore, I believe that it is important to make this point for another reason. In the real sciences it is understood that some data bear more directly on a given mechanism than does other. This has to do with the structure of the relevant theories and the inferential steps linking them to data. PoS shines a bright light on FL/UG precisely because it abstracts away from PLD when successful. PLD distorts one view of FL/UG via its influence on the properties of particular Gs. What makes PoS so good is precisely that it abstracts from such interfering details. In other words it focuses on G properties as such and not G+PLD properties. Those interested in languages and their properties will find this kind of argument unconvincing. But that's because I think that they are only secondarily interested in FL/UG. The main interest is in language and its diversity. FL/UG are of subsidiary concern, FL/UG being an abstraction, at best an induction over different Gs.
Last point: I could not agree with you more re the last point. All of this would be nugatory were PoS argumentation heuristically idle. One hopes it has legs and leads to other good questions, including ones that can be further elaborated and pursued typologically. That said, I would be happy were the logic of the above conceded. If your main interest is in FL/UG understand that approaching the problem mainly through comparative typology is a very indirect route, one far more indirect than PoS argumentation.
One very last point: this does not mean to say that PoS should always lead. It may be that setting up a good one is too hard and that a typological investigation is easier and so more cost effective. That there is a direct method does not mean that indirect ones are useless or to be eschewed. Not so. However, the PRESUPPOSITION that the typological approach is the best way to study FL/UG (and this is the default view IMO) is just wrong.
@Omer:
DeleteMy argument is not that typology cannot dispose of FL/UG claims. Of course it can, and here it does not matter what the space G possibilities looks like. If something is claimed impossible it cannot exist. 7,000 languages are more than enough (as you remind me often) to establish that some principle is wrong. However, there is another view that I think is wrong: that comparative typology shines a direct light on FL/UG. Or, to study it you must first gather a set of Gs, preferably very diverse and then see what they have in common. This induction over Gs treats FL/UG as a kind of abstraction over G properties. This, I believe, is the standard conception of UG (and, surprise, I believe that it is an abstract version of Greeneberg's conception). It is this view that I find severely wanting. It is not that one can learn a lot about FL/UG from a single language, it's that one can only learn about FL/UG by abstracting away from the variable effects that PLD has that are most responsible for making languages appear diverse. The question is how to abstract from this? IMO, PoS is an excellent way. We need others, and I think that the Culbertson and Adger attempt was an interesting one.
A last point: one way of reading your comment is that we need typology to refine what we find using other methods. I can live with this. PoS arguments hardly ever implicate an exact mechanism. Rather it identifies a class of data that requires an FL/UG source. In other words, it isolates data that directly reflect and FL/UG source. All alone, it does not specify the exact mechanisms as many different ones may suffice. Here comparative data could be valuable for it might allow us to narrow done the specifics at play. That's hard to disagree with, so I won't.
One last word on sanity checks: were listeners regularly to ask themselves about the PoS implications of some typological proposal the way they regularly apply a cross linguistic sanity check to proposed universals I would be a happy camper. In my experience, the issues hardly ever arise, and when they are mooted a surprising silence ensues. Tant pis.
This comment has been removed by the author.
DeleteHow much does the approximation of number of parameters vary? Is there not a risk of circularity if the parameters are derived from how the Gs of the languages of the world look like, and then the space of possible Gs is approximated from this number of parameters? Or am I missing something?
ReplyDelete