Comments

Showing posts with label Martin Haspelmath. Show all posts
Showing posts with label Martin Haspelmath. Show all posts

Wednesday, September 19, 2018

Generative grammar's Chomsky Problem

Martin Haspelmath (MP) and I inhabit different parts of the (small) linguistics universe. Consequently, we tend to value very different kinds of work and look to answer very different kinds of questions. As a result, when our views converge, I find it interesting to pay attention. In what follows I note a point or two of convergence. Here is the relevant text that I will be discussing (Henceforth MHT (for MHtext)).[1]

MHT’s central claim is that “Chomsky no longer argues for a rich UG of the sort that would be relevant for the ordinary grammarian and, e.g. for syntax textbooks” (1). It extends a similar view to me: “even if he is not as radical about a lean UG as Chomsky 21stcentury writings (where nothing apart from recursion is UG), Hornstein’s view is equally incompatible with current practice in generative grammar” (MHT emphasis, (2)).[2]

Given that neither Chomsky nor I seems to be inspiring current grammatical practice (btw, thx for the company MH), MHT notes that “generative grammarians currently seem to lack an ideological superstructure.” MHT seems to suggest that this is a problem (who wants to be superstructure-less after all?), though it is unclear for whom, other than Chomsky and me (what’s a superstructure anyhow?). MHT adds that Chomsky “does not seem to be relevant to linguistics anymore” (2).

MHT ends with a few remarks about Chomsky on alien (as in extra-terrestial) language, noting a difference between him and Jessica Coon on this topic. Jessica says the following (2):

 When people talk about universal grammar it’s just the genetic endowment that allows humans to acquire language. There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did. But while having a better understanding of human language wouldn’t necessarily help, hopefully it’d give us tools to know how we might at least approach the problem.

This is a pretty vintage late 1980s bioling view of FL. Chomsky demurs, thinking that perhaps “the Martian language might not be so different from human language after all” (3). Why? Because Chomsky proposes that many features of FL might be grounded in generic computational properties rather than idiosyncratic biological ones. In his words:

We can, in short, try to sharpen the question of what constitutes a principled explanation for properties of language, and turn to one of the most fundamental questions of the biology of language: to what extent does language approximate an optimal solution to conditions that it must satisfy to be usable at all, given extralinguistic structural architecture?” 

MHT finds this opaque (as do I actually) though the intent is clear: To the degree that the properties of FL and the Gs it gives rise to are grounded in general computational properties, properties that a system would need to have “to be usable at all” then to that degree there is no reason to think that these properties would be restricted to human language (i.e. there is no reason to think that they would be biologically idiosyncratic). 

MHT’s closing remark about this is to reiterate his main point: “Chomsky’s thinking since at least 2002 is not really compatible with the practice of mainstream generative grammar” (3-4).

I agree with this, especially MHT's remark about current linguistic practice. Much of what interests Chomsky (and me) is not currently high up on the GG research agenda. Indeed, I have argued (herethat much of current GG research has bracketed the central questions that originally animated GG research and that this change in interests is what largely lies behind the disappointment many express with the Minimalist Program (MP). 

More specifically, I think that though MP has been wildly successful in its own terms and that it is the natural research direction building on prior results in GG, its central concerns have been of little mainstream interest. If this assessment is correct, it raises a question: why the mainstream disappointment with MP and why has current GG practice diverged so significantly from Chomsky’s? I believe that the main reason is that MP has sharpened the two contradictory impulses that have been part of the GG research program from its earliest days. Since the beginning there has been a tension between those mainly interested in the philological details of languages and those interested in the mental/cognitive/neuro implications of linguistic competence.

We can get a decent bead on the tension by inspecting two standard answers to a simple question: what does linguistics study? The obvious answer is language. The less obvious answer is the capacity for language (aka, linguistic competence). Both are fine interests (actually, I am not sure that I believe this, but I want to be concessive (sorry Jerry)). And for quite a while it did not much matter to everyday research in GG which interest guided inquiry as the standard methods for investigating the core properties of the capacity for language proceeded via a filigree philological analysis of the structures of language. So, for example, one investigated the properties of the construal modules by studying the distribution of reflexives and pronouns in various languages. Or by studying the locality restrictions on questions formation (again in particular languages) one could surmise properties of the mentalist format of FL rules and operations. Thus, the way that one studied the specific cognitive capacity a speaker of a particular language L had was by studying the details of the language L and the way that one studied more general (universal) properties characteristic of FL and UG was by comparing and contrasting constructions and their properties across various Ls. In other words, the basic methods were philological even if the aims were cognitive and mentalisic.[3]And because of this, it was perfectly easy for the work pursued by the philologically inclined to be useful to those pursuing the cognitive questions and vice versa. Linguistic theory provided powerful philological tools for the description of languages and this was a powerful selling point. 

This peaceful commensalism ends with MP. Or, to put it more bluntly, MP sharpens the differences between these two pursuits because MP inquiry only makes sense in a mentalistic/cognitive/neuro setting. Let me explain.

Here is very short history of GG. It starts with two facts: (1) native speakers are linguistically productive and (2) any human can learn any language. (1) implies that natural languages are open ended and thus can only be finitely characterized via recursive rule systems (aka grammars (Gs)). Languages differ in the rules their Gs embody. Given this, the first item on the GG research agenda was to specify the kinds of rules that Gs have and the kinds of dependencies Gs care about. Given an inventory of such rules sets up the next stage of inquiry.

The second stage begins with fact (2). Translated into Gish terms it says that any Language Acquisition Device (aka, child) can acquire any G. We called this meta-capacity to acquire Gs “FL” and we called the fine structure of FL “UG.” The fact that any child can acquire any G despite the relative paucity and poverty of the linguistic input data implies that FL has some internal structure. We study this structure by studying the kinds of rules that Gs can and cannot have. Note that this second project makes little sense until we have candidate G rules. Once we have some, we can ask why the rules we find have the properties they do (e.g. structure dependence, locality, c-command). Not surprisingly then, the investigation of FL/UG and the investigation of language particular Gs naturally went hand in hand and the philological methods beloved of typologists and comparative grammarians led the way. And boy did they lead! GB was the culmination of this line of inquiry. GB provided the first outlines of what a plausible FL/UG might look like, one that had grounding in facts about actual Gs. 

Now, this line of research was, IMO, very successful. By the mid 90s, GG had discovered somewhere in the vicinity of 25-35 non-trivial universals (i.e. design features of FL) that were “roughly” correct (see here for a (partial) list). These “laws of grammar” constitute, IMO, a great intellectual achievement. Moreover, they set the stage for MP in much the way that the earlier discovery of rules of Gs set the stage for GB style theories of FL/UG. Here’s what I mean.

Recall that studying the fine structure of FL/UG makes little sense unless we have candidate Gs and a detailed specification of some of their rules. Similarly, if one’s interest is in understanding why our FL has the properties it has, we need some candidate FL properties (UG principles) for study. This is what the laws of grammar provide; candidate principles of FL/UG. Given these we can now ask why we have these kinds of rules/principles and not other conceivable ones. And this is the question that MP sets for itself: why this FL/UG? MP, in short, takes as its explanadum the structure of FL.[4]

Note, if this is indeed the object of study, then MP only makes sense from a cognitive perspective. You won’t ask why FL has the properties it has if you are not interested in FL’s properties in the first place. So, whereas the minimalist program so construed makes sense in a GG setting of the Chomsky variety where a mental organ like FL and its products are the targets of inquiry, it is less clear that the project makes much sense if ones interests are largely philological (in fact, it is pretty clear to me that it doesn’t). If this is correct and if it is correct that most linguists have mainly philological interests then it should be no surprise that most linguists are disappointed with MP inquiry. It does not deliver what they can use for it is no longer focused on questions analogous to the ones that were prominent before and which had useful spillover effects. The MP focus is on issues decidedly more abstract and removed from immediate linguistic data than heretofore. 

There is a second reason that MP will disappoint the philologically inclined. It promotes a different sort of inquiry. Recall that the goal is explaining the properties of FL/UG (i.e. the laws of grammar are the explanada). But this explanatory project requires presupposing that the laws are more or less correct. In other words, MP takes GB as (more or less) right.[5] MP's added value comes in explaining it, not challenging it. 

In this regard, MP is to GB what Subjacency Theory is to Ross’s islands. The former takes Ross’s islands as more or less descriptively accurate and tries to derive them on the basis of more natural assumptions. It would be dumb to aim at such a derivation if one took Ross’s description to be basically wrong headed. So too here. Aiming to derive the laws of grammar requires believing that these are basically on the right track. However, this means that so far as MP is concerned, the GBish conception of UG, though not fundamental, is largely empirically accurate. And this means that MP is not an empirical competitor to GB. Rather, it is a theoretical competitor in the way that Subjacency Theory is to Ross’s description of islands. Importantly, empirically speaking, MP does not aim to overthrow (or even substantially revise the content of) earlier theory.[6]

Now this is a problem for many working linguists. First, many don’t have the same sanguine view that I do of GB and the laws it embodies. In fact, I think that many (most?) linguists doubt that we know very much about UG or FL or that the laws of grammar are even remotely correct. If this is right, then the whole MP enterprise will seem premature and wrong headed to them.  Second, even if one takes these as decent approximations to the truth, MP will encourage a kind of work that will be very different from earlier inquiry. Let me explain.

The MP project so conceived will involve two subparts. The first one is to derive the GB principles. If successful, this will mean that we end up empirically where we started. If successful, MP will recover the content of GB. Of course, if you think GB is roughly right, then this is a good place to end up. But the progress will be theoretical not empirical. It will demonstrate that it is reasonable to think that FL is simpler than GB presents it as being. However, the linguistic data covered will, at least initially, be very much the same. Again, this is a good thing from a theoretical point of view. But if one’s interests are philological and empirical, then this will not seem particularly impressive as it will largely recapitulate GB's empirical findings, albeit in a novel way.

The second MP project will be to differentiate the structure of FL and to delineate those parts that are cognitively general from those that are linguistically proprietary. As you all know, the MP conceit is that linguistic competence relies on only a small cognitive difference between us and our apish cousins. MP expects FL’s fundamental operations and principles to be cognitively and computationally generic rather than linguistically specific. When Chomsky denies UG, what he denies is that there is a lot of linguistic specificity to FL (again: he does not deny that the GB identified principles of UG are indeed characteristic features of FL). Of course, hoping that this is so and showing that it might be/is are two very different things. The MP research agenda is to make good on this. Chomsky’s specific idea is that Merge and some reasonable computational principles are all that one needs. I am less sanguine that this is all that one needs, but I believe that a case can be made that this gets one pretty far. At any rate, note that most of this work is theoretical and it is not clear that it makes immediate contact with novel linguistic data (except, of course, in the sense that it derives GB principles/laws that are themselves empirically motivated (though recall that these are presupposed rather than investigated)). And this makes for a different kind of inquiry than the one that linguists typically pursue. It worries about finding natural more basic principles and showing how these can be deployed to derive the basic features of FL. So a lot more theoretical deduction and a lot less (at least initially) empirical exploration.

Note, incidentally, that in this context, Chomsky’s speculations about Martians and his disagreement with Coons is a fanciful and playful way of making an interesting point. If FL’s basic properties derive from the fact that it is a well designed computational system (its main properties follow from generic features of computations), then we should expect other well designed computational systems to have similar properties. That is what Chomsky is speculating might be the case. 

So, why is Chomsky (and MP work more generally) out of the mainstream? Because mainstream linguistics is (and has always been IMO) largely uninterested in the mentalist conception of language that has always motivated Chomsky’s view of language. For a long time, the difference in motivations between Chomsky and the rest of the field was of little moment. With MP that has changed. The MP project only makes sense in a mentalist setting and invites decidedly philologically  projects without direct implications for further philological inquiry. This means that the two types of linguistics are parting company. That’s why many have despaired about MP. It fails to have the crossover appeal that prior syntactic theory had. MHT's survey of the lay of the linguistic land accurately reflects this IMO.

Is this a bad thing? Not necessarily, intellectually speaking. After all, there are different projects and there is no reason why we all need to be working on the same things, though I would really love it if the field left some room for the kind of theoretical speculation that MP invites.

However, the divergence might be sociologically costly. Linguistics has gained most of its extra mural prestige from being part of the cog-neuro sciences. Interestingly, MP has generated interest in that wider world (and here I am thinking cog-neuro and biology). Linguistics as philology is not tethered to these wider concerns. As a result, linguistics in general will, I believe, become less at the center of general intellectual life than it was in earlier years when it was at the center of work in the nascent cognitive and cog-neuro sciences. But I could be wrong. At any rate, MHT is right to observe that Chomsky’s influence has waned within linguistics proper. I would go further. The idea that linguistics is and ought to be part of the cog-neuro sciences is, I believe, a minority position within the discipline right now. The patron saint of modern linguistics is not Chomsky, but Greenberg. This is why Chomsky has become a more marginal figure (and why MH sounds so delighted). I suspect that down the road there will be a reshuffling of the professional boundaries of the discipline, with some study of language of the Chomsky variety moving in with cog-neuro and some returning to the language departments. The days of the idea of a larger common linguistic enterprise, I believe, are probably over.


[1]I find that this is sometimes hard to open. Here is the url to paste in:
https://dlc.hypotheses.org/1269 

[2]I should add that I have a syntax textbook that puts paid to the idea that Chomsky’s basic current ideas cannot be explicated in one. That said, I assume that what MHT intends is that Chomsky’s views are not standard text book linguistics anymore. I agree with this, as you will see below.
[3]This was and is still the main method of linguistic investigation. FoLers know that I have long argued that PoS style investigations are different in kind from the comparative methods that are the standard and that when applicable they allow for a more direct view of the structure of FL. But as I have made this point before, I will avoid making it here. For current purposes, it suffices to observe that whatever the merits of PoS styles of investigation, these methods are less prevalent than the comparative method is.
[4]MHT thinks that Chomsky largely agrees with anti UG critics in “rejecting universal grammar” (1). This is a bit facile. What Chomsky rejects is that the kinds of principles we have identified as characteristic of UG are linguistically specific. By this he intends that they follow from more general principles. What he does not do, at least this is not what Ido, is reject that the principles of UG as targets of explanation. The problem with Evans and Levinson and Ibbotson and Tomasello is that their work fails to grapple with what GG has found in 60 years of research. There are a ton of non-trivial Gish facts (laws) that have been discovered. The aim is to explain these facts/laws and ignoring them or not knowing anything about them is not the same as explaining them. Chomsky “believes” that language has properties that previous work on UG ahs characterized. What he is questioning is whether theseproperties are fundamental or derived. The critics of UG that MHT cites have never addressed this question so they and Chomsky are engaged in entirely different projects. 
            Last point: MHT notes that neophytes will be confused about all of this. However, a big part of the confusion comes from people telling them that Chomsky and Evans/Levinson and Ibbotson/Tomasello are engaged in anything like the same project.
[5]Let me repeat for the record, that one can do MP and presuppose some conception of FL other than GB. IMO, most of the different “frameworks” make more or less the same claims. I will stick to GB because this is what I know best andMP indeed has targeted GB conceptions most directly.
[6]Or, more accurately, it aims to preserve most of it, just as General Relativity aimed to preserve most of Newtonian mechanics.

Tuesday, February 27, 2018

Universals; structural and substantive

Linguistic theory has a curious asymmetry, at least in syntax.  Let me explain.

Aspects distinguished two kinds of universals, structural vs substantive.  Examples of the former are commonplace: the Subjacency Principle, Principles of Binding, Cross Over effects, X’ theory with its heads, complements and specifiers; these are all structural notions that describe (and delimit) how Gs function. We have discovered a whole bunch of structural universals (and their attendant “effects”) over the last 60 years, and they form part of the very rich legacy of the GG research program. 

In contrast to all that we have learned about the structural requirements of G dependencies, we have, IMO, learned a lot less about the syntactic substances: What is a possible feature? What is a possible category? In the early days of GG it was taken for granted that syntax, like phonology, would choose its primitives (atomic elements) from a finite set of options. Binary feature theories based on the V/N distinction allowed for the familiar four basic substantive primitive categories A, N, V, and P. Functional categories were more recalcitrant to systematization, but if asked, I think it is fair to say that many a GGer could be found assuming that functional categories form a compact set from which different languages choose different options. Moreover, if one buys into the Borer-Chomsky thesis (viz. that variation lives in differences in the (functional) lexicon) and one adds a dash of GB thinking (where it is assumed that there is only a finite range of possible variation) one arrives at the conclusion that there are a finite number of functional categories that Gs choose from and that determine the (finite) range of possible variation witnessed across Gs. This, if I understand things (which I probably don’t (recall I got into syntax from philosophy not linguistics and so never took a phonology or morphology course)), is a pretty standard assumption within phonology tracing back (at least) to Sound Patterns. And it is also a pretty conventional assumption within syntax, though the number of substantive universals we find pale in comparison to the structural universals we have discovered. Indeed, were I incline to be provocative (not something I am inclined to be as you all know), I would say, that we have very few echt substantive universals (theories of possible/impossible categories/features) when compared to the many many plausible structural universals we have discovered. 

Actually one could go further, so I will. One of the major ambitions (IMO, achievements) of theoretical syntax has been the elimination of constructions as fundamental primitives. This, not surprisingly, has devalued the UG relevance of particular features (e.g. A’ features like topic, WH, or focus), the idea being that dependencies have the properties they do not in virtue of the expressions that head the constructions but because of the dependencies that they instantiate. Criterial agreement is useful descriptively but pretty idle in explanatory terms. Structure rather than substance is grammatically key. In other words, the general picture that emerged from GB and more recent minimalist theory is that G dependencies have the properties they have because of the dependencies they realize rather than the elements that enter into these dependencies.[1]

Why do I mention this? Because of a recent blog post by Martin Haspelmath (here, henceforth MH) that Terje Lohndal sent me. The post argues that to date linguists have failed to provide a convincing set of atomic “building blocks” on the basis of which Gs work their magic. MH disputes the following claim: “categories and features are natural kinds, i.e. aspects of the innate language faculty” and they form “a “toolbox” of categories that languages may use” (2-3). MH claims that there are few substantive proposals in syntax (as opposed to phonology) for such a comprehensive inventory of primitives. Moreover, MH suggests that this is not the main problem with the idea. What is? Here is MP (3-4):

To my mind, a more serious problem than the lack of comprehensive proposals is that linguistics has no clear criteria for assessing whether a feature should be assumed to be a natural kind (=part of the innate language faculty).

The typical linguistics paper considers a narrow range of phenomena from a small number of languages (often just a single language) and provides an elegant account of the phenomena, making use of some previously proposed general architectures, mechanisms and categories. It could be hoped that this method will eventually lead to convergent results…but I do not see much evidence for this over the last 50 years. 

And this failure is principled MH argues relying that it does on claims “that cannot be falsified.”

Despite the invocation of that bugbear “falsification,”[2] I found the whole discussion to be disconcertingly convincing and believe me when I tell you that I did not expect this.  MH and I do not share a common vision of what linguistics is all about. I am a big fan of the idea that FL is richly structured and contains at least some linguistically proprietary information. MP leans towards the idea that there is no FL and that whatever generalizations there might be across Gs are of the Greenberg variety.

Need I also add that whereas I love and prize Chomsky Universals, MH has little time for them and considers the cataloguing and explanation of Greenberg Universals to be the major problem on the linguist’s research agenda, universals that are best seen as tendencies and contrasts explicable “though functional adaptation.” For MH these can be traced to cognitively general biases of the Greenberg/Zipf variety. In sum, MH denies that natural languages have joints that a theory is supposed to cut or that there are “innate “natural kinds”” that give us “language-particular categories” (8-9).

So you can see my dilemma. Or maybe you don’t so let me elaborate.

I think that MH is entirely incorrect in his view of universals, but the arguments that I would present would rely on examples that are best bundled under the heading “structural universals.” The arguments that I generally present for something like a domain specific UG involve structural conditions on well-formedness like those found in the theories of Subjacency, the ECP, Binding theory, etc. The arguments I favor (which I think are strongest) involve PoS reasoning and insist that the only way to bridge the gap between PLD and the competence attained by speakers of a given G that examples in these domains illustrate requires domain specific knowledge of a certain kind.[3]
And all of these forms of argument loose traction when the issue involves features, categories and their innate status. How so?

First, unlike with the standard structural universals, I find it hard to identify the gap between impoverished input and expansive competence that is characteristic of arguments illustrated by standard structural universals. PLD is not chock full of “corrected” subjacency violations (aka, island effects) to guide the LAD in distinguishing long kosher movements from trayf ones. Thus the fact that native speakers respect islands cannot be traced to the informative nature of the PLD but rather to the structure of FL. As noted in the previous post (here), this kind of gap is where PoS reasoning lives and it is what licenses (IMO, the strongest) claims to innate knowledge. However, so far as I can tell, this gap does not obviously exist (or is not as easy to demonstrate) when it comes to supposing that such and such a feature or category is part of the basic atomic inventory of a G. Features are (often) too specific and variable combining various properties under a common logo that seem to have little to do with one another. This is most obvious for phi-features like gender and number, but it even extends to categories like V and A and N where what belongs where is often both squishy within a G and especially so across them. This is not to suggest that within a given G the categories might not make useful distinctions. However, it is not clear how well these distinctions travel among Gs. What makes for a V or N in one G might not be very useful in identifying these categories in another. Like I said at the outset, I am not expert in these matters, but the impression I have come away with after hearing these matters discussed is that the criteria for identifying features within and across languages is not particularly sharp and there is quite a bit of cross G variation. If this is so, then the particular properties that coagulate around a given feature within a given G must be acquired via experience with that that particular feature in that particular G. And if this is so, then these features differ quite a bit in their epistemological status from the structural universals that PoS arguments most effectively deploy. Thus, not only does the learner have to learn which features his G exploits, but s/he even has to learn which particular properties these features make reference to, and this makes them poor fodder for the PoS mill.

Second, our theoretical understanding of features and categories is much poorer than our understanding of structural universals. So for example, islands are no longer basic “things” in modern theory. They are the visible byproducts of deeper principles (e.g. Subjacency). From the little I can tell, this is less so for features/categories. I mentioned the feature theory underlying the substantive N,V,A,P categories (though I believe that this theory is not that well regarded anymore). However, this theory, even if correct, is very marginal nowadays within syntax. The atoms that do the syntactic heavy lifting are the functional ones, and for this we have no good theoretical unification (at least so far as I am aware). Currently, we have the functional features we have, and there is no obvious theoretical restraint to postulating more whenever the urge arises.  Indeed, so far as I can tell, there is no theoretical (and often, practical) upper bound on the number of possible primitive features and from where I sit many are postulated in an ad hoc fashion to grab a recalcitrant data point. In other words, unlike what we find with the standard bevy of structural universals, there is no obvious explanatory cost to expanding the descriptive range of the primitives, and this is too bad for it bleaches featural accounts of their potential explanatory oomph.

This, I take it, is largely what MH is criticizing, and if it is, I think I am in agreement (or more precisely, his survey of things matches my own). Where we part company is what this means. For me this means that these issues will tell us relatively little about FL and so fall outside the main object of linguistic study. For MH, this means that linguistics will shed little light on FL as there is nothing FLish about what linguistics studies. Given what I said above, we can, of course, both be right given that we are largely agreeing: if MH’s description of the study of substantive universals is correct, then the best we might be able to do is Greenberg, and Greenberg will tell us relatively little about the structure of FL. If that is the argument, I can tag along quite a long way towards MH’s conclusion. Of course, this leaves me secure in my conclusion that what we know about structural universals argues the opposite (viz. a need for linguistically specific innate structures able to bridge the easily detectable PoS gaps).

That said, let me add three caveats.

First, there is at least one apparent substantive universal that I think creates serious PoS problems; the Universal Base Hypothesis (UBH). Cinque’s work falls under this rubric as well, but the one I am thinking about is the following. All Gs are organized into three onion like layers, what Kleanthes Grohmann has elegantly dubbed “prolific domains” (see his thesis). Thus we find a thematic layer embedded into an agreement/case layer embedded into an A’/left periphery layer.  I know of no decent argument arguing against this kind of G organization. And if this is true, it raises the question of why it is true. I do not see that the class of dependencies that we find would significantly change if the onion were inversely layered (see here for some discussion). So why is it layered as it is? Note that this is a more abstract than your typical Greenberg universal as it is not a fact about the surface form of the string but the underlying hierarchical structure of the “base” phrase marker. In modern parlance, it is a fact about the selection features of the relevant functional heads (i.e. about the features (aka substance) of the primitive atoms). It does not correspond to any fact about surface order, yet it seems to be true. If it is, and I have described it correctly, then we have an interesting PoS puzzle on our hands, one that deals with the organization of Gs which likely traces back to the structure of FL/UG. I mention this because unlike many of the Greenberg universals, there is no obvious way of establishing this fact about Gs from their surface properties and hence explaining why this onion like structure exists is likely to tell us a lot about FL.

Second, it is quite possible that many Greenberg universals rest on innate foundations. This is the message I take away from the work by Culbertson & Adger (see here for some discussion). They show how some order within nominals relating Demonstratvies, Adjectives, Numerals and head Nouns are very hard to acquire within an artificial G setting. They use this to argue that their absence as Greenberg options has a basis in how such structures are learned.  It is not entirely clear that this learning bias is FL internal (it regards relating linear and hierarchical order) but it might be. At any rate, I don’t want anything I said above to preclude the possibility that some surface universals might reflect features of FL (i.e. be based on Chomsky Universals), and if they do it suggests that explaining (some) Greenberg universals might shed some light on the structure of FL.

Third, though we don’t have many good theories of features or functional heads, a lazy perusal of the facts suggest that not just anything can be a G feature or a G head. We find phi features all over the place. Among the phi features we find that person, number and gender are ubiquitous. But if anything goes why don’t we find more obviously communicatively and biologically useful features (e.g. the +/- edible feature, or the +/- predator feature, or the +/- ready for sex feature or…). We could imagine all sorts of biologically or communicatively useful features that it would be nice for language to express structurally that we just do not find. And the ones that we do find, seem from a communicative or biological point of view to often be idle (gender (and, IMO, case) being the poster child for this). This suggests that whatever underlies the selection of features we tend to see (again and again) and those that we never see is more principled than anything goes. And if that is correct, then what basis could there be for this other than some linguistically innate proclivity to press these features as opposed to those into linguistic service.  Confession: I do not take this argument to be very strong, but it seems obvious that the range of features we find in Gs that do grammatical service is pretty small, and it is fair to ask why this is so and why many other conceivable features that we could imagine would be useful are nonetheless absent.

Let me reiterate a point about my shortcomings I made at the outset. I really don’t know much about features/categories and their uniform and variable properties. It is entirely possible that I have underestimated what GG currently knows about these matters. If so, I trust the comments section will set things straight. Until that happens, however, from where I sit I think that MH has a point concerning how features and categories operate theoretically and that this is worrisome. That we draw opposite conclusions from these observations is of less moment than that we evaluate the current state of play in roughly the same way.



[1] This is the main theme of On Wh Movement and I believe what drives the unification behind Merge based accounts of FL.
[2] Falsification is not a particularly good criterion of scientific adequacy, as I’ve argued many times before. It is usually used to cudgel positions one dislikes rather than push understanding forward. That said, in MH, invoking the F word does not really play much more than an ornamental role. There are serious criticisms that come into play.
[3] I abstract here from minimalist considerations which tries to delimit the domain specificity of the requisite assumptions. As you all know, I tend to think that we can reduce much of GB to minimalist principles. The degree to which this hope is not in vain, to that degree the domain specificity can be circumscribed to whatever it is that minimalism needs to unify the apparently very different principles of GB and the generalizations that follow from them.

Monday, October 24, 2016

Universal tendencies

Let’s say we find two languages displaying a common pattern, or two languages converging towards a common pattern, or even all languages doing the same. How should we explain this? Stephen Anderson (here, and discussed by Haspelmath here) notes that if you are a GGer there are three available options: (i) the nature of the input, (ii) the learning theory and (iii) the cognitive limits of the LAD (be they linguistically specific or domain general). Note that (ii) will include (iii) as a subpart and will have to reflect the properties of (i) but will also include all sorts other features (cognitive control, structure of memory and attention, the number of options the LAD considers at one time etc.). These, as Anderson notes, are the only options available to a GGer for s/he takes G change to reflect the changing distribution of Gs in the heads of a population of speakers. Or, to put this more provocatively: languages don't exist apart from their incarnation in speakers’ minds/brains. And given this, all diachronic “laws” (laws that explain how languages or Gs change over time) must reflect the cognitive, linguistic or computational properties of human minds/brains.

This said, Haspelmath (H) observes (here and here) (correctly in my view) that GGers have long “preferred purely synchronic ways of explaining typological distributions,” and by this he means explanations that allude to properties of the “innate Language Faculty” (see here for discussion). In other words, GGers like to think that typological differences reflect intrinsic properties of FL/UG and that studying patterns of variation will hence shed light on its properties. I have voiced some skepticism concerning this “hence” here. In what follows I would like to comment on H’s remarks on a similar topic. However, before I get into details I should note that we might not be talking about the same thing. Here’s what I mean.

The way I understand it, FL/UG bears on properties of Gs not on properties of their outputs. Hence, when I look at typology I am asking how variation in typologies and historical change might explain changes in Gs. Of course, I use outputs of these Gs to try to discern the properties of the underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar (identical?) outputs from different congeries of G rules, operations and filters. In effect, whereas changing surface patterns do signal some change in the underlying Gs, similarity of surface patterns need not. Moreover, given our current accounts there is (sadly) too many roads to Rome, thus the fact that two Gs generate similar outputs (or have moved towards similar outputs from different Gish starting points) does not imply that they must be doing so in the same way. Maybe they are and maybe not. It really all depends.

Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?

Recall, we need to keep our questions clear. Say that we have identified an actual NUT (i.e. we have compelling evidence that certain kinds of G changes are “preferred”). If we have this and we find another G changing in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it. Well, in part: we have identified the kind of thing it is even if we do not yet know why these types of things exist.  An analogy: I have a pencil in my hand. I open it. The pencil falls. Why? Gravitational attraction. I then find out that the same thing happens when I have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and any other school supply at hand. I conclude that these falls are all instances of the same causal power (i.e. gravity). Have I explained why when I pick up a thumbtack and let it loose and it too falls that it falls because of gravity? Well, up to a point. A small point IMO, but a point nonetheless.  Of course we want to know how Gravity does this, what exactly it does when it does it and even why it does is the way that it does, but classifying phenomena into various explanatory pots is often a vital step in setting up the next step of the investigation (viz. identifying and explaining the properties of the alleged underlying “force”).

This said, I agree that the explanation is pretty lame if left like this. Why did X fall when I dropped it? Because everything falls when you drop it. Satisfied? I hope not.

Sadly, from where I sit, many explanations of typological difference or diachronic change have this flavor. In GG we often identify a parameter that has switched value and (more rarely) some PLD that might have led to the switch. This is devilishly hard to do right and I am not dissing this kind of work. However, it is often very unsatisfying given how easy it is to postulate parameters for any observable difference. Moreover, very few proposals actually do the hard work of sketching the presupposed learning theory that would drive the change or looking at the distribution of PLD that the learning theory would evaluate in making the change. To get beyond the weak explanations noted above, we need more robust accounts of the nature of the learning mechanisms and the data that was input to it (PLD) that led to the change.[2] Absent this, we do have an explanation of a very weak sort.

Would H agree? I think so, but I am not absolutely sure of this. I think that H runs together things that I would keep separate. For example: H considers Anderson’s view that many synchronic features of a G are best seen as remnants of earlier patterns. In other words, what we see in particular Gs might be reflections of “the shaping effects of history” and “not because the nature of the Language Faculty requires it” (H quoting Anderson: p. 2). H rejects this for the following reason: he doesn’t see “how the historical developments can have “shaping effects” if they are “contingent” (p. 2). But why not?  What does the fact that something is contingent have to do with whether it can be systematically causal? 1066 and all that was contingent, yet its effects on “English” Gs has been long lasting. There is no reason to think that contingent events cannot have long lasting shaping effects.

Nor, so far as I can tell, is there reason to think that this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies might not explain “universal tendencies.” Here’s what I mean.

Let’s for the sake of argument assume that there are around 50 different parameters (and this number is surely small). This gives a space of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different languages out there (and I assume, maybe incorrectly, Gs) is on the order of 7,000, at least that’s the number I hear bandied about among typologists. This number is miniscule. It covers .0005% of the possible space. It is not inconceivable that languages in this part of the space have many properties in common purely because they are all in the same part of the space. These common properties would be contingent in a UG sense if we assumed that we only accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these properties. It is even possible that it is hard to get to any other of the G possibilities given that we are in this region.  On this sort of account, there might be many apparent universals that have no deep cognitive grounding and are nonetheless pervasive. Don’t get me wrong, I am not saying these exist, only that we really have no knock down reason for thinking they do not.  And if something like this could be true, then the fact that some property did or didn’t occur in every G could be attributed to the nature of the kind of PLD our part of the G space makes available (or how this kind of PLD interacts with the learning algorithm). This would fit with Anderson’s view: contingent yet systematic and attributable to the properties of the PLD plus learning theory.

I don’t think that H (nor most linguists) would find this possibility compelling. If something is absent from 7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well maybe not. My only claim is that the basis for this confidence is not particularly clear. And thinking through this scenario makes it clear that gaps in the existing language patterns/Gs are (at best) suggestive about FL/UG properties rather than strongly dispositive.  It could be our ambient PLD that is responsible. We need to see the reasoning. Culbertson and Adger provide a nice model for how this might be done (see here).

One last point: what makes PoS arguments powerful is that they are not subject to this kind of sampling skepticism. PoS arguments really do, if successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs abstract away from PLD altogether and so remove this as a causal source of systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of course, the two kinds of investigation can be combined However, it is worth keeping in mind that typological investigations will always suffer from the kind of sampling problem noted above and will thus be less direct probes of FL/UG than will PoS considerations. This suggests, IMO, that it would be very good practice to supplement typologically based conclusions with PoS style arguments.[3] Even better would be explicit learning models, though these will be far more demanding given how hard it likely is to settle on what the PLD is for any historical change.[4]

I found H’s discussion of these matters to be interesting and provocative. I disagree with many things that H says (he really is focused on languages rather than Gs). Nonetheless, his discussion can be translated well enough into my own favored terms to be worth thinking about. Take a look.



[1] I say ‘apparent’ for I know very little of this literature though I am willing to assume H is correct that these exist for the sake of argument.
[2] Which does not mean that we have nice models of what better accounts might look like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William Sakas, Charles Yang, a.o., have provided excellent models of what such explanations would look like.
[3] Again a nice example of this is Culbertson and Adger’s work discussed  here. It develops an artificial G argument (meatier than a simple PoS argument) to more firmly ground a typological conclusion.

[4] Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example, shows.