Friday, March 16, 2018

Lotsa luck, please

One of things that makes doing research (and by this I mean original research) hard is that there is no guarantee of success. There really are no good rules for doing it. More often than not, it’s unclear what the question being asked should be, let alone what it actually is. It’s similarly unclear what kinds of answers would serve to advance said partially formulated question, nor what kinds of data would offer the best kinds of evidence. The research game is played while the rules are being made, remade and remade again and even if done well there is no reason to think that the whole damn thing might not collapse or spin out of control. There’s no insurance one can buy to protect against this kind of failure. That’s just the way it is. Put more concisely, there is no scientific method wherein you press a research button, work hard, and reap the rewards.

But why not? Why can’t we automate the whole process? Why do we fail so often? Well, interestingly, people have been thinking about this question and a recent Sci Am blog post discusses some recent research on the topic (see here). It seems that one reason things are so hard is that a lot of the process is driven by luck, and this, again interestingly, has important implications.

First, the claims: science likes to think of itself as a meritocracy, perhaps the paradigmatic meritocracy.[1] Scientists are judged by their work, the successful ones being so largely because they are smarter, more disciplined, more careful, more creative than others. Research success leads to increased resources to fund further success and in the best of all worlds, resources flow to those with proven track records.

Rewards enhance this perception and promote a hero-conception of discovery in which a brave person battles against the forces of scientific darkness and by originality, grit and brains to burn triumphs over the forces of darkness and ignorance. Examples of such include the average Nobel prize winner. In this regard, such prizes are considered as significantly different from lotteries. Their rewards are deserved because earned, and not the result of dumb luck. The Sci Am piece asks whether Nobel prizes and lotteries are not more similar than generally believed. And if they are, what does this do to the conception of merit and rewards to the deserving?

There is one more point to make: this meritocracy is not only taken to be right and proper but is also understood to be the best way to advance the scientific search for truth. Merit gains rewards which advances the common project. But is this actually the case?

Here’s what I mean. What if it turned out that luck plays a VERY large role in success?  What if success in science (and in life) is not a matter of the best and brightest and hardest working gaining as a result of their smarts, creativity and perseverance, but but is more a product of being in the right place at the right time? What, in other words, should happen if we came to appreciate that luck plays an inordinately large role in success? Were this so, and the blog post linked to cites work arguing for this conclusion, then the idea that science should be run as a meritocracy would require some rethinking.

How good are the arguments proffered? Well, not bad, but not dispositive either. They consist of two kinds of work: some modeling (largely of the ‘toy’ variety) and some review of empirical work that argues the models are pointing in the right direction. I leave it to you to evaluate the results. FWIW, IMO, the argument is pretty good and, at the very least, goes some way towards noting something that I have long considered obvious: that lots of success, be it academic or otherwise, is due to luck. As my mother used to say: “it’s better to be lucky than smart.” Apparently, the models bear her out.

What follows from this if it is correct? Well, the biggest implications are for those activities where we reward people based on their track records (e.g. promotion, tenure and funding). In what follows I want to avoid discussing promotion and tenure and concentrate on funding for the papers note some interesting features of funding mechanisms that avoid the meritocratic route. In particular, it seems that the most efficient way to fund research, the one that gets most bang for the buck, targets “diversity” rather than “excellence” (the latter term is so abused nowadays that I assume that shortly it will be a synonym for BS).

For example, one study of over 1200 Quebec researchers over 15 years concludes: “both in terms of quantity of papers produced and of their scientific impact, the concentration of research funding in the hands of a so-called ‘elite’ of researchers produces diminishing marginal returns” (5). Indeed, the most efficient way to distribute funds according to the models (and the empirical studies back this up) is equally and consistently over a lifetime of research (6):

…the best funding strategy of them all was one where an equal number of funding was distributed to everyone. Distributing funds at a rate of 1 unit every five years resulted in 60% of the most talented individuals having a greater than average level of success, and distributing funds at a rate of 5 units every five years resulted in 100% of the most talented individuals having an impact! This suggests that if a funding agency or government has more money available to distribute, they'd be wise to use that extra money to distribute money to everyone, rather than to only a select few. As the researchers conclude,

"[I]f the goal is to reward the most talented person (thus increasing their final level of success), it is much more convenient to distribute periodically (even small) equal amounts of capital to all individuals rather than to give a greater capital only to a small percentage of them, selected through their level of success - already reached - at the moment of the distribution."

This is what one would expect if luck (“serendipity and chance” (5)) is the dominant factor in scientific breakthrough.

So, does this mean it is only luck? No, clearly, other things matter. But luck plays an outsized role. Where talent etc. comes in is readying one to exploit the luck that comes one’s way (a prepared mind is a lucky one?). The post ends with the following reasonable observation based on the studies it reviews:

The results of this elucidating simulation, which dovetail with a growing number of studies based on real-world data, strongly suggest that luck and opportunity play an underappreciated role in determining the final level of individual success. As the researchers point out, since rewards and resources are usually given to those who are already highly rewarded, this often causes a lack of opportunities for those who are most talented (i.e., have the greatest potential to actually benefit from the resources), and it doesn't take into account the important role of luck, which can emerge spontaneously throughout the creative process. The researchers argue that the following factors are all important in giving people more chances of success: a stimulating environment rich in opportunities, a good education, intensive training, and an efficient strategy for the distribution of funds and resources. They argue that at the macro-level of analysis, any policy that can influence these factors will result in greater collective progress and innovation for society (not to mention immense self-actualization of any particular individual).

So, there may be a good reason for why research feels so precarious. It is. It requires luck to succeed, lots of luck, lots of consistent luck. And if this is correct, it suggests that the winner take all strategies that funding agents tend to favor is likely quite counterproductive for it relies on picking winners which is very hard to do if the distribution of winners is largely a matter of luck.

That said, I doubt that things will change very soon. First, in an era of big science, big grants are needed and if there is a money shortage, then the only way to have big grants is to eliminate little ones. Second, there is an internal dynamic. Winners like to think that their success is due to their own efforts. That’s the charm of meritocracy. And as winners tend to make the rules don’t expect the promotion of “excellence” (and rewarding it) to end anytime soon, even if it would make the scientific life a whole lot better.

Last point: a while ago FoL discussed an interesting interview with the biologist Sydney Brenner (here). It generated a lively discussion in the comments that bears on the above. The point that Brenner made is that science as practiced today would have stifled the breakthrough research that was carried on in his youth. Some noted a certain kind of nostalgia for a bygone era in Brenner’s remarks, a period with its own substantial downsides.  This is likely correct. However, in light of the “luck is critical” thesis, Brenner’s point might have been based on the fact that in his day funding was more widely spread out among the relevant players and so it was possible for more people to “get” lucky. The problem then with the current state of play is not merely the insufficient level of funding, but the distribution of that funding across potential recipients.  In earlier days, the money flowed to the bulk of the research community. Nowadays it does not. And if luck matters, then the spread matters too. More pointedly, if luck matters, then rewarding the successful is a bad strategy.

[1] Rewards enhance this perception and promote a hero-conception of discovery in which a brave person battles against the forces of scientific darkness and by originality, grit and brains to burn triumphs over the forces of darkness and ignorance. Examples of such include the average Nobel prize winner. In this regard, such prizes are considered as significantly different from lotteries. Their rewards are deserved because earned, and not the result of dumb luck. The Sci Am piece asks whether Nobel prizes and lotteries are not more similar than generally believed. And if they are, what does this do to the conception of merit and rewards to the deserving?

Tuesday, March 6, 2018

Evo Lang; a comment on logical distance

In a previous post, I outlined the three kinds of gaps that fuel PoS arguments. I also noted that one of the gaps (the one that focuses on utterances as imperfect exemplars of sentences in that they are subject to the exigencies that afflict (enhance?) all performances) was less serious than the other two in that it can likely be bridged using generic statistical techniques that can be applied to “raw” data to regiment it, scrub it, and normalize it. Despite its relative marginality for PoS claims, this kind of data degeneracy is often the focus of lots of empirical investigation claiming to refute the supposition that it is particularly poor, or misleading, or incomplete. Motherese, it is often retorted, is actually mostly well-formed, smoothly uttered, and even helpful intonationally. This kind of retort is (sometimes tacitly, sometimes not) used to support the claim that there is no real PoS problem because the data is quite clean and therefore the degeneracy problem is not serious and given that the data is good and clean, wholesome even, there is no reason to doubt that it is sufficient to guide an LAD to its G. What the previous post argued is that even if correct (which, frankly I doubt, but let’s be concessive[1]) it is completely irrelevant for there are two other gaps that have little to do with the quality of the data and this is where the real power of PoS arguments lie.

That’s what the earlier post argued at length so why am I regurgitating the points again here? Not to reiterate the main claim (though repetition is the soul of insight (or at least belief fixation)), but to observe that, in a curious sense, providing a thorough catalogue of problems for E(mpiricist) approaches to G acquisition serves to obscure the most important part of the argument. How so? It provides critics of the PoS a weak supposition on which to concentrate its fire which, if even partially successful (again, I am skeptical, but…) serves to move the argumentative focus away from the strong arguments (having to do with data deficiency, not degeneracy) and allow for a very premature declaration that there is no PoS problem at all. So, being thorough and exhaustive drops bread crumbs that Eish Hansel and Gretels eagerly gather thereby allowing them to get lost in a forest of irrelevancies.[2]  And the reason I mention this is that the same kind of poor argumentative behavior infects many Evo Lang discussions, which is what I want to concentrate on today by discussing a particularly obtuse piece that appeared in Aeon penned, by you guessed it, Dan Everett, entitled Did Homo Erectus Speak (henceforth DHES (here)).[3]

The goal of DHES is to argue for the Continuity Thesis. This is roughly the idea that current human linguistic facility is qualitatively identical to what our ancestors (and indeed other animals) have. So what we have is just like what they have but more so. Here is DHES (p.2):

…the ‘leap’ to language was little more than a long series of baby steps, requiring no mutations, nor any complex grammar. In fact, the language of erectus would have been every bit as much a ‘real language’ as any modern language.

The main argument for this conclusion is that Homo Erectus (Erectus (E) to friends and family) already spoke a language largely like ours and thus, our linguistic capacity has been gradually evolving for millions of years. Here is DHES (p. 2):

To discover the answers to these questions, we need to travel back in time at least 1.9 million years ago to the birth of Homo erectus, as they emerged from the ancient process of primate evolution. Erectus had nearly double the brain size of any previous hominim, walked habitually upright, were superb hunters, travelled the world, and sailed to ocean islands. And somewhere along the way they got language. Yes, erectus. Not Neanderthals. Not sapiens. And if erectus invented language, this means that Neanderthals, born more than a million years later, entered a world already linguistic.
Likewise, our species would have emerged into a world that already had language…
And, consequently, there is little reason to think that there is anything linguistically special about us. Our capacities are identical to Es give and take a little (very little). That’s the claim DHES advances, based on the “fact” (that DHES concedes is not widely accepted in the paleoanthropology community (p.2))[4] that E’s artifacts (“their settlements, their art, their symbols, their sailing ability, and their tools) all point to something like “an animal that can communicate via symbols.” Or, as DHES puts it: “a linguistic animal” (p. 6).

So, if E was a symbol manipulator (witness the artifacts) and was able to “transfer information by symbols” (5), then E was linguistic, i.e. graced with the same FL as us and there is no reason to assume that

…humans possess special cognitive abilities absent from the brains of all other creatures or whether, more simply, humans have language because they are smarter than other creatures (whether through higher densities of neurons, or other advantages of brain organisation). (p.6)

So what distinguishes us from other homos (and even other animals for the logic deployed leads here) is a bigger brain that is qualitatively the same as that of our ancestors but bigger and bigger gives you “language.”

Before examining the argument, you can see why DHES emphasizes and argues for E having language and the long time period separating E from us. The secret sauce of gradualist explanations is long expanses of time in which little changes can add up (see here for the definitive account). This is why DHES emphasizes the millions of years theme. The supposition seems to be that the only problem with a gradualist account of the evolution of the human FL is that it arose relatively recently (roughly 100kya) and that there is not enough time for natural selection to work its magic. So, the thinking seems to go, if DHES can show that this supposition is incorrect it implies that the continuity thesis does apply to human FL and the fact that humans speak as we do (in particular, have the kinds of Gs that we do) requires no novel cognitive architecture. It’s linguistic facility all the way down, with bigger brains adding more of the same doohickies and thingamabobs we find in smaller ones leading to an “apparent” qualitative (but in reality, merely a quantitative) change in capacity.

Now, as you can imagine, this is a very bad argument, though I concede that people like me (and perhaps Chomsky) have invited this kind of response. Chomsky has pointed out (following a pretty impressive bunch of people who think about this topic for a living (e.g. Tattersall)) that the indirect evidence for language is relatively recent. If one measures things using cultural artifacts, then the explosion of these around 50kya (rather than a handful of contentious ones further back) seems to indicate that something significant happened rather recently (not millions of years ago). If one takes such cultural artifacts to piggy back on our linguistic faculties and one takes these to prominently include the capacity to acquire Gs that generate an unbounded number of hierarchically structured interpretable objects, then one has indirect evidence for something like our kinds of Gs (Merge based ones) arising in the (relatively) recent past. If.

As I’ve noted before, this is a very indirect kind of argument for Merge, and it is not clear how much cultural artifacts implicate Merge. It is not unreasonable to suppose that the thing that goosed culture (maybe given its significance we should spell it Kultur!)[5] was hooking the system up with externalization thereby facilitating communication and the gradual build-up of retained and retainable knowledge. Who knows really. All of this is very speculative (i.e. it’s not as if there is a transparent logical entailment from elaborate burial rituals or cave paintings to hierarchical recursion). However, it is also not that important for even were this false the Evo Lang problem would remain fundamentally unaltered. Let me explain.

The hard problem for GGers is explaining how unbounded hierarchical recursion could have arisen (actually, how it actually arose is the problem, but right now would declare a small victory if we could redeem the modal). The assumption (and this is based on empirical evidence) is that there is nothing quite like it anywhere else in animal cognition. This is not to demean the powers of other animals. They really are amazing in many ways. But so far as we can tell there is nothing formally analogous to the kinds of operations and structures we regularly find in human Gs and there is no reason to believe that any other animal either uses or acquires systems with these properties. If this is true (and it is, really!), then one major Evo Lang problem is explaining how systems with this formal property could have biologically arisen in humans given that nothing like it was there before. The problem then is not a matter of temporal distance, but of logical distance. If the above correctly describes the current state of play and it is true that other animals don’t have cognitive capacities with these formal properties then explaining how these properties arose and the mental powers need to deploy them and acquire them requires not just more but also different. So what was the different and how did it happen? That’s the GG Evo Lang question.

Several comments: does this mean that this is the only EvoLang question? No, there are others and I will return to some discussed in DHES. Does this make the temporal question irrelevant? Yes and No.
Yes in the sense that the problem is more or less the same regardless of the time it took to arise. Why? Because the problem is how to get from non-recursive structural hierarchy to recursive structural hierarchy and no amount of non­-recursive hierarchy or recursive flat structure or non-recursive flat structure will get you there by adding more of it. How hierarchy arose and how, furthermore, embedded hierarchy within hierarchy ad nauseum arose is not explained by noting finite examples of hierarchy or unbounded examples of iterated beads on a string concatenations. Let me be clear: there is nothing wrong in pointing out that either of these exist in the mental repertoires of other animals, but this is not enough. There needs to be a story getting you from these non recursive hierarchical systems to the qualitatively different recursive hierarchical one we have now. No story, no proposed additional mechanism, no Evo Lang account.
And the No? Well, if FL arose (very) recently, then there would be no temptation to look for a gradualist account. So were FL of very recent vintage it would serve to block the bad (Eish) impulse to always look to the shaping effects of the environment for an explanation of anything.
Think of an analogy in the domain of language acquisition. Imagine that kids popped out speaking their native language. Does anyone think that we would be expecting Eish accounts of the process? Nope. Does anyone look for the shaping effects of the environment in explaining why kids are (normally) born with two legs and two arms, one heart, two kidneys etc.? Analogous impulses would dominate were LADs to pop out speaking their mother’s native tongue (actually I am not sure this is so, but I would hope it was). Ditto with a short evo time span and gradualist accounts. A long time span is a pre-requisite for a continuity story to even make sense. However, when you think about the issue just a little bit, even a very long time span does not bridge the logical gap, and that is the one that needs traversing.
Curiously, the problem DHES presupposes away is one that we have seen before in a slightly different venue. Piaget was a cognitive gradualist, famously claiming that logical thinking in children gradually developed in them. Jerry Fodor (in the Royaumont volume, sadly under-read nowadays) pointed out that gradually developing richer logical competence is impossible. There is no way of getting from the propositional to the predicate calculus without presupposing the resources of the latter as a precondition. This is a logical bridge too far, and it cannot be gradually navigated. The same holds true with the recursive hierarchy problem. It is not the sort of thing you get by adding more non recursive non hierarchical systems of representation. You need to add something else. The Evo Lang question is what.
DHES does not say. It does make many other points. It points to another important property of natural language, namely that its atoms are symbols and that this allows for “displacement” (i.e. not stimulus bound reference). DHES further insists (p.6) that

Symbols, not grammar, are thus the sine qua non of language. They alone guarantee communication that is displaced, that is shared by an entire community of speakers, that can be transmitted between speakers and between generations, and that can represent either abstract or concrete ideas or things.
Maybe DHES is right. As I’ve noted before, Chomsky agrees that there is something interestingly different about the atoms of human language wrt their semantic properties. But even were this is so (which it likely is), it doesn’t answer the GG question of how the kind of recursive hierarchy we find in human Gs (and that humans with FLs can all acquire) arose. In other words, even if we agree with DHES concerning the importance of atomic symbols as one key feature of human language (which, as I’ve noted many times before, Chomsky has highlighted often in the past) unless DHES shows us how symbolic terminals leads to recursive hierarchy, we have not progressed on the GG question.[6] And though the GG question is not the only question, it is one important one given that one of the distinctive features of human language is that it is G based (see here for recognition by some of the biologically informed that being G based is indeed a critical feature of human language).
So the big problem with DHES is its failure to recognize the logical problem the GG facts present. I say this because it appears to suppose that one explains how the capacity of interest arose by showing a chart that tracks its progression. The chart is on p. 7 and has arrows pointing from one kind of representational format to another. So, indices begat icons which begat symbols which via duality of patterning begat compositionality, which begat linearity, which begat hierarchy, which finally begat recursion. All very impressive, but for the fact that DHES says nothing about how all this begetting took place (DHES leaves out all the salacious prurient detail). How exactly does linearity begat hierarchy and hierarchy begat recursion? All DHES tells us is that it does. Or more accurately (pp. 9-10. I have quoted the parts where “the miracle happens” as the old New Yorker cartoon put it).
Once you have a set of symbols and a linear order agreed upon by a culture, you have a language.
That is really all there is to it, though of course most languages become more complex over time….
All of the embellishments of grammar such as hierarchical structures, recursion, relative clauses and other complex constructions are secondary, based on a slot-filler arrangement of and composition of symbols, in conjunction with cultural conventions and general principles of efficient computation…

Thus, once cultures and symbols appear, grammar is on the way...
So DHES does nothing to advance the GG question, except avoid saying anything about it while appearing to address it.
Before ending, let me admit that it might be that I am somewhat unfair to DHES. There are times when it appears that its interest is not engaging the Evo Lang question as GG poses it but in addressing another question: does a recursively hierarchical G have more expressive power than one without such a G? Note, that this is not the GG question and, to my knowledge, this has not been a question that has occupied my GG community. However, if this is the question that interests DHES, then it seems either irrelevant to, or problematic for, the standard Evo Lang question of how our G systems and the capacity to acquire them arose. Say that the two kinds of Gs are not expressively the same: how does this help answer the question of how our FL arose? Say they have the same expressive power, then why did we evolve an FL that could acquire Gs with unbounded hierarchy even though, by assumption, these add nothing to the “expressive power” of language? Again, it really does not advance the Evo Lang question of interest (which, to repeat, does not mean that this is the only question of interest).
Let me end here. DHES is a very messy piece and I think I know why. It really wants to argue that there is no Evo Lang problem of the kind GG (actually Herr Chomsky) poses because what we see all rose gradually over a very long period of time in small incremental steps. The problem is that DHES nowhere suggests how these steps could have been taken and how they could have added up to what we now have. How does one get from flat systems to hierarchical ones to recursively hierarchical ones? How does one get from strongly referential terminals (Chomsky’s observations concerning animal communication systems) to those that allow for pretty radical displacement? What mental changes are required to allow this kind of symbol or representational format to arise? What kind of mental changes are required to get from linear to unboundedly hierarchical? These are hard questions, and maybe we will never be able to answer them. But better to fail to answer a real question then fail to see what the question is.

[1] I hereby preemptively apologize to Jerry Fodor who wisely counciled against ever conceding anything even for the sake of argument. I am sinning here, I know.
[2] And yes I know that they did not pick up their own crumbs and get themselves lost, but the metaphor got away from me.
[3] His work really is the gift that keeps on giving, your one stop shopping venue for largely irrelevant arguments intended to buttress insupportable arguments. I am starting to think that DE is doing this all as public service for the enlightenment of the young. Master the non-sequiturs in the core DE oeuvre and you’ve seen through all (or at least many of) the non-sequiturs you are likely to encounter in the vast irrelevant anti- Chomsky literature. Like I said, an invaluable resource (sorta like the role that Piaget’s work played in early developmental psych work. Work through the many failures of logic there and you end up with modern developmental cognition of the Carey-Spelke variety (i.e. the good stuff)).
[4] Though I am not suggesting that this should be held against the view. Experts have been known to be wrong before, and for all I know E had some linguistic skills. That is not the issue, as I show below. The issue is how similar E’s capacities were to our own.
[5] The ‘K’ is also in honor of the fact that I’m posting from Germany where I am scheduled to give a talk that I stupidly agreed to give months ago sure that I would never have to do this as I would be lucky enough to be hit by a bus but my luck has turned and here I am in Stuttgart. So Kultur it is!
[6] DHES might be making a point that I am sympathetic to: that if one is interested in how Kultur arose, then the fact that Gs are recursively hierarchical is less important than that they deploy symbols closely semantically tied to the 4Fs. In other words, for communicative purposes, displacement might be critical and for Kultur communication. might be. This does not eliminate the GG question concerning recursive hierarchy, but it suggests that it contends that it is not the sine qua non of Kultur. I don’t know if this is right, but it might be for all I know. Like I’ve said before, given a simple ‘N V N’ template and 25,000 Ns and 15,000 Vs allows you to say a hell of a lot of things, maybe enough to sail the oceans and leave behind fancy artifacts.

Tuesday, February 27, 2018

Universals; structural and substantive

Linguistic theory has a curious asymmetry, at least in syntax.  Let me explain.

Aspects distinguished two kinds of universals, structural vs substantive.  Examples of the former are commonplace: the Subjacency Principle, Principles of Binding, Cross Over effects, X’ theory with its heads, complements and specifiers; these are all structural notions that describe (and delimit) how Gs function. We have discovered a whole bunch of structural universals (and their attendant “effects”) over the last 60 years, and they form part of the very rich legacy of the GG research program. 

In contrast to all that we have learned about the structural requirements of G dependencies, we have, IMO, learned a lot less about the syntactic substances: What is a possible feature? What is a possible category? In the early days of GG it was taken for granted that syntax, like phonology, would choose its primitives (atomic elements) from a finite set of options. Binary feature theories based on the V/N distinction allowed for the familiar four basic substantive primitive categories A, N, V, and P. Functional categories were more recalcitrant to systematization, but if asked, I think it is fair to say that many a GGer could be found assuming that functional categories form a compact set from which different languages choose different options. Moreover, if one buys into the Borer-Chomsky thesis (viz. that variation lives in differences in the (functional) lexicon) and one adds a dash of GB thinking (where it is assumed that there is only a finite range of possible variation) one arrives at the conclusion that there are a finite number of functional categories that Gs choose from and that determine the (finite) range of possible variation witnessed across Gs. This, if I understand things (which I probably don’t (recall I got into syntax from philosophy not linguistics and so never took a phonology or morphology course)), is a pretty standard assumption within phonology tracing back (at least) to Sound Patterns. And it is also a pretty conventional assumption within syntax, though the number of substantive universals we find pale in comparison to the structural universals we have discovered. Indeed, were I incline to be provocative (not something I am inclined to be as you all know), I would say, that we have very few echt substantive universals (theories of possible/impossible categories/features) when compared to the many many plausible structural universals we have discovered. 

Actually one could go further, so I will. One of the major ambitions (IMO, achievements) of theoretical syntax has been the elimination of constructions as fundamental primitives. This, not surprisingly, has devalued the UG relevance of particular features (e.g. A’ features like topic, WH, or focus), the idea being that dependencies have the properties they do not in virtue of the expressions that head the constructions but because of the dependencies that they instantiate. Criterial agreement is useful descriptively but pretty idle in explanatory terms. Structure rather than substance is grammatically key. In other words, the general picture that emerged from GB and more recent minimalist theory is that G dependencies have the properties they have because of the dependencies they realize rather than the elements that enter into these dependencies.[1]

Why do I mention this? Because of a recent blog post by Martin Haspelmath (here, henceforth MH) that Terje Lohndal sent me. The post argues that to date linguists have failed to provide a convincing set of atomic “building blocks” on the basis of which Gs work their magic. MH disputes the following claim: “categories and features are natural kinds, i.e. aspects of the innate language faculty” and they form “a “toolbox” of categories that languages may use” (2-3). MH claims that there are few substantive proposals in syntax (as opposed to phonology) for such a comprehensive inventory of primitives. Moreover, MH suggests that this is not the main problem with the idea. What is? Here is MP (3-4):

To my mind, a more serious problem than the lack of comprehensive proposals is that linguistics has no clear criteria for assessing whether a feature should be assumed to be a natural kind (=part of the innate language faculty).

The typical linguistics paper considers a narrow range of phenomena from a small number of languages (often just a single language) and provides an elegant account of the phenomena, making use of some previously proposed general architectures, mechanisms and categories. It could be hoped that this method will eventually lead to convergent results…but I do not see much evidence for this over the last 50 years. 

And this failure is principled MH argues relying that it does on claims “that cannot be falsified.”

Despite the invocation of that bugbear “falsification,”[2] I found the whole discussion to be disconcertingly convincing and believe me when I tell you that I did not expect this.  MH and I do not share a common vision of what linguistics is all about. I am a big fan of the idea that FL is richly structured and contains at least some linguistically proprietary information. MP leans towards the idea that there is no FL and that whatever generalizations there might be across Gs are of the Greenberg variety.

Need I also add that whereas I love and prize Chomsky Universals, MH has little time for them and considers the cataloguing and explanation of Greenberg Universals to be the major problem on the linguist’s research agenda, universals that are best seen as tendencies and contrasts explicable “though functional adaptation.” For MH these can be traced to cognitively general biases of the Greenberg/Zipf variety. In sum, MH denies that natural languages have joints that a theory is supposed to cut or that there are “innate “natural kinds”” that give us “language-particular categories” (8-9).

So you can see my dilemma. Or maybe you don’t so let me elaborate.

I think that MH is entirely incorrect in his view of universals, but the arguments that I would present would rely on examples that are best bundled under the heading “structural universals.” The arguments that I generally present for something like a domain specific UG involve structural conditions on well-formedness like those found in the theories of Subjacency, the ECP, Binding theory, etc. The arguments I favor (which I think are strongest) involve PoS reasoning and insist that the only way to bridge the gap between PLD and the competence attained by speakers of a given G that examples in these domains illustrate requires domain specific knowledge of a certain kind.[3]
And all of these forms of argument loose traction when the issue involves features, categories and their innate status. How so?

First, unlike with the standard structural universals, I find it hard to identify the gap between impoverished input and expansive competence that is characteristic of arguments illustrated by standard structural universals. PLD is not chock full of “corrected” subjacency violations (aka, island effects) to guide the LAD in distinguishing long kosher movements from trayf ones. Thus the fact that native speakers respect islands cannot be traced to the informative nature of the PLD but rather to the structure of FL. As noted in the previous post (here), this kind of gap is where PoS reasoning lives and it is what licenses (IMO, the strongest) claims to innate knowledge. However, so far as I can tell, this gap does not obviously exist (or is not as easy to demonstrate) when it comes to supposing that such and such a feature or category is part of the basic atomic inventory of a G. Features are (often) too specific and variable combining various properties under a common logo that seem to have little to do with one another. This is most obvious for phi-features like gender and number, but it even extends to categories like V and A and N where what belongs where is often both squishy within a G and especially so across them. This is not to suggest that within a given G the categories might not make useful distinctions. However, it is not clear how well these distinctions travel among Gs. What makes for a V or N in one G might not be very useful in identifying these categories in another. Like I said at the outset, I am not expert in these matters, but the impression I have come away with after hearing these matters discussed is that the criteria for identifying features within and across languages is not particularly sharp and there is quite a bit of cross G variation. If this is so, then the particular properties that coagulate around a given feature within a given G must be acquired via experience with that that particular feature in that particular G. And if this is so, then these features differ quite a bit in their epistemological status from the structural universals that PoS arguments most effectively deploy. Thus, not only does the learner have to learn which features his G exploits, but s/he even has to learn which particular properties these features make reference to, and this makes them poor fodder for the PoS mill.

Second, our theoretical understanding of features and categories is much poorer than our understanding of structural universals. So for example, islands are no longer basic “things” in modern theory. They are the visible byproducts of deeper principles (e.g. Subjacency). From the little I can tell, this is less so for features/categories. I mentioned the feature theory underlying the substantive N,V,A,P categories (though I believe that this theory is not that well regarded anymore). However, this theory, even if correct, is very marginal nowadays within syntax. The atoms that do the syntactic heavy lifting are the functional ones, and for this we have no good theoretical unification (at least so far as I am aware). Currently, we have the functional features we have, and there is no obvious theoretical restraint to postulating more whenever the urge arises.  Indeed, so far as I can tell, there is no theoretical (and often, practical) upper bound on the number of possible primitive features and from where I sit many are postulated in an ad hoc fashion to grab a recalcitrant data point. In other words, unlike what we find with the standard bevy of structural universals, there is no obvious explanatory cost to expanding the descriptive range of the primitives, and this is too bad for it bleaches featural accounts of their potential explanatory oomph.

This, I take it, is largely what MH is criticizing, and if it is, I think I am in agreement (or more precisely, his survey of things matches my own). Where we part company is what this means. For me this means that these issues will tell us relatively little about FL and so fall outside the main object of linguistic study. For MH, this means that linguistics will shed little light on FL as there is nothing FLish about what linguistics studies. Given what I said above, we can, of course, both be right given that we are largely agreeing: if MH’s description of the study of substantive universals is correct, then the best we might be able to do is Greenberg, and Greenberg will tell us relatively little about the structure of FL. If that is the argument, I can tag along quite a long way towards MH’s conclusion. Of course, this leaves me secure in my conclusion that what we know about structural universals argues the opposite (viz. a need for linguistically specific innate structures able to bridge the easily detectable PoS gaps).

That said, let me add three caveats.

First, there is at least one apparent substantive universal that I think creates serious PoS problems; the Universal Base Hypothesis (UBH). Cinque’s work falls under this rubric as well, but the one I am thinking about is the following. All Gs are organized into three onion like layers, what Kleanthes Grohmann has elegantly dubbed “prolific domains” (see his thesis). Thus we find a thematic layer embedded into an agreement/case layer embedded into an A’/left periphery layer.  I know of no decent argument arguing against this kind of G organization. And if this is true, it raises the question of why it is true. I do not see that the class of dependencies that we find would significantly change if the onion were inversely layered (see here for some discussion). So why is it layered as it is? Note that this is a more abstract than your typical Greenberg universal as it is not a fact about the surface form of the string but the underlying hierarchical structure of the “base” phrase marker. In modern parlance, it is a fact about the selection features of the relevant functional heads (i.e. about the features (aka substance) of the primitive atoms). It does not correspond to any fact about surface order, yet it seems to be true. If it is, and I have described it correctly, then we have an interesting PoS puzzle on our hands, one that deals with the organization of Gs which likely traces back to the structure of FL/UG. I mention this because unlike many of the Greenberg universals, there is no obvious way of establishing this fact about Gs from their surface properties and hence explaining why this onion like structure exists is likely to tell us a lot about FL.

Second, it is quite possible that many Greenberg universals rest on innate foundations. This is the message I take away from the work by Culbertson & Adger (see here for some discussion). They show how some order within nominals relating Demonstratvies, Adjectives, Numerals and head Nouns are very hard to acquire within an artificial G setting. They use this to argue that their absence as Greenberg options has a basis in how such structures are learned.  It is not entirely clear that this learning bias is FL internal (it regards relating linear and hierarchical order) but it might be. At any rate, I don’t want anything I said above to preclude the possibility that some surface universals might reflect features of FL (i.e. be based on Chomsky Universals), and if they do it suggests that explaining (some) Greenberg universals might shed some light on the structure of FL.

Third, though we don’t have many good theories of features or functional heads, a lazy perusal of the facts suggest that not just anything can be a G feature or a G head. We find phi features all over the place. Among the phi features we find that person, number and gender are ubiquitous. But if anything goes why don’t we find more obviously communicatively and biologically useful features (e.g. the +/- edible feature, or the +/- predator feature, or the +/- ready for sex feature or…). We could imagine all sorts of biologically or communicatively useful features that it would be nice for language to express structurally that we just do not find. And the ones that we do find, seem from a communicative or biological point of view to often be idle (gender (and, IMO, case) being the poster child for this). This suggests that whatever underlies the selection of features we tend to see (again and again) and those that we never see is more principled than anything goes. And if that is correct, then what basis could there be for this other than some linguistically innate proclivity to press these features as opposed to those into linguistic service.  Confession: I do not take this argument to be very strong, but it seems obvious that the range of features we find in Gs that do grammatical service is pretty small, and it is fair to ask why this is so and why many other conceivable features that we could imagine would be useful are nonetheless absent.

Let me reiterate a point about my shortcomings I made at the outset. I really don’t know much about features/categories and their uniform and variable properties. It is entirely possible that I have underestimated what GG currently knows about these matters. If so, I trust the comments section will set things straight. Until that happens, however, from where I sit I think that MH has a point concerning how features and categories operate theoretically and that this is worrisome. That we draw opposite conclusions from these observations is of less moment than that we evaluate the current state of play in roughly the same way.

[1] This is the main theme of On Wh Movement and I believe what drives the unification behind Merge based accounts of FL.
[2] Falsification is not a particularly good criterion of scientific adequacy, as I’ve argued many times before. It is usually used to cudgel positions one dislikes rather than push understanding forward. That said, in MH, invoking the F word does not really play much more than an ornamental role. There are serious criticisms that come into play.
[3] I abstract here from minimalist considerations which tries to delimit the domain specificity of the requisite assumptions. As you all know, I tend to think that we can reduce much of GB to minimalist principles. The degree to which this hope is not in vain, to that degree the domain specificity can be circumscribed to whatever it is that minimalism needs to unify the apparently very different principles of GB and the generalizations that follow from them.