Faculty of Language: Evans-Levinson: the sound and the fury

Monday, May 20, 2013

Evans-Levinson: the sound and the fury

I confess that I did not read the Evans and Levinson article (EL) (here) when it first came out. Indeed, I didn’t read it until last week. As you might guess, I was not particularly impressed. However, not necessarily for the reason you might think. What struck me most is the crudity of the arguments aimed at the Generative Program, something that the (reasonable) commentators (e.g. Baker, Freidin, Pinker and Jackendoff, Harbour, Nevins, Pesetsky, Rizzi, Smolensky and Dupoux a.o.) zeroed in on pretty quickly. The crudity is a reflection, I believe, of a deep seated empiricism, one that is wedded to a rather superficial understanding of what constitutes a possible “universal.” Let me elaborate.

EL adumbrates several conceptions of universal, all of which the paper intends to discredit. EL distinguishes substantive universals from structural universals and subdivides the latter into Chomsky vs Greenberg formal universals. The paper’s mode of argument is to provide evidence against a variety of claims to universality by citing data from a wide variety of languages, data that EL appears to believe, demonstrate the obvious inadequacy of contemporary proposals. I have no expertise in typology, nor am I philologically adept. However, I am pretty sure that most of what EL discuss cannot, as it stands, broach many of the central claims made by Generative Grammarians of the Chomskyan stripe. To make this case, I will have to back up a bit and then talk on far too long. Sorry, but another long post. Forewarned, let’s begin by asking a question.

What are Generative Universals (GUs) about? They are intended to be in the first instance, descriptions of the properties of the Faculty of Language (FL). FL names whatever it is that humans have as biological endowment that allows for the obvious human facility for language. It is reasonable to assume that FL is both species and domain specific. The species specificity arises from the trivial observations that nothing does language like humans do (you know: fish swim, birds fly, humans speak!). The domain specificity is a natural conclusion from the fact that this facility arises in all humans pretty much in the same way independent of other cognitive attributes (i.e. both the musical and the tone deaf, both the hearing impaired and sharp eared, both the mathematically talented and the innumerate develop language in essentially the same way). A natural conclusion from this is that humans have some special features that other animals don’t as regards language and that human brains have language specific “circuits” on which this talent rests. Note, this is a weak claim: there is something different about human minds/brains on which linguistic capacity supervenes. This can be true even if lots and lots of our linguistic facility exploits the very same capacities that underlie other forms of cognition.

So there is something special about human minds/brains as regards language and Universals are intended to be descriptions of the powers that underlie this facility; both the powers of FL that are part of general cognition and those unique to linguistic competence. Generativists have proposed elaborating the fine structure of this truism by investigating the features of various natural languages and, by considering their properties, adumbrating the structure of the proposed powers. How has this been done? Here again are several trivial observations with interesting consequences.

First, individual languages have systematic properties. It is never the case that, within a given language, anything goes. In other words, languages are rule governed. We call the rules that govern the patterns within a language a grammar. For generativists, these grammars, their properties, are the windows into the structure of FL/UG. The hunch is that by studying the properties of individual grammars, we can learn about that faculty that manufactures grammars. Thus, for a generativist, the grammar is the relevant unit of linguistic analysis. This is important. For grammars are NOT surface patterns. The observables linguists have tended to truck in relate to patterns in the data. But these are but way stations to the data of interest: the grammars that generate these patterns. To talk about FL/UG one needs to study Gs. But Gs are themselves inferred from the linguistic patterns that Gs generate, which are themselves inferred from the natural or solicited bits of linguistic productions that linguists bug their friends and collaborators to cough up. So, to investigate FL/UG you need Gs and Gs should not be confused with their products/outputs, only some of which are actually perceived (or perceivable).

Second, as any child can learn any natural language, we are entitled to conclude from the intricacies of any given language to powers of FL/UG capable of dealing with such intricacies. In other words, the fact that a given language does NOT express property P does not entail that FL/UG is not sensitive to P. Why? Because a description of FL/UG is not an account of any given language/G but an account of linguistic capacity in general. This is why one can learn about the FL/UG of an English speaker by investigating the grammar of a Japanese speaker and the FL/UG of both by investigating the grammar of a Hungarian, or Swahili, or Slave speaker. Variation among different grammars is perfectly compatible with invariance in FL/UG, as was recognized from the earliest days of Generative Grammar. Indeed, this was the initial puzzle: find the invariance behind the superficial difference!

Third, given that some languages display the signature properties of recursive rule systems (systems that can take their outputs as inputs), it must be the case that FL/UG is capable of concocting grammars that have this property. Thus, whatever G an individual actually has, that individual’s FL/UG is capable of producing a recursive G. Why, because that individual could have acquired a recursive G even if that individual’s actual G does not display the signature properties of recursion. What are these signature properties? The usual: unboundedly large and deep grammatical structures (i.e. sentences of unbounded size). If a given language appears to have no upper bound on the size of its sentences, then it's a sure bet that the G that generates the structures of that language is recursive in the sense of allowing structures of type A as parts of structures of type A. This, in general will suffice to generate unboundedly big and deep structures. Examples for this type of recursion include conjunction, conditionals, embedding of clauses as complements of propositional attitude verbs, relative clauses etc. The reason that linguists have studied these kinds of configurations is precisely because they are products of grammars with this interesting property, a property that seems unique to the products of FL/UG, and hence capable of potentially telling us a lot about the characteristics of FL/UG.

Before proceeding, it is worth noting that the absence of these noted signature properties in a given language L does not imply that a grammar of L is not basically recursive. Sadly, FL seems to leap to this conclusion (443). Imagine that for some reason a given G puts a bound of 2 levels of embedding on any structure in L. Say it does this by placing a filter (perhaps a morphological one) on more complex constructions. Question: what is the correct description of the grammar of L? Well, one answer is that it does not involve recursive rules for, after all, it does not allow unbounded embedding (by supposition). However, another perfectly possible answer is that it allows exactly the same kinds of embedding that English does modulo this language specific filter. In that case the grammar will look largely like the ones that we find in languages like English that allow unbounded embedding, but with the additional filter. There is no reason just from observing that unbounded embedding is forbidden to conclude that the grammar of this hypothetical language L (aka Kayardild or Piraha) has a grammar different in kind from the grammars we attribute to English, French, Hungarian, Japanese etc. speakers. In fact, there is reason to think that the Gs that speakers of this hypothetical language have does in fact look just like English etc. The reason is that FL/UG is built to construct these kinds of grammars and so would find it natural to do so here as well. Of course L would seem to have an added (arbitrary) filter on the embedding structures, but otherwise the G would look the same as the G of more familiar languages.

An analogy might help. I’ve rented cars that have governors on the accelerators that cap speed at 65 mph. The same car without the governor can go far above 90 mph. Question: do the two cars have the same engine? You might answer “no” because of the significant difference in upper limit speeds. Of course, in this case, we know that the answer is “yes”: the two cars work in virtually identical ways, have the very same structures but for the governor that prevents the full velocity potential of the rented car from being expressed. So, the conclusion that the two cars have fundamentally different engines would be clearly incorrect. Ok: swap Gs for engines and my point is made. Let me repeat it: the point is not that the Gs/engines might be different in kind, the point is that simple observation of the differences does not license the conclusion that they are (viz. you are not licensed to conclude that they are just finite state devices because they don’t display the signature features of unbounded recursion, as EL seems to). And, given what we know about Gs and engines the burden of proof is on those that conclude from such surface differences to deep structural differences. The argument to the contrary can be made, but simple observations about surface properties just doesn’t cut it.

Fourth, there are at least two ways to sneak up on properties of UGs: (i) collect a bunch and see what they have in common (what features do all the Gs display) and (ii) study one or two Gs in great detail and see if their properties could be acquired from input data. If any could not be, then these are excellent candidate basic features of FL/UG. The latter, of course, is the province of the POS argument. Now, note that as a matter of logic the fact that some G fails to have some property P can in principle falsify a claim like (i) but not one like (ii). Why? Because (i) is the claim that every G has P, while (ii) is the claim that if G has P then P is the consequence of G being the product of FL/UG. Absence of P is a problem for claims like (i) but, as a matter of logic, not for claims like (ii) (recall, If P then Q is true if P is false). Unfortunately, EL seems drawn to the conclusion that PàQ is falsified if –P is true. This is an inference that other papers (e.g. Everett’s Piraha work) are also attracted to. However, it is a non-sequitur.

EL recognizes that arguing from the absence of some property P to the absence of Pish features in UG does not hold. But the paper clearly wants to reach this conclusion nonetheless. Rather than denying the logic, EL asserts that “the argument from capacity is weak” (EL’s emphasis). Why? Because EL really wants all universals to be of the (i) variety, at least if they are “core” features of FL/UG. As these type (i) universals must show up in every G if they are indeed universal, absence to appear in one grammar is sufficient to call into question its universality. EL is clearly miffed that Generativists in general and Chomsky in particular would hold a nuanced position like (ii). EL seems to think that this is cheating in some way. Why might they hold this? Here’s what I think.

As I discussed extensively in another place (here), everyone who studies human linguistic facility appreciates that competent speakers of a language know more than they have been exposed to. Speakers are exposed to bits of language and from this acquire rules that generalize to novel exemplars of that language. No sane observer can dispute this. What’s up for grabs is the nature of the process of generalization. What separates empiricists from rationalists conceptions of FL/UG is the nature of these inductive processes. Empiricists analyze the relevant induction as a species of pattern recognition. There are patterns in the data and these are generalized to all novel cases. Rationalists appreciate that this is an option, but insist that there are other kinds of generalizations, those based on the architectural properties (Smolensky and Dupoux’s term) of the generative procedures that FL/UG allow. These procedures need not “resemble” the outputs they generate in any obvious way and so conceiving this as a species of pattern recognition is not useful (again, see here for more discussion). Type (ii) universals fit snugly into this second type, and so empiricists won’t like them. My own hunch is that an empiricist affinity for generalizations based on patterns in the data lies behind EL’s dissatisfaction with “capacity” arguments; they are not the sorts of properties that inspection of cases will make manifest. In other words, the dissatisfaction is generated by Empiricist sympathies and/or convictions which, from where I sit, have no defensible basis. As such, they can be and should be discounted. And in a rational world they would be. Alas…

Before ending, let me note that I have been far too generous to the EL paper in one respect. I said at the outset that its arguments are crude. How so? Well, I have framed the paper’s main point as a question about the nature of Gs. However, most of the discussion is framed not in terms of the properties of Gs they survey but in terms of surface forms that Gs might generate. Their discussion of constituency provides a nice example (441). They note that some languages display free word order and conclude from this that they lack constituents. However, surface word order facts cannot possibly provide evidence for this kind of conclusion, it can only tell us about surface forms. It is consistent with this that elements that are no longer constituents on the surface were constituents earlier on and were then separated, or will become constituents later on, say on the mapping to logical form. Indeed, in one sense of the term constituent, EL insists that discontinuous expressions are such for they form units of interpretation and agreement. The mere fact that elements are discontinuous on the surface tells us nothing about whether they form constituents at other levels. I would not mention this were it not the classical position within Generative Grammar for the last 60 years. Surface syntax is not the arbiter of constituency, at least if one has a theory of levels, as virtually every theory that sees grammars as rules that relate meaning with sounds assumes (EL assumes this too). There is nary a grammatical structure in EL and this is what I meant be my being overgenerous. The discussion above is couched in terms of Gs and their features. In contrast, most of the examples in EL are not about Gs at all, but about word strings. However, as noted at the outset, the data relevant to FL/UG are Gs and the absence of Gish examples in EL makes most of EL’s cited data irrelevant to Generative conceptions of FL/UG.

Again, I suspect that the swapping of string data for G data simply betrays a deep empiricism, one that sees grammars as regularities over strings (string patterns) and FL/UG as higher order regularities over Gs. Patterns within patterns within patterns. Generativists have long given up on this myopic view of what can be in FL/UG. EL does not take the Generative Program on its own terms and show that it fails. It outlines a program that Generativists don’t adopt and then shows that it fails by standards it has always rejected using data that is nugatory.

I end here: there are many other criticisms worth making about the details, and many of the commentators of the EL piece better placed than me to make them do so. However, to my mind, the real difficulty with EL is not at the level of detail. EL’s main point as regards FL/UG is not wrong, it is simply besides the point. A lot of sound and fury signifying nothing.

57 comments:

Tim HunterMay 20, 2013 at 2:16 PM
I think the connection between the "empiricist affinity for generalizations based on patterns in the data" and "dissatisfaction with 'capacity' arguments" is even more direct than you suggest. When we talk about patterns that are "in the data", what we actually mean (I think) is patterns that can be recognised without reference to any domain-specific concepts, i.e. patterns that you can recognise by taking notice of things like the linear ordering of words, and do not require bringing in notions like c-command, binding domains, bounding nodes, etc. As is (very) frequently pointed out, the only thing that's up for grabs is the kind of generalisation the learner makes, not whether generalisations are made; so in a relevant sense there is no such thing as "patterns in the data". To the extent that the phrase means something, I think it means patterns recognisable without domain-specific machinery, and if the "dissatisfaction with 'capacity' arguments" actually concerns domain-specific capacity (which I guess it does), then these two things are obviously very closely connected.

(I made a similar comment on the post about picking up patterns in decimal expansions that you linked to.)
ReplyDelete
Replies
UnknownMay 20, 2013 at 2:28 PM
Apologies to Tim i do not mean to ignore you but had finished writing this comment before yours appeared and I truly hope that for once Norbert will answer the questions i ask at the end...

After reading E&L Norbert is struck by “the crudity of the arguments aimed at the Generative Program”. This is a pretty serious accusation but not backed with any citation of such crude argument. And for several of Norbert’s own arguments ‘crude’ would be a compliment. He writes: “humans have some special features that other animals don’t as regards language and that human brains have language specific “circuits” on which this talent rests”. It would be nice to be told what is the difference between ‘special features’ and ‘specific “circuits”’, or how this imprecise statement relates to anything E&L [or other empiricists] claim. Presumably somewhere in these ‘special features’ lurk the grammars, the relevant units of linguistic analysis. Of course this analysis is indirect via perceivable data and their surface pattern. Here one would expect sophisticated examples of such a backtracking analysis: surface data [SD] → G → UG. Alas, no such luck.

Similarly unsupported claims continue: “one can learn about the FL/UG of an English speaker by investigating the grammar of a Japanese speaker and the FL/UG of both by investigating the grammar of a Hungarian, or Swahili, or Slave speaker”. This may be true but why not provide even a single example of how the investigation of Hungarian reveals a property of English grammar that could not have been easier discovered by studying English? Would this drag us down on the crude level of E&L who provided example after example in support of their arguments?

The predictable dismissal of the arguments from Piraha reveals an even higher level of crudeness. Musings about artificial filters and irrelevant analogies to “cars that have governors on the accelerators” confirm [i] that NO empirical finding could persuade Norbert that his theory might be wrong and [ii] that he never read the arguments made in Everett [2012]. Otherwise he would know that the claim is not [to stay within the analogy] that the Piraha ‘car’ cannot go 90m/h but that it achieves this speed in another way than Norbert’s rental car [Piraha language does not have sentence recursion but discursive recursion – for short description see http://ling.auf.net/lingbuzz/001696]. And for this reason we could not learn about this peculiarity of Piraha by studying say German. Or, to turn the argument against Norbert for a moment: if we believe that we can learn about genuine universals from studying JUST Piraha, we could conclude [entirely wrongly] that sentence recursion in not a property of human language and that those exotic languages [like English] that have sentence recursion have an added epicycle to the ‘normal UG’.

Ignorance of actual arguments is also revealed in the attack on unnamed empiricists. Norbert claims that unlike Empiricists “Rationalists … insist that there are other kinds of generalizations, those based on the architectural properties” Had he ever read recent works of Elman or Tomasello he would know that these ‘arch empiricists’ do not deny built in architectural constraints. But why bother with such a crude activity as actually reading the arguments when one can rely on hunches?

The final point deserves attention “EL does not take the Generative Program on its own terms and show that it fails. It outlines a program that Generativists don’t adopt” maybe for once Norbert can assume that opponents are not intentionally misrepresenting or attacking straw men but simply do not understand which of the many claims that have been made under the umbrella of 'generativism' are binding at the moment. In other words can Norbert possibly outline in specific detail WHAT the current program of generativists is? What ARE the ‘specific brain circuits’ that allow humans and humans alone to have language?
ReplyDelete
Replies
Alex ClarkMay 21, 2013 at 1:23 AM
Norbert's claim seems to be this: that though we observe the strings, and (partially) observe their interpretations, we are primarily interested in the grammars, which are not observed. (This part is fine).

Therefore, an argument based on the strings alone can't refute a theory about the grammars.

But that seems a step too far. Because that would make the theory impossible to refute, since we don't have direct evidence about the grammars, but only (at the moment) via the word strings and associated interpretations.
So take some putative universal grammatical property -- say binary branching. What sort of evidence would refute this?

(I am not interested in defending the E and L paper per se, as I have some reservations about their methodology too).

ReplyDelete
Replies
ChrisMay 21, 2013 at 8:01 AM
My current take on this stuff is that this "debate" sounds a bit like the "debate" between "frequentists" and "Bayesians", at least in the broad sense. Larry Wasserman (whose blog you should definitely read) makes the point that the difference between Bayesian statistics and frequentist statistics is a difference in goals, not a difference in techniques. If you have Bayesian goals then it does not matter whether you "identify as Bayesian" or not: you can only make valid or invalid inferences. I think there's something very similar here: asking questions about mental grammars (which I see as very similar to Bayesian notions of "beliefs") means you accept certain things like inductive bias (or, in Bayesian terms, there is no such thing as an uninformative prior- you cannot consistently pursue Bayesian goals without accepting some information in your prior; just as you can't pursue arithmetic without accepting that 1 != 0). Now, if you can claim that UG is just a prior on grammars (and I can't see how it could be otherwise), it is still reasonable to ask to what extent its biological manifestation is used by other cognitive processes.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, May 20, 2013

Evans-Levinson: the sound and the fury

57 comments:

Contributors