Faculty of Language: Universals: a consideration of Everett's full argument

Sunday, September 11, 2016

Universals: a consideration of Everett's full argument

I have consistently criticized Everett’s Piraha based argument against Chomsky’s conception about Universal Grammar (UG) by noting that the conclusions only follow if one understands ‘universal’ in Greenberg rather than Chomsky terms (e.g. see here). I have recently discovered that this is correct as far as it goes, but it does not go far enough. I have just read this Everett post, which indicates that my diagnosis was too hasty. There is a second part to the argument and the form is actually one of a dilemma: either you understand Chomsky’s claims about recursion as a design feature of UG in Greenbergian terms OR Chomsky’s position is effectively unfalsifiable (aka: vacuous). That’s the full argument. I (very) critically discuss it in what follows. The conclusion is that not only does it fail to understand the logic of a CU conception of universal, it also presupposes a rather shallow Empiricist conception of science, one in which theoretical postulates are only legitimate if directly reflected in surface diagnostics. Thus, Everett’s argument gains traction only if one mistakes Chomsky Universals (CU) for Greenberg Universals (GUs), misunderstands what kind of evidence is relevant for testing CUs and/or tacitly assumes that only GUs are theoretically legit conceptions in the context of linguistic research. In short, the argument still fails, even more completely than I thought.

Let’s give ourselves a little running room by reviewing some basic material. GUs are very different from CUs. How so?

GUs concern the surface distributional properties of the linguistic objects (LO) that are the outputs of Gs. They largely focus on the string properties of these LOs. Thus, one looks for GUs by, for example, looking at surface distributions cross linguistically. For example, one looks to see if languages are consistent in their directionality parameters (If ‘OP’ then ‘OV’). Or if there are patterns in the order in nominals of demonstratives, modifiers and numerals wrt the heads they modify (e.g. see here for some discussion).

CUs specify properties of the Faculty of Language (FL). FL is the name given to the mental machinery (whatever its fine structure) that outputs a grammar for L (G_L) given Primary Linguistic Data from language L (PLD_L). FL has two kinds of design features. The linguistically proprietary ones (which we now call UG principles) versus the domain general ones, which are part of FL but not specific to it. GGers investigate the properties of FL by, first, investigating the properties of language particular Gs and second, via the Poverty of the Stimulus argument (POS). POS aims to fix the properties of FL by seeing what is needed to fill the gap between information provided about the structure of Gs in the PLD and the actual properties that Gs have. FL has whatever structure is required to get the Language Acquisition Device (LAD) from PLD_L to G_L for any L. Why any L? Because any kid can acquire any G when confronted with the appropriate PLD.

Now on the face of it, GUs and CUs are very different kinds of things. GUs refer to the surface properties of G outputs. CUs refer to the properties of FL, which outputs Gs. CUs are ontologically more basic than GUs[1] but GUs are less abstract than CUs and hence epistemologically more available.

Despite the difference between GUs and CUs, GGers sometimes use string properties of the outputs of Gs to infer properties of the Gs that generate these LOs. So, for example, in Syntactic Structures Chomsky argues that human Gs are not restricted to simple finite state grammars because of the existence of sentences that allow non-local dependencies of the sort seen in sentences of the form ‘If S₁ then S₂’. Examples like this are diagnostic of the fact that the recursive Gs native speakers can acquire must be more powerful than simple FSGs and therefore that FL cannot be limited to Gs with just FSG rules.[2] Nonetheless, though GUs might be useful in telling you something about CUs, the two universals are conceptually very different, and only confusion arises when they are run together.

All of this is old hat, and I am sorry for boring you. However, it is worth being clear about this when considering the hot topic of the week, recursion, and what it means in the context of GUs and CUs. Let’s recall Everett’s dilemma. Here is part 1:

1. Chomsky claims that Merge is “a component of the faculty of language,” (i.e. that it is a Universal).[3]

2. But if it is a universal then it should be part of the G of every language.

3. Piraha does not contain Merge.

4. Therefore Chomsky is wrong that Merge is a Universal.

This argument has quite a few weak spots. Let’s review them.

First, as regards the premises (1) and (2), the argument requires assuming that if something is part of FL, a CU, then it appears in every G that is a product of FL. For unless we assume this, it cannot be that the absence of Merge in Piraha is inconsistent with the conclusion that it is part of FL. But, the assumption that Merge is a CU does not imply that it is a GU. It simply implies that FL can construct Gs with that embody (recursive) Merge.[4] Recall, CUs describe the capacities of the LAD not its Gish outputs. FL can have the capacity to construct Merge containing Gs even if it can also construct Gs that aren’t Merge containing Gs. Having the capacity to do something does not entail that the capacity is always (or even ever) used. This is why a claim like Everett’s that argues from (3), the absence of Merge in the G of Piraha, does not argue against Merge as part of FL.

Second, what is the evidence that Merge is not part of Piraha’s G? Everett points to the absence of “recursive structures” in Piraha LOs (2). What are recursive structures? I am not sure, but I can hazard a guess. But before I do so, let me note that recursion is not properly a predicate of structures but of rules. It refers to rules that can take their outputs as inputs. The recursive nature of Merge can be seen from the inductive definition in (5):

5. a. If a is a lexical item then a is a Syntactic Object (SO)

b. If a is an SO and b is an SO then Merge(a,b) is an SO

With (5) we can build bigger and bigger SOs, the recursive “trick” residing in the inductive step (5b). So rules can be recursive. Structures, however, not so much. Why? Well they don’t get bigger and bigger. They are what they are. However, GGers standardly illustrate the fact of recursion in an L by pointing to certain kinds of structures and these kinds have come to be fairly faithful diagnostics of a recursive operation underlying the illustrative structures. Here are two examples both of which have a phrase of type A embedded in another one of type A.

6. S within an S: e.g. John thinks that Bill left in which the sentence Bill left is contained within the larger sentence John thinks that Bill left.

7. A nominal within a nominal: e.g. John saw a picture of a picture where the nominal a picture is contained within the larger nominal a picture of a picture.

Everett follows convention and assumes that structures of this sort are diagnostic of recursive rules. We might call them reliable witnesses (RW) for recursive rules. Thus (6)/(7) are RWs for the claim that the rule for S/nominal “expansion” can apply repeatedly (without bound) to their outputs.

Let’s say this is right. It does not imply that the absence of RWs implies the absence of recursive rules. As it is often said: the absence of evidence is not evidence of absence. Merge may be applying even though we can find no RWs diagnostic of this fact in Piraha.

Moreover, Everett’s post notes this. As it says: “…the superficial appearance of lacking recursion does not mean that the language culd not be derived from a recursive process like Merge. And this is correct” (2-3). Yes it is. Merge is sufficient to generate the structures of Piraha. So, given this, how can we know that Piraha does not employ a Merge like operation?

So far as I can tell, the argument that it doesn’t is based on the assumption that unless one has RWs for some property X one cannot assume that X is a characteristic of G. So absent visible “recursive structures” we cannot assume that Merge obtains in the G that generates these structures. Why? Because Merge is capable of generating unboundedly big (long and deep) structures, and we have no RWs indicating that the rule is being recursively applied. But, and this I really don’t get; the fact that Merge could be used to generate “recursive structures” does not imply that in any given G it must so apply. So how exactly does the absence of RWs for recursive rule application in Piraha (note, I am here tentatively conceding that Everett’s factual claims might be right (which is likely incorrect, for they are likely wrong)) show that Merge is not part of a Piraha G? Maybe Piraha Gs can generate unboundedly large phrases but then applies some filters to the outputs to limit what surfaces overtly (this in fact appears to be Everett’s analysis).[5] In this sort of scenario, the Merge rule is recursive and can generate unboundedly large SOs but the interfaces (to use minimalist jargon) filters these out preventing the generated structures from converging. On this scenario, Piraha Gs are like English or Braizilain Portuguese or … Gs, but for the filters.

Now, I am not saying that this is correct. I really don’t know and I leave the relevant discussions to those that do.[6] But, it seems reasonable and if this is indeed what the right G analysis for Piraha is, then it too contains a recursive rule (aka, Merge) though because of the filters it does not generate RWs (i.e. the whole Piraha G does not output “recursive structures”).

Everett’s post rejects this kind of retort. Why? Because such “universals cannot be seen, except by the appropriate theoretician” (4). In other words, they are not surface visible, in contrast to GUs, which are. So, the claim is that unless you have a RW for a rule/operation/process you cannot postulate that rule/operation/process exists within that G. So, absence of positive evidence for (recursive) Merge within Piraha is evidence against (recursive) Merge being part of Piraha G. That’s the argument. The question is why anyone should accept this principle?

More exactly, why in the case of studying Gs should we assume that absence of evidence is evidence of absence, rather than, for example, evidence that more than Merge is involved is yielding the surface patterns attested. This is what we do in any other domain of inquiry. The fact that planes fly does not mean that we throw out gravity. The fact that balls stop rolling does not mean that we dumb inertia, the fact that there are complex living systems does not mean that entropy doesn’t exist. So why should the absence of RWs for recursive Merge in Piraha imply that Piraha does not contain Merge as an operation?

In fact, there is a good argument that it does. It is that many many other Gs have RWs for recursive Merge (a point that Everett accepts). So, why not assume that Piraha Gs do too? This is surely the simplest conclusion (viz. that Piraha Gs are just like other Gs fundamentally) if it is possible to make this assumption and still “capture” the data that Everett notes. The argument must be that this kind of reply is somehow illicit. What could license the conclusion that it is?

I can only think of one: that universals just are summaries of surface patterns. If so, then without surface patterns that are RWs for recursion in a given G means that there are no recursive rules in that G for there is nothing to “summarize.” All G generalizations must be surface “true.” The assumption is that it is scientifically illicit to postulate some operation/principle/process whose surface effects are “hidden” by other processes. The problem then with considering a theory according to which Merge applies in Piraha but its effects are blocked so that there are no RW-like “recursive structures” to reliably diagnose that it is there is that such an assumption is unscientific!

Note that this methodological principle applied in the real sciences would be considered laughable. Most of 19^th century astronomy was dedicated to showing that gravitation regulates planetary motion despite the fact that planets do not appear to move in accord with the inverse square law. The assumption was made that some other mass was present and it was responsible for the deviant appearances. That’s how we discovered Neptune (see here for a great discussion). So unless one is a methodological dualist, there is little reason to accept Everett’s presupposed methodological principle.

It is worth noting that an Empiricist is likely to endorse this kind of methodological principle and be attracted to a Greenberg conception of universals. If universals are just summaries of surface patterns then absent the pattern there is no universal at play.

Importantly, adopting this principle runs against all of modern GG, not just the most recent minimalist bit. Scratch any linguist and s/he will note that you can learn a lot about language A by studying language B. In particular, modern comparative linguistics within GG assumes this as a basic operating principle. It is based on the idea that Gs are largely similar and that what is hard to see within the G of language A might be pretty easy to observe in that of language B. This, for example, is why we often conclude that Irish complementizer agreement tells us something about how WH movement operates in English, despite there being very little (no?) overt evidence for C to C movement in English. Everett’s arguments presuppose that all of this reasoning is fallacious. His is not merely an argument against Merge, but a broadside on virtually all of the cross-linguistic work within GG for the last 30+ years.

Thankfully, the argument is really bad. It either rests on a confusion between GUs and CUs or rests on bogus (dualist) methodological principles. Before ending, however, one more point.

Everett likes to say that a CU conception of universals is unfalsifiable. In particular, that a CU view of universals robs these universals of any “predictive power” (5). But this too is false. Let’s go back to Piraha.

Say you take recursion to be a property of FL then what would you conclude if you ran into speakers that spoke a language without RWs for that universal? You would conclude that they could learn a language where that universal has clear overt RWs. So, assume (again only for the sake of discussion) that you find a Piraha speaker sans recursive G. Assuming that recursion is part of FL you predict that such speakers could acquire Gs that are clearly recursive. In other words, you would predict that Piraha kids would acquire, to take an example at random, Brazilian Portuguese just like non-Piraha kids do. And, as we know, they do! So, taking recursion to be a property of FL makes a prediction about the kinds of Gs LADs can/do acquire. And these predictions seem to be correct. So, postulating CUs does have empirical consequences and it does make predictions, it’s just that it does not make predictions about whether CUs will be surface visible in every L (i.e. provide RWs in every L) and there is no good reason that they should.

Everett complains in this post that people reject his arguments because they confuse GUs and CUs and that this is incorrect (i.e. they don’t make this confusion). However, it is clear that there is lots of other confusion and lots of methodological dualism and lots of failure to recognize the kinds of “predictions” a CU based understanding of universals does make. Both prongs of the dilemma that the argument against CUs rests on collapse on pretty cursory inspection. There is no there there.

Let me end with one more observation, one I have made before. That recursion is part of FL is not reasonably debatable. What kind of recursion there is, and how it operates is very debatable. The fact is not debatable because it is easy to see its effects all around you. It’s what Chomsky called linguistic productivity (LP). LP, as Chomsky has noted repeatedly, requires that linguistic competence involve knowledge of a G with recursive rules. Moreover, that any child can acquire any language implies that every child comes equipped to the language acquisition task with the capacity to acquire a recursive G. This means that the capacity to acquire a recursive G (i.e. to have an operation like Merge) must be part of every human FL. This is a near truism and, as Chomsky (and many others, including moi) have endlessly repeated, it is not really contestable. But there is a lot that is contestable. What kind of rules/operations do Gs contain (e.g. FSGs, PSGs, TGs, MPs?)? Are these rules/operations linguistically proprietary (i.e. part of UG or not?)? How do Gs interact with other cognitive systems, etc.? These are all very hard and interesting empirical questions which are and should be vigorously debated (and believe me, they are). The real problem with Everett’s criticism is that it has wasted a lot of time by confusing the trivial issues with the substantive ones. That’s the real problem with the Piraha “debate.” It’s been a complete waste of time.

[1] By this I mean that whereas you are contingently a speaker of English and English is contingently SVO it is biologically necessary that you are equipped with an FL. So, Norbert is only accidentally a speaker of English (and so has a G_English) and it is only contingently the case that English is SVO (it could have been SOV as it once was). But it is biologically necessary that I have an FL. In this sense it is more basic.

[2] Actually, they are diagnostic on the assumption that they depict one instance of an unbounded number of sentences of the same type. If one only allows finite substitutions in the S positions then a more modest FSG can do the required work.

[3] Quote from 202 Science paper with Fitch and Hauser.

[4] I henceforth drop the bracketed modifier.

[5] Thanks to Alec Marantz for bringing this to my attention.

[6] This point has already been made By Nevins, Pesestsky and Rodrigues in their excellent paper.

13 comments:

AveryAndrewsSeptember 11, 2016 at 7:22 PM
My little addition to this is that if CUs can't be converted into probabalistic implicational GUs, they will be of very limited interest to many people, including me. But of course they can be ... from X-bar theory you can for example predict that if a member of a word class appears in some syntactic position, perhaps with a bit of extra stuff such as a determiner or a case-marker, then all the usual satellites of that class will be able to appear there also. This is usually correct, and I'm not aware of any way in which the currently popular sequence- or usage- based approaches can predict it.

But then there are the exceptions, such as prenominal possessors in German, and all adnominal possessors in Piraha, for which I think we need to be open to multiple possible forms of explanation. The avenues that occur to me now are:

a) classical Bayes/MDL (Goldsmith, Chater, Perfors & Clark)
b) better predictions (Ramscar, Dye et al)
c) simpler processing (Yang)

(b, c) strike me currently as better bets, but might be very difficult to distinguish from each other, since anything that yields better predictions of what's coming next will probably make processing simpler and vice-versa. But in all three cases, there is some kind of tradeoff between formal 'simplicity' of the grammar as quantified by an evaluation metric over a notation, and something else of a more functional nature.

Lots of people have been doing heavy foundational work relevant this for a long time, in various ways, too many to list, although Guglielmo Cinque sort of comes to mind ATM, but one point I'd like to make in contrast to Jeff Lidz' recent posting is that I think it's important that some of this work relate to things that can be observed in corpora, rather only ones evident from looking at intuitions about selected examples, because there is also a big population of people who just aren't impressed by that, but who might be able to be brought around to paying more attention to the intuitions if at least some of them had a clear connection to corpus-based observations.
ReplyDelete
Replies
Martin HaspelmathSeptember 11, 2016 at 10:46 PM
I don't think the term "surface" helps us understand the difference between Greenbergian and Chomskyan universals, but the following goes to the heart of the matter: "Modern comparative linguistics within GG assumes as a basic operating principle that you can learn a lot about language A by studying language B". If this assumption were correct, one would think that some successes along the lines of discovering Neptune would have been made, but in fact, there are only very dim ideas about what might be in UG, despite decades of vibrant research. So I think it makes more sense to study universals without making that assumption (i.e. by strictly separating language description from comparison), in the Greenbergian fashion, and to look for explanations primarily in the domain-general aspects of FL. But in contrast to Everett and Tomasello, I don't see this as challenging the Chomskyan philosophy – we may eventually find Neptune, and if there's good independent evidence for it, I'll be happy to accept it.
ReplyDelete
Replies
Tim HunterSeptember 12, 2016 at 12:45 PM
[Part 1 of 2]

Although I agree with the gist of Norbert's post I'm actually not sure that "Chomsky Universals" are the best notion for clarifying the debate. To get to that point, let me start with another point that addresses the (non-)falsifiability issue.

Maybe it would be useful to point out that there are perfectly imaginable claims about UG that would be refuted if it were true that Piraha speakers could not embed a sentence inside another sentence -- just not claims that anyone has made, as far as I know. For example, by looking at boring old English (without leaving the air-conditioning!), we can write down a couple of hypotheses about the way verbs combine with other things:
(1) a. Every verb's lexical entry encodes requirements on what a phrase that combines with that verb can be headed by.
(1) b. The verb 'hit' wants a D as the head of its sister phrase, the verb 'object' wants the word 'to' as the head of its sister phrase, the verb 'know' wants a C as the head of its sister phrase, etc.

One can easily imagine hypothesizing that the particular collection of subcategorization frames specified in (1b) is provided by FL, i.e. the learner comes equipped with the assumption that there must be some verbs that combine with DPs, some that combine with CPs, etc. This seems to be a hypothesis that would be falsified by the finding that some language didn't have sentences analogous to "I know that John left". But that's not the hypothesis that syntacticians typically make about how to parcel out the information in (1) between innate and learned: instead, the idea is basically that (1a) is provided by FL. (Of course there are other ways of parceling things out too, it doesn't have to be a split along (1a)/(1b): for example, one can imagine that FL specifies (1a) *and* that there must be some verbs that combine with DPs, but that's all.)

And it's also easy to imagine evidence against the hypothesis that FL provides (only) (1a): this would be a language where some verb can only combine with DPs that have a PP complement inside them, or some verb can only combine with CPs that have an object DP inside them, or some verb can only combine with phrases that have at least four nodes inside them, or whatever. And this is a point that's made in every intro syntax class.
ReplyDelete
Replies
AveryAndrewsSeptember 12, 2016 at 5:32 PM
I'm not sure I follow everything that Tim says above, but isn't a large part of the problem with g's their slippery connection to observations, especially of language use as opposed to intuitions? (1a) however points in the general direction of an implicational universal, namely, that if a verb specifies the identity of the head of something inside one of its complements, it will fix the identity of all of the heads along the way, which becomes observational and afaik true to the extent that we can pin down what we mean by 'head'. So we have the idiom "X got up Y's nose", but, no such idioms where the preposition is variable but 'nose' is not.
ReplyDelete
Replies
JALSeptember 20, 2016 at 6:32 AM
I continue to emphasize that we should not let Everett get away with incorrectly defining Merge as self-embedding.
ReplyDelete
Replies
NorbertSeptember 20, 2016 at 7:38 AM
I completely agree. Nor should we focus the discussion on whether Piraha does or doesnt have some property diagnostic of recursion. It DOESNT MATTER. We need to stop answering the question "when did Chomsky stop beating his dog" and attack the presupposition behind the question, which is that anything Everett claims to have found regarding Piraha is simply irrelevant. Thats the big point and it remains obscured.
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Sunday, September 11, 2016

Universals: a consideration of Everett's full argument

13 comments:

Contributors