Thursday, November 15, 2012

Poverty of Stimulus Redux

 This paper by Berwick, Pietroski, Yankama and Chomsky (BPYC) offers an excellent succinct review of the logic of the Poverty of Stimulus argument (POS). In addition it provides an up to date critical survey of putative “refutations.”  Let me highlight (and maybe slightly elaborate) on some points that I found particularly useful.

First, they offer an important perspective, one in which the POS is one step in a more comprehensive enterprise. What’s the utility in identifying “innate domain specific factors” of linguistic cognition? It is “part of a larger attempt to…isolate the role of other factors [my emphasis](1209).”  In other words, the larger goal is to resolve linguistic cognition into four possible factors; (i) innate domain specific, (ii) innate domain general, (iii) external stimuli, and (iv) effects of “natural law.” Understanding (i) is a critical first step in better specifying the roles of the other three factors and how they combine to produce linguistic competence. As they put it:

The point of a POS argument is not to replace “learning” with appeals to “innate principles” of Universal Grammar (UG). The goal is to identify factor (i) contributions to linguistic knowledge, in a way that helps characterize those contributions. One hopes for subsequent revision and reduction of the initial characterization, so that 50 years later, the posited UG seems better grounded (1210).

I have the sense that many of those that get antsy with UG and POS do so because they see it as smug explanatory complacency: all linguists do is shove all sorts of stuff into UG and declare the problem of linguistic competence solved!  Wrong. What linguists have done is create a body of knowledge outlining some non-trivial properties of UG. In this light, for example, GB’s principles and parameters can be understood as identifying a dozen or so “laws of grammar,” which can now themselves become the object of further investigation, e.g. Are these laws basic or derived from more basic cognitive and physical factors? (minimalists believe the latter), What more basic factors are these? (Chomsky thinks Merge is the heart of the system and speculation abounds about whether it is domain general or linguistically specific) and so forth.  The principles/laws, in other words, become potential explananda in their own right. There is nothing wrong with reducing proposed innate principles of UG to other factors (in fact, this is now the parlor game of choice in contemporary minimalist syntax). However, to be worthwhile it helps a lot to start with a relatively decent description of the facts that need explaining and the POS has been a vital tool in establishing these facts. Most critiques of the POS fail to appreciate how useful it is for identifying what needs to be explained.

The second section of the BPYC paper recaps the parade case for POS, Aux to Comp movement in Y/N questions. It provides an excellent and improved description of the main facts that need explanation.  It identifies the target of inquiry to be the phenomenon of “constrained homophony,” i.e. humans given a word-string will understand it to have a subset of the possible interpretations logically attributable to it.  Importantly, native speakers find both that strings have the meanings they do and don’t have the meanings they don’t. The core phenomenon is “(un)acceptability under an interpretation." Simple unacceptability (due to pure ungrammaticality with no interpreations) is the special case of having zero readings. Thus what needs explanation is how given ordinary experience native speakers develop a capacity to identify both the possible and impossible interpretations and thus:

…language acquisition is not merely a matter of acquiring a capacity to associate word strings with interpretations.  Much less is it a mere process of acquiring a (weak generative) capacity to produce just the valid word strings of the language (1212).

As BPYC go on to show in §4, the main problem with most of the non-generative counter proposals is that they simply misidentify what needs to be explained. None of the three proposals on offer even discuss the problem of  constrained homophony, let alone account for it. BPYC emphasize this in their discussion of string-based approaches in §4.1. However, the same problem extends to the Reali and Christensen bi-gram/tri-gram/recurrent network models discussed in §4.3 and the Perfors, Tanenbaum and Regier paper in §4.2, though BPYC don’t emphasize this in their discussion of the latter two. 

The take home message from §2 and §4.1 and §4.2 is that even for this relatively simple case of Y/N question formation, critics of the POS have “solved” the wrong problem (often poorly, as BPYC demonstrate).

Sociologically, the most important section of the BPYC paper is §4.2.  Here they review a paper by Perfors, Tenenbaum and Regier (PTR) that has generated a lot of buzz. They effectively show that where PTR is not misleading, it fails to illuminate matters much. The interested reader should take a careful look. I want to highlight two points.

First, contrary to what one might expect given the opening paragraphs PTR do not engage with the original problem that Aux-inversion generated; whether UG requires that grammatical (transformational) rules be structure dependent. PTR address another question: whether the primary linguistic data contains information that would force an ideal learner to choose a Phrase Structure Grammar (PSG) over either a finite list or a right regular grammar (generates only right branching structures).  PTR conclude that given these three options there is sufficient information in the Primary Linguistic Data (PLD) for an ideal learner to choose a PSG type grammar over the other two. Whatever the interest of this substitute question, it is completely unrelated to the original one Chomsky posed. Whether grammars have PSG structure says nothing about whether transformations are structure or linearly dependent processes.  This point was made as soon as the PTR paper saw the light of day (I heard Lasnik and Uriagereka make it at a conference at MIT many years ago where the paper was presented) and it is surprising that the published PTR version did not clearly point out that the authors were going to discuss a completely different problem. I assume it’s because the POS problem is sexier than the one that PTR actually address. Nothing like a little bait and switch to goose interest in one’s work.

Putting this important matter aside, it’s worth asking what PTR’s paper shows on its own terms. Curiously, it does not show that kids actually use PLD to choose among the three competing possibilities.  They cannot show this for PTR is exploring the behavior of ideal learners, not actual ones.  How close kids are to ideal learners is an open question.  And similar to one Chomsky long ago considered. Chomsky’s reasons for moving from evaluation measure models (as in Aspects) to parameter setting ones (as in LGB) revolved around the computational feasibility of providing an ordering of alternative grammars necessary for having usable evaluation metrics.  Chomsky thought (and still thinks) that this is a very tall computational order, one unlikely to be realizable.  The same kind of feasibility issues affects PTR’s idealization. How computationally feasible is to assume that are able to order, compare and decide among the many grammars compatible with the data that are in the hypothesis space? The larger the space of options available for comparison the more demanding the problem typically is. When looking for needles, choose small haystacks. In the end, what PTR shows is that there is usable information in the PLD that were it used could choose PSGs over the two other alternatives. It does not show that kids do or can use this information effectively.

The results are IMHO more modest still. The POS argument has been used to make the rationalist point that linguistically capable minds come stocked full of linguistically relevant information. PTR’s model agrees. The ideal mind comes with the three possible options pre-coded in the hypothesis space. What they show is that given such a specification of the hypothesis space, the PLD could be used to choose among them. So, as presented, the three grammatical options (note the domain specificity: it's grammars in the hypothesis space) are given (i.e. innately specified). What's “learned” (i.e. data driven) is not the range of options but the particular option selected. What pposition does PTR argue against? It appears to be the following position: only by delimiting the hypothesis space of possible grammars so as to exclude all but PSGs can we explain why the grammars attained are PSGs (which of course they are not, as we have known for a long while, but ignore that here). PTR's proposal is that it's ok to widen the options open for consideration to include non-PSG grammars because the PLD suffices to single out PSGs in this expanded space. 

I can’t personally identify any generativist who has held the position PTR targets. There are two dimensions in a Bayesian scenario: (i) the possible options the hypothesis space delimits, (ii) a possible weighting of the given options giving some higher priors than others (making them less marked in linguistic terms).  These dimensions are also part of Chomsky’s abstract description of the options in Aspects chapter 1 and crops up in current work on questions of whether some parameter value is marked.  So far as I can tell, the rationalist ambitions the POS serves are equally well met by theories that limit the size of the hypothesis space and those that widen it but make some options more desirable than others via various kinds of (markedness) measures (viz. priors). Thus, even disregarding the fact that the issue PTR discuss is not what generativists mean by structure dependence, it is not clear how revelatory their conclusions are as their learning scenario assumes exactly the kind of richly structured domain specific innate hypothesis space the POS generally aims to establish. So, if you are thinking that PTR gets you out from under rich domain specific innate structures, think again.  Indeed if anything, PTR pack more into the innate hypothesis space than generativists typically do.

Rabbinic law requires that every Jew rehash the story of the exodus every year on Passover. Why? The rabbis answer: it’s important for everyone in every generation to personally understand the whys and wherefores of the exodus, to feel as if s/he too were personally liberated, lest it’s forgotten how much was gained.  The seductions of empiricism are almost alluring as “the fleshpots of Egypt.” Reading BPYC responsively, with friends, is excellent antidote, lest we forget!


  1. So people still believe in parameters, in spite of Boeckx's discussion and their generally hopeless prospects for explaining the acquisition of things such as word-meaning?

    I think it's perfectly possible and even plausible to have principles without parameters (but rather constraints of unbounded complexity in a constraint language, simpler=better). Bayes then provides a better notion of 'fit', including fewer options for externalizing a given meaning (indeed, a standard grammar-improvement method in the XLE system of LFG is to run the grammar in generation mode to find excess structures that are produced for a given f-structure, and change the grammar to eliminate them).

  2. Some people do (Chomsky, Baker, Roberts) some don't (Boeckx, Londahl, Newmeyer). I agree with you that the issue of principles can be divorced from parameters and I have even expressed skepticism about the parameter setting model (c.f. 'A Theory of Syntax' 7.2.2). Chomsky's point, which I think relevant for Bayesian ideal learner models, concerns the computational feasibility of establishing an evaluation metric. I suspect an analogous problem holds for PTR views when scaled up to realistic levels (there are a lot of PSGs). That said, as noted in a 'Theory of Syntax,' parameters that are not independent, and there is no reason to think UG parameters are, have their own computational challenges and make incremental learning difficult to understand. Dresher and Fodor and Sakas have explored these issues and there are various proposals out there but the problem is a real one.

    Your suggestion in the last paragraph intrigues me. The little I understand of Bayesian methods is that they require a given hypothesis space. In the context of what you are saying, it must delimit the set of possible changes one can make to an inadequate grammar. How do you see this as different from listing possible alternative parameter settings? Doesn't one need to outline how the space is traversed? If so, either the grammars are lined up in some order with the data pushing you towards the most adequate one (evaluation metric) or there are some number of parameters you weigh and decide on. The first has feasibility problems, the second independence issues. How does yours work?

  3. I see 'parameters' as binary settings of a finite number of 'switches', 'rules' as statements of potentially unbounded length constructed from a symbol vocabulary, which, itself, might or might not be universally delimited. Joan Bresnan's LFG remake of the passive as a lexical rule, or phrase structure in the ID/LP rule notation would be suitable examples. So yes, to make Bayesian methods work, you need at least rules to have a navigable hypothesis space with a prior a.k.a. evaluation metric defined over it. Mike Dowman's under review paper 'Minimum Description Length as a Solution to the Problem of Generalization in Syntactic Theory' discusses how to assess grammar weight/length, since frequency of use of the symbols is part of it (although it doesn't seem to me that the math guys are in full agreement about exactly how to count length, I also don't think the details matter very much). So that would be the first alternative you describe.

    I have no idea how to navigate in grammar-spaces (current Bayesian learners for syntax-relevant rules assume a fixed vocab and finite upper limits on rule-length, etc), but presumably somebody who's good at math will be able to make some progress on that some day.

    Hmm there's another possibility, that the number of categories is unbounded, but that each set of categories generates a limited number of possible rules, but I hadn't thought of that before, so there's a possible useful result of this discussion... (I don't buy the arguments that Adjective is a universal category, in part because the people who argue for it have not imo done a sufficiently careful job to distinguish adjectives from intransitive stative verbs in all the relevant languages.)

    But I think one could still come up with a useful notion of 'fit' of grammar to data just by counting the number of structures that the grammar produces for each string, with the requirement that the correct one (or one determining the correct meaning) be included, and the evaluation principle that fewer is better. This is a more relaxed version of the 'one form one meaning a.k.a. Uniqueness Principle that's been floating around for a long time, and is also a discrete approximation of '2 part MDL', in which we assume that all the possible ways of expressing a meaning are equally probable. So, in this approximation, G1 is better than G2 if the sum weight(G1)+Log2(sum of number of parses for each meaning (intended meaning, one for each) sentence in the corpus). The 2nd term is the amount of info needed to describe the corpus given the lexical items and scheme of semantic assembly (a more precise account of what I mean by 'meaning') of its sentences.

    It's easy to work out by hand that adding a verb-final Linear Precedence constraint to an ID/LP grammar will be motivated pretty quickly, as the data grows, if the data really obeys the constraint, but for more complicated examples I'll need to get the XLE generator working, which I haven't done yet (it takes some time to get into hack mode ...)

    1. I think that the problem is more complex, at least in principle, than this. This is above my pay grade, but I am encouraging a pro to post on these issues very soon. Here's however, what I have been told: setting up a space of grammars that is easily navigable is no easy feat and the computational intractability grows pretty quickly when realistic cases are considered. The feasibility issue really is non-trivial and though a parameter setting model has its own problems, dumping it amounts less to removing an impediment to progress than leaving us not knowing how to proceed. I think that seining this over to the math whizzes is not yet a viable strategy.

  4. One of the striking things about the BYPC paper is that they are explicitly arguing for a very weak UG -- basically just Merge in two forms, Internal and External. So this raises two questions -- first whether that is really domain specific in any meaningful sense (which relates to the claim in this post that PTR's bias for PSGs is domain specific) and secondly whether such minimal UG is able to explain the facts in the paper.

    The answer to the latter is clearly no -- Denis Bouchard has a nice paper on this. And though opinions differ on the first point, that is a different and much less heated debate.
    If UG is just a bias towards a certain family of grammar formalisms, say minimalist grammars, and everything else is learned (features, lexicons etc) then I am not sure anyone disagrees substantively anymore. I'll buy into that, and I am about as empiricist as they get.

    The grey area is how much else is innate apart from Merge: e.g. is there a universal innate set of syntactic categories, a universal lexicon etc. etc., and where does all of the other machinery of the MP come from?

  5. Two points: I am not sure what one means by weak vs strong UG. Do you mean that its operative principles are domain specific or not? As you know, minimalists aim to reduce the domain specificity of the principles and operations. I think that it is currently unclear how successful this has been. Moreover, it only pertains to FL in the narrow sense. We, or at least I, would like to know about FL in the wider sense as well and here it seems pretty clear that humans must have biologically provided mechanisms that other animals don't enjoy for the simple reason that we pick natural languages up reflexively and easily and nothing else does.
    Second: the issue of domain specificity is not all that hard to settle, at least in principle: find another domain that has phrases of the sort we do, movement dependencies of the sort we do, locality of the sort we do and bingo, no more specificity. From the little I know, we are the only human natural languages display properties of this kind. There is some rumble about bird songs etc but there is currently no clear evidence that they exploit PSG generated patterns. There are no other candidates out there, so far as I know.
    Last point: Let's say we find that the operations of UG are no different from those in other domains of animal cognition. What does this imply for the structure of UG? Or for its (mental) organ like (modular) status? Even if FL is composed completely out of operations recycled from other domains of cognition (which I doubt) it seems that only humans have put them altogether to form an FL. That alone would be pretty specific and distinctive. Like I said, I suspect that there are some (one or two) distinctive operations, but the question of what it looks like and what the basis of FL are is not made easier or more "environmentally" driven (empiricist friendly) if all the basic operations are cognitively general. I have discussed these issues at more length in 'A theory of syntax' (where I also try to show how to derive many of the principles of GB using simpler assumptions) and so won't drone on at length here.
    Really the last point: I think the question of whether Merge really is domain general an interesting issue. I think that Chomsky is inclined to think it is. I am less sure. I hope to post on this soonish.

  6. I think there is a spectrum of proposals for UG from ones that just propose a small amount of presumably domain general principles (e.g. a bias towards hierarchically structured representations like a CFG) towards those that posit a very rich and structured set of principles including eg. innate syntactic categories which will presumably inevitably be domain specific. So let's call the former small UG and the latter big UG to avoid overloading weak and strong.

    I think domain specificity is not that trivial to determine -- remember we are talking about the domain specificity of UG, the initial state, not the final state. The final state will exhibit phenomena that are clearly domain specific whether the initial state is domain specific or domain general; so one needs to have a conditional in there. One can argue that IF UG is just a bias towards hierarchically structured representations, then it is domain general. IF it includes syntactic categories, subjacency then it is clearly domain specific.

    I don't think one wants to look at animal communication systems to determine domain specificity -- that would tell us about species specificity which is something different (though not species specific implies not domain specific).
    We could look at other domains like vision or planning or so on and note that hierarchically structured representations and CFGs are widely used in many areas of computer science and cognitive science. But I don't think that is the interesting question -- the interesting question is whether there is a lot of stuff in UG (as in P and P models) or just one small thing (as in the BYPC paper and other recent papers by Chomsky).

    Which brings me back to my question about this paper which I will rephrase. The 'old' POS argument, as you discussed in your previous post, was an argument for big UG. We need a big UG to explain AUX inversion.
    The argument in the BYPC paper seems to be an argument for a small UG -- And as far as they are explicit it about it, the small UG is just Merge. So this raises the question: is the UG they propose strong enough to account for the acquisition of the correct rule. Answer: no, as Denis Bouchard discusses.
    So what is the 'logic' of this argument that you praised in this post? And what is the relationship of this argument to the original POS argument?

    At the risk of being repetitive and boring, I understood the old POS argument: as Pullum and Scholz put it, the one thing that was clear about the old argument was its conclusion -- big UG.
    Now BYPC are no longer arguing for big UG -- so what is the role of the POS argument now? Why does the *same* argument now have a completely different conclusion?
    I find the paragraph you quote about characterizing the contributions of UG to be not completely explicit on this point.

    1. I don't think there was ever an argument for a "big" UG. I think there were arguments, which BYPC rehearsed and maybe updated a bit, that certain explananada (regarding languages not naturally acquired by kids) were due largely to UG, as opposed to being by-products of general learning strategies. That conclusion may have seemed to imply a "big" UG, especially if one thinks in detail about the requisite innate components of all the GB modules. (This may in turn have led some people to deny that some of the explananda were real.) But there was always conceptual room for the possibility of reduction, even if there were few good ideas about how to proceed in this way. (As a very rough analogy, think of the ideal gas law: one can posit it as basic, or try to reduce a variant of it to more basic principles.) BYPC, as nativists who want to posit the "smallest" innate language-specific endowment that does justice to the explananda, suggested a strategy for reduction. I'd be delighted to consider other strategies for reduction. And if you think that explaining the explananda will require a larger innate endowment, including whatever it is that lets kids acquire words as they do, I won't argue with you. But I thought that part of the aim was to ask how much might be squeezed out of Merge, as a way of getting a fix on what else is *required*, as opposed to what else would *suffice*.

  7. This comment has been removed by the author.

  8. [first version had something important missing, so I deleted it] I'll speculate that a possible reason for UG to look task specific even if it actually isn't is that we might not know enough about other domains to recognize the uility of provisions specific enough to provide crucial help with the acquisition of odd stuff in other languages.

    For example myself in a 1996 article in the Australian Journal of Linguistics and Rachel Nordlinger in a 1998 postulated slightly different versions of a mechanism of 'zipping' (me) or 'bumping' (Rachel) in LFG to account for how the remarkable 'case stacking' found in Kayardild and various other languages works, where the affixes on the determiner of the possessor of say an adverbial can specify grammatical features of various higher levels in the structure.

    So perhaps zipping/bumping is task specific, or maybe it's also used by children learning to tie their shoes or chimpanzees to build their nests; without detailed knowledge of how these things work we can't know, however plausible the speculation of some amount of task specific stuff looks. Similarly for the wierd things that people find going on with 'Long Distance Agreement' in Hindi or Icelandic, in the MP.

    1. I think your way of thinking about domain specificity maybe makes my earlier comment seem (correctly!) to be a bit glib.

      Say taking the question of whether PCFGs are domain specific or not -- so one answer is that they are used extensively in modelling DNA sequences in bioinformatics so they are clearly not domain specific. But that misses the point that they might still be domain specific cognitively -- they might only be used mentally for phrase structure and thus be domain specific even if we use them scientifically for many other purposes. But as you say that would be very hard to establish either positively or negatively, since we just don't know what representations are used cognitively for mereology or planning or vision or whatever.

    2. Damn it, I just hit some button and lost my reply. God I hate doing that. But, once more into the breech:

      Avery and Alex make a point that I agree with: that we don't know a lot about other cognitive domains yet so who really knows if the stuff we postulate in UG is domain specific or domain general. Yes, the only real way to find out is to compare operations/principles/primitives once we have good descriptions in other domains, or at least as good as what we have in linguistics. But (I bet you could hear it coming eh?), the POS argument is very good for "establishing a body of doctrine" (Chomsky quoting Joseph Black) whose features would be relevant to addressing this question intelligently.

      Here's how I see it. We may not know IF UG need be this complex, but we have good reason to think that the grammars we posit have abstract properties that seem, given what we know, to be sui generic, and that it is these properties that a domain general approach would need to explain away (i.e. reanalyze in domain general terms). This is what makes the BPYC paper so very useful. They demonstrate (at least to my satisfaction) that even for the simple case of Y/N question formation three putative domain general reanalyses simply fail. Moreover, they fail for pretty boring reasons: they don't really try to explain the relevant phenomenon, what BPYC call "constrained homophony (CH)." A movement account can explain but CH it needs supplementation with a principle favoring structure dependent operations within grammars. This smells like a domain specific principle if for no other reasons there is little evidence for movement processes in other cognitive domains. Of course, this could be wrong. Maybe there are and they favor similar restrictions on "movement," a terrifically interesting result should anyone present evidence for it (and one that I at least, and I suspect Chomsky too, would embrace and delight in). But this is not what "refutations" of the POS typically do, as BPYC show for three currently fashionable examples. What struck me as important is that it seems that these reanalyses don't really address the phenomenon, CH. But CH is the evidence for treating Y/N question formation as a movement phenomenon and so these three reanalyses BPYC examine are really quite besides the point, as they are not explaining what needs explanation. It is my experience that this is typically the case and if so it does not move the conversation forward at all, and that's really too bad for the question about domain specificity is interesting, though at present any firm conclusion is, I believe, premature (though I think that the weight of argument favors domain specificity). In sum, the POS is a very useful tool even for those who think that there is "less" domain specificity than meets the eye. And this brings me to Alex's weak/strong distinction (see next "reply").

    3. Minimalism recognizes that packing UG full of domain specific constraints has a cost and raises a question: how did it all get in there? THis has long been a question and people skeptical about "rich" UGs have often pointed to this problem. However, earlier skepticism (and current skepticism) is not enough: one needs to show how to derive the relevant properties given domain general operations/constraints and this is NOT easy to do. Chomsky and friends have the ambition of reducing the domain specificity of UG to a bare minimum. Nice ambition. Alex says that Bouchard notes that this doesn't work (I have not read the paper, sorry). Say that Bouchard is right, then too bad for Chomsky's ambitions! Even if Bouchard is wrong, this still leaves the question of the domain specificity of Merge. Chomsky believes it might be domain general. If he is right and if merge suffices to account for the main properties of Y/N questions then this phenomenon does not argue for the domain specificity of UG. If he is wrong about the domain generality of merge, then it remains as an argument for the domain specificity of UG. Whatever the outcome, the right way to argue about it travels via the POS argument which is indispensable for establishing the relevant "body of doctrine."

      Two last points: first, I suspect that both Alex and Avery will agree with what I have said as the points are, to my mind, methodologically anodyne. What they will say is that they remain unconvinced, the data is not yet enough for them to conclude that UG has "rich" domain specific structure. There is no principled way to establish how much evidence is enough. If that's all we disagree with, then there's not much reason for argument. Half empty, half full. Big deal.

      Second, a virtue of the BPYC paper that I highlighted is that it emphasizes that POS reasoning serves the instrumental function I reiterate above. they welcome grounding the conclusions of POS in deeper, possibly domain general principles. I agree, and I am sure that Alex and Avery would as well. The POS is not the last stop, but is an excellent place to begin.

  9. There's a huge amount to think about in that article, but, shooting from the hip perhaps, I'll opine that they somewhat obscure what I take to be their essential point by saying too much about too many things, such as trying to motivate the MP ab initio. If you want to defuse PTR and other work questioning structure dependence as an aspect of UG, I think it's enough to observe that:

    a) nobody can actually find descriptively adequate grammars from realistic bodies of evidence in a typologically realistic search space without assuming various principles of apparent UG including either structure-dependence as such (presumably built into the Culicover and Wexler TG learner from several decades ago) or built in as a consequence of other assumptions (Kwiatkowski, Steedman et al's CCG learner). Note especially that PTR chooses a grammar with structure over one without, but doesn't actually navigate the search space to find anything at all, so might be a step towards removing structure dependence from UG, but doesn't actually accomplish the job, even aside from the fact pointed out in the paper that it doesn't rule out a hybrid model with a mixture of structure dependent and independent rules.

    b) structure dependence of predicate preposing constructions appears to have no counterexamples from typology.

    Therefore, I would, possessing apparent utility, and lacking evident counterexamples, it is a plausible addition to UG. & if you take Chomsky's occasional discussions of the rather tentative nature of the kinds of conclusions we can draw in linguistics seriously, I don't think it's necessary or even desirable to be more definite than that.

    An interesting near counterexample is 2nd position clitic pheneomena, but I think Legate has cleaned that up in a convincing way, although more conformation from people who know about different language would be nice (Homeric Greek is checking out so far).

  10. Hmm b) isn't formulated quite right, v2: structure dependence of constructions that signal clause type by modifying word order appears to have no counterexamples from typology.