Tuesday, June 2, 2015

Some juicy questions we'd like to have answers to

The Athens confab ended with suggestions for future questions we should address. As I was looking back over some old files I had I found one that was a list for myself of such questions, ones that I thought would be worth thinking about. Some were bigger, some smaller, but all seemed to me worth trying to tie down, or at least domesticate.  I reviewed this list in light of the discussion in Athens and I throw them out for your delectation and rumination. IMO, some of these questions are more blue sky than others. Some, I believe, even have plausible answers right now, but none that are definitive. Others  will need conceptual work to turn them into real questions. Nonetheless, the Athens get-together stimulated me to finally release this list, the combination of late night musings on my part, some good lunches with colleagues and discussions/proposals at Athens. Please feel free to add and elaborate.

1.     What’s a lexical atom? Are syntactic atoms the same as phonological, semantic, morphological atoms? What’s the relation between a lexical atom and a concept? What’s the relation between a lexical atom and a word?
2.     How/when do syntactic atoms enter a derivation?  Are all atoms of a piece or do they enter G operations in different ways and at different points?
3.     Is there a relatively small class of basic syntactic operations? Is so what are they? Can the class of such operations be reduced to just one or two? Are any peculiarly “linguistic”? Right now we have: Merge, Move, Bind, Agree, Probe, Lower, Obviate, Delete, Label Are all of these primitive? Are all linguistically proprietary.
4.     How transparent are the rules of the performance system and those of the competence system? Are the rules that get used in performance the same as the basic rules if such exist? Or if there are basic operations, do these compile to form larger units that are the rules that performance systems use (e.g. constructions)? Or, assuming that rules like passive are not primitives of UG, might they nonetheless be rules that the parser uses? Is some version of the DTC viable and if so what does it tell us about the primitives of the competence system?
5.     How is variation acquired? Is there enough information in the PLD to allow the attested variation to be “induced”? How much information is there in the PLD? What are the limits of the PLD (e.g. degree 0, 0+, 1)? How do kids actually use the PLD in fixing their Gs (what’s the learning path look like)?
6.     Is there a bound on possible G (syntactic) variation? P&P theories assume that there is, i.e. that there is only a finitely number of ways that Gs can vary (up to lexical difference). Is this so?  In other words, are parameter theories right?
7.     Are all parameters lexical parameters (micro vs macro variation)?
8.     Is there a universal base (Cinque) and if so what’s it’s source (e.g. is there a semantic basis for the hierarchy and if so what does this tell us about semantic primitives)?
9.     Are the assumptions behind the ideal speaker-hearer model reasonable (note: we all believe that they are literarily false)? In particular how misleading is the idea that LADs get all of their data at once, i.e. does loosening this idealization lead to qualitatively different theories of G? If not what does loosening the assumption buy us? How important is the fact that Gs are acquired incrementally to the grammatical end states attained? Does incremental learning make the acquisition problem harder or easier and if so how?
10.  Can any feature be a grammatical feature or are there limits on the kinds of features a G can have? So phi features exist. Is this a historical accident or a feature of FL? What’s the inventory of possible features? Can any feature be, in principle, grammatically recruited?
11.  What’s the role of UTAH and the thematic hierarchy in UG? G? Do we need well-defined theta roles? Do DPs always have well defined/able theta roles? Does it matter? Where does the theta hierarchy come from?
12.  Why do predicates have at most three arguments?
13.  Is there a distinction between syntactic and semantic binding or are they the “same” thing? i.e. X syntactically binds Y iff and X semantically binds Y. And if they are different are there diagnostics to tell them apart? If there are two ways to bind why are there two ways to do the same thing?
14.  How many kinds of locality conditions does UG allow (A-movement locality, Case/agreement locality, A’-locality, biding locality, thematic locality, selection locality)? Can these different notions be unified? What’s the relation, if any, between phases and minimality? Are both required?
15.  Are islands the products of G or simply complexity effects? If the former, are islands derivational or interface effects? How are they to be derived given current conceptions of locality?
16.  How are ECP effects to be integrated into the grammar given phase-based assumptions and the copy theory of movement?
17.  What’s a G? In P&P it was a vector of P values. Now?
18.  Why do Gs seem to clump into languages? Why is the space of possible Gs clumpy?
19.  What kinds of relations are grammatical? Antecedence? Agreement? Selection? Can any relation be grammaticized? Topicalization yes, but “suitable mate movement” no? Is there an inventory of basic “constructions” that Gs try to realize (an idea that Emmon Bach once mooted)?
20.  What are the phase heads and why?
21.  Are phrases labeled and if so why? Are they required by the interface (Chomsky PoP) or by Gs? What is the evidence that the interfaces need labeled structures?
22.  What is pied piping?
23.  Is movement/merge free or feature driven?
24.  Are labels endocentric or exocentric or both?
25.  What’s the semantic structure of a ling unit? Conjunction like structure a la Davidson or Function/Argument structure a la Montague? Or some of each?
26.  Is there an intrinsic bound on the PLD that kids use to fix their Gs and if there is what is it and where does the bound come from?
27.  Can Gs be ordered wrt simplicity? If so, how?  MDL, algorithmic complexity. A practical case or two to fix ideas would be very nice to have.
28.  Why are determiners conservative?
29.  Why do we have superlative ‘est’ meaning roughly ‘most’ but no analogous morpheme meaning ‘least’? I.e. ‘Biggest’ means most big (more big than the others). Why is there nothing like ‘biglest’ meaning least big (less big than the others)?
30.  What are the sources for grammatical recursion? Is merge a basic or complex operation?
31.   What’s the relation between a feature and a head? Can heads be arbitrarily large bundles of features? If not what is the upper bundling bound?
32.  Is their a syntax below X0 and if there is how does it compare to that above X0?
33.  Is grammaticality gradient? I.e need we assume that there are grades of grammaticality? Note we all assume that there are grades of acceptability.
34.  What’s an interface constraint? Can performance constraints (e.g. structure of memory) influence the shape of the representations of the competence system and if so how? I.e. are such general performance constraints interface constraints?
35.  Your turn….


  1. I would take issue with the formulation of #4 (“How transparent are the rules of the performance system and the competence system?”). It presupposes an architectural choice that is far from established.

    We need to be careful to not conflate:

    a. Differences between levels of analysis of the same cognitive system (feel free to invoke Marr if it helps)
    b. Differences between tasks
    c. Differences between separate neurocognitive systems.

    Discussions of the relation between standard/traditional grammatical theory and models of real-time stuff routinely mix up these distinctions. Your question is perhaps motivated primarily by (a) and (b), but it presupposes that a distinction at the level of (c) is necessary.

    Your sub-question about constructions vs. basic operations is an interesting one. I don’t have a succinct answer to offer, but I would recommend a closely related discussion that might be instructive. It has a similar flavor, but it involves a case where the linguistics is almost trivially simple. It’s the discussion in the word recognition literature over whether regular plurals, i.e., dogs, tables, portraits are stored and retrieved as full forms in memory. It’s a tricky question to answer, even for such easy cases, but let’s assume that there’s good evidence that at least some regular plurals are stored. Should we conclude from this that there’s a pure lexicon in which redundancies such as stored regulars are avoided, and also a ‘performance’ lexicon in which some commonly used routines are stored in precompiled form? Probably not. We’d more likely conclude that there’s a lexicon, that it has productive combinatorial rules, and that it includes some degree of redundancy. So, suppose that we find that passives and some other constructions have a similar status to stored regular plurals, should we draw a different conclusion?

    1. THX. I think that these refinements are all important to get the question(s) clarified. However, I guess I don't see how this impugns the relevance of the larger question. In fact, I thought that this was the kind of question your work was trying to address: how are representations/operations arrived at using the standard GG platform of techniques relate to this using other measures. Call the use theory that exploits only primitives and relations established by the competence theory transparent. We might expect some version of the DTC to be "visible" in use systems if something like this is right. In fact, interesting work has been and is now currently being pursued that tries this out. If I understand Alec Marantz's views on these matters, there really is no other understanding of these issues within cog-neuro. So we can ask the question and plod along the way we normally do with hard questions. Some tasks will be more revealing, some less, some data better, some worse etc. But this is what we should expect, no?

      Last, I guess I am with David Adger re the second point. I actually do not see that we wouldn't conclude that we have both a rule and a routine. Fine. The rule would be the default and prominent in many cases. When it statistically pays to routinize (i.e. compile and add probabilities) we do. Why is this not reasonable?

    2. I remember having this conversation with you guys years ago when Colin was working on his PIG (Parser-is-Grammar) theory. Berwick & Weinberg had a term "type transparency" for competence theories that also double as performance theories. It seems to me that it's unreasonable to insist that a good competence theory MUST type transparently map to a good performance theory, but it is equally unreasonable to refuse exploring the type transparent theories. Frankly, if one is engaged in the study of performance, type transparency would need to be the default position, at least from a practical point of view.

      But as Norbert notes, the picture of how competence and performance play together isn’t messy. Even for trivial cases. The stored-regulars story is more complex--and less true. Constantine Lignos in his dissertation used all the data in the English lexicon project to evaluate claims that high frequency rule-formed words are stored (Alegre & Gordon 1999 JML). The evidence is equivocal, possibly because prior studies used the very small Brown Corpus for lexical frequencies. At least for past tense, the lexical decision data is compatible with a fully (de)compositional model of morphology (Taft, Foster, etc.) that does not have stored regular forms.

    3. @Norbert. Yes, this is certainly a topic that I’m invested in. Strongly. The worry that I was raising was specifically about the way that commonly used terminology in these discussions tends to foster some unwarranted assumptions. Terms like “use systems” encourage the assumption that there are independent-but-related cognitive systems that are dedicated to specific tasks. If that’s how things are, then there certainly are questions about the nature of the relation between these systems. On the other hand, if there are not distinct systems, then the question of how they’re related is moot or ill-posed. The notion that there are task-specific cognitive systems in addition to a task-neutral cognitive system is a respectable empirical claim, but it is an empirical claim. For related reasons, I prefer to avoid the term “performance”, as that gets used in so many different ways that it tends to foster conflated claims. Over the past 50 years, Chomsky has been pretty consistent in his use of the notion of “competence”, but in contrast “performance” has been used to refer to a grab bag of quite different notions.

      At the risk of being repetitive, we need to be careful to not conflate (i) theories that describe the same cognitive system at different levels of analysis, (ii) theories that describe the same cognitive system under different task settings (e.g., when intended meaning is known or unknown), and (iii) theories that describe distinct but related cognitive systems.

      @Charles. Yes, a priori I think it’s unreasonable to require type transparency between theories. But at this point we’re way beyond the a priori stage, because there’s so much empirical evidence, and the evidence is pretty clear.

      @Norbert. Re: DTC. (= “Derivational Theory of Complexity”, for those born after the Beatles disbanded.) That discussion in the 1960s started from questions about transparency between different theories, i.e., do representations or processes in one theory map neatly onto corresponding representations and processes in another theory. (For some those might be theories at different levels of analysis, and for others might be theories of distinct cognitive systems.) Those questions are as relevant now as ever. Where things went astray, and still do, is in the “D” and the “C” parts, specifically the suggestion that there’s a useful notion of the overall “complexity” of a sentence, and that the primary predictor of that is the steps in a grammatical derivation. Nowadays, we can probe individual pieces of mental representations and processes in finer detail, and so we can probe the status of individual grammatical operations, without the need to worry about whether there’s a useful global measure for an entire sentence, and whether that is usefully expressed in terms of “complexity”.

      As a friendly amendment to your #4, I’d suggest the following: “What is the relation between theories that aim to generate all/only grammatical expressions and theories of real-time linguistic processes? Is knowledge of language like knowledge of number/quantity, where distinct cognitive systems exist that work with the same content?"

    4. Love the amendment. We can elaborate further in fact: are the primes over which the systems operate the same? Are the dependencies calculated the same? Are the locality conditions the same? I take the transparency thesis to mean something like: yes they are( largely) the same, or, at the very least, the primes and dependencies of the competence theory are also primes, dependencies calculated by the real time processes subject to the same restrictions on these dependencies. This allows for there to be additional things in the real time theories. At any rate, the hops is that these largely overlap. What I have found promising is that this bet seems to be defensible and insightful.

  2. @Colin, just wondering why we wouldn't conclude for your plurals example that you'd have both a rule and a routine. Seems consistent with acquisition and with (sociolinguistic) variation. In fact it's what I recommended in as a way of dealing with variability. Is there a good reason to shy away from it?

  3. Let us assume regular plurals are also stored in English and verbal forms in Spanish (aka Charles Yang's talk at MayFest). Would this also mean that hundreds of inflected forms of each verb in agglutinative languages (e.g. Turkish) are also stored in memory? If yes, would it raise any issues wrt memory load and retrieval etc? If not, why?

    1. I would assume that if people stored any inflected forms in memory, they'd tend to only store the frequent ones. They probably wouldn't store very infrequent forms, and most likely wouldn't store forms that they've never encountered, which I'm assuming is a fairly common scenario in Turkish (there must be millions of potential forms that never occur in a given corpus, right?). So I don't know if the number of stored forms would be hugely larger in Turkish than in, say, Spanish. But in any case, I'm not sure there's a reason to think that there's a particular number of stored forms beyond which you'd run into memory load and retrieval issues - do we know that people who know more things are slower at accessing those things?

    2. Atakan: I don't remember what I said, but probably not the regular forms are stored. For me--and I think the Marantz school also--even irregular past tense is compositionally formed by (unproductive rules). Before children learn that -ed is productive--this takes 2-3 years for most children--they would treat "walked", a high frequency word, as "irregular", because the -d suffix has not been elevated to the productive status yet.

      A very related question though hard to answer. What happens the "irregular" form of "walked" after the child learns +ed is productive? They may erase it, i.e., be erasing a lexically specific link between 'walk' and "+ed". They may keep it as is, and forever treat "walked" as an exceptionally formed word that takes some form of memorization. Or they may have two "walked" in co-existence, one exceptionally formed before they learned the productivity of -ed and consumes memory and the other predictably formed with +ed. I don't think we know for sure which one of the three logical possibilities is true although the Lignos results do cast doubt on the high frequency storage effects in general.

    3. Ok, I'm sorry that I raised the issue of stored regulars, as it seems to have side-tracked the discussion. For current purposes, I don't care whether it's TRUE that there are stored regulars (people disagree on that). The question was: if we stipulate that it is true, then what would we conclude about the lexicon? Would we conclude that we have a neat partitioning between two lexicons: the clean, non-redundant one and the one that contains the redundantly stored regular forms? Or would we just conclude that the lexicon includes some redundant stuff? I thought that this might be a more straightforward version of the question that Norbert was asking about passives and grammatical primitives ... but I guess I was wrong. (To repeat: the question of whether some high frequency regulars actually are stored is not pertinent to this discussion.)

  4. This is an impressive list of fascinating questions. I couldn't help wondering if there was a taxonomy into domains or categories of questions that might help refining and extending the list, or help distinguish established results from outstanding questions. I could think of the following subdivision, which is not meant to be exclusive, of course.

    I. The nature of (U)G: what defines G?
    3. Is there a relatively small class of basic syntactic operations?
    14. How many kinds of locality conditions does UG allow?
    17. What’s a G? In P&P it was a vector of P values. Now?
    18. Why do Gs seem to clump into languages? Why is the space of possible Gs clumpy?
    19. What kinds of relations are grammatical? Antecedence? Agreement? Selection?

    II. The architecture of (U)G: levels, modules, interfaces.
    1. What’s a lexical atom?
    4. How transparent are the rules of the performance system and those of the competence system?
    27. Can Gs be ordered wrt simplicity?
    34. What’s an interface constraint?

    III. Representation and implementation
    2. How/when do syntactic atoms enter a derivation?
    16. How are ECP effects to be integrated into the grammar given phase-based assumptions and the copy theory of movement?
    20. What are the phase heads and why?
    21. Are phrases labeled and if so why?

    IV. Questions evoked by descriptive generalizations
    11. What’s the role of UTAH and the thematic hierarchy in UG? G?
    12. Why do predicates have at most three arguments?
    22. What is pied piping?
    28. Why are determiners conservative?
    29. Why do we have superlative ‘est’ meaning roughly ‘most’ but no analogous morpheme meaning ‘least’?

    Now of course some of the questions will fit into more than one category, but perhaps it is good to distinguish the domains/ categories that frame the questions asked.

    One more thing about the class of syntactic operations: Merge, Move, Bind, Agree, Probe, Lower, Obviate, Delete, Label. These seeem to compare apples and pears. Merge and Move can be viewed as internal vs external Merge, and therefore of the same kind. The Delete that takes place after movement is directly tied to movement and probably has no independent status (I am not referring to ellipsis, which may be simple non-pronunciation). The status of Lower is rather problematic for all the well-known reasons. Bind, at least Bind as in the Binding Theory of yore, reduces to Agree in my book (see Rooryck & Vanden Wyngaerd 2011). Also Probe seems to be part of Agree rather than an independent operation, if it is an operation at all. I mean, in operator-variable binding we don't seem to speak of an operation Operate that initiates the operator-variable relation. Why would we in the case of Agree? I always understood the Probe-Goal relation to be a Condition for Agree to apply, with Probe a metaphor for the unvalued features that require valuation. I am also not sure that Obviate is an active operation in the sense of the other operations, it seems more like no relationship is established, so it is more a condition than an operation. So perhaps Conditions should be distinguished from Operations.

    1. Thx, the categorization is very helpful and a marked improvement on the scattershot nature of the list. So the a lot, this is really useful.

      Let me say a word about your last point. There are various theories that try to unify these operations, and I am all in favor of doing this. However, I think it pays to keep them separate so that the virtues of the unifications can be adjudicated. I too have adopted Chomsky's idea for unifying merge and move via E and I merge (though I have also suggested other ways of doing this more consonant with earlier COPY based views of movement). And I too have suggested that Bind (at least reflexivization and control) are really aspects of I merge (and given that I find Agree (the long distance operation) more or less a notational variant of I-merge (which is why I never liked the latter) still the idea that these are related, at least for reflexivization, is now widely assumed. As for Probe, I am not sure that it is not an operation. You need to probe and IF YOU FIND a suitable goal then you need to agree. Feature matching/agreement/valuation/checking is separate from finding something to agree with. Of course we can compile these into "one thing" but the sub operations seem distinct and worth keeping apart. I agree about obviate, but I noted it because it lies behind whatever we think principle B is. This has been less successfully integrated into a merge/agree system than has reflexivization. But, this said, the points you make are welcome and has cleaned up a pretty sloppy list.