Sunday, August 24, 2014

Cakes, Damn Cakes, and Other Baked Goods

As promised in my previous post, here's Omer's reply to my musings on derivations and representations.



In a recent post on this blog, Thomas Graf addresses the derivationalism vs. representationalism debate---sparked (Thomas' post, that is) by my remarks in the comments section of this post.

Thomas notes, among other things, that literally any derivational formalism can be recast representationally. For example, one could take the set of licit derivational operations in the former model, and turn them into representational well-formedness conditions on adjacent pairs of syntactic trees in an ordered sequence. (Each tree in this ordered sequence corresponds, informally, to an intermediate derivational step in the former model.) As best I can tell, this translatability of any derivational formalism into representational terms is not even debatable; I certainly wouldn't argue this point.

Here's where things start to fray, though:
So why is one inherently derivational, and the other one is not? [...] The difference is only in our interpretation of these [formal devices]. But judging a technical device by our interpretation of it is rather peculiar because, in general, there's no bounds on how one may interpret some formal piece of machinery.
I agree wholeheartedly with the last part---namely, that when discussing the interpretation(s) that can be assigned to a particular formalism, one is usually dealing not with what can be proven, but with what is plausible, reasonable, or even just conceivable. And this means that one brushes up against matters of opinion, personal preference, and, dare I say, scientific taste. To make matters worse, we linguists---myself obviously included---are not always as terminologically hygienic as we should be in making sure to avoid the "proven"-style lingo when addressing these matters.

But at the risk of attributing to Thomas sentiments that he may not hold, I read this passage as a suggestion that we should therefore avoid entirely the question of how we interpret our formalisms, and instead stick to the formalisms themselves. On this, I could not disagree more. Obviously, it would be more straightforward if things weren't so darn murky; but in my view, the point at which we stop caring about how we interpret these formalisms is the point at which we stop being cognitive scientists, and instead become something more like mathematicians [in the broad sense, the way a graph-theorist can be thought of as a mathematician]. To put it another way: as a linguist, I am not interested in formalisms qua formalisms; I am interested in them as idealized versions of computations in the mind. Consequently, I am fundamentally interested only in those formalisms for which there exists at least one plausible cognitive interpretation.

This means that it is entirely possible that two formalisms X and Y would be mathematically equivalent---let's say X and Y are identical in both their weak and strong generative capacities---and yet only one of them would be considered a plausible linguistic theory.

Moving to a specific example, I see no reasonable cognitive interpretation for the "ordered n-tuple of representations" formalism, and thus---speaking only for myself here---I consider it largely irrelevant to linguistics (at least until someone can show me a reasonable cognitive interpretation of it).

And this brings us back to the derivationalism vs. representationalism issue. I have stated, in the remarks linked to above, that I find the "generate-and-filter" grammatical architecture to be bankrupt, and that I was disheartened that Chomsky seems to be veering in that direction (yet again). How can that be squared with the aforementioned, not-even-debatable translatability of derivationalism to representationalism? Simple. The only generate-and-filter architecture that I find cognitively plausible---and with this, I think Chomsky would concur---is one in which syntax (what Chomsky would call the "computational system") does the generating, and the interfaces (LF/PF) do the filtering. Alas, this system simply doesn't get the facts right; I won't rehash the argument here, but it boils down to the fact that---contrary to Chomsky's own claims---the obligatoriness of syntactic agreement cannot be reduced to anything the interfaces should be paying attention to (interpretability/uninterpretability or particular interpretations on the LF side, overtness/non-overtness or particular phonological content on the PF side). So, since the only cognitively plausible generate-and-filter system doesn't work, I consider the generate-and-filter architecture to be bankrupt.

Is this an "I have never baked a good cake so there are no good cakes" argument? I'm not sure. Bankruptcy is temporary (just ask Donald Trump!), and as I said above, I'm willing to have my mind changed if someone gives me a working generate-and-filter architecture that has at least one plausible cognitive interpretation. But until such time, forgive me if I shop for cakes at the bakery, a.k.a. the derivationalism store ;-)

47 comments:

  1. @Alex D: It might well be that some linguists, including Chomsky, approach this issue very carefully, and Omer also seems to belong to this camp. Still, there's plenty of discussions in the literature that aren't as meticulously hedged. Intros to HPSG and LFG often present the representational nature of these frameworks as an advantage over transformational approaches. There's the back-and-forth between Brody on the one hand and Epstein and Seely on the other regarding whether "true" Minimalism ought to be representational or derivational. And similar debates can be found in morphology and phonology --- it's not just syntacticians who feel strongly about these things, and many seem to feel very strongly about it (see e.g. this paper by Pullum and Scholz).

    Regarding your remarks on the parsing argument: Full ack!

    ReplyDelete
  2. @Omer: I'm happy to see that you consider this at least partially a matter of scientific taste. I don't think this is usually acknowledged in the literature. So here's where I think we disagree:

    1) While scientific taste is important in shaping our intuitions of a formalism, how it works internally and how it can be modified, it is not a sufficient reason for discarding an analysis or particular formalism. Just because Einstein doesn't believe that God does not play dice is not sufficient reason for him to reject quantum physics. He might consider the project misguided and decide not to follow it at all, but he can't use his personal taste as an argument against it in an academic discussion. In linguistics, however, some people do reject analyses according to whether they fit their idea of what is natural.

    2) The fact that linguistics is a cognitive science changes nothing about point 1 because a) we know to little about cognition to know what is natural, and b) linguistics deliberately operates at a high level of abstraction. The mantra to keep in mind is that specification does not equal implementation. As Alex D points out, a formalism that is specified in one way can be implemented in a completely different way.

    3) Being interested in the cognitive side of syntax does not entail that one must be a notational realist. Questions of memory usage and parsing complexity, for instance, can be addressed in a meaningful way without being particularly attached to how the formalism is defined. In fact, this is the standard road to take because --- as is so often the case --- abstraction makes things a lot easier at no significant cost (what you abstract away you can throw in again later once you have a better grasp of what's going on).

    I'm also flummoxed by your sentiment that the only generate-and-filter architecture that I find cognitively plausible [...] is one in which syntax (what Chomsky would call the "computational system") does the generating, and the interfaces (LF/PF) do the filtering. What is it that makes this model plausible, what is it that makes all other models implausible?

    In particular, consider the following thought experiment: I could just as well interpret the T-model in such a way that syntax as such does not exist but is just an abstract description of what PF and LF have in common and how their respective data structures are linked to each other (this is basically the bimorphism interpretation of the T-model). So syntax is merely a convenient middleman and we could just as well map LF and PF directly to each other. In such a setup, I can view PF as generating and LF as filtering, or the other way round. I don't see how that is any more or less plausible than the standard pipeline via syntax. If anything, it is much closer to how psychologists think about comprehension VS production.

    ReplyDelete
  3. @Thomas: Let me take your points in turn.

    1. The question – and this comes up again immediately below, when we get to (2) – is how little do we know about cognition, and whether it's sufficiently little to be ignored completely. I think the answer is, "we know quite little, but not nothing." And to me, that makes "X does/doesn't fit with what we know about cognition" perfectly admissible in an academic discussion. To reiterate something I said in the body of this post: however little we may know about cognition, there is a leap between that and the claim (which you seem to be implying, if not making) that we therefore shouldn't pay attention to it at all.

    2. I'm well aware that specification does not equal implementation, which is why I phrased my requirement existentially: show me at least one cognitively plausible implementation. Alex D.'s example concerned whether a person writing a compiler for the C programming language should care if the grammar of C is conceived of derivationally or representationally. To me, this misses the point entirely. C is not a facet of the natural world; natural language is. The way people program in C? Now that is a facet of the natural world. And that is very much a sequential, if not-entirely-rational, derivational process. So to circle back, if someone offered me a model of "how humans program in C", and that model did not have at least one implementation that corresponded to what people actually do, I would reject it. Here's where you might be saying to yourself, "But we know far too little about what people do during comprehension/production of natural language." This brings us back to (1) and the distinction between "very little" and "nothing at all."

    3. Notational realism indeed mixes specification with implementation; see (2) and the existential, rather than universal, requirement I suggested on cognitively plausible implementations.

    Finally, regarding the T-model and your suggested thought experiment: the reason Chomsky's generate-and-filter proposal is the only one I consider plausible (setting aside its empirical inadequacy for the moment) is because it locates the property that is unique to language---discrete, structured infinity---in the language-specific part of the model. PF (the interface of syntax with PHON) and LF (the interface of syntax with SEM) are themselves mappings between language-specific machinery and domain-general machinery. It's not that hard to imagine other species---e.g. other primates---having PHON and SEM; so from the perspective of cognition, we need it to be the case that a mapping between those two is crucially insufficient to generate discrete, structured infinity. If we have a theory of PF/PHON and LF/SEM such that a mapping between those two is sufficient to generate discrete, structured infinity, then we know it's the wrong model! (I'm starting to sound like a broken record here, but what the heck: the model I'm declaring wrong here might be equivalent, formally speaking, to the standard, derivationally-interpreted T-model; but the latter is cognitively sound, and the former, not. And to me, that's what matters.)

    ReplyDelete
    Replies
    1. @Omer: My list of replies isn't quite as neatly organized, mostly because I feel that there are fundamental axioms driving your arguments that I'm not aware of (in particular regarding the notion of cognitive plausibility), so all I can do is pick out isolated points that I don't understand or agree with.

      however little we may know about cognition, there is a leap between that and the claim (which you seem to be implying, if not making) that we therefore shouldn't pay attention to it at all.
      there is also a middle ground where we pay attention and nonetheless acknowledge that we don't know enough yet to address particularly subtle issues such as representational versus derivational (I'm partial to the stronger claim that no amount of evidence can tell those two apart because they don't describe different objects, but I'm willing to adopt the weaker stance for now in an attempt to find some common ground).

      C is not a facet of the natural world; natural language is. The way people program in C? Now that is a facet of the natural world.
      I'm not quite sure what the intended meaning of natural world is here (C is just as real as groups, lattices, diseases, or the language faculty in that it is an abstract concept with concrete realizations in the real world). Leaving philosophy aside, though (something I'm more than eager to do), I have the impression that you're playing fast and loose with the competence-performance distinction in your example. I think Alex D's point is that given such a distinction, the connection between the two is sufficiently loose that the specification of one cannot meaningfully restrict the specification of the other.

      if someone offered me a model of "how humans program in C", and that model did not have at least one implementation that corresponded to what people actually do, I would reject it.
      So there's some very fuzzy terms here, in particular "correspond". What does it mean for a model to correspond to what people actually do? I can think of several interpretations, some more strict than others, and my hunch is that this might be one of the issues that our disagreement stems from.

      the reason Chomsky's generate-and-filter proposal is the only one I consider plausible [...] is because it locates the property that is unique to language---discrete, structured infinity---in the language-specific part of the model. PF (the interface of syntax with PHON) and LF (the interface of syntax with SEM) are themselves mappings between language-specific machinery and domain-general machinery.
      But if LF and PF are basically APIs that only mediate between domain-general and language-specific machinery, they, too, are language-specific by virtue of serving no other function. And I can chain those two APIs together without the middleman of syntax (functional composition). So whether you have syntax in your model or not seems to be irrelevant for the argument you are making since there is still a locus for language-specifics and the bimorphism perspective still is perfectly workable.

      If we have a theory of PF/PHON and LF/SEM such that a mapping between those two is sufficient to generate discrete, structured infinity, then we know it's the wrong model!
      But that's not a matter of how you set up the architecture, it's a matter of how powerful your specific machinery is. I can just as well take the standard T-model and restrict syntax in a way such that it cannot give rise to discrete, structured infinity. Since that fact doesn't invalidate the standard T-model, why would the other one invalidate the bimorphism model? Basically, I don't see why granting animals full-fledged human-level PF and LF is plausible whereas granting them full-fledged syntax is ludicrous on conceptual grounds. Your argument is about restricting power, not where the power is located in the architecture.

      Delete
    2. Okay, now we're getting somewhere: "Given [the distinction between competence and performance], the connection between the two is sufficiently loose that the specification of one cannot meaningfully restrict the specification of the other."

      Speaking for myself, an emphatic "no" to this. Theories of performance dramatically restrict the space of possible theories of competence; this has been hashed out on the blog once before, in the comments under this post (search for 'realtime', which for some reason I like writing as one word; I blame linux kernel naming conventions).

      And herein rests the answer to your first point, as well: 'derivational' and 'representational' may pick out the same set of formal objects, but they pick out two different sets of ordered pairs of <formal object, plausible realtime cognitive implementation thereof>. In particular, I would contend that at present, the set of such ordered pairs picked out by 'representational' is empty. This is precisely because competence, while a terrifically useful abstraction, cannot be completely divorced from performance (see the discussion just linked to).

      The discussion of PF/LF and PHON/SEM is getting hard for me to follow (possibly due to confusions that I myself introduced into the discussion), so let me suggest some housekeeping. Let PHON be whatever phonology we share with other primates, and SEM be whatever conceptual structure we share with other primates. And let PF and LF refer to the interfaces of PHON and SEM, respectively, with the language-specific parts of our linguistic infrastructure (incl., perhaps, with one another).

      With that in place, you are entirely right that if what PF and LF add above and beyond PHON and SEM is present only in humans, then those can be the source of discrete infinity. But then PF and LF can't just be filters; there has to be a part of our cognitive apparatus that is in charge of generating what PF and LF filter, if the system in its entirety can be put to realtime use. Either "hidden" in PF, "hidden" in LF, "hidden" in both, or in some third place. I contend that at this juncture, we've been reduced to arguing about terminology: take whatever the component in charge of generation is, and call it "syntax" – what you have is the derivational interpretation of Chomsky's T-model.

      Delete
    3. @Omer: Not exactly the shortest discussion, but I think I got the gist of it. Mind you, I don't agree with it at all, and I think Dennis O's remark is right on spot: I frankly don't understand what it means for a competence model to be "unsuited for realtime use," given that the system is by definition not one that operates in real time.

      Your reply in terms of grammars G and G' as distinct objects one of which is the competence grammar and the other one the realtime performance grammar does not gel with how one constructs a parser. Given the competence-performance distinction, the only requirement is that the parser recognizes all expressions licensed by the grammar, and only those. How this is done in real-time is up to the parser. Since the link between the two is only in terms of generated structures, whether syntax is derivational or representational is irrelevant. The parser doesn't need to use the same architecture. That's not about having different grammars, that's about operating at different levels of abstraction, and it's perfectly compatible with the fact that syntactic notions such as c-command seem to matter for processing.

      Besides this fundamental disagreement, there's also a simple practical question: what is a concrete example of a fully specified parser (as in: sound and complete) that can only operate with a derivational/representational model of syntax?

      Delete
    4. Yes, well, it's precisely this statement of Dennis O.'s that I disagree with. I think the ontology that you and Dennis (and many others) are using, while certainly an internally coherent one, is a poor fit for cognitive science, or at least for linguistics. Take something like islands. It seems the signature of filled-gap effects that one finds in online processing is different depending on whether you're currently parsing material that's in an island or not. That means that there is something in the (human) parser that gives rise to the equivalent of island effects. Now, that something can be the same thing that gives rise to island effects in the grammar, or it can be something different – unique to the (human) parser.

      The position articulated by Dennis O., as I understand it, is that it's incoherent to ask whether the same fundamental property is responsible for islands "in the parser" and "in the grammar", since the two are ontologically different entities. I say: too bad for that ontology. Let's replace it with one where it makes sense to ask the question – "Is the property responsible for island effects in online parsing the same one responsible for island effects in the grammar?" – where I take "the grammar" here to be responsible, at a first approximation, for the acceptability judgments people give when afforded unlimited time.

      Since we find the same effects ("islands") in both, an ontology where this question can be asked is a priori preferable to one where it can't, because you have at least a chance of a unified explanation.

      Delete
    5. is the property responsible for island effects in online parsing the same one responsible for island effects in the grammar?
      Actually this kind of question only makes sense if you think of grammar and parser as cognitively different objects. If you think of it in terms of levels of abstraction, Marr-style, such that we have something like neural computation > parser > grammar, the answer is trivially yes given a suitably abstract understanding of what it means for two representations to share a given property.

      But irrespective of how one answers the question, doesn't the fact that you need such specific, wide-reaching, and ultimately contentious assumptions to get to the point of "derivationalism beats representationalism" suggest that the basic line of reasoning is very unstable?

      To go back to the quantum theory example, I'm perfectly fine with somebody presenting the above as their personal argument for why they prefer to think of syntax in derivational terms and why they believe this to be a more fruitful perspective for their research interests. But for scientific discourse, where intersubjectivity reigns supreme, there's just too much personal taste (e.g. "cognitively plausible") and ontological commitments (relation between grammar and parser) in it to make the argument convincing, let alone conclusive (and frankly I think if you keep pushing your own line of reasoning you'll eventually wind up with something inconsistent, too). Yet there are linguists who think that this issue is clearcut (I mentioned some of them in my reply to Alex D).

      Delete
    6. I think your evaluation if this line of reasoning ("specific, wide-reaching, and ultimately contentious") is just as much a matter of scientific taste as the line of reasoning itself is. I am tempted to say something reductionist here – along the lines of, "you just can't get away from scientific taste!"

      But I have two more tangible questions:

      1. On the ontology that you adopt, and Dennis O. has defended, what would an explanation look like for the fact that both online parsing and the "grammar" (as in, the abstract entity that formally defines the set of acceptable sentences) exhibit island effects?

      2. Why – or, where – do you think this line of reasoning (i.e., the one advanced in my previous comments) winds up with something inconsistent?

      Delete
    7. 1: Just for the record, I'm actually agnostic regarding how one should think of the grammar-parser relation. That being said, under the Marr-style perspective you only need an explanation at one of the levels since every level is talking about the same ontological object, just at different levels of abstraction. So if you have a competence story, great, if you have a performance story, great, if you have a neural story, also great.

      2. I think the problem is that you need to weaken the competence-performance distinction to get around the simple argument that grammar and parser only need to agree on what structures are well-formed or ill-formed, yet at the same time you want to keep this distinction to actually justify why you are describing grammar mechanisms in the first place. Pinning down the middle ground between the two is where things get hairy if one tries to spell it out in precise terms. Basically you need a link that's tight enough to prefer derivational over representational but without prefering top-down generation over bottom-up (for otherwise your derivational formalism would be just as bad as a representational one according to your criteria).

      Delete
    8. 1. I honestly don't see how the Marr-levels view serves to explain why incremental parsing shows island effects. If the parser is an algorithmic implementation of the grammar, all that needs to happen is that at the end of said algorithm, the "yes" or "no" result spit out matches what the (more abstract) grammar's verdict is.

      Maybe I should clarify what I think a better explanation would look like. Suppose "grammar" is a procedural mechanism that constructs linguistic structures; and that, as incremental parsing progresses, this "grammar" procedure is repeatedly deployed to build a (partial) structure for the (partial) set of terminals encountered so far. On this view, it is nearly inevitable that even incremental structure-building will be subject to the same whims (e.g. islands, c-command) that govern fully formed linguistic structures.

      2. This same model I just sketched is compatible, as far as I can tell, with a bottom-up derivational engine that can operate in realtime, and is put to use by performance systems such as incremental parsing. It might not be your cup of tea – and I already admitted to taste being at play – but I don't think it's internally inconsistent.

      Delete
    9. 1. That model is perfectly compatible with a Marr-style interpretation: there is a procedural mechanism for incremental parsing, and what we call grammar is the result of abstracting away the incrementality from said mechanism. You're just phrasing it in the other direction, starting with the grammar as the first primitive and assuming that this object is somehow a distinct subroutine of the parser.

      2. First of all, incremental bottom-up parsing is not a trivial affair on a technical level --- irrespective of grammar formalism --- and given what we know so far it is unlikely to be psychologically correct (cf. Resnik's memory-based arguments). But in principle, yes, your model is perfectly compatible with a derivational bottom-up approach.

      But it is also compatible with a generate-and-filter model, e.g. via incremental intersection parsing. Suppose your input string is w_1 ... w_n. For each prefix p[i] := w_1 ... w_i, the language induced by p[i] is the set L(p[i]) of trees whose string yield has p[i] as a prefix. The set of structures entertained by the parser is the intersection of L(p[i]) and each L(F), where L(F) is the set of trees that are well-formed with respect to filter F. More precisely, if the parser is at w_i, it just has to intersect the set of previously entertained structures with L(p[i]). At the end of the parse, you apply a minimality criterion to filter out those trees that are proper extensions of the input string.

      If you want to block that kind of parser, you need something stronger than your model in 1. And I'm pretty sure that any set of restrictions that is strong enough to block the parser above will be too strong for what you want to do.

      Anyways, I'm gonna put our discussion on a one-night hiatus at this point, gotta hit the sack. I'm looking forward to more witty banter tomorrow ;)

      Delete
  4. I'm not sure I see the force of Omer's argument from island effects. The parser's job is to figure out whether any given input has a parse that respects the rules and constraints of the grammar. This being so, it is no surprise that the parser typically postulates gaps only in configurations where the grammar permits them. That is just how a well-designed parser should behave[*]. Is the "same property" responsible for island effects in offline and online processing? I'm not sure that the question is clear enough to be given a definite answer. But I don't see that there is any interesting sense in which a story along these lines fails to offer a unified account of island effects. Island effects are the result of grammatical constraints, and these grammatical constraints are respected by the parser because it is a parser for the grammar in question.

    Theories of performance dramatically restrict the space of possible theories of competence

    Ok, but how? A lot of people seem to have the intuition that generate+filter architectures would make parsing difficult, but as far as I can tell this is just an intuition. This was the point of my C example. When you actually sit down to write a parser, it really doesn't seem to make much difference whether the grammar was specified this way or that way.

    [*] That the parser is in fact well-designed may still be surprising.

    ReplyDelete
    Replies
    1. @AlexD says: "The parser's job is to figure out whether any given input has a parse that respects the rules and constraints of the grammar. This being so, it is no surprise that the parser typically postulates gaps only in configurations where the grammar permits them. That is just how a well-designed parser should behave."

      As far as I can tell, your statement only makes sense if you assume my position, in the first place. If the parser can evaluate whether "a parse respects [the rules and constraints of the grammar]", then there must be at least one implementation of [the rules and constraints of the grammar] that is procedural, since the parser---in which this implementation of [the rules and constraints of the grammar] is embedded---operates in a realtime, online fashion. And all I've been saying all along is that we should demand of our theories of the grammar that they have at least one implementation that is plausible as a cognitive procedure.

      Delete
    2. It's just part and parcel of what a parser is that it is sensitive to the constraints imposed by the grammar. If it were not then it would not be a parser for the grammar in question. Thus, if island constraints are among the constraints imposed by G, then a parser for G -- if it is indeed a parser for G and not for some other grammar -- will necessarily respect those constraints. Furthermore, an efficient parser will not entertain candidate structures which are not licensed by G. So, an efficient parser will not entertain candidate structures which violate island constraints.

      The parser itself must of course be procedural and cognitively plausible. However, there appears to be no particular barrier to specifying cognitively-plausible parsing procedures for grammars with generate+filter architectures. So your demand, while reasonable enough, does not appear to give rise to any special difficulty for generate+filter theories.

      Delete
    3. My claim was not that generate-and-filter grammars can't have equivalent specifications that are derivational/procedural. As a matter of fact, I began this very blog post with the (admittedly trivial) point that every derivational formalism has a representational counterpart. So it follows that at least some – if not all – representational formalisms will be implementable procedurally (i.e., derivationally).

      My point was that I am only interested in the derivational/procedural versions, because they are the only ones that strike me as cognitively plausible – and here, the data from incremental parsing is relevant. (Again, the post acknowledges that such a derivationally/procedurally-specified grammar will have representational equivalents as far as its weak & strong generative capacities; that is not at issue here.)

      And with that, I'm also signing off for the night.

      Delete
    4. Right, but what I'm asking is why you think that only the derivational/procedural versions are compatible with what we know about incremental parsing. In other words, is this just a hunch which you are perfectly entitled to but which proponents of representational theories can feel free to ignore, or are representational theories really no longer viable?

      Delete
  5. @AlexD: It's neither. The point is a methodological one: let TG be the "true grammar" (that thing that we linguists are trying to get at). Given that we know that the grammar, or something extensionally-equivalent to it, is put to use in incremental parsing; and given that this must happen procedurally, pretty much by definition; it follows that among the different equivalent specifications of TG, at least one will have to be implementable procedurally, as a computation that humans perform online.

    Now, if you show me a generate-and-filter grammar, I don't know how to easily tell if it has a procedural implementation that is cognitively feasible (it's not enough that it has *a* procedural implementation; for all I know, that procedural implementation could still involve, say, going through an enumerable infinity of candidate outputs until one is encountered that satisfies all the output filters). For example, I for one have no idea whether "intersect with each L(F), where L(F) is the set of trees that are well-formed with respect to filter F" (from Thomas' comment above) is something that can be done in reasonable procedural time. Maybe some people who are more talented than me can look at a generate-and-filter grammar and tell, just by eyeing it, whether it will lend itself to an adequate procedural implementation. But here's a more direct research strategy: deal with the procedural variant in the first place.

    (One possible exception to all of this, discussed in the body of the post, is a generate-and-filter architecture where syntax does all the generating, and the interfaces do all the filtering. This model is considered by some to be cognitively plausible even if it's not strictly procedural – where "some" includes Chomsky, first and foremost; and initially, me as well – though I must admit that as the discussion here progresses, I'm becoming less convinced of the cognitive viability of even this generate-and-filter model. But since this interface-driven version fails observational adequacy, the discussion of whether it's cognitively viable or not is moot.)

    ReplyDelete
    Replies
    1. What about the generate and filter grammar where the generation procedure is concatenation and the filter is a regular expression? That certainly has a straightforward incremental parsing procedure. I realize that this is a trivial example; it just illustrates the point that there is no general difficulty with coming up with efficient parsing procedures for generate+filter grammars. Some generate+filter grammars will be hard to parse and others will be easy. Possible future discoveries notwithstanding, there appears to be nothing general that can be said here.

      Delete
    2. Sorry for the multiple replies. I just wanted to add, echoing Thomas, that the mere existence of a procedural implementation of the grammar is of very little help in coming up with an efficient parsing algorithm that doesn't backtrack excessively (as you yourself point out). The flip side of this is that it is trivial to convert a grammar specified in terms of constraints into a simple backtracking parser. Just write down the grammar in Prolog and you get a backtracking procedural implementation of it for free. There is no particular reason to think that grammars specified in terms of a derivational procedure will more naturally lend themselves to an efficient procedural implementation; and merely deriving a procedural implementation (efficient or otherwise) is easy to do given any kind of grammar.

      Delete
  6. Interesting discussion: just a clarification question as I realize I don't really know what a generate and filter model looks like.

    Are there any formally explicit examples of a generate-and-filter model in the literature?
    If not what would a g and f model of say a simple CFG look like?
    Does model-theoretic syntax (Pullum Rogers Scholz style ) count as a g 'n' f model?

    ReplyDelete
  7. @AlexC: As usual, I can speak only for myself; my use of "generate-and-filter" is meant to contrast with "crash-proof" (in the sense used by Frampton & Gutmann).

    @AlexD: We continue to speak (write) across purposes. It is abundantly clear that there are representationally-specified grammars for which a straightforward procedural implementation exists. In your concatenation-as-generation/regular-expressions-as-filtration example, this is because regular expressions can be evaluated procedurally – say, using finite state automata. My point was not "every generate-and-filter architecture can't be proceduralized"; it was "I see no reason to believe that a priori, any generate-and-filter architecture can be (easily) proceduralized." If you accept the latter, then it makes some methodological sense to stick to a procedural specification of the grammar.

    ReplyDelete
  8. @Alex: As far as I'm concerned Generate-and-filter includes the trivial case where the grammar generates Sigma^* (string, tree or graph alphabet) and the filters do all the work. So one can drop the generate-part and include any constraint-based formalism, including model-theoretic syntax. For CFGs any constraint-based formulation will count as generate-and-filter (e.g. the logical specification of CFGs as strictly 2-local tree languages). One can adopt intermediary positions --- the grammar generates all binary branching trees, constraints filter out illicit labelings --- but that's just pushing the workload around as one sees fit. And the model-theoretic definition of MGs in my thesis would count as generate and filter, too.

    In the specific case of Omer's interpretation, generate and filter means any system where the grammar can fail to generate a well-formed output structure (illicit derivation or because well-formedness is constrainted by filters a posteriori). Once again this can hold of MGs if one makes Merge free and implements the feature calculus as tree geometric constraints. Just pushing the workload around.

    @Omer: From over here it looks like you're overly eager to assume that derivationalism means implementability. Why is there no reason to believe that a priori, any generate-and-filter architecture can be (easily) proceduralized, but the same isn't true for derivational architectures? The system I described above works for every reasonable generate-and-filter model (reasonable = the notion of well-formedness with respect to constraints is definable, as is set-theoretic intersectoin). It won't be efficient unless all constraints are finite-state --- which is plenty for syntax and includes even transderivational constraints --- but if efficiency is a concern to you, then derivationalism is just as bad off, for otherwise we would have efficient parsers for recursive grammars (recursive in the formal language theory sense as a subclass of type-0 grammars).

    ReplyDelete
    Replies
    1. That's a very fair point, Thomas. Let's not forget, though, that parsing is just half of the equation of how grammar is put to online use; there's also production. Even if derivational architectures and representational ones currently tie on the "can we construct an efficient parser" front, can it not be the case that derivational ones fare better vis a vis production? Suppose that they do; now we have a choice point (scientific taste again!) – for some, the aforementioned tie will be enough to render the two equal (call this the "I care about the worst case" camp); for others, the preferability in production is enough to consider the derivational class tentatively better, and hope that the parsing technology will catch up (call this the "partially better is still better" camp). I pledge allegiance to the latter.

      Of course, you can reject the premise – that a crash-proof grammar is preferable even when looking only through the prism of production – but if you accept it, I don't think the "partially better is still better" position is an incoherent one.

      ––––––––––––––––––––

      As a side note, let me say that I'm finding this exchange extremely useful! I feel I'm learning a lot from the comments other people are posting here – and while I may or may not be convincing anybody of anything, I find this a very useful exercise, if only in refining my own views and clarifying (to myself, at least) what is at stake. So thank you!

      Delete
    2. @Omer: I'm also having a good time. See, I told you it would be a fun discussion :)

      As for production, that case is completely parallel. Parsing is the process of mapping a sound string to its syntactic structure, production is the process of mapping a logical formula (or whatever you think happens after LF) to its syntactic structure.

      The only reason production is considered more complicated is because we have a good idea of what the input is for parsing (strings plus prosodic phrasing, the latter of which is usually abstracted away from), while it is unclear just what production starts with. If production for you is simply mapping an LF tree-structure to the corresponding pronounced string, you can still use the model above (only the notion of prefix has to be defined differently).

      Delete
    3. @Thomas: Suppose we go with an almost cartoonishly realist interpretation of older, Greed-based minimalism. You start with a lexical subarray – the terminals you are going to combine. And you start Merge-ing and Move-ing, but the grammar is crash-proof, so you are guaranteed that the outcome will be a well-formed structure of English. It might not be the one you "intended" (you might select "John, T[past], see, Mary" meaning to say "John saw Mary" but end up with "Mary saw John"), but what you don't have to worry about is ending up with "[[[John Mary] T[past]] see]", etc.

      Is this not computationally simpler than selecting the terminals "John, T[past], see, Mary" and having to deal with the many, many non-convergent ways a derivation involving these terminals might proceed?

      Delete
    4. @Omer: I don't know what notion of computational simplicity you have in mind (efficiency?), but I think your answer cannot be answered unless you specify every single aspect of your grammar and parser. Given the assumptions that have been introduced in the discussion so far, neither is obviously preferable to the other.

      It also seems like you're shifting the goal post. The criterion you set out is cognitive plausiblity, arguing that generate-and-filter models cannot be employed in the way you outline as cognitive plausible. I showed that they can, and we also agreed that computational efficiency is not a valid criterion here because then even crash-proof derivational frameworks can fail just as bad as the representational ones. But now all of a sudden w'ere talking about some notion of computational simplicity of two derivational models, one crash-proof, the other not, without any clear context of evaluation. In short: I'm really not sure how this question factors into the previous discussion.

      Delete
  9. Like Alex and Thomas, I do not understand how the requirement of plausible parsing models bears on the question of whether we should prefer derivational or representational interpretations of a formalism. As has been said, when you have a working parser, there doesn't seem to be any sense in which it uses (or "corresponds to" or whatever) one interpretation of the formalism and not another.

    Take for example a CFG. We can identify both representational and derivational interpretations of the rules of a CFG: a rule like "VP -> V NP" might be understood representationally as a constraint on static trees to say "one way for a VP to be well-formed is for it to have exactly two daughters, a left daughter labeled V and a right daughter labeled NP"; or it can be understood derivationally to say "rewrite VP as V NP". (Or it can be understood in a bottom-up derivational way, to say "combine a V with an NP to make a VP", or any number of other ways.)

    So let's suppose that our combined theory of a certain creature's use of language ("combined" in the sense of roughly competence plus performance) consists of a certain CFG plus the left-corner parsing algorithm. Is there any sense in which this combined theory involves one of the interpretations to the exclusion of the other? I can't see how to make sense of this. And accordingly, I don't see how any facts about how this creature goes about parsing a sentence can bear on whether we should prefer the derivational or representational interpretation of the CFG. I can certainly imagine discovering that, say, the CFG should be paired with the bottom-up parsing algorithm rather than the left-corner algorithm, or whatever. But no matter which of those parsing algorithms you choose, what they care about is just the stuff that all the interpretations have in common.

    Similarly, I don't really understand this question:
    there is something in the (human) parser that gives rise to the equivalent of island effects. Now, that something can be the same thing that gives rise to island effects in the grammar, or it can be something different – unique to the (human) parser.

    Again, to be concrete, let's take a full CFG:
    S -> NP VP
    NP -> Det N
    NP -> John
    Det -> the, a
    N -> cat, dog
    VP -> ran
    VP -> V NP
    V -> chased

    A creature that was using this grammar in combination with, say, the left-corner parsing algorithm, would consistently treat sentences that put the verb before the object as well-formed, and would consistently treat sentences that put the object before the verb as ill-formed, and it would do this whether it was performing an "acceptability judgement" or any sort of fancier experiment. Is it "the same thing" that gives rise to this verb-before-object property in both acceptability judgements and in parsing? I'm not sure I understand the question. If pushed I would be inclined to say yes, it's the same thing in both cases, namely the rule "VP -> V NP". But I get the impression this is not answering the question people have in mind.

    Of course the same thing goes for any other fact about what the CFG above licenses and what it doesn't. The fact that sentences are no longer than five words will show up in both acceptability judgements and in time-sensitive experiments; so will the fact that any word that follows 'the' will be a noun; etc. Is it the same thing that is responsible for these effects showing up in both kinds of experiments? I'm not sure. If someone can explain the sense in which it is or isn't the same thing in this simplified case, then I might be able to understand the analogous question about island effects, but for the moment I don't get it.

    ReplyDelete
  10. @Tim: This is a useful example. My reply is that both the bottom-up and the left-corner parsing algorithms *are* nothing but (different) derivational interpretation of what a CFG is. In fact, one of them is included in the examples you provided for what a derivational interpretation of a CFG would look like ("combine a V with an NP to make a VP"). So what I'd say about this example is that there are parsing algorithms for CFGs by virtue of CFGs having a concise derivational interpretation (or multiple such interpretations).

    ReplyDelete
    Replies
    1. Maybe the bottom-up parsing algo exists by virtue of the fact that there is the "combine V with NP to make VP" interpretation of CFGs, but there doesn't seem to be any other derivational interpretation of CFGs by virtue of which the left-corner parsing algo exists. Unless of course we want to just do away with the distinction between a derivational interpretation and a parsing algorithm. Is the disagreement just this terminological one?

      I can sort of see the logic for conflating these notions (in the case of bottom-up parsing algos they generally coincide quite closely), but there do seem to be derivational interpretation of formalisms that are useful for reasons that have nothing to do with their empirical plausibility as parsing algorithms. For example, what should we do about the standard bottom-up derivational interpretation of minimalist grammars --- does the fact that this seems like a poor candidate as a parsing algorithm make it less appealing as a derivational interpretation of the formalism?

      Delete
  11. @Thomas: I did not purport to have a fully worked out theory where every single aspect of the grammar and parser is specified. (If I did, I imagine I'd be busy Scrooge-McDucking into my piles of money instead of participating in this discussion, enjoyable as it may be.)

    Re: goal posts, etc., you and @AlexD are trying – understandably, perhaps – to push the discussion into the realm of proofs and/or refutations; but my original response to your original post already declared plainly that I don't have that, so I'm not sure what exactly has shifted. You say "[t]he criterion you set out is cognitive [plausibility], arguing that generate-and-filter models cannot be employed in the way you outline as cognitive[ly] plausible. I showed that they can." On this we disagree: what you have shown is that (some) generate-and-filter models have procedural equivalents that are cognitively plausible (remember that we disagree on the ontology). That, too, is a point I conceded from the start.

    @Tim: you ask, "what should we do about the standard bottom-up derivational interpretation of minimalist grammars --- does the fact that this seems like a poor candidate as a parsing algorithm make it less appealing as a derivational interpretation of the formalism?"

    It absolutely does! I think we should worry about it. (I remember a Phillips guy once wrote an entire thesis related to this.) That I don't have a solution at hand does not mean I don't think it's a serious problem that should keep syntacticians up at night.

    ReplyDelete
    Replies
    1. @Omer: what you have shown is that (some) generate-and-filter models have procedural equivalents that are cognitively plausible
      Again, the incremental intersection parser works for every generate-and-filter model where the filters are decidable (one can tell for every tree whether is passes the filter). As a nice bonus it is efficient for every model where the filters are finite-state, and pretty much all syntactic constraints in the literature satisfy this criterion. The architecture of this parser follows the idea you sketched, i.e. a parser that calls the grammar as a subroutine for building the structure.

      Your initial post carved out the criterion that for every paradigm (derivational VS representational) and every grammar formalism thereof we need at least one readily available, cognitively plausible interpretation if we want derivational and representational to be on equal footing. Our discussion brought to light what you consider cognitively plausible, and the intersection parser fits the bill by virtue of mirroring a model you consider plausible. Since you disagree with me, you disagree with at least one of these statements, but I can't tell from your questions about production and crash-proof VS crashing Minimalism which statement in particular you're homing in on.

      Delete
    2. I disagree with the claim that a procedural implementation of the incremental intersection parser is in any meaningful sense representational. You use L(F) (the set of trees that satisfy linguistic constraint F) and L(p[i]) (the set of trees whose string yield has p[i] as a prefix) in your description of such a parser; but neither of those entities would actually take part in a procedural implementation of the parser, since each such set is infinite. (Much like, say, no actual implementation of is-this-number-prime actually checks set membership in the set of all prime numbers.)

      The reason I keep bringing up the ontological issue is that I can already see the Marr-inspired rebuttal whereby "intersect with each L(F), where L(F) is the set of trees that are well-formed with respect to filter F" is the description of the computation but I am nitpicking regarding the algorithmic level. As I said before, if the incremental parser is the algorithm-level abstraction and the grammar is the computation-level abstraction, then as far as I can tell the only thing entailed is that the final output of the former match the final output of the latter for every input; it doesn't entail that intermediate steps in the parsing algorithm would somehow align with the final output of the grammar. (Alex D., if I understand him correctly, thinks this is a natural outcome of the parser being well-designed for the grammar – which I think is *at least* as vague as anything that I have said in this discussion.)

      Getting back to the intersection parser, if it can actually be implemented algorithmically, it means there is an equivalent characterization of the computation it performs that is derivational; and hence, an intersection parser is representational only in the very trivial sense described at the top of this post (namely, that every derivational model has a representational counterpart).

      I am getting the feeling that we are starting to go around in circles; not a criticism of the discussion or its participants, but perhaps a sign that it has run its course. What do you think?

      Delete
    3. Omer wrote: It absolutely does! I think we should worry about [the fact that the standard bottom-up minimalist derivational model is a poor candidate as a parsing algorithm]. (I remember a Phillips guy once wrote an entire thesis related to this.) That I don't have a solution at hand does not mean I don't think it's a serious problem that should keep syntacticians up at night.

      I think the original point of disagreement is almost just terminological then: if the appeal of a particular "derivational interpretation" depends in part on the degree to which that derivational interpretation serves as an explanation of how humans process sentences in real time, then I think we'd all agree that there is a difference between derivational and representational formulations of grammars, just because one is saying more than the other, right?

      There's always disagreement over the deeper point of whether it's valid or useful to leave room in the discussion for a notion of derivation that is divorced from such real-time considerations. But leaving that Pandora's box aside, a view of the grammar based on derivations that are *not* real-time-agnostic like that does seem clearly different from one based on a purely representational system. Does this bring everyone into line (he asked optimistically)?

      Delete
    4. it doesn't entail that intermediate steps in the parsing algorithm would somehow align with the final output of the grammar. (Alex D., if I understand him correctly, thinks this is a natural outcome of the parser being well-designed for the grammar – which I think is *at least* as vague as anything that I have said in this discussion.)

      @Omer: I don't think the point is vague so much as very general: a good incremental parser will postulate only structural hypotheses which the grammar could assign to extensions of the substring of the input string which the parser has currently consumed. It makes no difference whether the parser's intermediate representations are anything like the structures licensed by the grammar. Whatever those representations look like, they implicitly or explicitly encode structural hypotheses, and well-behaved parsers don't entertain structural hypotheses inconsistent with the grammar.

      This leaves two questions: (i) why is the parser incremental, and (ii) why is it good? That it is incremental seems unsurprising for obvious practical reasons. That it is good, on the other hand, is rather surprising, since we could no doubt get by with much less accurate parsing. I think this is where "one system" models appear to have an advantage (the idea being that parsing is necessarily accurate if it uses the very same computational system that is responsible for acceptability judgments). However, I have never understood these "one system" models at more than a metaphoric level. Whenever one tries to specify the details, some kind of grammar/parser distinction inevitably seems to creep in.

      Delete
    5. @Omer: I really feel like you're raising the stakes higher and higher, but fine, I'm game. So rather than just being able to use the grammar directly without any recoding thereof, the parser itself has to reflect the nature of the grammar. Then we run into exactly the problem I mentioned earlier on: your requirements are so high that even derivational theories cannot pass it.

      Because, for instance, a parser cannot simply build the structure bottom-up from the most embedded constituent the way a Minimalist derivation proceeds, the parser must use some kind of hypothetical reasoning, the parser must either entertain multiple structures at the same time or keep a storage of structures that have already been tried and failed, and so on, and so forth. None of that is derivational in the sense used by linguists.

      To me it looks like you're saying that 1) parsing involves computation, 2) all computation is procedural (which is debatable), and 3) procedural means derivational, wherefore 4) all parsing is derivational and thus 5) only derivational formalisms are okay. But that kind of derivationality has nothing to do with derivational VS representational in the linguistic sense.

      As a CS metaphor: According to this logic, all programming languages are imperative because running the program eventually boils down to instructions for manipulating memory registers.

      @everyone: On a slightly different note, we're not restricted to only discussing Omer's position here. Just because he was kind enough to write a reply doesn't mean that he should get all the flak. So if somebody wants to chime in with their ideas about why the derivational representational distinction is still so prominent in linguistics, that would be great. I won't bite your head off *pinky swear*

      Delete
  12. I will take Thomas up on his suggestion to add my .02 to the discussion. The derivation/representation discussion was once easy to settle. Prior to trace theory, there was no way of cobbling together an adequate non-derivational theory. Try stating movement or binding effects in the Standard Theory without traces or some analogue thereof. Given traces (or copies) however, the problem of distinguishing derivational (D) vs representational (R) formats for theory becomes much more difficult, as was noted starting with chapter 6 of LGB (where the two versions were treated as effectively notational variants). However, with the Minimalist turn it might be possible to breath some life into the discussion once again. Here are two ways that the distinction might matter.
    1. If filters MUST BE bare output conditions (BOC), then only semantically/phonologically active "copies" are licit interface objects (given something like the streamlining effects of Full Interpretation). Thus, if one wants to have a "filter" it must receive an interpretation as an interface interpretation effect. This might have teeth. So for example, it is not at all obvious that intermediate traces have semantic/phono effects (i.e. they are not interpretable). If this is so, then locality effects on movement (subjacency or the ECP) cannot be stated as BOCs as the representational residues do not make it to the AP/CI interfaces. Ergo, we are back to something like the Standard Theory world and only a derivational rendering of these restrictions becomes possible.
    2. The most direct argument for Ds within MP involves Merge-over-Move (MoM) economy restrictions. These effectively compare derivational trajectories. MoM assumes that only converging structures are compared, so it is reasonable to assume that the illicit derivations have well formed/interpretable representations. If such derivations are nonetheless illicit, it strongly suggests that the source of the problem is not in the representational format, but in the derivational history. In addition, translation into an R format, does not look at all straightforward, at least to me (especially if one eschews arbitrarily complex indices with no straightforward interface interpretations).

    Now, I know, both these kinds of arguments are currently either out of fashion (MoM) or contentious (intermediate traces might be interface active (think lambdas). But they do suggest that the distinction is not without grammatical interest somewhat independently of the larger issues that Omer has raised.

    Let me end by noting that I am sympathetic with Omer's kind of argument. I understand that the issues are very hard to resolve as they are often theory internal and/or involve as yet unsettled matters in other domains. However, the distinction between D/R formats has been heuristically very fertile. We should care about this. Philosophers like to distinguish the context of discovery from the context of justification. The D/R divide has been hard to justify in many cases, but the two have suggested very different research approaches. Thus, though it has often proven possible to translate results from one format into the other and so justify one as better, the styles of explanation each has prompted seem to me to have lead to different kinds of investigations. Thus, the D/R distinction has proven to me methodologically pregnant. This in itself, IMO, is reason to keep the question on the front burner.

    ReplyDelete
    Replies
    1. @Norbert: Great to see you chiming in.

      Your second point is rather easy to address representationally if you assume that the basic data structure for syntax are what we call derivation trees rather than phrase structure trees. And that's not unreasonable --- after all, phrase structure trees started out as CFG derivation trees. I briefly mentioned this in my original post, because it really highlights that it's hard to pin down what linguists think makes a theory derivational/representational. It seems to be more about "natural" interpretations than the technical devices as such.

      As for your first point, there's of course the problem that there is no easy way of telling which effects are imposed by the interfaces rather than syntax. In particular PF seems to do very little work, and one might take that as an indication that a lot of what we call sytnax is actually PF (in particular if one is partial to the bimorphism perspective of the T-model that I described in a reply to Omer).

      Also, I'm not so sure about your idea that intermediate traces do not matter at the interfaces. Take the infamous case of wanna-contraction (and let's assume that the standard trace VS PRO story is correct). Then *Who do you wanna be likely to leave shows that intermediate traces matter for PF. I'm sure one can find similar cases for prosodic phrasing. And then there's the Fox eamples of pronominal binding where one needs an intermediate landing site, e.g. Which paper that he gave to Ms Brown did every student ask her to read. So if you believe standard binding theory is correct and applies at LF (which admittedly you don't, as far as I know), that's another case where you need something like intermediate copies.

      Those are highly theory-laden arguments, of course, but I think that shows that whatever story one tries to get off the ground regarding derivatonal VS representational depends on a huge battery of assumptions, none of which are particularly well supported. And the more assumptions you have, the more likely it is that at least two of them are incompatible. So at that point one should stop for a minute and ask themselves: what's the payoff?

      Delete
    2. The D vs R "debate" presupposed that derived objects were where the syntactic action was. Grammatical operations were defined to reorganize them. Given this, it is possible to ask whether a given effect should be traced to the properties of the derived object or the properties of the rules that manipulated them.However, once one moves to derivation trees, the argument likely becomes yet more subtle (if tenable at all). I suspect, however, that the effects of derived structure will become important once constraints on movement like ECP and subjacency are plugged into the derivation trees for here what one has built becomes relevant to how one can grow the derivation tree.

      The MoM discussion lives on the prior distinction; it's about rule application, not about the derived objects that the rules give rise to. What you're observing is that it is possible to map a derivation into a structure that represents the history of that derivation and then place restrictions on this representation. I assume that one can, but if one asks what the representation is a representation OF, then it still seems to me that it represents the HISTORY of the derivation and the constraints are derivational constraints. So, if one dumps the intuitive distinction between the objects created by rules and the how the rules apply then the D vs R "debate" will seem pretty useless. I return to this one more time at the end.

      As regards the first point: yes it's tough. But your question is not whether we have solved the concerns but whether the distinction is worth making. I still think it is precisely because the presuppositions it rests on in a given argument in favor of D or R are interesting to investigate. So yes there are subtle empirical questions that arguing for one or another aspect of D vs R must be resolved for the arguments to be dispositive. Isn't this what we should expect for a rather subtle distinction? So, yes they rest on many assumptions, which is why they are interesting for concentrating on these problems makes it possible to investigate these assumptions, many of which are interesting in their own right.

      Last point: as Bill Idsardi pointed out to me, brains use various kinds of codes. Two important ones are rate codes vs place codes. Rate codes track the time course of computations, place codes track their spatial properties. In virtually all cases we can trade rate for space codes and vice versa. This, however, does not gainsay the fact that there is a fact of the matter as to which code is at play in a given circumstance. The D/R debate rhymes with this coding debate. How to decide between the options is very subtle in both cases. This does not mean that asking which is at play has no payoff. For example, under the minimalist interpretation of filters as BOCs it places a rather stringent demand on Rists: to specify what these BOCs are. To date, we have very few specific examples of such that seem even remotely plausible, IMO. This is one of the reasons that, as a methodological matter, I tend to favor, Dish explanations of grammatical phenomena.

      Delete
  13. "The only generate-and-filter architecture that I find cognitively plausible---and with this, I think Chomsky would concur---is one in which syntax (what Chomsky would call the "computational system") does the generating, and the interfaces (LF/PF) do the filtering."

    I think you are right about this, but that you and Chomsky might disagree about where filtering by the interfaces might actually start, or in other words which phenomena count as being filtering by the interfaces. Because I will not try to divine what is in his mind, let me tell you what is in mine. You argue, very persuasively, that uninterpretable features are not a useful concepts and deduce that filtering by LF/PF is unsound. I am entirely convinced by your empirical and analytical justification of the first statement but the conclusion does not necessarily follow. Whatever the probing mechanism, it is a fact that T probes (in fact your own contribution is a refinement of the way T probes). Why should it be so? Why is probing necessary in the first place and why do some functional heads, but not other, probe? A SMT compliant answer to this question seems to be that one of the interface (or both) cannot accommodate purely hierarchical structures and needs supplementary information and that probing is done to provide such supplementary information. In fact, crucial to your account is not only probing but the Person Licensing Condition, but (again in a SMT framework) the PLC is a typical candidate for an LF-interface effect.

    So it might be that the generate and filter model Chomsky has in mind is in a some sense much more primitive than what we typically think of, and entirely compatible with your own views and empirical and theoretical contributions.

    ReplyDelete
    Replies
    1. @Olivier:

      Yes, I certainly concede in the work that you mention that the PLC is a "thorn in the side" of eliminating all "generate-and-filter" logic from the computational system. That is why the real force of the argument comes from number agreement. In number agreement, nothing like the PLC seems to be at play; that is simply an empirical observation. Consequently, in that domain, it is simply untenable that "the interfaces [...] cannot accommodate purely hierarchical structures and needs supplementary information and that probing is done to provide such supplementary information." That is because, while number agreement needs to be attempted in every instance, there are instances where it has failed outright -- and thus, going by the quote, failed to provide the "supplementary information" that the interfaces "need" -- and yet the result is fully grammatical. But that means that the interfaces don't actually "need" this "supplementary information"; they can do without it after all. And so something else entirely must be motivating syntactic probing.

      Returning to the PLC, I want to reiterate that it is a problem for a uniform, no-post-hoc-filtration view of syntax. But in the same breath, I want to reiterate that even if we (I?) never solve this problem, it doesn't undermine the argument against probing being motivated by satisfying interface conditions, for the reasons outlined above.

      Delete
  14. @Omer

    First, let me be completely clear that insofar as I have theoretical preferences, they lie squarely in the derivational model, so that nothing that I write should be construed as a disagreement. My aim is rather to clarify the logical articulations of the concepts.

    You write: That is because, while number agreement needs to be attempted in every instance, there are instances where it has failed outright -- and thus, going by the quote, failed to provide the "supplementary information" that the interfaces "need" -- and yet the result is fully grammatical. But that means that the interfaces don't actually "need" this "supplementary information"; they can do without it after all. And so something else entirely must be motivating syntactic probing.

    But I disagree with the logic: yes, one type of agreement has failed with no negative consequence, but that doesn't mean the necessary information has not been provided by another mean, so that the interfaces have what they need.

    Let me outline a strongly minimalist model of syntax-core syntax is just Merge; the rest is interface business-which is compatible with the Kichean data. PF needs linearizing, but hierarchical structures are not linearizable, so functional heads have to probe in order to provide supplementary informations. The range of possible probing is limited to person, number, animacy and (crucially for the following) focus (that probing involves these features, appropriately geometrized, that much we know from empirical evidence; let me assume that's all there is).

    Now imagine a language with T-probing originally relativized as in standard French but moving diachronically (for some reasons) towards the morphological disappearance of agreement morphemes. At some point, phi-feature probing might become unlearnable and so the language will shift to focus probing and if the process goes on, the language might end up like Japanese, with exclusive focus probing.

    OK, now I claim that the intermediate state in this evolution is perfectly compatible with Agent-focus agreement in Kichean (including, in support of the proposition, the semantic interpretation of the construction). For some reasons, phi-probing is not doing the job anymore, yet the interfaces need the information, so focus probing is taking place (the precise timing being unimportant for the moment) and the interfaces are happy. Note that this kind of mechanism is quite commonly offered for partial configurationality.

    I will add one thing (because I'm less interested in theoretically pure implementation than in insights with actual empirical consequences): empirical diagnosis for the existence of focus probing are rather easy, so the adequacy of the mode outlined is in principle empirically testable. For instance, how does Agent-Focus construction fare with negation? Any intervention effect showing up? How about quantifiers? Questions? Underdetermined terms like someone? Can one sluice an Agent-Focus construction? Etc.

    ReplyDelete
    Replies
    1. @Olivier: To answer your question (non-exhaustively, I'm afraid) --

      Agent Focus is compatible with questions; in fact one of its primary uses is to construct wh-interrogatives in cases where the wh-phrase is the agent of a dyadic predicate. I haven't tried negation and/or sluicing, though the literature might contain the answer (see my thesis for references on AF; one word of warning, though: other Mayan languages, outside the Kichean branch, have similar constructions that are also similarly named, but whose behavior is crucially different from that of Kichean AF, so make sure they're talking about Kichean). Finally, quantification in Kichean -- and in Mayan in general -- is unfortunately a dicey issue. Judith Aissen once remarked to me that she is not at all convinced that Mayan languages have anything resembling the QNP/bound-pronoun construction we find in, say, Indo-European.

      Let me address your more general point, though. Of course there will be other probes, higher than the number probe -- e.g. Focus0 -- that will also scan the structure. And perhaps their (often successful) probing could provide the necessary "supplemental information" the interfaces need. But unless I am misunderstanding your outline, this only exacerbates the problem: why the heck is number probing happening, if its successful probing is not strictly required by the interfaces, and whatever is required by the interfaces will be provided later on by, e.g., Focus0 anyway?

      Delete
  15. Thank you so much for answering.

    "why the heck is number probing happening, if its successful probing is not strictly required by the interfaces, and whatever is required by the interfaces will be provided later on by, e.g., Focus0 anyway?"

    Because languages have to be learnable, so the shift from all phi-probing to no phi-probing need not be discrete. T in Kichean can phi-probe, so why shouldn't it? Especially since the phi-probe is sometimes successful, so maybe it has to try if only to know if it has failed. More abstractly, I think Miyagawa would say that phi-probes always probe in all languages, it is just the case that in some they are so defective that they have (crucially, almost) no effect (a formulation that can be recast elegantly in terms of your own approach, I think).

    From an acquisition of language point of view (within a minimalist perspective), I think that this might actually be the null-hypothesis: the child picks up all and every clue at disposal to linearize what has to be linearized and only later on finds out that some constructions in his language never make use of part of some tools.

    ReplyDelete
    Replies
    1. @Olivier: You're very welcome. As you can tell, I very much enjoy engaging with and thinking about these things, so thank you, as well.

      Two points regarding your last comment. First, "T in Kichean can phi-probe, so why shouldn't it?" is not exactly the same question as "why is the sentence ungrammatical when T hasn't probed?" I am not so much interested in the question of why number probing is a possibility; I am interested in how its obligatoriness is enforced. For the latter, you have not, as far as I can tell, provided anything resembling an interface-based answer. Whatever "supplementary information" the interfaces need, they seem to be fine with not getting it from T (either because they get it elsewhere -- e.g. Focus0 -- or because they don't need it at all). So, to put it bluntly, where does the star come from in "*These children is hungry" (or the Kichean AF analogue thereof)?

      Second, there is reason -- very good reason, in my view -- to reject the idea that "phi-probes always probe in all languages." The reason is that this view makes a very robust typological generalization concerning the Person Case Constraint unstateable (see my 2011 NLLT paper for the details). So I think there is good reason to think that languages with no overt phi agreement don't have phi probes. And this is pretty straightforward from an acquisition perspective, too: the child only posits probing features when faced with overt morphological co-variance between two distinct structural positions. So, e.g., no phi probes in Japanese.

      Delete
  16. "I am not so much interested in the question of why number probing is a possibility; I am interested in how its obligatoriness is enforced."

    At first glance, I would say: probes are there, and probes got to probe (of course in a relativized way). That's a stipulation. So perhaps you dislike it (I don't especially like it either, it is not like I have a sophisticated theory). See below, though.

    "Whatever "supplementary information" the interfaces need, they seem to be fine with not getting it from T (either because they get it elsewhere -- e.g. Focus0 -- or because they don't need it at all). So, to put it bluntly, where does the star come from in "*These children is hungry" (or the Kichean AF analogue thereof)?"

    In English, the star comes from the fact that the only way to go from the Merge-constructed structure to this particular linearization is through effective phi-probing. In spoken French, alongside "Ma soeur et moi sommes belles" (linearized solely through effective phi-probing) one can find "Ma soeur et moi, onnest belle" where I explicitly indicated the morphological reduction of the mismatching resumptive pronouns which is a linearization produced by phi-probing on gender but not number (and presumably not person either) and focus probing to trigger the left-dislocation. But "*Ma soeur et moi est belle" has a star because if number probing has been successful, then why is there no agreement, and if it hasn't, where has the information come from that "Ma soeur et moi" are in focus (in the syntactic sense).

    At any rate, I don't see any logical impossibility in the statement that probes have to probe, that when probing is successful linearization ensues and when it is not further probing has to take place. By the way, do you subscribe to the view that probes originate at C? If so, it seems possible to me to have a purely computational view of probes: a probe is just a computational device that probes, they exist because no linearization is possible without them and they fall down from C until the job is done.

    "The reason is that this view makes a very robust typological generalization concerning the Person Case Constraint unstateable (see my 2011 NLLT paper for the details)."

    Am I right in thinking that you are referring to the revised PLC? If so, how does that become impossible to state (the revised revised version simply becomes that pronouns fail to have the semantic interpretation constrained by their phi-features if they don't agree with a phi-probe, so perhaps no semantic interpretation at all, which does seem to be the case for pronouns in radically non-agreeing language like Vietnamese)?

    ReplyDelete
    Replies
    1. @Olivier: If we take the statement "probes are there, and probes got to probe" as the relevant principle, then you have already conceded my point. This principle implies that there is a computational operation or process that is obligatory in a way that does not reduce to the interfaces' need. In other words, we are no longer in a world where computational operations apply freely, constrained only by the need to produce interface-legible structures. In other words, we have discarded the Strong Minimalist Thesis. I'm of course fine with that, I just want to point out that it is a consequence of the aforementioned quote.

      In the continuation of your comment, however, you suggest something slightly different. Admittedly, part of the suggestion is centered around French data that I'm not entirely familiar with, so I will not attempt to pull an analysis of those out of my sleeve (for what it's worth, some glosses would be helpful). I'll therefore reconstruct the logic, as best I can, as it applies to Kichean. The slightly different idea is that the sort of ungrammaticality we are talking about arises because what needs to happen is that at least one probe -- be it number, gender, person, Focus, etc. -- needs to have scanned the structure, for reasons having to do with linearization. I think Kichean pretty easily falsifies this approach: in the equivalent of 'FOC they ABSpl.saw.AF him' ("It was them who saw him") in Kichean AF, both Focus and number must target the subject. The equivalent of '*FOC they ABSsg.saw.AF him' is out. But even in this ungrammatical case, Focus will have probed the structure; not only that, it has targeted the very same constituent ('they') that number would have targeted. And yet, the sentence is ungrammatical without number probing also having happened. Given that an uninterpretable features story doesn't work here either (as I have shown elsewhere), there is again no interface-based theory to be had here -- unless, of course, we adopt "probes are there, and probes got to probe" which, as I have said, amounts to saying "there is no interface-based theory to be had here."

      Re:the PCC -- no, what I am referring to is the following. As shown by Albizu 1997, Rezac 2008, Baker 2011, and others, the PCC is a fundamentally syntactic phenomenon, not reducible to morphology (pace Bonet 1991, 1994). However, it is also well known that by and large, the PCC arises only in languages that have some overt agreement with (or cliticization of) internal arguments. Now, in a modular view of syntax, it is impossible for syntax to 'see' whether something will be spelled out overtly or not. So, if every language has phi probing of its internal arguments in the abstract syntax, and the PCC is fundamentally syntactic, there is no way to state the fact that only languages with overt internal argument agreement(/clitics) have the PCC. Solution: no internal argument agreement --> no internal argument phi probes.

      Delete