Comments

Friday, July 26, 2013

Guest Post: Tim Hunter on Minimalist Grammars and Stats

I have more than once gotten the impression that some think that generative grammarians (minimalists in particular) have a hostility to combing grammars and stats because of some (misguided, yet principled) belief that grammars and probabilities don't mix. Given the wide role that probability estimates play in processing theories, learnability models, language evolution proposals, etc. the question is not whether grammars and stats ought to be combined (yes they should be) but how they should be combined.  Grammarians should not fear stats and the probabilistically inclined should welcome grammars. As Tim notes below there are two closely related issues: what to count and how to count it. Grammars specify the whats, stats the hows.  The work Tim discusses was done jointly with Chris Dyer (both, I am proud to say, UMD products) and I hope that it encourages some useful discussion on how to marry work on grammars with stats to produce useful and enlightening combinations.

Tim Hunter Post:


Norbert came across this paper, which defines a kind of probabilistic minimalist grammar based on Ed Stabler's formalisation of (non-probabilistic) minimalist grammars, and asked how one might try to sum up "what it all means". I'll mention two basic upshots of what we propose: the first is a simple point about the compatibility of minimalist syntax with probabilistic techniques, and the second is a more subtle point about the significance of the particular nuts and bolts (e.g. merge and move operations) that are hypothesised by minimalist syntacticians. Most or all of this is agnostic about whether minimalist syntax is being considered as a scientific hypothesis about the human language faculty, or as a model that concisely captures useful generalisations about patterns of language use for NLP/engineering purposes.

Norbert noted that it is relatively rare to see minimalist syntax combined explicitly with probabilities and statistics, and that this might give the impression that minimalist syntax is somehow "incompatible" with probabilistic techniques. The straightforward first take-home message is simply that we provide an illustration that there is no deep in-principle incompatibility there.

This, however, is not a novel contribution. John Hale (2006) combined probabilities with minimalist grammars, but this detail was not particularly prominent in that paper because it was only a small piece of a much larger puzzle. The important technical property of Stabler's formulation of minimalist syntax that Hale made use of had been established even earlier: Michaelis (2001) showed that the well-formed derivation trees can be defined in the same way as those of a context-free grammar, and given this fact probabilities can be added in essentially the same straightforward way that is often used to construct probabilistic context-free grammars. So everything one needs for showing that it is at least possible for these minimalist grammars to be supplemented with probabilities has been known for some time.

While the straightforward Hale/Michaelis approach should dispel any suspicions of a deep in-principle incompatibility, there is a sense in which it does not have as much in common with (non-probabilistic) minimalist grammars as one might want or expect. The second, more subtle take-home message from our paper is a suggestion for how to build on the Hale/Michaelis method in a way that better respects the hypothesised grammatical machinery that distinguishes minimalist/generative syntax from other formalisms.

As mentioned above, an important fact for the Hale/Michaelis method is that minimalist derivations can be given a context-free characterisation; more precisely, any minimalist grammar can be converted into an equivalent multiple context-free grammar (MCFG), and it is from the perspective of this MCFG that it becomes particularly straightforward to add probabilities. The MCFG that results from this conversion, however, "misses generalisations" that the original minimalist grammar captured. (The details are described in the paper, and are reminiscent of the way GPSG encodes long-distance dependencies in context-free machinery by using distinct symbols for, say, "verb phrase" and "verb phrase with a wh-object", although MCFGs do not reject movement transformations in the way that GPSG does.) In keeping with the slogan that "Grammars tell us what to count, and statistical methods tell us how to do the counting", in the Hale/Michaelis method it is the MCFG that tells us what to count, not the minimalist grammar that we began with. This means that the things that get counted are not defined by notions such as merge and move operations, theta roles or case features or wh features, which appeared in the original minimalist grammar; rather, the counts are tied to less transparent notions that emerge in the conversion to the MCFG.

We suggest a way around this hurdle, which allows the "what to count" question to be answered in terms of merge and move and feature-checking and so on (while still relying on the context-free characterisation of derivations to a large extent). The resulting probability model therefore works within the parameters that one would intuitively expect to be laid out for it by the non-probabilistic machinery that defines minimalist syntax; to adopt merge and move and feature-checking and so on is to hypothesise certain joints at which nature is to be carved, and the probability model we propose works with these same joints. Therefore to the extent that this kind of probability model fares empirically better than others based on different nuts and bolts, this would (in principle, prima facie, all else being equal, etc.) constitute evidence in favour of the hypothesis that merge and move operations are the correct underlying grammatical machinery.




32 comments:

  1. Can I just plug Adger 2006 and Adger and Smith 2010 where we argue that one can handle probabilistic distributions of data in a simple minimalist feature checking system in a way that brings together minimalist syntax with probabilistic variationist sociolinguistics. The 2006 paper is in journal of linguistics and the 2010 in Lingua.

    ReplyDelete
  2. Excellent to see a reference list on this building up.

    A conceptual aspect of this stuff I'm wondering about is how to tell what if any aspects of the grammar are probabilistic, as opposed to the probabilities being caused by the environment of use. My judgement would be that for example the use of discourse positions really is probabilistic, but proving it is a different matter.

    ReplyDelete
  3. Mark Johnson gave a talk about learning parameters (Pollock 1989 style V-to-T, ...) in a minimalist grammar framework using statistical inference, at this year's International Congress of Linguists building directly on Tim's and Chris's paper.
    I think it was recorded but I don't know where or when the video will be available. The slides are available here: http://web.science.mq.edu.au/~mjohnson/papers/Johnson12ICLtalk.pdf

    ReplyDelete
  4. You should have put a spoiler alert here -- you have spoiled the drama of Tim's talk at MOL which I was looking forward to ..

    Obviously I am completely on-side here with Tim's work as I am a big fan both of Stablerian MGs and probabilistic modelling, but one quibble. There doesn't seem, to me at least, to be much relationship between MGs and Minimalist syntax as exemplified by e.g. David Adger's core syntax book, (since he just posted above) or Norbert's recent book on theory of syntax (since this is his blog), or Chomsky's view of Minimalism, (
    based on some brief discussion with Sandiway Fong who is trying to formalise it).
    So Tim has shown that there is no incompatibility between MGs and the use of probabilities, which is roughly because MGs are well behaved computationally, and equivalent in one sense to a PSG formalism. But this certainly does not imply that there is a compatibility between minimalist syntax and probabilities.

    On another note, it's interesting to thing about parameterising MCFGs in the light of Postal 64's arguments about discontinuous constituents. One of his arguments is basically Tim's point about about the naive parameterisation missing generalisations.
    So you need some sort of feature calculus to control the derivations in an MCFG/MG --
    it would be nice to have some more abstract way of thinking about these features and what they do, in the way that we can now think of the derivations in an abstract way.

    ReplyDelete
    Replies
    1. I agree: there are significant differences between Stabler's MGs and (each of the various flavours of) minimalist syntax that you can find in the wild. To the extent that there is any conclusion to be reached here about minimalism "as a whole", it concerns only the big-picture kind of question that Norbert mentioned in his intro: if you want to combine minimalist syntax with stats, feel free, and there are at least ways to get started. But I will be the first to admit that --- indeed, I would encourage everyone to notice that --- this is what you might call "existential compatibility" (there is a way), not "universal compatibility" applying to anything calling itself minimalist syntax.

      It may be overstating things to say that there is "not much relationship" between minimalist syntax at large and Stabler's MGs, but I do spend quite a bit of time worrying about the differences. Most significantly perhaps, the Shortest Move Constraint (SMC) that MGs enforce seems to me to be a much stricter minimality constraint than working syntacticians generally adopt, but is crucial to the MCFG-reformulation of MGs; and since this reformulation seems to have become relatively standard in the formal work on MGs (including our paper!), I do worry that the two bodies of work are drifting apart. Of course if they drift apart far enough, then doing something with MGs won't even constitute "existential compatibility" anymore.

      Delete
    2. But, given the sorry history of relationships between generativists and variationists as recounted in the 1999 Probabilistic Linguistics book, I think it's very important to have in-principle compatibility demos no matter how big the technical problems and in-fact coverage gaps.

      Delete
    3. Very true. I am a little uneasy about the revisionism that is going on by various people along the lines of "But Chomskyan linguistics has always been sympathetic to statistical modelling, look at LSLT!". There are some ideological fault lines lurking here -- but maybe more to do with acquisition rather than representation.

      Delete
    4. There was plenty of text in early Chomsky about the homogeneous community of ideal speaker-hearers who know their language perfectly and never goof is an idealization, but not a syllable to the effect that the discrete grammar might be an idealization of something statistical.

      I think it's still conceptually possible that the grammar is non-statistical, with every expressible meaning having a unique realization, so that the observable statistics would come from choice of meanings to be expressed, but almost certainly just false.

      Partly from the acquisitional factor that child learners could not know all the conditioning factors behind the variation they witness, so need a statistical model of some kind, & there's not likely to be a magic moment when it gets replaced by something different

      Delete
  5. The video of Mark Johnson's talk about statistical inference and minimalist grammars for language acquisition is available here: https://mediaserver.unige.ch/play/80084

    ReplyDelete
  6. As one of the look-at-LSLT revisionists that Alex refers to, I also think it's a good idea to shout as loudly as possible, that grammars are not incompatible with statistical inference. That doesn't mean we will be heard. The field has gone through this a couple of times already; see, among others, the variable rule debate, which was ignited by Labov's probabilistic amendment to SPE.

    More to the point here, and also echoing Alex's point: it has more to do with acquisition rather than representation. On my view, the most compelling argument for using probabilistic models for language learning is that it offers a straightforward account of the gradualness in child language acquisition. But the puzzling fact is that children's syntactic development does not generally follow the usual type of frequency effects.

    For instance, as Amy Pierce showed many years ago and replicated for all verb raising languages, French children learn V-to-T at 18-20th month such as virtually no errors ever occur. On what kind of statistical information? A long time ago, I counted child directed French and found that 7% of utterances contain a finite verb followed by negation or VP level adverb, which seems adequate to facilitate early acquisition. But problems come up when we turn to other aspects of child language--in fact, the most studied aspects. If I have to name three biggest topics in language acquisition, they would be (a) Null Subjects, (b) Optional Infinitives and (c) English past tense (true, it's morphology). In all three cases, children stubbornly resist statistical trends in adult language: the Null Subject stage for English children last 3 years, Optional Infinitives even longer, and even adults still over-regularize. To compound the matter further, Italian and Chinese children, who learn the opposite grammars to English with respect to subject use, are at near adult level at the age of 2.

    The question, then, seems to come up with the best representation-learning combo to account for child language quantitatively and cross-linguistically. Formal properties matter too: after all, every child learns approximately the same grammar (at least they can understand each other), and this suggests that the search space be suitably smooth.

    ReplyDelete
  7. Yes, there is good stats (e.g. Tim's work) and bad stats (n-grams), and in the anti bad stats broadsides that Chomsky has delivered over the years, I feel the good stats have suffered some collateral reputational damage. So it is worth carrying on shouting, though like you I am somewhat pessimistic.

    I still don't quite understand the "probabilities in the performance module" versus "probabilities in the competence grammar" debate, or even what the consensus view on it is at the moment.

    ReplyDelete
  8. Alex, have a look at Stabler's forthcoming TICS paper on his website for a very nice illustration of probabilities in the parser vs the grammar. I agree with Charles that where probabilistic info looks most likely (ahem) is in acquisition and in parsing and other aspects of performance like lexical choice. The principles of the grammar don't, at least as far as current arguments go, look probabilistic to me.

    ReplyDelete
  9. I am just trying to fit Tim and Chris's work into this debate. If everybody agrees that stats have some place in syntax, then the only thing to discuss is where precisely the stats should go: in the lexicon, the grammar, the parser, the LAD ...

    So ... MGs are a fully lexicalised formalism, so in a certain sense the grammar is the lexicon, assuming a universal set of features (?) -- if the lexicon is part of performance and thus probabilistic then one approach is just to attach probabilities only to the productions that introduce lexical items, and then everything is deterministic bottom up.. but that doesn't work mathematically (I think).

    I guess Tim's approach is one alternative answer -- but the probabilities there are not in the parser or the lexicon .. but defined over local chunks of the derivation tree, which seems pretty close to "the principles of the grammar".

    So where does the paper that is the topic of the post fit in to the representation/acquisition/performance space?

    (thanks for the pointer to the new Stabler paper!)

    ReplyDelete
    Replies
    1. Like Alex perhaps, I am not sure I understand what this "where are the stats" debate really means. It's true that the model Chris and I propose is quite "close to the grammar" (indeed, our whole goal was to try to get something "closer" to the grammar than what had been done in the past). But this is not at all the same thing, to my mind, as saying that the grammatical rules themselves are "inherently probabilistic". For one thing, our approach is entirely compatible with a speaker simultaneously "knowing" multiple distinct probability distributions over the set of derivations of a single grammar. Maybe one simplistic case that this would correspond to would be different distributions for different registers or something. My understanding (perhaps incorrect) of the most extreme "probabilities in the grammar" position is that it rejects this distinction between, on the one hand, a non-probabilistic grammar that defines a set of derivations, and on the other, some parameter values that supplement that grammar to define a probability distribution over that set; to the extent that we reject this distinction, it doesn't seem that we could have a one-to-many relationship between grammars and parameter settings.

      So our attempt to get "close to the grammar" should not be interpreted as pushing the view that grammars are inherently probabilistic, or that the probabilities are right in there as part of the rules, etc. Rather, the idea is that whatever uses one might find for defining probability distributions over the derivations of a grammar (whether for processing or acquisition, whether one or many distributions per speaker, etc.), it makes some sense to have the probabilities parametrized in a way that lines up with the way the set of derivations is defined (carving nature at the same joints and all that).

      (BTW, Alex: Unfortunately, I'm not going to make it to MoL this year for complicated reasons concerning my US visas, but Chris will be there and will give a suitably dramatic talk, I'm sure.)

      Delete
    2. This comment has been removed by the author.

      Delete
    3. In the wider sense of grammar that was popular amongst the generative semanticists, the triggering conditions for the use of a genre aka probability distribution would be part of the grammar; I'm inclined to think now that those guys had many basically right ideas but much weaker tools than we do now.

      Delete
  10. Norbert here:

    Thomas Graf has tried to post this comment twice and it has not appeared for some reason. The point is interesting so I have posted it for him. Here it is:

    ****

    I'm a little late to the party, but I'm curious where you, Alex and Tim, see big discrepancies between MGs and Minimalist syntax. The strength of MGs is that they are an extremely malleable formalism, they can easily be modified without altering their fundamental properties. Personally, I think of standard MGs as characterized by the following properties:

    1) their derivation tree languages fall into a particular formal class thanks to the SMC (regular tree languages),
    2) the mapping from derivations to derived trees has limited complexity (definable in monadic second-order logic),
    3) the derivation trees are lexicalized via a (usually symmetric) feature calculus.

    As far as I can tell, Tim's probabilistic work depends only on 1), as this is what makes it possible to view MGs as underlyingly context-free. You can easily expand MGs with new movement types, Adjunction, (limited) Late Merger, locality restrictions, reference-set computation, change the feature checking mechanism, or relax the SMC, and 1) will still hold. The only proposals in the literature that strike me as problematic are those that incorporate notions of semantic or structural identity.

    ReplyDelete
    Replies
    1. I'm on the road this week so I don't have time to respond in much detail, but yes I'm just thinking of point (1). The SMC as usually stated (i.e. a limit of one) seems too strict to me; relaxing it to a limit of say three or four is tempting at first because it's likely to cover virtually all of the acceptable and/or used-in-practice sentences, but this seems to be missing the point.

      Delete
    2. Just wanted to add a note of agreement with Thomas on this point. You can basically restrict movement relations whatever way you like so long as you ensure that the relation between the relevant two nodes in the derivation tree is MSO-definable. In practice, that means that you need to have (i) a finitely-bounded feature specification identifying the moved phrase (i.e., no indices allowed) and (ii) an MSO-definable structural relation which unambiguously identifies the moved phrase given this specification. That permits pretty much any formulation of local SMC you like (and even some versions defined in terms of reference sets). Of course there are limitations on what you can define that are inherent consequences of the restriction to a regular derivation tree language and an MCS string language. However, it seems to me that most current informal work in Minimalist syntax could easily be formalized using the techniques presented in Thomas' recent work and elsewhere.

      I think the situation here has really changed in the past few years. It's actually pretty easy to roll your own flavor of MG using the formal tools that are now on the market.

      Delete
  11. This comment has been removed by the author.

    ReplyDelete
  12. There was another post by Thomas Graf which doesn't seem to have shown up on the page though it came through on the subscription so I copy it here:


    ---- from TG
    "Thanks, Norbert, for pushing through my comment, i'm also on the road --- just like Tim --- so maybe this confused blogspot in some way i cannot fathom (previous comments went through just fine). I agree with Tim that relaxing the SMC from, say, 1 to 3 is missing the point. But i do not think that's what the SMC is about. The SMC is a very brute-force way of ensuring MG derivations are regular and involve symmetric feature checking, and there's many alternative routes (mirroring the point made by Alex D).

    I think it's worth giving a quick summary of what the SMC does in Minimalist Grammars. In MGs, every movement operation is triggered by mapping a movement licensor feature (+f) to a corresponding licensee feature (-f). This mapping involves two processes: a derivational feature checking mechanism (``if you have +f and -f, elide them from your equation'') and a mapping process (``put a -f element where the +f element used to be''). Now for various reasons, we do not want to have any ambiguity in how these features are mapped to each other. That is to say, if we have one +f feature and more than one -f feature that could check it, we're not happy because this raises several questions: which -f feature is mapped to the +f feature, and what are we supposed to do with the remaining -f features that did not happen to be among the precious chosen -f features? The SMC does away with these questions in a very blunt way --- we simply block all those configurations as ungrammatical

    But there are many viable alternatives. For example, we might have some mechanism to decide which -f feature was closer to the relevant +f feature and just decide not to care about -f features that cannot be checked this way. That would be very close to the Closeness condition in Minimalist syntax and still preserve property 1) mentioned above.

    Frankly, i don't see why anyone would expect the original MG setup, which was designed in 1997, to be compatible with recent interations of Minimalist syntax. That doesn't mean early iterations of MGs were a waste of time, because all the interesting theorems about the early kind of MGs still carry over to the new variants. But it does bring me back to my original point: what kind of analysis or proposal is incompatible with MGs? Alex D's answer suggests that the answer is none, but i'm still curious what Tim and Alex C have to say about
    this. "
    --- end of the post from TG

    So I am not sure I think they are incompatible as such -- there are as you say a wide variety of MGs and an even wider variety of proposals in the Minimalist syntax literature of various degrees of formality -- but I was thinking about for example, the sorts of models where set theoretic merge is defined as { A, B } without linear order, and linearisation comes afterwards and contains some learned components that only pronounce one of each copied element and so on. It may be possible to formalise that within the MG framework, but superficially at least it doesn't seem to be closely related. Or no more closely related to MGs than to some CG proposals.

    And I should clarify that if there is a fundamental incompatibility between MGs and some proposal in Minimalist syntax, then I consider that more of a problem for the syntactic proposal than for MGs, which I think are very much on the right track. For example, if a proposal takes the class of language outside of PTIME.

    ReplyDelete
    Replies
    1. Regarding set-theoretic merge and linearization, I'm not sure that there's any real difficulty here. The objects constructed by set-theoretic merge can be modeled as unordered trees. You could define a two-step MSO transformation from derivation trees to unordered trees and then to ordered trees. (Most of the linearization algorithms proposed in the literature are very simple and would be MSO-definable, I think.)
      There should be no problem in defining the transformations so that language-specific rules determine which “copy” is pronounced. Introducing true copying in derivation trees or derived trees is not so straightforward, of course.

      Delete
    2. That's very interesting, thanks.
      But then there is the other problem, namely if the translation between the is completely straightforward then the different proposals are just notational variants of each other and we shouldn't argue about them as though the differences were empirically significant.

      Of course the translations are never that simple --- e.g. the MG -> MCFG translation causing an exponential blowup, and so there will probably be some impact... but it needs some careful analysis.

      So having it both ways (i.e. claiming that minimalist syntax is so close to MGs that they inherit the nice computational properties, while claiming that they are sufficiently different that there are empirical differences) is possible, at least if you are interested in the descriptive side of syntax rather than the explanatory side, but I do think it needs some argument.

      (Just to clarify I am not being snarky about descriptive versus explanatory, I just mean if you are interested in the problem of finding grammars for particular languages, versus the learning/UG problems)

      Delete
    3. I don't think I understand your point. With respect to some issues, the alternatives are pretty similar, notational variants quite often. Wrt to other issues they are not. So for example, substantive theoretical differences exist on how to analyze various kinds of dependencies, e.g. binding, control. Are these unifiable with "move" (i.e. i-merge) or not. If they are then an even larger portion of UG is amenable to the kinds of computational concerns that animate you, Stabler, Tim, Alex, Thomas, etc. If not, then what do we do with those? These are UG problems, aren't they?

      So, there are many kinds of problems. It looks like to the degree that Minimalism "in the wild" is MG translatable, then some of the concerns you have may be assuaged (learnability?). However, some that I have may not be: how much of the kinds of UG properties we have previously identified (in GB for example) are codable using minimalist techniques? If they ALL are, then we can go back and also ask how good GB was as a description of UG generalizations. It was pretty good, IMO, but hardly perfect. So can it be improved and if so can these improvements be minimalistically accommodated. And this goes on and on, as expected.

      I confess Alex that I am not sure I can now identify the bee in your bonnet. If you are saying that things are complicated, then sure, OK, who thinks otherwise. But I heard you saying that there was something obvious standing in the way of doing with Minimalism what you think ought to get done. But then Thomas and Alex D ask you what in particular and that they don't see the problem and then I just don't get your reply. Is Thomas' reply (and Alex's) on the right track? If so, is this obviously doomed? If not, are other approaches less imperiled? It looks to me like your main concerns have been addressed. Is this wrong? And if it isn't does this mean that for the time being minimalism is not, in your view, obvious junk? Inquiring minds want to know.

      Delete
    4. So you ask the same question that I am interested in: are the minimalist syntax proposals in the literature translatable into MGs?
      So if the answer is YES: (as I am told here they are by people who know their stuff)
      then are they just notational variants? And if they are, then why argue about them?

      If the answer is NO: then yes, this affects the theory of what grammars are, but then they don't inherit the nice properties of MGs, such as the one that Tim showed above.

      But you can't have it both ways -- you can't claim that A) they are different in empirically meaningful ways
      AND
      B) they have the nice computational properties of MGs, efficient parsing, learnability, having nice statistical models etc.



      So I obviously don't think that all minimalism is junk, or I wouldn't be here. (Though looking at Minimalist papers on lingbuzz there is clearly some pseudo-scientific junk out there).
      But I have different views about the value of the MP, of MGs, and about minimalist syntax, and I am interested in the relationships between them, in particular between the last two.

      Delete
    5. As I see it, the point is not so much that minimalist syntax proposals are translatable into MGs, for some fixed interpretation of what an MG is, but rather that a particular method of formalizing standard MGs (by defining a constrained mapping from a regular derivation tree language to a derived tree langauge) can easily be “tweaked” to derive new flavors of MG. These tweaked MGs could (I think) be used to model the majority of proposals in the informal literature. For example, as has been mentioned, there's nothing special about the particular definition of the SMC that Stabler adopts in his original paper. Lots of other definitions could be adopted that would be equally effective in ensuring the regularity of the derivation tree language. These different definitions of the SMC would nonetheless make different predictions about the range of movements that a phrase with a given feature specification can undergo. So they would not be “notational variants” as far a syntacticians are concerned. Certainly, from the outside perspective of e.g. a computational linguist, the differences might be to small to be worth bothering about. In the same way, a syntactician might not care too much about fine distinctions between different variants of the same learning algorithm.

      Delete
    6. I want to just second Alex D's remarks and add a little flesh. FIrst, from my outside status I notice that Stabler, for example, elaborates many kinds of MGs. In his paper in the Offord Handbook edited by Cederic, for example, he goes through 4 or 5 different MGish models and notes their similarities wrt their "nice" computational properties. However, these are all different theories of UG, as they involve different basic operations, group different phenomena under different generalizations etc. Thus, this is not just G niggling, it involves different theories of UG, of interest to syntacticians,even if not to some computationalists.

      Indeed, this is where the syntax action is, at least if one's interest is in unification like mine is. How can one and should one if one can model control as movement? Binding? Does case checking involve feature movement or overt movement with lower copy pronunciation? I can see why those interested in other issues might find this so much of a muchness. However, for us insiders, these are intriguing empirical questions with significant theoretical cachet.

      So, Alex C do you agree with Alex D and Thomas that most proposals in the wild are tweakable to something like what we see with MGs a la Stabler? If not, why not? If so, why do you still sound so unhapy? I am beginning to feel that you just don't like this minimalist stuff, even if it is MG koserable. Your privilege. But this is clearly not an argument against or for anything.

      Delete
    7. So we aren't having an argument here -- or at least I am not as I don't have a fixed view that I am trying to defend. I am just trying to understand better the relationship between MGs and Minimalist Syntax, and some of our disagreements are I think attributable, as Alex D points out, to differences of perspective and/or methodology. And some are also no doubt attributable to the fact that I am not expert in MGs...

      So what counts as a "notational variant" (NV) depends on what you are interested in I guess, but maybe that's not the right phrase. I guess I mean "empirically indistinguishable" (EI). So sure, we might have two theories that "make different predictions about the range of movements that a phrase with a given feature specification can undergo." But of course, phrases, feature specifications and movement are all theoretical objects that we don't directly observe, and if these two theories nonetheless define exactly the same possible sets of sound/meaning pairings even if the movement relations and underlying structural descriptions are different, then we might want to say that they are NVs or EI. "Might" because it's not that simple, as different parameterisations might give very different sizes as we convert from one to the other, so there is a simplicity of grammar issue here as well. And "Might" also because there could be psycholinguistic evidence (eg Bock style structural priming) that could distinguish the two theories even if they can't be distinguished by the more linguisticky evidence.

      So the feature calculus is really important in these discussions, but we don't have quite the right technical vocabulary to talk about it in an abstract way, as we do with the derivation trees and more language theoretic stuff. So the SMC can be formulated in a number of different ways that all maintain strong equivalence to MCFGs, but with different paramaterisations of the vast resulting set of nonterminals.

      (Sorry for delay -- in Berlin at CogSci).

      Delete
  13. From David Adger:

    ***

    Completely agree with Alex (Drummond) above. Even looking at the system I presented way back in Core Syntax, it's fairly straightforward to formalise most of the analysis given there using MGs so I think there can actually be a fairly close relationship between MGs and minimalism in the wild as presented in undergrad textbooks. I think there's a sociological issue here though. When I talk to friends who work in LFG or HPSG, many bemoan what they see as the absence of formal work in minimalism, and most simply don't believe me when I say that much of the work is straightforward to make formally explicit or they say that an MG type formalisation is not really minimalism. I think this is because they want a uniform, mostly agreed upon, formally explicit and fairly complete theory (looking at you Miram Butt and Ash Asudeh!), Essentially a grammar fragment for UG. But we working syntacticians in minimalism (and elsewhere) look like we are constantly changing even what seem to be fairly crucial and core theoretical precepts. Which from the LFG/HPSG perspective must be a bit annoying. The reason why this is sociological, I think, is that its about aims and interests. Theoretical minimalist syntacticians are trying to solve theoretical problems (sometimes raised by empirical analysis, sometimes by theoretical qualms) so we are constantly trying out new ways of configuring assumptions (which would lead to different formalisations) basically because although the research programme is fairly clear and has had numerous successes, how it will pan out in detail is not. So for many syntacticians working in the framework (although not all) its about exploring which ways of configuring rather inexplicit theoretical hunches lead to interesting new ways of understanding the phenomena (which can then be made explicit). This gets quite messy and disparate (and interesting!). I hesitate to speak for my LFG/HPSG friends, but my impression is that, possibly because of the discipline imposed by computational interests (building usable grammars), buy probably for other reasons too, such disparate messiness is unattractive, and uniformity of the basic theoretical framework is more highly valued. But this is an issue of interests and is, I think, orthogonal to questions of formalizability. This is then related to the question that Alex (Clark) raises about the relation between MGs vs minimalism in the wild: MGs provide a great way of formalising ideas that are being explored by theoretical syntacticians even though these ideas might be quite disparate, but MG is not intended to be a constraining formal framework in the same way that, for example, LFG is. I may have got that wrong, so please correct me if so!

    ReplyDelete
  14. @David via Norbert. The computational project is certainly a big factor. Another, which motivates the more descriptively oriented LFG-ers, is to capture generalisations in an at least semiformal framework that has a good chance of remaining accessible for a reasonably long time.

    Another consideration is that we tended to find the explanatory ambitions of GB/MP implausible, on the basis that learning seemed to be probably more powerful than Chomsky was speculating in 1979, & many of the supposed principles and parameters seemed to have bad & unacknowledged problems from the beginning.

    ReplyDelete
  15. Past tenses in the above because I think the ground has shifted a lot under the various entrenched positions, and things need to be deeply rethought and rephrased.

    ReplyDelete
  16. (Sorry to have to leave this interesting discussion for so long. I'll add this anyway and see if anyone's still interested ...)

    I agree with the comments from Thomas and Alex D. that we needn't get bogged down in the precise details of the SMC, as it was formulated in the original MGs in 1997. The bigger point is what the SMC gets us, which is "ensuring that MG derivations are regular" (which in turn ensures that they can be characterised by a context-free grammar, with some missed generalisations and blowup); swapping in some other method of ensuring that the derivations are regular will leave the "nice computational properties" in place, including the probability model discussed above, for example. My worry is not about whether minimalism-in-the-wild follows Stabler's SMC to the letter, it's about whether the derivations we find in the wild are such that there is in fact another regularity-enforcing constraint that we could swap in for the SMC.

    To illustrate with a fairly contrived example: let's suppose that quantifiers take scope via syntactic movement (QR), and that all such movements are driven by the same type of feature (say, '-q'). The number of quantifiers we can have in a single clause doesn't seem to be bounded in any principled sense, because we can construct things like:
    (1) every man met some woman [on every day] [in some building] [with every friend] ...
    Let's suppose that there's a derivation where all of these quantifiers move to scope-taking positions at the top of this clause. (I don't think their relative scope-taking positions actually matter at all, nor whether there is more than one option or not.) Then at a certain point in the derivation, we have say a TP constituent that has one unchecked '-q' feature somewhere inside it for each quantifier, each of which needs to be checked by some future move operation. There's no limit on how many of these to-be-moved quantifiers we might need to be keeping track of by the time we get to this TP level, so doesn't this violate the finiteness that is required for the derivations to be regular?

    In one sense, it doesn't matter at all whether the assumptions I made about the data are plausible. My point is just that if such data turned up, and a syntactician made the theoretical moves that I sketched in order to try to account for it, then I don't think any of those theoretical moves would be considered particularly outlandish. And this means we have a mismatch between (i) MGs in the broad sense, encompassing all those possible variations that maintain the nice computational properties, and (ii) the things syntacticians might do that are not considered outlandish.

    Of course, in another sense, the data does matter: if we don't need that extra stuff, then we don't need it, and so much the better for MGs as an empirical hypothesis. But this doesn't affect the mismatch we have at the moment.

    ReplyDelete