*****
Why Formalize?
I read with interest Norbert’s recent post on formalization:
“Formalization and Falsification in Generative Grammar”. Here I write some
preliminary comments on his post. I have
not read other relevant posts in this sprawling blog, which I am only now
learning how to navigate. So some of what I say may be redundant. Lastly, the
issues that I discuss below have come up in my joint work with Edward Stabler
on formalizing minimalism, to which I refer the reader for more details.
I take it that the goal of linguistic theory is to
understand human language faculty by formulating UG, a theory of the human
language faculty. Formalization is a tool toward that goal. Formalization is stating a theory
clearly and formally enough that one can establish conclusively (i.e., with a
proof) the relations between various aspects of the theory and between claims
of the theory and claims of alternative theories.
Frege in the Begriffsschrift (pg. 6 of the Begriffschrift in
the book Frege and Godel) analogizes the “ideography” (basically first and
second order predicate calculus) to a microscope: “But as soon as scientific
goals demand great sharpness of resolution, the eye proves to be insufficient.
The microscope, on the other hand, is perfectly suited to precisely such goals,
but that is just why it is useless for all others.” Similarly, formalization in
syntax is a tool that should be employed when needed. It not an absolute
necessity and there are many ways of going about things (as I discuss below).
By citing Frege, I am in no way claiming that we should aim for the same level of
formalization that Frege aimed for.
There is an important connection with the ideas of Rob
Chametzky (posted by Norbert in another place on this blog). As we have seen,
Rob divides up theorizing into meta-theoretical, theoretical and analytical. Analytical work, according to Chametzky is:
“concerned with investigating the (phenomena of the) domain in question. It
deploys and tests concepts and architecture developed in theoretical work,
allowing for both understanding of the domain and sharpening of the theoretical
concepts.” It is clear that more than 90% of all linguistics work (maybe 99%)
is analytical, and that there is a paucity of true theoretical work.
A good example of analytical work would be Noam Chomsky’s
“On Wh-Movement”, which is one of the most beautiful and important papers in
the field. Chomsky proposes the wh-diagnostics and relentlessly subjects a
series of constructions to those diagnostics uncovering many interesting patterns
and facts. The consequence that all these various constructions can be reduced
to the single rule of wh-movement is a huge advance, allowing one insight into
UG. Ultimately, this paper led to the Move-Alpha framework, which then led to
Merge (the simplest and most general operation yet).
“On Wh-Movement” is
what I would call “semi-formal”. It has semi-formal statements of various
conditions and principles, and also lots of assumptions are left implicit. As a
consequence it has the hallmark property of semi-formal work: there are no
theorems and no proofs.
Certainly, it would have been a waste of time to fully
formalize “On Wh-Movement”. It would have expanded the text 10-20 fold at
least, and added nothing. This is something that I think Pullum completely
missed in his 1989 NLLT contribution on formalization. The semi-formal nature
of syntactic theory, also found in such classics as “Infinite Syntax” by Haj Ross
and “On Raising” by Paul Postal, has led to a huge explosion of knowledge that
people outside of linguistics/syntax do not really appreciate (hence all the
uninformed and uninteresting discussion out there on the internet and Facebook about
what the accomplishments of generative grammar have been), in part because
syntacticians are generally not very good popularizers.
Theoretical work, according to Chametzky is: “is concerned
with developing and investigating primitives, derived concepts and architecture
within a particular domain of inquiry.” There are many good examples of this
kind of work in the minimalist literature. I would say Juan Uriagereka’s
original work on multi-spell-out qualifies and so does Sam Epstein’s work on
c-command, amongst others.
My feeling is that theoretical work (in Chametzky’s sense)
is the natural place for formalization in linguistic theory. One reason is that
it is possible, using formal assumptions to show clearly the relationship
between various concepts, assumptions, operations and principles. For example, it
should be possible to show, from formal work, that things like the NTC, the
Extension Condition and Inclusiveness should really be thought of as theorems proved
on the basis of assumptions about UG. If
they were theorems, they could be eliminated from UG. One could ask if this
program could be extended to the full range of what syntacticians normally
think of as constraints.
In this, I agree with Norbert who states: “It can lay
bare what the conceptual dependencies between our basic concepts are.”
Furthermore, as my previous paragraph makes clear, this mode of reasoning is
particularly important for pushing the SMT (Strong Minimalist Thesis) forward.
How can we know, with certainty, how some concept/principle/mechanism fits into
the SMT? We can formalize and see if we can prove relations between our
assumptions about the SMT (assumptions about the interfaces and computational
efficiency) and the various concepts/principles/mechanisms. Using the ruthless
tools of definition, proof and theorem, we can gradually whittle away at UG,
until we have the bare essence. I am sure that there are many surprises in
store for us. Given the fundamental, abstract and subtle nature of the elements
involved, such formalization is probably a necessity, if we want to avoid
falling into a muddle of unclear conclusions.
A related reason for formalization (in addition to clearly
stating/proving relationships between concepts and assumptions) is that it
allows one to compare competing proposals. One of the biggest such areas
nowadays is whether syntactic dependencies make use of chains, multi-dominance
structures or something else entirely. Chomsky’s papers, including his recent
ones, make references to chains at many points. But other recent work invokes
multi-dominance. What are the differences between these theories? Are either of them really necessary? The SMT
makes it clear that one should not go beyond Merge, the lexicon, and the
structures produced by Merge. So any additional assumptions needed to implement
multi-dominance or chains are suspect. But what are those additional
assumptions? I am afraid that without formalization it will be impossible to
answer these questions.
Questions about syntactic dependencies interact closely with
TransferPF (Spell-Out) and TransferLF, which to my knowledge, have not only not
been formalized but not even stated in an explicit manner (other than the
initial attempt in Collins and Stabler 2013). Investigating the question of
whether multi-dominance, chains or some something else entirely (perhaps
nothing else) is needed to model human language syntax will require a
concomitant formalization of TransferPF and TransferLF, since these are the
functions that make use of the structures formed by Merge. Giving explicit and
perhaps formalized statements of TransferPF and TransferLF should in turn lead
to new empirical work exploring the predictions of the algorithms used to
define these functions.
A last reason for formalization is that it may bring out
complications in what appear to be innocuous concepts (e.g., “workspaces”,
“occurrences”, “chains”). It will also
help one to understand what alternative theories without these concepts would
have to accomplish. In accordance with the SMT, we would like to formulate UG
without reference to such concepts, unless they are really needed.
Minimalist syntax calls for formalization in a way that
previous syntactic theories did not. First, the nature of the basic operations
is simple enough (e.g., Merge) to make formalization a real possibility. The
baroque and varied nature of transformations in the “On Wh-Movement” framework
and preceding work made the prospect for a full formalization more daunting.
Second, the nature of the concepts involved in minimalism, because
of their simplicity and generality (e.g., the notion of copy), are just too
fundamental and subtle and abstract to resolve by talking through them in an
informal or semi-formal way. With formalization we can hope to state things in
such a way to make clear the conceptual and the empirical properties of the
various proposals, and compare and evaluate them.
My expectation is that selective formalization in syntax
will lead to an explosion of interesting research issues, both of an empirical
and conceptual natural (in Chametzky’s terms, both analytical and theoretical).
One can only look at a set of empirical problems against the backdrop of a
particular set of theoretical assumptions about UG and I-language. The more
that these assumptions are articulated; the more one will be able to ask
interesting questions about UG.
Thanks for the updated version. Would it be possible to get the full reference to Collins&Postal [2013] you mention here?
ReplyDeleteQuestions about syntactic dependencies interact closely with TransferPF (Spell-Out) and TransferLF, which to my knowledge, have not only not been formalized but not even stated in an explicit manner (other than the initial attempt in Collins and Postal 2013).
Christina, that is also a typo. It is meant to be Collins and Stabler 2013. Sorry.
ReplyDelete@Greg: On the page with Chris's original post you said: "@Chris: "These questions about syntactic dependencies interact closely with TransferPF (Spell-Out) and TransferLF, which to my knowledge, have not only not been formalized but not even stated in an explicit manner." I'm a little surprised to read this! Certainly, in Ed's Minimalist Grammar framework, these matters have long been formalized."
ReplyDeleteYou are right that in formal minimalist grammars (MGs), we see clearly how to separate derivation from spellout to PF and LF. We also see why the separation is so valuable -- both steps are revealed to be very simple, finite state steps. Your 2007 makes this completely explicit, and I try to explain it informally in Appendix B of my 2013 TopiCS paper.
However, I don't think it is right to say that we have formalized TransferPF/LF in the minimalist program, for two reasons. First, I think that mainstream conceptions of minimalist derivation and of TransferPF/LF are fundamentally different from the MG conception. And second, given that they are different, we have not persuaded Collins (or Chomsky, or many others) that the MG conception is better. In fact, while the MG perspective is ever so much clearer and hence easier to defend/assess, I am not sure that it is empirically correct. Let me defend these claims briefly.
First, in the study of MGs we have some simple and beautiful technical results (some of which you are responsible for): (0) MG derivations represented as trees (or terms) are regular, and those trees can be mapped to PF/LF by a certain kind of regular transduction to yield mildly context sensitive string languages. There are different ideas about TransferPF/LF though. One idea could be this. (1a) The computation of linguistic structure does not build representations of derivations, but syntactic objects of some kind which potentially do not encode everything about their derivational histories. (1b) Certain operations (internal merge, agree) on those objects may involve a search through ('visible' parts of) the objects. (1c) What gets transferred to PF may influence what gets transferred to LF and also what is 'visible' in the derivation. I am not defending (1a-c) here but only pointing out that it is different from (0).
OK, now my second point is already clear. Many linguists are not convinced that (0) provides the right picture. (1b,c) are related to the proper formulation of island conditions and to some empirical threats to the expressive adequacy of standard MGs (with SMC or equivalent), as you know. And I am intrigued by some of the psycholinguistic work coming out of Colin Phillips' lab and other places suggesting that some linguistic illusions may be explained in part by the fact that operations requiring search may be more susceptible to interference that categorial operations that are 'carried forward' in the derivation.
It is no surprise that I think perspective (0) is enormously valuable, partly because it is just so extremely clear and simple and reasonably close to descriptive adequacy. But not to overstate the case, I think there are some genuine issues raised by some of the alternative perspectives. Another advantage of perspective (0) is that it make it much easier to see what the genuine issues actually are.
Setting (1c) aside for the moment, I'm not sure I see a fundamental difference here. In other words, (0) is compatible with (1a) and (1b); it has them as special cases. ((0), like categorial grammar, doesn't require that anything (beyond categorized string tuples) be built up at all.) (1a) simply says (if I understand this proposal correctly) that the maps from derivations to PF/LF go through a common intermediate representation, and that we do well to factor them into a common map from derivations to derived things, and the maps from these to PF/LF. (Salvati (2011) actually makes this claim.) (1b) is (I think) a claim about the proper formulation of locality conditions, and (1c) is (at least the second conjunct) about phases. If our derivations are regular, then if these (1b,c) constraints are regular too, we can reformulate them over derivations, as you know.
DeleteI think the adequacy of the SMC or equivalent is one of the big important problems that no one is working on. It's the question of whether there is a natural formal restriction on our syntactic theory that makes it empirically restrictive. This is the meat! This is the big question! Are there any formal linguistic (syntactic) universals?! (See e.g. Keenan & Stabler 2010.) To you I'll confess that I am unsure whether it is adequate to think of the SMC (or SPIC) as an emergent property of parser preferences. (Adapting from McAllester, the SMC/SPIC at the level of the grammar would be a `low temperature limit' of the parser's behaviour.) But anyways, this question (adequacy of SMC or equivalent=formal vs substantive universals) seems to relate also to `Darwin's Problem'; formal universals seem much more natural to explain away than do substantive ones. (Although as Behme continues to justifiably point out, no one has the slightest idea how to link language with biology, and so this bit is all hand waving.)
And I am intrigued by some of the psycholinguistic work coming out of Colin Phillips' lab and other places suggesting that some linguistic illusions may be explained in part by the fact that operations requiring search may be more susceptible to interference that categorial operations that are 'carried forward' in the derivation.
I am very excited at the prospect of a reconcilliation of linguistics and psycholinguistics, and Colin is one of the people responsible for this. I think that especially here, where behavioural predictions are being inspired by linguistic theory, it is important that the linguistic theory be formally well understood, so that these predictions are not based on accidents of notation.
It is no surprise that I think perspective (0) is enormously valuable, partly because it is just so extremely clear and simple and reasonably close to descriptive adequacy. But not to overstate the case, I think there are some genuine issues raised by some of the alternative perspectives. Another advantage of perspective (0) is that it make it much easier to see what the genuine issues actually are.
Amen.
I think that the spirit of your note could have been recast as talking about the parser grammar relation; e.g.
[...] we have not persuaded Collins (or Chomsky, or many others) that the levels conception is better.
I'm late to the party, but here's a few rambling thoughts of mine.
Delete(0) is compatible with (1a) and (1b); it has them as special cases.
That's true, but not a satisfactory answer for linguists, which is what Ed is pointing out.
For example, Brody has argued that one reason to stay away from derivational approaches is that they are too fine-grained, multiple derivations correspond to one and the same phrase structure tree. Brody thinks that this extra granularity has no empirical reflexes, so phrase structure trees are the better representation format. I disagree, and some of my recent work on the Adjunct Island Constraint relies on how adjuncts are built, not what they look like. I think that a case for derivations can be made, rather easily in fact, but in order for it to be persuasive we have to do more than show how linguists' ideas are just special cases of the derivational picture.
The same goes for (1b). We can do it over derivation trees, but if all conditions look more natural over phrase structure trees, that won't convince linguists. Fortunately there are many cases where derivation trees provide the nicer picture, and I can't think of many where it is uglier (basically only conditions stated at the level of S-structure, but these have all been abandoned except for the LCA).
As for Chris's specific concern about Transfer, I am currently auditing his seminar on Formalizing Minimalism, and from what I've gathered so far he is really worried about defining Transfer-PF and Transfer-LF for phrase structure trees, either multi-dominance or standard trees with chains, possibly even the kind of parallel chains Chomsky uses in his most recent papers. Combined with ideas like partial pronunciation of copies, reconstruction and so on this does become quite a mess quickly. So he is right that these things have not been worked out carefully. He does seem genuinely interested in the MG perspective, he just isn't convinced yet that we can throw out all these specialized devices like parallel chains without losing something in the translation. Not necessarily wrt what can be expressed at all, but what can be expressed in a natural fashion.
"For example, Brody has argued that one reason to stay away from derivational approaches is that they are too fine-grained, multiple derivations correspond to one and the same phrase structure tree. Brody thinks that this extra granularity has no empirical reflexes, so phrase structure trees are the better representation format. "
DeleteSo the debate here I take it is whether "structural descriptions", which are some latent structures that mediate the sound meaning relationship, should be taken to be derivation trees or derived trees. In order to answer this we need to figure out what the *necessary* properties of these latent structures are. Now I get the impression that linguists often tacitly assume that we only want one SD per distinct reading of a sentence; is that right?
But that doesn't seem necessary, even if it might be desirable, and it also doesn't seem possible, given various assumptions.
Yes, the debate is about derivation trees vs derived trees, but it's not so much about readings per structure.
DeleteSemantic ambiguity does not entail syntactic ambiguity. For instance, the German sentence Peter muss heute in der Stadt sein is ambiguous between a deontic reading "It is mandatory for Peter to be in the city today" and an epistemic reading "It must be the case that Peter is in the city today". Some syntacticians might want to posit different structures for these two sentences, but it's an empirical issue, not one of principle.
In the other direction, you can have multiple SDs with the same reading, e.g. in cases of heavy NP shift. This operation can even be string vacuous, so you would have two distinct SDs that yield the same string and the same meaning.
The question of granularity is mostly a syntactic one: do syntactic dependencies care about how a structure is built, or only what it looks like? In a framework such as Minimalism, where the derivational history is partially encoded in the SD via traces, chains, or multi-dominance, that's a tricky issue.
I wrote inaccurately -- I meant one SD per string/meaning pair.
DeleteWhat do you mean by "it's an empirical issue" wrt the German example?
What sort of empirical evidence would tell you how many SDs are involved?
For instance certain syntactic operations may be grammatical with deontic muessen but not with the epistemic one. That could then be taken as evidence that two different SDs are involved. If you cannot find such differences, then you do not want to have different SDs for the two. This is somewhat complicated by the fact that there are also semantic restrictions on when and where deontic and epistemic readings are available, so even if the two do not completely behave the same, you still have to make the argument that this is due to syntactic differences. And all of this hinges on the analysis of very subtle data and pondering what the most succinct/elegant account of that data might be.
DeleteAnyways, to get back to the main point, the one SD per string/meaning pair works as a rule of thumb, but not in all cases, which is what I tried to point out with the heavy NP-shift example.
However, a slightly weakened version may be along the right tracks:
(Weak) Effect on Output Condition. For every operation O, there has to be some SD S such that applying O to S yields an SD S' that is mapped to a different string-meaning pair than S.
I know, that's very sloppy; what should O be for, say, a CFG, or what does it mean to apply O to an SD. But the idea is simple enough: an operation has to be in principle capable of affecting the output strings or the meaning of a sentence. Not necessarily in all cases (cf. heavy NP shift), but in some. This gets you very close to one SD per string/meaning pair while allowing for some exceptions. And that seems to be very close to what modern syntax looks like.
@Thomas: Brody's problem is `spurious ambiguity', and I think it cannot be dismissed so glibly. (I confess, it is not clear to me what weight it should have, but `one type per word, one structure per meaning' certainly has a strong raw appeal to it.) And the idea that one structure corresponds to one meaning is central to the idea of a bimorphism. The standard responses are to (i) use underspecified meanings, and preserve the bimorphism structure or (ii) use fully specified meanings, but enrich the structures. Making the morphisms non-deterministic is either a notational variant of (ii), or takes us farther away from things we understand how to reason about.
Delete@Greg: I'm not dismissing Brody's problem, quite the opposite. As for the structure-meaning correspondence being essential to bimorphisms, I wasn't describing what the situation looks like from a formal perspective, my musings were on the tacit linguistic consensus (or what I take it to be). Because the whole crux of this discussion (and many others on this blog) is to which extent linguists consider the formal perspective in line with their ideas about syntax.
DeleteI think things like RNR and argument cluster coordination and other arguments that are used to motivate type-logical and categorial approaches make it implausible that one can avoid spurious ambiguity entirely, but I don't know how these are dealt with in MGs.
Delete@Thomas: Of course, I am a linguist, so you mean something narrower. I think that (in syntax and even in phonology) there are two problems. (I) the rejection (either explicitly in the case of syntax, or implicitly in phonology) of formalism, resulting in having no way to distinguish between notation and theory, and (II) losing track of where and why hypotheses are made. An example of (II) is Chomsky's assumption that merge should be set union (or pair formation). This assumption is based on the idea that, because set union is pretty simple formally (I don't know how to quantify this), it will be easier to explain the evolutionary emergence of in future work. This assumption leads directly to lots of well-known problems, and equally well-known theoretical patches. Okay, so now we have a huge theory built in part to protect the completely unmotivated pipe dream that set union is an evolutionary gimme. But of course, this theory is also built to describe the 50-odd years of deep linguistic insights had by the transformational tradition. Say Josephine linguist finds the latter extremely valuable, and the former extremely dubious. What does she do?
Delete@Greg: yes, the intended denotation for "linguist" is much narrower. As a matter of fact, it's closer to "syntacticians that do not care much for computational linguistics", which definitely doesn't include either of us.
DeleteSay Josephine linguist finds the latter [linguistic insights] extremely valuable, and the former [~biolinguistic assumptions] extremely dubious. What does she do?
I suppose there are many routes Josephine might take depending on her theoretical inclinations, but it's not really pertinent to the point I wanted to make. I do not disagree with anything you say from a linguistic perspective, our views line up almost perfectly in that regard. And in the perfect world that I inhabit in my happiest daydreams, everybody else is similarly detached from matters of notation and arbitrary technical details. But that's not the case in the real world.
In the real world we still have to sell our ideas to an audience with a very different perspective. Formal arguments don't get much attention unless they confirm whatever the person you're talking to already believes, with the standard reply being that the formal results rely on flawed assumptions or the wrong evaluation criteria. You can't counter that with another purely formal argument, because you'll get exactly the same reply.
So if Chris is wondering about Transfer-PF, pointing out the dlmbutt and MSO implementations probably won't be very persuasive. But it might be more persuasive if it is coupled with a precise definition of his idea of Transfer-PF, a corresponding dlmbutt/MSO implementation, and a demonstration that the technical details omitted in the first implementation do not add anything significant to the second.
In the case of derivations vs derived trees, showing that the two are equally powerful won't convince many linguists. Showing that derivation trees are helpful for analytic work will be more persuasive, and if we also show why spurious ambiguity isn't much of an issue in MGs, then we have a really fine argument.
So very long story short: it's not about what's the best theory (mine, obviously ;) ), it's about what's the best argument for the best theory.