This is a long post. I guess I got carried away. In my defense, I think that the topic keeps coming up obliquely and so it’s time to take it head on. The issue is the role of formalization in grammatical research. Let me be up front with the conclusion: there’s nothing wrong with it, it can be valuable if done the right way, it has been valuable in some cases where it has been done the right way, but there seems to be an attitude floating in the atmosphere that there is something inherently indispensible about it, and that anything that is not formalized is ipso facto contentless and/or incapable of empirical examination. This, IMO, is balderdash (a word I’ve always wanted to use). This is not only false within linguistics, it is false for other sciences as well (I mention a few examples below). Moreover, the golden glow of formalization often comes wrapped in its won dark cloud. In particular, if done without a sensitivity to the theories/concepts of interest, it risks being imposing looking junk. This is actually more deleterious than simple junk for formalizations can suggest depth and can invite a demand for respect that simple junk does not. This point will take me some time to elaborate and that’s why the post is way too long. Here goes.
There is an interesting chain of conversation in the thread to this post. In the original, I outlined “a cute example” from Chomsky, which illustrated the main point about structure dependence but sans the string altering features that T to C movement (TCM) in Yes/No questions (Y/N) can induce. To briefly review, the example in (1) requires that instinctively modify the main verb swim rather than the verb inside the relative clause fly; this despite (i) the fact that eagles do instinctively fly but don’t instinctively swim (at least I don’t think they do) and (ii) fly is linearly closer to instinctively than swim is.
(1) Instinctively, eagles that fly swim
The analogy to T to C movement (TCM) in Y/Ns lies in both adverbial modification and TCM being insensitive to string linear properties. I thought the example provided a nice simple illustration of how one can get sidetracked by irrelevant surface differences (e.g. in *Is the boy who sleeping is dreaming, the pair who sleeping is an illicit bigram) from missing the main point, viz. that examples like this are a dime a dozen and that this surface blemish cannot generally be expected to help matters much (e.g. Can boys that sleep dream means ‘is it the case that boys that sleep can dream’ and not ‘is it the case that boys that can sleep dream’). At any rate, I thought that the adverbial modification facts were useful in making this point, but I was wrong. And I was wrong because I failed to consider how the adverbial case would be generated and how its generation compared to that of Y/Ns. I plan to try and rectify this mistake here before getting on to a discussion of how and whether formalization can clarify matters, a point raised in the notes by Alex Clark and Benjamin Boerschinger, though in somewhat different ways. But first let’s consider the syntax of the matter.
The standard analysis of Y/Ns is that T moves to C to discharge some demand of C. More specifically, in some Gs (note TCM is not a feature of UG!), +WH C needs a finite T for “support.” GB analyzes this operation as a species of head movement from the genus ‘move alpha’. Minimalists (at least some) have treated this as an instance of I-merge. Let’s assume something like this is correct. The UGish question is why in sentences like (2) can must be interpreted as moving from the matrix T and not the embedded relative clause T?
(2) [Can+C+WH [DP boys [CP that (*tcan) sleep]] *(tcan) dream]
What makes (2) interesting is precisely the fact that in cases like this there are two potential candidates available for satisfying the needs of C+WH but that only one of them can serve, the other being strictly prohibited from moving. The relevant question is what principle selects the right T for movement? One answer is that the relevant T is the one “closest” to C and that proximity is measured hierarchically, not linearly. Indeed, such examples show that were locality measured string linearly then the opposite facts should obtain. Thus, these kinds of data indicate that in such cases we had better prohibit string linear measures of proximity as they never seem to be empirically condign for Y/Ns.
So far, I believe, there is no controversy. The battle begins in trying to locate the source of the prohibition against string linear restrictions like the one proposed. For people like me, we take these kinds of cases to indicate that the prohibition has its roots in UG. Thus, a particular G eschews strong linear notions of proximity because UG eschews such notions and thus they cannot be components of particular Gs. Others argue that the prohibitions are G specific and thus that structure independent processes of syntax are possible but for the appropriate input. In other words, were our young LADs and LASs (language acquisition device/system) exposed to the relevant input they would acquire rules that moved the linearly most proximate T to C rather than the hierarchically most prominent one. Thus, the disagreement, like all disagreements about scientific principles, is one about counterfactual situations. So, how does one argue about a counterfactual?
My side scours the Gs of the world and argues that linear sensitive rules are never found in their syntax and that this argues that structure dependence is part of UG. Or, we argue that the relevant data to eliminate the strong linear condition is unavailable in the PLD available to LADs and LASs and so the absence of strong linear conditions in GEnglish, for example, cannot be traced to the inductive proclivities of LADs and LASs. The other side argues (or has to argue) that this is just a coincidence for there is nothing inherent to Gs that prohibits such processes and that in cases where particular Gs eschew string linear conditions it’s because the data surveyed in the acquisition process sufficed to eliminate it, not because such conditions could not have been incorporated into particular Gs to regulate how their rules apply.
Note that the arguments are related but independent. Should the absence of any string linear prohibitions in any Gs prove correct (and I believe that it is very strongly supported) it should cast doubt on any kind of “coincidence” theory (btw, this is where the adverbial cases in (1) are relevant). So too should evidence that the PLD is too weak to provide an inductive basis for eliminating the strong linear option (which I also believe has been demonstrated, at least to my satisfaction).
This said, it is important to note that this is a very weak conclusion. It simply indicates that something inherent to UG eliminates string linear conditions as options, it does not specify what the structure relevant feature of UG is. And here is where collapsing (1) and (2) can mislead. Let me explain.
The example in (1) would typically be treated as a case of adverb fronting (AF), along the lines of (3).
(3) Adverb1 […t1…]
AF contrasts with TCM in several respects. First, it is not obligatory. Thus declaratives without AF are perfectly acceptable, in contrast with Y/Ns without TCM. Second, whereas every finite clause must contain a T, not every finite declarative need contain an adverb.
The first difference can be finessed in a simple (quite uninteresting) way. We can postulate that whenever AF has applied some head F has attracted it. Thus, AF in (1) is not really optional. What’s optional is the F feature that attracts the adverb. Once there, it functions like its needy C+WH counterpart.
The second difference, I believe, drives a wedge between the two cases. Here’s how: given that a relative clause and a matrix clause will each contain (finite) T0s and hence both be potential C+WH rescuers, there is no reason to think that they will each contain adverbs. Why’s this relevant? Because whereas for TCM we can argue that we need a principle to select the relevant T that moves, there is no obvious choice of mover required for AF. So whereas we can argue that the right principle for TCM in (4a) is something like Shortest Attract/Move (SA/M) this would not suffice for (4b) where there is but one adverb available for fronting. Thus, if SA/M is the right principle regulating TCM it does not suffice to regulate AF cases (if, I assume here, they are species of I-merge).
(4) a. [C+WH [RC …T…]…T…]
b. [F+ADV [RC …ADV…]…]
What else is required? Well, the obvious answer is something like the CNPC and/or the Subject Condition (SC). Both would suffice to block AF in (4b). Moreover, both have long been considered properties of UG and both are clearly structure sensitive prohibitions (they are decidedly not string linear). However, island conditions and minimality restrictions are clearly different locality conditions even if both are structure dependent.
Now this has been a long-winded and overly didactic review of a much over-discussed example. Why do I bring this up again?! Because of some comments by Alex Clark suggesting that the AF facts could be derived in formalized minimalist grammars and that therefore this nullifies any explanation of the kind provided above, viz. that UG explains the data in (1) and (2) by noting that the relevant structures are underivable (my emphasis in what follows):
So here is a more controversial technical claim:
let English+ be English with additionally the single incorrect pairing (s,w2). English+ can be generated by an MCFG; ergo it can be generated by an MG. English++ is English but additionally with the fronted adverbs out of subject relatives.; again generable by an MG. (MG means Stabler's Minimalist grammars with shortest move constraint). So I think these claims are correct, and if not could someone technical chime in and correct me.
So Norbert is right that the grammars will look strange. Very strange indeed if you actually convert them from an MCFG. But they are allowed by this class of grammars, which in a sense defines the notion of licit grammatical dependencies in the theory. So Norbert wants to say, oh well if my theory makes the wrong predictions then it has been formalized incorrectly, and when it is formalized correctly it will make the right predictions, Period. But while this is certainly a ballsy argument, it's not really playing the game.
Alex is right, Stabler’s MG with SA/M can derive the relevant AF examples as the derivation implicit in (4b) does not violate SA/M. That’s why an MG including this (let’s call it ‘SMG’) cannot prevent the derivation. However, as Stabler (here) notes there are other MGs that code for other kinds of locality restrictions. In fact, there are whole families of such, some encoding relativized minimality (RMG) and some embodying phases (PMG and PCMG). I assume, though Stabler does not explicitly discuss this, that it is also possible to combine different locality restrictions together in an MG (i.e. RPMG that combines both relativized minimality and phases). So what we have are formalizations of various MG grammars (SMG, RMG, PMG, PCMG, and RPMG) all with slightly different locality properties that generate slightly different licit structural configurations. Stabler shows that despite these differences, these restricted versions of MG all share some common computational properties such as efficient recognition and parsability. However, they are, as Stabler notes, also different in that they allow for different kinds of licit configurations, PMG/PCMGs blocking dependencies that violate the PIC and RMGs blocking those that violate relativized minimality (see his section 4). In sum, there are varieties of MGs that have been formalized by Stabler and Co. and these encode different kinds of conditions that have been empirically motivated in the minimalist literature. There is no problem formalizing these different MGs nor in recognizing that despite being different in what structures they license they can still share some common general properties.
Three observations: First, I leave it as an exercise for the reader to code island restrictions (like the CNPC) in phase based terms. This is not hard to do given that phases and the original subjacency theory (i.e. employing bounding nodes) are virtually isomorphic terms (hint: D is a phase without an accessible phase edge).
Second, the Stabler paper offers one very good reason for formalizing grammars. The paper shows that different theories (i.e. those that characterize UG differently) can nonetheless share many features in common. Though empirically relevant, the different locality conditions do not differ in some of their more general computational features. Good. What we see is that not all empirically different characterizations of FL/UG need have entirely different computational properties.
Third, Stabler recognizes that the way to explore MG and UG computationally is to START with the empirically motivated features research has discovered and then develop formalizations that encode them. More pointedly, this seems to contrast with Alex’s favored method of choosing some arbitrary formalism (simple MG) and then insisting that anyone who thinks that this is the wrong formalism (e.g. moi) for the problem at hand is (though “ballsy,” hmm, really! OK), “not really playing the game.” Au contraire: to be interesting the formal game requires formalizing the right things. If research has found that FL/UG contains islands and minimality, then to be interesting your formalization had better code both these restrictions. If it doesn’t it’s just the wrong formalization and is not, and should not, be part of any game anyone focused on FL/UG plays. There may be some other game (as David Adger suggests in his comments on the post) but it is arguably of no obvious relevance to any research that syntacticians like moi are engaged in and of dubious relevance to the investigation of FL/UG or the acquisition of grammar. Boy, that felt good to say!
Now, it is possible that some grammars encode inconsistent principles and that formalization could demonstrate this (think Russell’s Paradox and Frege’s naïve set theory). However, this is not at issue here. What is at issue is how one conceives of the proper role of formalization in this sort of inquiry. Frankly, I am a big fan. I think that there has been some very insightful work of the formalizing sort. However, there has also been a lot of bullying. And there has been a lot of misguided rhetoric. Formalization is useful, but hardly indispensible. Remember, Euclidean geometry did just fine for thousands of years before finally formalized by Hilbert, so too the Calculus before Cauchy/Weierstrass. Not to mention standard work in biology and physics, which though (sometimes) mathematical is hardly formalized (not at all the same thing; formal does not equate with formalized). What we need are clear models that make clear predictions and that can be explored. Formalization can help in this process, and to the degree that it does, it should be encouraged. But PULEEEZE, it is not a panacea and it is not even a pre-requisite for good work. And, in general it should be understood to be the handmaiden of theory, not its taskmaster. To repeat, formalizations that formalize the wrong things or leave out the right things is of questionable value. Jim Higginbotham has expressed this well in his discussion of whether English is a context free language (here). As he put it:
…once our attention turns to core grammar as the primary object of linguistic study, questions such as the one that I have tried to answer here are of secondary importance (232).
What matters are the properties of FL/UG. Formalizations that encompass these are interesting and can be useful tools for investigating further properties of FL/UG. But, and this is the important part (maybe this is what makes my attitude “ballsy”) they must earn their keep and if they fail to code the relevant features of “core grammar” or FL/UG then, no matter how careful and precise their claims I don’t see what they bring to the game. Quite often what we find is all hat, no cattle.
 The traces are here to mark the possible base positions of can. (*…) means ‘unacceptable if included’ while *(…) means unacceptable if left out.
 There are also actual language acquisition studies like Crain and Nakayama that are relevant to evaluating the UG claim.
 Indeed, it need not be a feature of UG at all, at least in principle. Imagine an argument to the effect that learning in general, i.e. whatever the cognitive domain, ignores string linear information. Were that so, it would suffice to explain why it is so ignored in the acquisition of Gs. However, this view strikes me as quite exotic and, I believe, would cause not small degree of problems in the acquisition of phonology, for example, where string linear relations is where all the action is.
 I assume that this is how to treat AF. If, however, adverbs could be base generated sentence initially and their interpretation were subject to a rule like “modify the nearest V” then AF phenomena would be entirely analogous to CTMs. The main reason I doubt that this is the right analysis comes from data like that in note 7, where it appears that adverbs can move quite a distance from the verbs they modify. This is certainly true of WH adverbs like when and how but I also find the cases like (i) in note 7 acceptable with long distance readings.
 TCM also seems obligatory in WH questions, thought there is some debate right now about whether TCMs applies in questions like who left. For Y/Ns, however, it is always required in matrix clauses. Here is a shameless plug for a discussion of these matters in a recent paper joint paper with Vicki Carstens and Dan Seely (here).
 I hope it goes without saying that I am simplifying matters here. The relevant F is not +ADV for example, but something closer to +focus, but none of this is relevant here as far as I can see.
 There is one further important difference between TCM and AF. The latter is not strictly local. Thus, in (i) tomorrow can modify either the matrix or the embedded clauses. Nonetheless, it cannot modify the relative clause in (ii):
(i) Tomorrow, NBR is reporting that Bill will be in Moscow
(ii) Tomorrow, the news anchor that is reporting from DC will be in Moscow
 I am setting aside the question whether there is a way of unifying the two. It is to be hoped that there is, but I don’t know of any currently workable suggestions. Such a unification would not alter anything said below, though it would make for a richer and nicer theory of FL/UG.
 In other papers Stabler considers MGs that allow for sidewards movement and those that don’t. It is nice to see that to date the theoretical innovations proposed have been formalizable in pretty straightforward ways, or so it appears.
 Stabler observes the obvious parallelism with bounding node versions of Subjacency. The isomorphism between the older GB theory and modern Phase Theory means that Phase Theory does not advance our understanding of islands in any minimalistically interesting way. However, for current purposes the fact that Phases and the PIC can code the CNPC and SC suffices. We all welcome the day when we have a deeper understanding of the locality restrictions underlying islands.
 IMO, this is the best reason to formalize: to see if two things that look very different may nonetheless be similar (or identical) in with respect to other properties of interest. It is possible to be careful and clear without being formalized (indeed, the latter often obscues as much as it enlightens). However, formalizing often allows one to take a bird’s eye view of theoretical matters that matter, and when used thus they it can be extremely enlightening.
 Let me insist: this is not to argue that formalization does not have a place in such investigations. Rather, it is to argue that the fact that some formalization fails to make the relevant cut is generally a problem about the formalization not the adequacy of the empirical cut. See below.
 I am a big fan of Bob Berwicks’s work, as well as Stabler and Co, Tim Hunter’s (with Chris and alone), stuff on Phonology by Heinz and Idsardi to name a few. What makes these all interesting to me is their being anchored firmly in the syntactic theory literature.
 Thx to Bob Berwick for the reference.