I love Whig History (WH). I have even tried my hand at it (here, here, here, here). What sets them apart from actual history is that they abstract away from the accidents of history and, if successful, reveal the “inner logic” of historical events. A Whig History’s conceit is that it outlines how events should have unfolded had they been guided by rational considerations. We all know that these are never all that goes on, but scientists hope that this goes on often enough, if not at the individual level, then at the level of the discipline as a whole. Even this might be debated (see here), but to the degree that we can fashion a WH, to that degree we can rationally guide inquiry by learning from our past mistakes and accomplishments. It’s a noble hope, and I am a fervent believer.
Given this, I am always on the lookout for good rational reconstructions of linguistic history. I recently came across a very good one by Tom Bever that I want to make available (here). Let me mark a few of my personal favorite highlights.
1. The paper starts with a nice contrast between Behaviorist (B) vs R methodologies:
The behaviorist prescribes possible adult structures in terms of a theory of what can be learned; the rationalist explores adult structures in order to find out what a developmental theory must explain.
Two comments: First, throughout the paper, TB contrasts B with R. However, the right contrast is E with R, B being a particularly pernicious species of E. Es take mental structures to be inductive products of sensory inputs. Bs repudiate mental structures altogether, favoring direct correlation with environmental stimuli. So whereas Es allow for mental structures which are reducible to environmental parameters, Bs eschew even these. Chomsky’s anti-E arguments were not confined to the B version of E. It extends to all Associationist conceptions.
Second, TB’s observation regarding the contrasting directions of explanation for Es and Rs exposes E’s unscientific a priorism. Es start with an unfounded theory of learning and infer from this what is and is not acquirable/learnable. This relies on the (incorrect) assumption that the learning theory is well-grounded and so can be used to legislate acquisition’s limits.
Why such confidence in the learning theory? I am not entirely certain. In part, I suspect that this is because Es confuse two different issues: they run together the pretty obvious correct observation that belief fixation causally requires stimulus input (e.g. I speak west island Montreal English because I was raised in an English speaking community of west-island Montrealers) with the general conception that all beliefs can be logically reduced to inductions over observational (viz. sensational) inputs. Rs can (and do) accept the first truistic part while rejecting the second much stronger conception (e.g. the autonomy of syntax thesis just is the claim that syntactic categories and processes cannot be reduced to either semantic or phonetic (i.e. observational) inputs). Here’s where Rs introduce the notion of an environmental “trigger.” Stimuli can trigger the emergence of beliefs. They do not shape them. Beliefs are more than congeries of stimuli. They have properties of their own not reducible to (inductive) properties of the observational inputs.
Rs reverse the E direction of inquiry. Rs start with a description of the beliefs attained and then ask what kind of acquisition mechanism is required to fix the beliefs so described. In short, Rs argue from facts describable in (relatively) neutral theoretical terms and then look for cognitive theories able to derive these data. If this looks like standard scientific practice, it’s because it is. Theories that ascribe a priori knowledge to the acquisition system (as R accounts typically do) need not themselves suffer from methodological a priorism (as E theories of learning typically do). These points have often been confused. Why? R has suffered from a branding problem. The morphological connection between ‘empiricism’ and ‘empirical’ has misled many onto thinking that Es care about the data while Rs don’t. False. If anything, the reverse is closer to the truth, for Rs do no put unfounded a priori restrictions on the class of admissible explananda.
2. Empiricism in linguistics had a particular theoretical face: the discovery procedure (DP), understood as follows (115):
Language was to be described in a hierarchy of levels of learned units such that the units at each level can be expressed as a grouping of units at an intuitively lower level. The lowest level was necessarily composed of physically definable units.
This conception has a very modern ring. It’s the intuition that lies behind Deep Learning (DL) (see here). DL exploits a simple idea: that learning not only induces from the observational input but that outputs of prior inductions can serve as inputs to later (more ”abstract”) ones. In contrast to turtles, its inductions all the way up. DL, then, is just the rediscovery of DPs, this time with slightly fancier machines and algorithms. DL is now very much in vogue. It informs the work of psychologists like Elisa Newport, among others. However, whatever its technological virtues, GGers know it to an inadequate theory of language acquisition. How do we know this? Because we’ve run around this track before. DL is a gussied up DP and all the new surface embroidery does not make it any more adequate as an acquisition model for language. Why not? Because higher levels are not just inductive generalizations over lower ones. Levels have their own distinctive properties, and this we have known for at least 60 years.
TB’s discussion of DPs and their empirical failures is very informative (especially Harris’s contribution to the structuralist DP enterprise). It also makes clear why the notion of “levels” and, in particular, their “autonomy” is such a big deal. If levels enjoy autonomy then they cannot be reduced to generalizations over information at earlier levels. There can, of course, be mapping relations between levels, but reduction is impossible. Furthermore, in contrast to DP (and DL) there is no asymmetry to the permissible information flow: lower levels can speak to higher ones and vice versa. Given the contemporary scene, there is a certain déjà vu quality to TBs history, and the lessons learned 60 years ago have, unfortunately, been largely unlearned. In other words, TB’s discussion is, sadly, very relevant still.
3. Linguistics and Psycholinguistics
The bulk of TB’s paper is a discussion of how early theories of GG mixed with the ambitions of psychologists. GG is a theory of competence. We investigate this competence by examining native speaker judgments under “reflective equilibrium.” Such judgments abstract away from the baleful effects of resource limitations such as memory restrictions or inattention and (it is hoped) this allows for a clear inspection of the system of linguistic knowledge as such. As TB notes, very early on there was an interesting interaction between GG so understood and theories of linguistic behavior (122):
Linguistics made a firm point of insisting that, at most, a grammar was a model of “competence” – what the speaker knows. This was distinguished form “performance” – how the speaker implements this knowledge. But, despite this distinction, the syntactic model had great appeal as a model of the processes we carry out when we talk and listen. It offered a precise answer to the question of what we know when they know the sentences in their language: we know the different coherent levels or representation and the linguistic rules that interrelate those levels. It was tempting to postulate that the theory of what we know is a theory of what we do…This knowledge is linked to behavior in such a way that every syntactic operation corresponds to a psychological process…
Testing the hypothesis that there is a one-to-one relation between grammatical rules/levels and psychological processes and structures was described as investigating the “psychological reality” of linguistic structures/operations in ongoing behavior. In other words, how well does linguistic theory accommodate behavioral measures (confusability, production time, processing time, memorizability, priming) of language use in real time? TB reviews this history, and it is fascinating.
A couple of comments: First, the use of the term “psychological reality” was unfortunate. It implied that what GG studied was not a part of psychology. However, this, if TB is right, was not the intent. Rather, the aim was to see if the notions that GGers used to great effect in describing linguistic knowledge could be extended to directly explain occurrent linguistic behavior. TB’s review suggests that the answer is in part “yes!” (see TB’s discussion of the click experiments, especially as regards deep structure on 127). However, there were problems as well, at least as regards early theories. Curiously, IMO, one interesting feature of TB’s discussion is that the problems cited for the “identification thesis” (IT) are far less obvious from the vantage point of today’s Gs then those of yesteryear.
Let me put this another way: one thing that theorists like to ask experimentalists is what the latter bring to the theoretical table. There is a constant demand that psycholinguistic results have implications for theories of competence. Now, I am not one who believes that the goal of psycholinguistic research should be to answer the questions that most amuse me. There are other questions of linguistic interest. However, the early history that TB reviews provides potentially interesting examples of how psycholinguistic results would have been useful for theoreticians to consider. In particular TB offers examples in which the psycholinguistic results of this period pointed towards more modern theories earlier than purely linguistic considerations did (e.g. see the discussion of particle movement (125) or dative shift (124)). Thus, this period offers examples of what many keep asking for, and so they are worth thinking about.
Second, TB argues that the “psychological reality” considerations had mixed results. The consensus was that there is lots of evidence for the “reality” of linguistic levels but less evidence that G rules and psychological processes are in a one-to-one relation. In other words, there is consensus that the Derivational Theory of Complexity (DTC) is wrong.
For what it’s worth, my own view is that this conclusion is overstated. IMO it’s hard to see how the DTC could be wrong (see here). Of course, this does not mean that we yet understand how it is right. Nonetheless, a reasonable research program is to see how far we can get in assuming that there is a very high level of transparency between the operations and structures of our best competence theories and those of our best performance theories. At least as a regulative ideal, this looks like a good assumption, and it has produced some very interesting work (e.g. see here).
Let’s end. Tom Bever has written a very useful paper on a fascinating period of GG history. It’s a very good read, with lessons of great contemporary relevance. I wish that were not so, but it is. So take a look.
 If internal representations map perfectly onto environmental variables, then the advantages of the former are unclear. However, eschewing representations altogether is not a hallmark of classical Eism.