I love Whig History (WH). I have even tried my hand at it (here,
here,
here,
here).
What sets them apart from actual history is that they abstract away from the
accidents of history and, if successful, reveal the “inner logic” of historical
events. A Whig History’s conceit is that it outlines how events should have unfolded had they been
guided by rational considerations. We all know that these are never all that
goes on, but scientists hope that this goes on often enough, if not at the
individual level, then at the level of the discipline as a whole. Even this
might be debated (see here),
but to the degree that we can fashion a WH, to that degree we can rationally
guide inquiry by learning from our past mistakes and accomplishments. It’s a
noble hope, and I am a fervent believer.
Given this, I am always on the lookout for good rational
reconstructions of linguistic history. I recently came across a very good one
by Tom Bever that I want to make available (here).
Let me mark a few of my personal favorite highlights.
1. The
paper starts with a nice contrast between Behaviorist (B) vs R methodologies:
The behaviorist prescribes possible adult structures in
terms of a theory of what can be learned; the rationalist explores adult structures in order to find out what a developmental
theory must explain.
Two comments: First, throughout the paper, TB contrasts B
with R. However, the right contrast is E with R, B being a particularly
pernicious species of E. Es take mental structures to be inductive products of
sensory inputs. Bs repudiate mental structures altogether, favoring direct correlation
with environmental stimuli. So whereas Es allow for mental structures which are
reducible to environmental parameters, Bs eschew even these.[1] Chomsky’s anti-E arguments were not confined
to the B version of E. It extends to all Associationist conceptions.
Second, TB’s observation regarding the contrasting
directions of explanation for Es and Rs exposes E’s unscientific a priorism.
Es start with an unfounded theory of
learning and infer from this what is and is not acquirable/learnable. This
relies on the (incorrect) assumption that the learning theory is well-grounded
and so can be used to legislate acquisition’s limits.
Why such confidence in the learning theory? I am not
entirely certain. In part, I suspect that this is because Es confuse two
different issues: they run together the pretty obvious correct observation that
belief fixation causally requires stimulus input (e.g. I speak west island
Montreal English because I was raised in an English speaking community of
west-island Montrealers) with the general conception that all beliefs can be logically
reduced to inductions over observational (viz. sensational) inputs. Rs can (and
do) accept the first truistic part while rejecting the second much stronger
conception (e.g. the autonomy of syntax thesis just is the claim that syntactic
categories and processes cannot be reduced to either semantic or phonetic (i.e.
observational) inputs). Here’s where Rs introduce the notion of an
environmental “trigger.” Stimuli can trigger the emergence of beliefs. They do
not shape them. Beliefs are more than congeries of stimuli. They have
properties of their own not reducible to (inductive) properties of the
observational inputs.
Rs reverse the E direction of inquiry. Rs start with a
description of the beliefs attained and then ask what kind of acquisition
mechanism is required to fix the beliefs so described. In short, Rs argue from
facts describable in (relatively) neutral theoretical terms and then look for
cognitive theories able to derive these data. If this looks like standard
scientific practice, it’s because it is. Theories that ascribe a priori knowledge to the acquisition
system (as R accounts typically do) need not themselves suffer from
methodological a priorism (as E
theories of learning typically do). These points have often been confused. Why?
R has suffered from a branding problem. The morphological connection between
‘empiricism’ and ‘empirical’ has misled many onto thinking that Es care about
the data while Rs don’t. False. If anything, the reverse is closer to the
truth, for Rs do no put unfounded a
priori restrictions on the class of admissible explananda.
2. Empiricism
in linguistics had a particular theoretical face: the discovery procedure (DP),
understood as follows (115):
Language was to be described in a
hierarchy of levels of learned units such that the units at each level can be
expressed as a grouping of units at an intuitively lower level. The lowest
level was necessarily composed of physically definable units.
This conception has a very modern ring. It’s the intuition
that lies behind Deep Learning (DL) (see here). DL exploits a
simple idea: that learning not only induces from the observational input but
that outputs of prior inductions can serve as inputs to later (more ”abstract”)
ones. In contrast to turtles, its inductions all the way up. DL, then, is just
the rediscovery of DPs, this time with slightly fancier machines and
algorithms. DL is now very much in vogue. It informs the work of psychologists
like Elisa Newport, among others. However, whatever its technological virtues,
GGers know it to an inadequate theory
of language acquisition. How do we know this? Because we’ve run around this track
before. DL is a gussied up DP and all the new surface embroidery does not make
it any more adequate as an acquisition model for language. Why not? Because
higher levels are not just inductive generalizations over lower ones. Levels
have their own distinctive properties, and this we have known for at least 60
years.
TB’s discussion of DPs and their empirical failures is very
informative (especially Harris’s contribution to the structuralist DP
enterprise). It also makes clear why the notion of “levels” and, in particular,
their “autonomy” is such a big deal. If levels enjoy autonomy then they cannot
be reduced to generalizations over information at earlier levels. There can, of
course, be mapping relations between levels, but reduction is impossible. Furthermore,
in contrast to DP (and DL) there is no asymmetry to the permissible information
flow: lower levels can speak to higher ones and vice versa. Given the
contemporary scene, there is a certain déjà vu quality to TBs history, and the
lessons learned 60 years ago have, unfortunately, been largely unlearned. In
other words, TB’s discussion is, sadly, very relevant still.
3. Linguistics
and Psycholinguistics
The bulk of TB’s paper is a discussion of how early theories
of GG mixed with the ambitions of psychologists. GG is a theory of competence.
We investigate this competence by examining native speaker judgments under “reflective
equilibrium.” Such judgments abstract away from the baleful effects of
resource limitations such as memory restrictions or inattention and (it is
hoped) this allows for a clear inspection of the system of linguistic knowledge
as such. As TB notes, very early on there was an interesting interaction
between GG so understood and theories of linguistic behavior (122):
Linguistics made a firm point of
insisting that, at most, a grammar was a model of “competence” – what the
speaker knows. This was distinguished form “performance” – how the speaker
implements this knowledge. But, despite this distinction, the syntactic model
had great appeal as a model of the processes we carry out when we talk and
listen. It offered a precise answer to the question of what we know when they
know the sentences in their language: we know the different coherent levels or
representation and the linguistic rules
that interrelate those levels. It was tempting to postulate that the theory of
what we know is a theory of what we do…This knowledge is linked to behavior in
such a way that every syntactic operation corresponds to a psychological
process…
Testing the hypothesis that there is a one-to-one relation
between grammatical rules/levels and psychological processes and structures was
described as investigating the “psychological reality” of linguistic
structures/operations in ongoing
behavior. In other words, how well does linguistic theory accommodate behavioral
measures (confusability, production time, processing time, memorizability,
priming) of language use in real time? TB reviews this history, and it is
fascinating.
A couple of comments: First, the use of the term
“psychological reality” was unfortunate. It implied that what GG studied was not a part of psychology. However, this,
if TB is right, was not the intent. Rather, the aim was to see if the notions
that GGers used to great effect in describing linguistic knowledge could be
extended to directly explain occurrent linguistic behavior. TB’s review
suggests that the answer is in part
“yes!” (see TB’s discussion of the click experiments, especially as regards
deep structure on 127). However, there were problems as well, at least as
regards early theories. Curiously, IMO, one interesting feature of TB’s
discussion is that the problems cited for the “identification thesis” (IT) are
far less obvious from the vantage point of today’s Gs then those of yesteryear.
Let me put this another way: one thing that theorists like
to ask experimentalists is what the latter bring to the theoretical table.
There is a constant demand that psycholinguistic results have implications for
theories of competence. Now, I am not
one who believes that the goal of psycholinguistic research should be to answer the questions that
most amuse me. There are other questions of linguistic interest. However, the
early history that TB reviews provides potentially interesting examples of how
psycholinguistic results would have been useful for theoreticians to consider. In
particular TB offers examples in which the psycholinguistic results of this
period pointed towards more modern theories earlier than purely linguistic considerations
did (e.g. see the discussion of particle movement (125) or dative shift (124)).
Thus, this period offers examples of what many keep asking for, and so they are
worth thinking about.
Second, TB argues that the “psychological reality”
considerations had mixed results. The consensus was that there is lots of
evidence for the “reality” of linguistic levels but less evidence that G rules
and psychological processes are in a one-to-one relation. In other words, there
is consensus that the Derivational Theory of Complexity (DTC) is wrong.
For what it’s worth, my own view is that this conclusion is
overstated. IMO it’s hard to see how the DTC could be wrong (see here).
Of course, this does not mean that we yet understand how it is right. Nonetheless, a reasonable research program is
to see how far we can get in assuming that there is a very high level of
transparency between the operations and structures of our best competence
theories and those of our best performance theories. At least as a regulative ideal,
this looks like a good assumption, and it has produced some very interesting
work (e.g. see here).
Let’s end. Tom Bever has written a very useful paper on a
fascinating period of GG history. It’s a very good read, with lessons of great
contemporary relevance. I wish that were not so, but it is. So take a look.
[1]
If internal representations map perfectly onto environmental variables, then
the advantages of the former are unclear. However, eschewing representations
altogether is not a hallmark of classical Eism.
This comment has been removed by a blog administrator.
ReplyDelete