As readers may have noticed (even my mother has noticed!), I
am very fond of Poverty of Stimulus arguments (POS). Executed well, POSs
generate slews of plausible candidate structures for FL/UG. Given my delight in
these, I have always wondered why it is that many other otherwise intelligent
looking/sounding people don’t find them nearly as suggestive/convincing as I
do. It could be that they are not nearly as acute as they appear (unlikely), or
it could be that I am wrong (inconceivable!), or it could be that discussants
are failing to notice where the differences lie. I would like to explore this
last possibility by describing two different senses of pattern, one congenial
to an empiricist mind set, and one not so much. This is not, I suspect, a conscious
conviction and so highlighting it may allow for a clearer understanding of
where disagreement lies, even if it does not lead to a Kumbaya resolution of
differences. Here goes.
The point I want to make rests on a cute thought experiment
suggested by an observation by David Berlinski in his very funny, highly
readable and strongly recommended (especially with those who got off on
Feyerabend’s jazz style writing in AgainstMethod) book Black Mischief. Berlinski discusses two kinds of patterns.
The first is illustrated in the following non-terminating decimal expansions:
1. (a)
.222222…
(b) .333333…
(c) .454545…
(d)
.123412341234…
If asked to continue into the … range, a normal person (i.e.
a college undergrad, the canonical psych subject and the only person buyable
with a few “extra” credits, i.e. cheap) would continue (1a) with more 2s, (1c)
with more 3s (1c) with 45s and (1d) with 1234s.
Why, because the average person would detect the indicated pattern and
generalize as indicated. People are good
at detecting patterns of this sort. Hume discussed this kind of pattern
recognition behavior, as have empiricists ever since. What the examples in (1)
illustrate is constant conjunction, and this leads to a simple pattern that
humans have little trouble extracting, (at least in the simple cases[1]).
Now as we all know, this will not get us great results for
examples like (2).
2.
(a) .141592653589793…
(b)
.718281828459045…
The cognoscenti will
have recognized (2a) as the decimal part of the decimal expansion of π (15
first digits) and (2b) as the decimal part of the decimal expansion of e (15 first digits). If our all purpose
undergrad were asked to continue the series he would have a lot of trouble
doing so (Don’t take my word for it. Try the next three digits[2]).
Why? Because these decimal expansions don’t display a regular pattern as they
have none. That’s what makes these numbers irrational in contrast with the
rational numbers in (1). However, and this is important, the fact that
they don’t display a pattern does not mean that it is impossible to generate the decimal expansions in (2). It
is possible and there are well known algorithms for doing so (as we display
anon). However, though there are generative procedures for calculating the decimal
expansions of π and e, these
procedures differ from the ones underlying (1) in that the products of the procedures don’t exhibit a perceptible pattern.
The patterns, we might say, contrast in that the patterns in (1) carry the
procedures for generating them in their
patterning (Add 2,3, 45, 1234, to the end), while this is not so for the
examples in (2). Put crudely, constant conjunction and association exercised on
the patterning of 2s in (1a) lead to the rule ‘keep adding 2’ as the rule for
generating (1a), while inspecting the patterning of digits in (2a) suggests
nothing whatsoever about the rule that generates it (e.g. (3a)). And this, I believe, is an important conceptual
fault line separating empiricists from rationalists. For empiricists, the
paradigm case of a generative procedure is intimately related to the observable
patternings generated while Rationalists have generally eschewed any
“resemblance” between the generative procedure and the objects generated. Let
me explain.
As Chomsky has
repeatedly correctly insisted, everybody
assumes that learners come to the task of language acquisition with
biases. This just means that everyone
agrees that what is acquired is not a list, but a procedure that allows for
unbounded extension of the given (finite) examples in determinate ways. Thus, everyone (viz. both empiricists and
rationalists (thus, both Chomsky and his critics)) agrees that the aim is to
specify what biases a learner brings to the acquisition task. The difference
lies in the nature of the biases each is willing to consider. Empiricists are
happy with biases that allow for the filtering of patterns from data.[3]
Their leading idea is that data reveals patterns and that learning amounts to
finding these in the data. In other
words, they picture the problem of learning as roughly illustrated by the
example in (1). Rationalists agree that
this kind of learning exists,[4]
but that there are learning problems
akin to that illustrated (2). And that this kind of learning demands departure
from algorithms that look for “simple” patternings of data. In fact, it requires
something like a pre-specification of the possible
generative procedures. Here’s what I
mean.
Consider learning the
digital expansion of π. It’s possible to “learn” that some digital sequence is
that of π by sampling the data (i.e. the digits) if, for example, one is biased
to consider only a finite number of pre-specified
procedures. Concretely, say I am given the generative procedures in (3a)
and (3b) and am shown the digits in (2a). Could I discover how to continue the
sequence so armed? Of course. I could quickly come to “know” that (2a) is the
right generative procedure and so I could continue adding to the … as desired. (Excuse 'infinity' below. Blogspot doesn't like the infinity sideways 8)
3 (a)
infinity infinity
π = 2 ∑
k!/(2k+1)!! = 2 ∑ 2k k!2/ (2k+1)! = 2 [ 1+ 1/3 (1
+ 2/5 (1 + 3/7 ( 1 +…)))]
k=0 k=0
(b) e = lim
(1+1/n)n = 1 + 1/1! + 1/2!
+ 1/3! + ...
nà infinity
How would I come to
know this? By plugging several values for k,
n into (3a,b) and seeing what pops
out. (3a) will spit out the sequence in (2a) and (3b) that of (2b). These
generative procedures will diverge very quickly. Indeed the first computed digit
renders us confident that asked to choose (3a) or (3b) given the data in (2a),
(3a) is an easy choice. The moral: even
if there are no patterns in the data
learning is possible if the range of relevant choices is sufficiently
articulated and bounded.
This is just a thought
experiment, but I think that it highlights several features of importance.
First, that everyone is knee deep in given biases, aka: innate, given modes of
generalizations. The question is not
whether these exist but what they are. Empiricists, from the Rationalist point
of view, unduly restrict the admissible biases to those constructed to find
patterns in the data. Second, that even in the absence of patterned
data, learning is possible if we consider it as a choice among given hypotheses. Structured hypothesis
spaces allow one to find generative procedures whose products display no
obvious patterns. Bayesians, by the way, should be happy with this last point
as nothing in their methods restricts what’s in the hypothesis space. Bayes
instructs us how to navigate the space given input data. IT has nothing to say
about what’s in the space of options to begin with. Consequently there is no a priori reason for restricting it to
some functions rather than others. The matter, in other words is entirely
empirical. Last, it pays to ask whether for any problem of interest it is more
like that illustrated in (1) or in (2). One way of understanding Chomsky’s
point is that when we understand what we want to explain, i.e. that linguistic
competence amounts to a mastery of “constrained homophony” over an unbounded
domain of linguistic objects (see here), then the problem looks much more like that in (2)
than in (1), viz. there are very few (1) type patterns in the data when you
look closely and there are even fewer when the nature of the PLD is
considered. In other words, Chomsky’s
bet (and on this I think he is exactly right) is that the logical problem of
language acquisition looks much more like (2) than like (1).
A historical aside:
Here, Cartwright provides the ingredients for a nice reconstructed history.
Putting more than a few words in her mouth, it would go something like this:
In the beginning there was Aristotle. For him minds could form concepts/identify
substances from observation of the elements that instanced them (you learn
‘tiger’ by inspecting tigers, tiger-patterns lead to ‘tiger’ concepts/extracted
tiger-substances). The 17th century dumped Aristotle’s epistemology
and metaphysics. One strain rejected the substances and substituted the
patterns visible to the naked eye (there is no concept/substance ‘tiger’ just
some perceptible tiger patternings). This grew up to become Empiricism. The
second, retained the idea of concepts/substances but gave up the idea that
these were necessarily manifest in visible surface properties of experience (so
‘tiger’ may be triggered by tigers but the concept contains a whole lot more
than what was provided in experience, even what was provided in the patternings). This view grew up to be Rationalism.
Empiricists rejected the idea that conceptual contents contain more than meets
the eye. Rationalists gave up the idea the content of concepts are exhausted by
what meets the eye.
Interestingly, this
discussion persists. See for example Marr’s critique of Gibsonian theories of
visual perception here. In sum, the idea that learning is restricted to
patterns extractable from experience, though wrong, has a long and venerable
pedigree. So too the Rationalist alternative. A rule of thumb: for every
Aristotle there is a corresponding Plato (and, of course, vice versa).
[1]
There is surely a bound to this. Consider a decimal expansion whose period are
sequences of 2,500 digits. This would likely be hard to spot and the wonders of
“constant” conjunction would likely be much less apparent.
[2]
Answer: for π: 2,3,8 and for e:
2,3,5.
[3]
Hence the ton of work done on categorization, categorization of prior
categorizations, categorization of prior categorizations of prior
categorizations…
[4]
Or may exist. Whether it does is
likely more complicated than usually assumed as Randy Gallistel’s work has
shown. If Randy is right, then even the parade cases for associationism are
considerably less empiricist than often assumed.