As readers may have noticed (even my mother has noticed!), I am very fond of Poverty of Stimulus arguments (POS). Executed well, POSs generate slews of plausible candidate structures for FL/UG. Given my delight in these, I have always wondered why it is that many other otherwise intelligent looking/sounding people don’t find them nearly as suggestive/convincing as I do. It could be that they are not nearly as acute as they appear (unlikely), or it could be that I am wrong (inconceivable!), or it could be that discussants are failing to notice where the differences lie. I would like to explore this last possibility by describing two different senses of pattern, one congenial to an empiricist mind set, and one not so much. This is not, I suspect, a conscious conviction and so highlighting it may allow for a clearer understanding of where disagreement lies, even if it does not lead to a Kumbaya resolution of differences. Here goes.
The point I want to make rests on a cute thought experiment suggested by an observation by David Berlinski in his very funny, highly readable and strongly recommended (especially with those who got off on Feyerabend’s jazz style writing in AgainstMethod) book Black Mischief. Berlinski discusses two kinds of patterns. The first is illustrated in the following non-terminating decimal expansions:
1. (a) .222222…
If asked to continue into the … range, a normal person (i.e. a college undergrad, the canonical psych subject and the only person buyable with a few “extra” credits, i.e. cheap) would continue (1a) with more 2s, (1c) with more 3s (1c) with 45s and (1d) with 1234s. Why, because the average person would detect the indicated pattern and generalize as indicated. People are good at detecting patterns of this sort. Hume discussed this kind of pattern recognition behavior, as have empiricists ever since. What the examples in (1) illustrate is constant conjunction, and this leads to a simple pattern that humans have little trouble extracting, (at least in the simple cases).
Now as we all know, this will not get us great results for examples like (2).
2. (a) .141592653589793…
The cognoscenti will have recognized (2a) as the decimal part of the decimal expansion of π (15 first digits) and (2b) as the decimal part of the decimal expansion of e (15 first digits). If our all purpose undergrad were asked to continue the series he would have a lot of trouble doing so (Don’t take my word for it. Try the next three digits). Why? Because these decimal expansions don’t display a regular pattern as they have none. That’s what makes these numbers irrational in contrast with the rational numbers in (1). However, and this is important, the fact that they don’t display a pattern does not mean that it is impossible to generate the decimal expansions in (2). It is possible and there are well known algorithms for doing so (as we display anon). However, though there are generative procedures for calculating the decimal expansions of π and e, these procedures differ from the ones underlying (1) in that the products of the procedures don’t exhibit a perceptible pattern. The patterns, we might say, contrast in that the patterns in (1) carry the procedures for generating them in their patterning (Add 2,3, 45, 1234, to the end), while this is not so for the examples in (2). Put crudely, constant conjunction and association exercised on the patterning of 2s in (1a) lead to the rule ‘keep adding 2’ as the rule for generating (1a), while inspecting the patterning of digits in (2a) suggests nothing whatsoever about the rule that generates it (e.g. (3a)). And this, I believe, is an important conceptual fault line separating empiricists from rationalists. For empiricists, the paradigm case of a generative procedure is intimately related to the observable patternings generated while Rationalists have generally eschewed any “resemblance” between the generative procedure and the objects generated. Let me explain.
As Chomsky has repeatedly correctly insisted, everybody assumes that learners come to the task of language acquisition with biases. This just means that everyone agrees that what is acquired is not a list, but a procedure that allows for unbounded extension of the given (finite) examples in determinate ways. Thus, everyone (viz. both empiricists and rationalists (thus, both Chomsky and his critics)) agrees that the aim is to specify what biases a learner brings to the acquisition task. The difference lies in the nature of the biases each is willing to consider. Empiricists are happy with biases that allow for the filtering of patterns from data. Their leading idea is that data reveals patterns and that learning amounts to finding these in the data. In other words, they picture the problem of learning as roughly illustrated by the example in (1). Rationalists agree that this kind of learning exists, but that there are learning problems akin to that illustrated (2). And that this kind of learning demands departure from algorithms that look for “simple” patternings of data. In fact, it requires something like a pre-specification of the possible generative procedures. Here’s what I mean.
Consider learning the digital expansion of π. It’s possible to “learn” that some digital sequence is that of π by sampling the data (i.e. the digits) if, for example, one is biased to consider only a finite number of pre-specified procedures. Concretely, say I am given the generative procedures in (3a) and (3b) and am shown the digits in (2a). Could I discover how to continue the sequence so armed? Of course. I could quickly come to “know” that (2a) is the right generative procedure and so I could continue adding to the … as desired. (Excuse 'infinity' below. Blogspot doesn't like the infinity sideways 8)
π = 2 ∑ k!/(2k+1)!! = 2 ∑ 2k k!2/ (2k+1)! = 2 [ 1+ 1/3 (1 + 2/5 (1 + 3/7 ( 1 +…)))]
(b) e = lim (1+1/n)n = 1 + 1/1! + 1/2! + 1/3! + ...
How would I come to know this? By plugging several values for k, n into (3a,b) and seeing what pops out. (3a) will spit out the sequence in (2a) and (3b) that of (2b). These generative procedures will diverge very quickly. Indeed the first computed digit renders us confident that asked to choose (3a) or (3b) given the data in (2a), (3a) is an easy choice. The moral: even if there are no patterns in the data learning is possible if the range of relevant choices is sufficiently articulated and bounded.
This is just a thought experiment, but I think that it highlights several features of importance. First, that everyone is knee deep in given biases, aka: innate, given modes of generalizations. The question is not whether these exist but what they are. Empiricists, from the Rationalist point of view, unduly restrict the admissible biases to those constructed to find patterns in the data. Second, that even in the absence of patterned data, learning is possible if we consider it as a choice among given hypotheses. Structured hypothesis spaces allow one to find generative procedures whose products display no obvious patterns. Bayesians, by the way, should be happy with this last point as nothing in their methods restricts what’s in the hypothesis space. Bayes instructs us how to navigate the space given input data. IT has nothing to say about what’s in the space of options to begin with. Consequently there is no a priori reason for restricting it to some functions rather than others. The matter, in other words is entirely empirical. Last, it pays to ask whether for any problem of interest it is more like that illustrated in (1) or in (2). One way of understanding Chomsky’s point is that when we understand what we want to explain, i.e. that linguistic competence amounts to a mastery of “constrained homophony” over an unbounded domain of linguistic objects (see here), then the problem looks much more like that in (2) than in (1), viz. there are very few (1) type patterns in the data when you look closely and there are even fewer when the nature of the PLD is considered. In other words, Chomsky’s bet (and on this I think he is exactly right) is that the logical problem of language acquisition looks much more like (2) than like (1).
A historical aside: Here, Cartwright provides the ingredients for a nice reconstructed history. Putting more than a few words in her mouth, it would go something like this:
In the beginning there was Aristotle. For him minds could form concepts/identify substances from observation of the elements that instanced them (you learn ‘tiger’ by inspecting tigers, tiger-patterns lead to ‘tiger’ concepts/extracted tiger-substances). The 17th century dumped Aristotle’s epistemology and metaphysics. One strain rejected the substances and substituted the patterns visible to the naked eye (there is no concept/substance ‘tiger’ just some perceptible tiger patternings). This grew up to become Empiricism. The second, retained the idea of concepts/substances but gave up the idea that these were necessarily manifest in visible surface properties of experience (so ‘tiger’ may be triggered by tigers but the concept contains a whole lot more than what was provided in experience, even what was provided in the patternings). This view grew up to be Rationalism. Empiricists rejected the idea that conceptual contents contain more than meets the eye. Rationalists gave up the idea the content of concepts are exhausted by what meets the eye.
Interestingly, this discussion persists. See for example Marr’s critique of Gibsonian theories of visual perception here. In sum, the idea that learning is restricted to patterns extractable from experience, though wrong, has a long and venerable pedigree. So too the Rationalist alternative. A rule of thumb: for every Aristotle there is a corresponding Plato (and, of course, vice versa).
 There is surely a bound to this. Consider a decimal expansion whose period are sequences of 2,500 digits. This would likely be hard to spot and the wonders of “constant” conjunction would likely be much less apparent.
 Answer: for π: 2,3,8 and for e: 2,3,5.
 Hence the ton of work done on categorization, categorization of prior categorizations, categorization of prior categorizations of prior categorizations…
 Or may exist. Whether it does is likely more complicated than usually assumed as Randy Gallistel’s work has shown. If Randy is right, then even the parade cases for associationism are considerably less empiricist than often assumed.