In an earlier post (here),
I reviewed Fodor’s and Chomsky’s argument concluding that anyone that believes
in induction must be a nativist. Why?
Because all extant inductive theories of belief fixation (BF) are selection theories and all selection
theories presuppose a given
hypothesis space that characterizes all the possible
fixable beliefs. Thus, anything that
“learns” (fixes beliefs) must have a representation of what is learned (a given
hypothesis space) which is used to evaluate the input/experience in fixing
whatever beliefs are fixed. Absent this,
it is impossible to define an inductive procedure.[1]
Thus, trivially (or almost tautologically (see note 1)), whatever one’s theory of induction,
be it Rationalist or Empiricist, everyone is a nativist. The question is not whether nativism but
what’s native. And here is where Rationalists and Empiricists actually differ.
Before going on, let me remind you that both Fodor and
Chomsky (and all the participants at Royaumont it seems to me) took this to be
a trivial, nay, almost a tautological consequence of what induction is. However, this does not mean that it is not
worth remembering and repeating. It is still the case that intelligent people
confuse Rationalism with Nativism and assume that Empiricists have no nativist
commitments. This suggests that Rationalists contrast with Empiricists in
making fancy assumptions about minds and hence bear the burden of proof in any
argument about mental structures.
However, once it is recognized that all psychological theory is
necessarily nativist, the burden shifting manoeuver looses much of its punch.
The question becomes not whether the
mind is pre-stocked with all sorts of stuff, but what kind of stuff it is
stuffed with and how this stuff is organized.
Amy Perfors (here)
says this exactly right (135)[2]:
…because all models implicitly
define a hypothesis space, it does not make sense to compare models according
to whether they build hypothesis spaces in. More interesting questions are:
What is the size of the latent hypothesis space defined by the model? How
strong or inflexible is the prior?...
So given that everyone is a nativist, how to decide between
Rationalist (R) vs Empiricist (E) approaches to the mind. First of all, note
that given that everyone is a trivial nativist the debate between Rs and Es
necessarily revolves around how
beliefs are fixed and what this implies for the mind’s native structure. Interestingly,
probing this question ends up focusing on what kind of experience is required
to fix a given belief.
Es have traditionally taken the position that beliefs are
fixed by positive exposures to extensions of the relevant concepts. So, for
example, one fixes the belief that ‘red’ means RED by exposure to red, and that
‘dog’ means DOG by exposure to dogs. Thus, there is no belief fixation without
exposure to tokens in the relevant extensions of a concept. It is in this sense
that Es see the environment as shaping
mental structure. Minds track environmental input and are structured by this
input. The main contribution that minds make to the structure of their contents
is by being receptive to the information that the environment makes available. On
an E view, the trick is to figure out how to extract information in the signal. As should be obvious,
this sort of view champions the idea that minds are very good statistical
machines able to find valuable informational needles in potentially very large
input haystacks. Rs have no problem with this assumption, but they argue that
it is insufficient to account for our attested cognitive capacities.
More particularly, Rs argue that there is more to the
fixation of belief than environmental input. Or, another way of making this
same point, is that the beliefs that get fixed via exposure to input data far
outrun the information available from that input. Thus, thought the environment
can trigger the emergence of beliefs they
do not shape them for we have
ideas/concepts that are not themselves tokened in the input. If this is correct, then Rs reason that
hypothesis spaces are highly structured and what you come to “know” is strongly
affected by this given structure. Note
that the disagreement between Rs and Es hinges on what it is possible to glean
from available input.
So how to approach this disagreement in a semi-rational
manner? This is where the Logical
Problem of Acquisition (LPA) comes in.
What is the LPA? It’s an attempt
to specify the nature of the input data that an Acquisition Device (AD) has access
to and to then compare this to the properties of the attained competence.
Chomsky discusses the general form of this approach in chapter 1 of Reflections on Language (here).
In the study of language, the famous diagram in (1)
concisely describes the relevant issues:
(1) PLDL
-> FL -> GL
PLDL is the name we give to the linguistic data
from L that a child (actually) uses in building its grammar. FL is, well you
know, and GL is the resultant grammar that a native speaker
attains. One can easily generalize this
schema to other domains of inquiry by subbing other relevant domains for “L.” A
generalized version of the schema is (2) (‘X’ being a variable ranging over
cognitive domains of interest) and a version of it as applied to vision is (3).
So, if one’s interest is in visual object recognition (as for example in Marr’s
program), we can consider the schema in (2) as outlining the logic to be
explored (PVD = Primary visual data, FV = Faculty of Vision, GV = grammar (i.e.
rules) of vision).[3]
(2) PXD
-> FX -> GX
(3) PVD
-> FV -> GV
This schematic rendition of the LPA focuses the R vs E
debate on the information available in PXD. An Eish conception commits hostages
to the view that PXD is quite rich and that it provides a lot of information
concerning GX. To the degree that
information about GX can be garnered from PXD to that degree we need not
populate FX with principles to bridge the gap. Rish conceptions rest on the
view that PXD is a rather poor source of information relevant to GX. As a result, Rs assume that FX is generally
quite rich.
Note that both Rs and Es assume that FX has a native
structure. This, recall is common to both views. The question at issue is how
much belief fixation (or more exactly the fixation of a particular belief) owes to the nature of the data and how much to
the structure of the hypothesis space. As a first approximation one can say
that Rs believe that given hypothesis spaces are pretty highly structured so
that the data required to “search” that space can be quite sparse. Conversely,
the richer the set of available alternatives the more one needs to rely on the
data to fix a given belief. Thus for Rs all the explanatory action lies in
specifying the narrow range of available alternatives, while for Es most of the
explanatory action lies in specifying the (most often nowadays, statistical)
procedures that determine how one moves across a rather expansive set of
possibilities.
The schemas above suggest ways of investigating this
disagreement. Let’s consider some.
E invites the view that, ceteris
paribus, variations in PXD should lead to variations in GX as the latter
closely tracks properties of the former (it is in this sense that Es think of
PXD as shaping a person’s mental
states). Thus, if some kinds of inputs
are systematically absent in an individuals’ PXD, we should expect that that
individual’s cognitive development and attained competence should differ from
that of a individual with more “normal” inputs. Hume (our first systematic
associationist psychologist) gives a useful version of this view:[4]
…wherever by any accident the
faculties which give rise to any impressions are obstructed in their
operations, as when one is born blind or deaf, not only the impressions are
lost, but also their corresponding ideas; so that there never appear in the
mind the least traces of either of them.
There’s been lots of research over the last 50 years exploring
Hume’s contention in the domain of language acquisition. Lila Gleitman and Barbara Landau (G&L)
provides a good brief overview of some of the child language research
investigating these matters.[5]
It notes that the evidence does not support this prediction (at least in the
domain of language). Rather it seems that “humans reconstruct linguistic form
…[despite] the blatantly inadequate information offered in their usable
environment (91).” In other words, it seems that the course of language
acquisition can proceed smoothly (in fact no differently than what happens in
the “normal” case) even when the input to the system is perceptually very
limited and degraded. G&L interpret this Rishly to mean that language
acquisition is relatively independent of the nature of the quality of the
input, which makes sense if it is guided by a rich system of innate knowledge.
G&L illustrate the logic using two kinds of examples:
blind people can and do learn the meanings of words like ‘see’ and ‘look’
without being able to see or look, and people can acquire full native
competence (and can make very subtle “perceptual” distinctions in their
vocabulary) despite being blind and deaf. Indeed, it seems that even extreme
degradation of the sensory channels leaves the process of language acquisition
unaffected.
It is worth noting just how degraded the input can be when
compared to the “normal” case. G&L reporting Carol Chomsky’s original research
on learning via the Tadoma method (92):[6]
To perceive speech at all, the
deaf-blind must place their fingers strategically at the mouth and throat of
the speaker, picking up the dynamic movements of the mouth and jaw, the timing
and intensity of the vocal-cord vibration, and the release of air…From this
information, differing radically in kind and quality from the continuously
varying speech wave, the blind-deaf recover the same ornate system of
structured facts as do hearing learners…
In short, there is plenty of evidence that language
acquisition can (and does) take place in the face of extremely degraded input,
at least when compared with the PLD available in the standard case.[7]
The Poverty of Stimulus (PoS) argument also reflects the
logic of the schemas in (1-3). As the schema suggests, a PoS has two major
struts: a description of the available PLD and a description of the grammatical
operations of interest (i.e. the relevant rules). The next step compares what information
can be gleaned about the operation from the data, the slack is then used to
probe the structure of FL. The standard PoS question is then: what must we
assume about FL so that given the
witnessed PLD, the LAD can derive the
relevant rules? As the schema indicates,
the inference is from instances of
rules (used outputs of a grammatical system) to the rules that generate the
observed sentences. Put another way, whatever else is going on, the LPA
requires that FL at least contain
some ways of generalizing beyond the PLD. This is not controversial. What is
controversial is how fancy these methods for generalizing beyond the data have
to be. For Es, the generalizing procedures are quite anodyne. For Rs it is
often quite rich.
Well-designed PoS arguments focus on grammatical phenomena
for which there is no likely relevant
information available in the PLD. If Es are right (see Hume above), all
relevant grammatical operations and principles should find (robust?) expression
in the PLD. If Rs are right, we should find lots of cases where speakers develop
grammatical competence even in the absence of relevant PLD (e.g. all agree that “John expects Mary to hug
himself” is out and that “John expects himself to hug Mary is good” where
‘John’ is the antecedent of ‘himself’).
It goes without saying that given this logic debate between
Es and Rs will revolve around how to specify the PLD in relevant cases (see here
for a sophisticated discussion). So for example, all accept the idea that PLD
consists of good examples of the relevant operation (e.g. all take: “John
hugged himself” to be a typical data point bearing on principle A (A)). What of
negative data, data that some example is unacceptable with the indicated
interpretation (e.g. that “John expects Mary to hug himself” is out)? There is every reason to think that overt
correction of LAD “mistakes” barely occurs. So, in this sense the PLD does not contain negative data. However,
perhaps for the LAD absence of evidence is evidence of absence. In other words,
perhaps for the LAD failing to
witness an example like “John expects Mary to hug himself” leads to the
conclusion that the dependency between ‘John’ and ‘himself’ in these
configurations is illicit. This is entirely possible. So too with other *-cases.[8]
Note, that this reasoning requires a fancier FL than one
that simply assumes that all decisions are made on the basis of positive
data. So the logic of LPA is respected
here: we compensate for the absence of certain information in the PLD (i.e.
direct negative evidence) by allowing FL to evaluate expectations of what
should be seen in the PLD were a given construction good.[9]
The question an R would ask an E is whether the capacity to compute such
expectations doesn’t itself require a pretty hefty native capacity. After all,
many things are absent from the data, but only some of these absences tell us
anything (e.g. I would bet that for most cases in the PLD the anaphor is within
5 words of the antecedent, nonetheless “John confidently for a man of his age
and temperament believes himself to be ready to run the marathon” seems fine).
One assumption I commonly make in considering PoS arguments
is that PLD effectively consists of simple acceptable sentences (e.g. “John
likes himself”). This is the so-called
Degree 0 hypothesis (D0H).[10] If the PLD is so restricted, then FL must be very rich indeed for many robust
linguistic phenomena are simply unattested
(and recall, induction is impossible in the absence of any data to drive it) in
simple clauses; e.g. island effects, ECP effects, many binding effects,
minimality effects a.o. The D0H may be too strong, but there are two (maybe one
as they are related) reasons for thinking that it is on the right track.
The first is Penthouse Principle (PP) Effects. Ross noted long ago that there are many
operations restricted to main clauses but virtually none that apply exclusively
to embedded clauses. Subject Aux Inversion and Tag Question formation are two
examples from English. If we assume that
something like D0H is right(ish) we expect all idiosyncratic processes to be
restricted to main clauses where substantial evidence for them will be
forthcoming. Embedded clauses, on the
other hand should be very regular. At the very least we expect no operations to apply exclusively to embedded domains, the
converse of the PP as given D0H there can be no evidence to fix them.
The second reason relates to this. It’s a diachronic
argument David Lightfoot gave based on the history of English (here).
It is based on a very nice observation:
main clause properties can affect embedded clause properties but not vice
versa. Lightfoot illustrates this by considering the shift from OV to VO in
English. He notes that in the period in
which the change occurred, embedded clauses always
displayed OV order. Despite this, English changed from OV to VO. Lightfoot reasons as follows: were embedded
clause information robustly available there would have been very good evidence
that, despite appearances to the contrary
in unembedded clauses, that English was OV not VO (i.e. the attested change
to VO (which ended up migrating to embedded clauses) would never have occurred.
Thus, the fact that English changed in this way is nice (and influences in the
other direction are unattested) follows if something like D0H holds (viz. an
LAD don’t use embedded clause information child in the acquisition of its
grammar). Lisa Pearl subsequently elaborated a sophisticated quantitative
version of this argument here
and here.
The upshot: D0H holds. Of course if it does then the strong version of PoS
arguments for many linguistic phenomena readily spring to mind. No data, no induction. No induction, highly structured
natively given hypothesis spaces guiding the AD.
OK, this post has gotten out of control and is far too long.
Let me end by reiterating the take-home message. Rs and Es differ not on whether nativism but
on what is native. And, exploring the latter effectively revolves around
considerations of how much information the data contains (and the child can
use) in fixing its beliefs. This is where the action is. Research like what
G&L review is interesting in that it shows that achieved competence seems
quite insensitive to large variations in the relevant usable data. Classical PoS
arguments are interesting in that they provide cases where it is arguable that
there is no data at all in the input
relevant to fixing a given belief. If
this is so, then the mechanisms of belief fixation must lean very heavily on
the highly structured (and hence restricted nature) of the hypothesis space
that ADs natively bring to the belief fixation process. In R/E debates everyone
believes that input matters and everyone believes that minds have native
structure. The argument is about how
much each factor contributes to the process. And this, is something that can
only be adjudicated empirically. As things stand now, IMO, the fertility of the
Rish position in the domain of language (most of cognition actually) has been
repeatedly demonstrated. Score one (indeed many) for Descartes and Kant.
[1]
In effect, induction serves to locate a member/members from a given set of
alternatives. No pre-specified alternatives, no induction. Thus Fodor’s point:
for learning (i.e. belief fixation) to be possible requires a given set of
concepts that mediate the process.
Fodor emphasizes that this view, though trivial, is
not purely tautological. There does exist a tautological
claim that some have confused with Fodor’s. This misreading interprets Fodor as
saying that any acquired concept must be acquirable (i.e. a principle of modal
logic along the lines of: If I do
have the concept that I could have
had the concept). Alex Clark, for example, so reads Fodor (here): “There is a tautological claims which is that
I have an innate intellectual endowment that allows me to acquire the concept
SMARTPHONE in some way, on the basis of reading, using them, talking to people
etc. Obviously any concept I have, I must have the innate ability to have it…”
Fodor
notes this possible interpretation of his views at Royaumont (p. 151-2), but
argues that this is not what he is
claiming. He says the following: “The
banal thesis is just that you have the innate potential of learning any concept
you can in fact learn; which reduces, in turn, to the non-insight that whatever
is learnable is learnable. …What I intended to argue is something very much
stronger; the intended argument depends on what learning is like, that is the
view that everybody has always accepted, that it is based on hypothesis
formation and confirmation. According to that view, it must be the case that
the concepts that figure in the hypothesis you come to accept are not only potentially accessible to you, but are actually exploited to mediate the learning…The
point about confirming a hypothesis like "X is miv off it is red and
square" is that it is required that not only red and square be potentially
available to the organism, but that these notions be effectively used to
mediate between the organism's experiences and its consequent beliefs about the
extension of miv…”
In other words, if inductive logics require given
hypothesis spaces to get off the ground and if we attribute an inductive logic
to a learner then we must also be attributing to them the given hypothesis
space AND we must be assuming that it is in virtue of exploiting the properties
of that space in fixing a belief. So far as I can tell, this is what every inductivist is in fact
committed to.
[2]
Despite the terminological misstep of identifying Rationalism with Nativism on
p 127.
[3]
In Marr’s program, the grammar includes the rules and derivations that get us
from the grey scale sketch to the 2.5D sketch.
[4]
This is quoted in Gleitman and Landau, see note 4. The quote is from Hume’s Treatise p 49.
[6]
Carol Chomsky’s original papers on this topic are appendixed in book. They are
well worth reading. On the basis of the reported speech, the Tadoma learners
seem indistinguishable from “normal” native speakers.
[7]
G&L also note the excess of data problem towards the end of their paper.
This is something that Gleitman has explored in more recent work (discussed here and in
links cited there). Lila once noted that a picture is worth a thousand words,
and that is precisely the problem. In the early period of word learning the
child is flooded with logical possibilities when word learning is studied in
naturalistic settings. Here induction
becomes a serious challenge not because there is no information but because
there is too much and narrowing it down to the relevant stuff is very hard.
Lila and colleagues have argued that in such cases what the child does bears
relatively little resemblance to the careful statistical sampling that one
might expect if acquisition were via “learning.” This suggests that there must
be a certain sweet spot where data is available but not too available for
learning (induction) to be a viable form of acquisition. Where this is not
possible other acquisition procedures appear to be at play, e.g. guess and
guess again! Note, that this amounts to saying that resource constraints are
key factors in making “learning” an option. In many cases, learning (i.e.
reviewing the alternatives systematically) is simply too costly, and other less
seemingly rational procedures kick in. Interestingly, form an R perspective, it
is precisely when the field of options is narrowed (when syntax kicks in) that
something akin to classical learning appears to become viable.
[8]
For reasons I have never quite understood, many (see here)
have assumed that GGers are hostile to the idea that LADs can use “negative”
data productively. This is simply false.
See Howard Lasnik (here)
for a good review. As Lasnik notes, the
possibility that negative data could be relevant goes back at least to
Chomsky’s LGB (if not earlier). What is
relevant, is not whether negative data might be useful but what kinds of minds
can productively use it. The absence of
a barking is useful when one is listening for dogs. Thus, the more constrained the space of
options under consideration the easier it is to use absence of evidence as
evidence of absence. If you have no idea what you are looking for, not finding
it is of little informational value.
[9]
For example, Chater and Vitanyi (C&V) (here)
order the available hypotheses according to “simplicity” measured in MDL terms,
not unlike what Chomsky proposed in Aspects.
Not surprisingly, given such an ordering indirect negative evidence can be
usefully exploited (something that would not surprise a GGer). What C&V do not consider are the
possibility of cases where there is virtually no relevant positive or negative data in the PLD. This is what is
taken to be the strongest kind of PoS argument and is the central case
discussed in at least one of the references C&V cite (see here).
[10]
Most who think that this is more or less on the right track actually take
“simple” to mean un-embedded binding domains (e.g. Lightfoot). This is sometimes called Degree 0+. Thus, ‘Bill’ is in the PLD in (i) but not in
(ii):
(i)
John believes Bill to be intelligent
(ii)
John believes (that) Bill is intelligent