This was prompted by various replies to Jeff's recent post (see
below), which triggered many memories of déjà vu. But it got too long, in
every relevant sense, for a single comment.
Rightly or wrongly, I think knowledge of language is an
interesting example of knowledge acquired under the pressure of experience, but
not acquired by generalizing from experience. Rightly or wrongly, I suspect
that most knowledge is of this sort. That's one way of gesturing at what it is
to be a Rationalist. So I'm curious how far the current wave of skepticism
regarding POS-arguments goes, since such arguments are the lifeblood of
Rationalism.
Once upon a time, somebody offered an argument that however people
acquire knowledge of the Pythagorean Theorem, it isn't a matter of generalizing
from observed instances of the theorem. This leads me to wonder, after reading
some of the replies to Jeff: is the Platonic form of a POS argument also
unpersuasive because (1) it is bound up in "meaty, theory-internal
constructs," and (2) the input is impoverished only relative to "an
(implicit) superficial encoding and learning algorithm"? If not, what
makes the argument that Jeff offered relevantly different? The classic POS
arguments in linguistics were based on observations regarding what certain
strings of words cannot mean, raising the question of how the relevant
constraints could be learned as opposed to tacitly assumed. What's so
theory-internal about that?
Moreover, sensible rationalists never denied that thinkers can
and often do represent certain "visible things"--drawn on the board,
or in the sand--as right triangles, and hence as illustrations of
theorems concerning right triangles. The point, I thought, was that any such
way of "encoding the data" required a kind of abstraction that is
tantamount to adopting axioms from which the theorems follow. If one uses
experience of an actual drawing to activate and apply ideas of right angles
formed by lines that have no width, then one is using experience in a
remarkable way that makes it perverse to speak of "learning" the
theorem by generalizing from experience. But of course, if one distinguishes
the mind-independent "experienceables" from overtly representational encodings--I
believe that Jeff usually stresses the input/intake contrast--then any
experience-dependent knowledge acquisition can be described as the result of
"generalizing" from encodings, given a suitably rich framework for
encodings. Indeed, given a suitably rich framework, generalizing from a single
case is possible. (It's worth remembering that we speak of both arithmetic
induction and empirical induction. But if knowledge of linguistic constraints
turns out to be more like knowledge acquired via arithmetic induction, that's
hardly a point against Rationalists who use POS arguments to suggest that
knowledge of linguistic constraints turns out to be more like knowledge
acquired via arithmetic induction.)
With enough tenacity, I guess one can defend the idea that (pace
Descartes) we learn from our encodings-of-experience that the world contains
material things that endure through time and undergo change, and that (pace
Leibniz) we generalize from observations of what is the case to conclusions
about what might be or must be the case, and that (see Dyer and Dickinson,
discussed by Gallistel and others) novice bees who were only allowed to forage
a few times in late afternoons still generalized from their
encodings-of-experience in a way that allowed them to communicate the location
of food found on the first (and overcast) morning. Put another way, one can
stipulate that all experience-dependent knowledge acquisition is learning, and
then draw two consequences: (1) POS-arguments show that a lot of learning--and
perhaps all learning--is very very unsuperficial, and (2) a huge part of the
enterprise of studying knowledge of language and its acquisition consists in
(a) repeatedly reminding ourselves just how unsuperficial this
knowledge/acquisition is, and (b) using POS arguments to help discover the
mental vocabulary in terms of which encodings of the relevant experience are
formulated. But (1) and (2) seem like chapter one and verse of Aspects.
So as usual, I'm confused by the whole debate about POS arguments.
Is the idea that with regard to human knowledge of language, but not knowledge
of geometry (or bee-knowledge of solar ephemeris), there's supposed to be some
residual plausibility to the idea that generalizations of the sort Jeff has
pointed to (again) can be extracted from the regularities in experienceables
without effectively coding the generalizations in terms of how the "data
of experience" gets encoded? If so, is there any better form of the
argument that would be accepted as persuasive; or is it that with regard to
knowledge of linguistic generalizations, the prior probability of Empiricism
(in some suitably nonsuperficial form) is so high that no argument can dislodge
it?
Or is the skepticism about POS arguments more general, so that
such arguments are equally dubious in nonlinguistic domains? If so, is there
any better form of the argument (say regarding geometry, or the bees) that
would be accepted as persuasive; or is it that with regard to knowledge of all
generalizations, the prior probability of Empiricism (in some suitably
nonsuperficial form) is so high that no argument can dislodge it?
Of course, nobody in their right mind cares about drawing a sharp
line between Rationalism and Empiricism. But likewise, nobody in their right
mind denies that there is at least one distinction worth drawing in this
vicinity. Team-Plato, with Descartes pitching and Leibniz at shortstop, uses
POS considerations to argue that (3) we encode experience and frame hypotheses
in very interesting ways, and (4) much of what we know is due to how we encode
experience/hypotheses, as opposed to specific experiences that
"confirm" specific hypotheses. There is another team, more motley,
whose roster includes Locke, Hume, Skinner, and Quine. They say that while (5)
there are surely innate mechanisms that constrain the space of hypotheses
available to human thinkers, (6) much of what we know is due to our having
experiences that confirm specific hypotheses.
To be sure, (3-6) are compatible. Disagreements concern cases, and
"how much" falls under (6). And I readily grant that there is ample
room for (6) under the large tent of inquiry into knowledge of language; again,
see chapter one of Aspects. Members of Team-Plato can agree that (6) has
its place, against the background provided by (3) and (4); though many members
of the team will insist on sharply distinguishing genuine cases of
"inductive bias," in which one of two available hypotheses is
antecedently treated as more likely, from cases that reflect knowledge of how the
relevant vocabulary delimits the hypothesis space (as opposed to admitting a
hypothesis but assigning a low or even zero prior probability). But my question
here is whether there is any good reason for skepticism about the use of POS
arguments in support of (3) and (4).
Absent a plausible proposal about how generalizations of the sort
Jeff mentions are learned, why shouldn't we conclude that such generalizations
fall under (4) rather than (6)?
Sidepoint: it's not like there is any good basis, empirical or
conceptual, for thinking that most cases will fall under (6)--or that
relegation to (4) should be a last resort. The history of these debates is
littered with versions idea that Empiricism is somehow the
default/simpler/preferable option, and that Rationalists have some special
burden of proof that hasn't yet been met. But I've never met a plausible
version of this idea. (End of sidepoint.)
I'm asking because this bears on the question of whether or not
linguistics provides an interesting and currently tractable case study of more
general issues about cognition. (That was the promise that led me into
linguistics; but as a philosopher, I'm used to getting misled.)
If people think that POS arguments are generally OK in the
cognitive sciences, but not in linguistics, that's one thing. If they think
that POS arguments are generally suspect, that's another thing. And I can't
tell which kind of skepticism Jeff's post was eliciting.
I think there are real differences between language and geometry. One is that the truths of geometry don't depend on contingent cultural facts or vary from place to place.
ReplyDeleteFor me it is not a philosophical debate but entriely an empirical one. There is, as Chomsky never tires of pointing out, some part of human genetic endowment that is absent in rocks and kittens that allows us to acquire language. This acquisition ability is some sort of LAD, which takes the input and outputs some grammar that defines an infinite set of sound meaning pairings. Both the LAD and the input are *essential* to the process.
You write " I think knowledge of language is an interesting example of knowledge acquired under the pressure of experience, but not acquired by generalizing from experience. ".
For me the term "generalizing" just means that the grammar output is not just a memorisation of the input: i.e. that the child can understand and produce sentences that are not in the input. And "from experience" just means that the output of the LAD depends on the input. I don't see how one can deny either of these. And learning is the term we use in the learning theory community for that process. So I think there is a terminological difference here unless you are referring to Fodorian brute causal processes, which I guess you aren't?
Now the POS used to be an argument for domain specific knowledge in the LAD. I never liked that argument. This has now changed in the MP era and now the POS is just, as it should be a question: what the hell is going on in the LAD that such rich grammars can come out of such poor inputs? (to copy the title of a recent book).
So my view is we can move forward by making specific proposals about the structure of the LAD as a computationally efficient learning algorithm that can output grammars of the right type, and we can make progress by developing a general theory of learning grammars. As opposed to using a general theory of probabilistic learning like Charles Yang does or like the Bayesians do. But that's just me; others have different approaches which is of course good. One approach however does not try to come up with models of the LAD. Rather it comes up merely with restrictions on the class of grammars output by the LAD.
So rather than saying "the LAD is this function ..." they say "the LAD is some function whose outputs lie in the class of grammars G". i.e. just defining the range of the function.
Or worse just saying "the LAD outputs grammars that have property P" where the class of grammars and the property P are not defined. My scepticism is about these types of solutiom, which in the examples I have looked at don't explain anything. I.e. with "solutions" to the POS of the form "X is innate so the learner doesn't have to learn X ". And my second objection to this is Darwin's problem in the case when X is clearly specific to language, and the proposed solution which is just to kick the can down the road and hope that at some point in the future there will be a reduction to something non language specific.
(PS One of the reasons I dislike the term UG is that as it is often used it blurs the distinction between the LAD and properties of the outputs of the LAD. So if anyone feels the need to use the term UG, please specify which you mean.)
Just lost another comment so here is a truncated version:
ReplyDeleteYou write
"(4) much of what we know is due to how we encode experience/hypotheses, as opposed to specific experiences that "confirm" specific hypotheses.
...
.(6) much of what we know is due to our having experiences that confirm specific hypotheses. "
So I am not sure I see the contrast between 4 and 6 in quite the right place.
So if we have a Bayesian PCFG learner, then it doesn't rely on specific experiences that confirm specific hypotheses, but rather on a subtle computation where potentially every experience confirms probabilistically almost every other hypothesis to a greater or lesser extent, whereas a triggering algorithm in a P & P model does have a specific experience that confirms a specific hypothesis (a parameter value setting).
Whereas I think most people would have this the other way round.
Alex: I don't think our goals, or even the methods, are all that different.
ReplyDeleteLinguists, especially those in the P&P persuasion, try to characterize the properties of human languages with hopefully good learnability properties. Some of us work on this more directly: for instance, William Sakas, Bob Berwick and I will argue that a linguistically motivated parameter space is feasible to learn, at next year's GALANA conference to be held at Maryland.
Folks like you, and many in the grammar induction tradition, try to characterize formally learnable classes of languages. The burden of proof is to show that these classes of languages correspond to something like human language.
None of us has really gotten there yet but all of us, including the Bayesians, are reacting to the consequences of formal learning theories. As Gold noted, to achieve positive learnability, we can restrict the space of grammars or provide the learner with information about statistical distributions of the languages. Of course, these two tracks are not mutually exclusive.
I look forward to reading your paper with Bob and William when it is ready ; sounds like the sort of work there should be more of.
ReplyDeleteI think our methods and assumptions are quite similar though we come to different conclusions -- but these are very different from those of Paul and Jeff and Norbert. So I wonder what your take is on the strategy of making thinks like PISH innate without going all the way to a finite class of grammars.
I think a lot of the work in inductive inference exactly looks at how to "characterize formally learnable classes of languages". That isn't quite what I do -- I am trying to characterize learnable languages, with respect to certain algorithmic approaches. I don't think there is any problem then with relating these languages to human languages, since (most of the time) they are fairly close variants of the standard grammars that mathematically precise linguists use. Other people in the grammatical inference community look at other types of grammars -- pattern grammars, external contextual grammars that are rather further away from the standard view.
Anyway, I am sure your turkey needs either to be basted or taken out off its brine depending on what time zone you are in, so happy thanskgiving to all, and especially to Norbert for stimulating some interesting discussion.
Really good work there. Informative and helpful. Appreciate it. But might be looking for POS System in North Carolina
ReplyDelete