Comments

Wednesday, November 26, 2014

POS POST PAST


This was prompted by various replies to Jeff's recent post (see below), which triggered many memories of déjà vu. But it got too long, in every relevant sense, for a single comment.

Rightly or wrongly, I think knowledge of language is an interesting example of knowledge acquired under the pressure of experience, but not acquired by generalizing from experience. Rightly or wrongly, I suspect that most knowledge is of this sort. That's one way of gesturing at what it is to be a Rationalist. So I'm curious how far the current wave of skepticism regarding POS-arguments goes, since such arguments are the lifeblood of Rationalism.

Once upon a time, somebody offered an argument that however people acquire knowledge of the Pythagorean Theorem, it isn't a matter of generalizing from observed instances of the theorem. This leads me to wonder, after reading some of the replies to Jeff: is the Platonic form of a POS argument also unpersuasive because (1) it is bound up in "meaty, theory-internal constructs," and (2) the input is impoverished only relative to "an (implicit) superficial encoding and learning algorithm"? If not, what makes the argument that Jeff offered relevantly different? The classic POS arguments in linguistics were based on observations regarding what certain strings of words cannot mean, raising the question of how the relevant constraints could be learned as opposed to tacitly assumed. What's so theory-internal about that?

Moreover, sensible rationalists never denied that thinkers can and often do represent certain "visible things"--drawn on the board, or in the sand--as right triangles, and hence as illustrations of theorems concerning right triangles. The point, I thought, was that any such way of "encoding the data" required a kind of abstraction that is tantamount to adopting axioms from which the theorems follow. If one uses experience of an actual drawing to activate and apply ideas of right angles formed by lines that have no width, then one is using experience in a remarkable way that makes it perverse to speak of "learning" the theorem by generalizing from experience. But of course, if one distinguishes the mind-independent "experienceables" from overtly representational encodings--I believe that Jeff usually stresses the input/intake contrast--then any experience-dependent knowledge acquisition can be described as the result of "generalizing" from encodings, given a suitably rich framework for encodings. Indeed, given a suitably rich framework, generalizing from a single case is possible. (It's worth remembering that we speak of both arithmetic induction and empirical induction. But if knowledge of linguistic constraints turns out to be more like knowledge acquired via arithmetic induction, that's hardly a point against Rationalists who use POS arguments to suggest that knowledge of linguistic constraints turns out to be more like knowledge acquired via arithmetic induction.)

With enough tenacity, I guess one can defend the idea that (pace Descartes) we learn from our encodings-of-experience that the world contains material things that endure through time and undergo change, and that (pace Leibniz) we generalize from observations of what is the case to conclusions about what might be or must be the case, and that (see Dyer and Dickinson, discussed by Gallistel and others) novice bees who were only allowed to forage a few times in late afternoons still generalized from their encodings-of-experience in a way that allowed them to communicate the location of food found on the first (and overcast) morning. Put another way, one can stipulate that all experience-dependent knowledge acquisition is learning, and then draw two consequences: (1) POS-arguments show that a lot of learning--and perhaps all learning--is very very unsuperficial, and (2) a huge part of the enterprise of studying knowledge of language and its acquisition consists in (a) repeatedly reminding ourselves just how unsuperficial this knowledge/acquisition is, and (b) using POS arguments to help discover the mental vocabulary in terms of which encodings of the relevant experience are formulated. But (1) and (2) seem like chapter one and verse of Aspects.

So as usual, I'm confused by the whole debate about POS arguments. Is the idea that with regard to human knowledge of language, but not knowledge of geometry (or bee-knowledge of solar ephemeris), there's supposed to be some residual plausibility to the idea that generalizations of the sort Jeff has pointed to (again) can be extracted from the regularities in experienceables without effectively coding the generalizations in terms of how the "data of experience" gets encoded? If so, is there any better form of the argument that would be accepted as persuasive; or is it that with regard to knowledge of linguistic generalizations, the prior probability of Empiricism (in some suitably nonsuperficial form) is so high that no argument can dislodge it?

Or is the skepticism about POS arguments more general, so that such arguments are equally dubious in nonlinguistic domains? If so, is there any better form of the argument (say regarding geometry, or the bees) that would be accepted as persuasive; or is it that with regard to knowledge of all generalizations, the prior probability of Empiricism (in some suitably nonsuperficial form) is so high that no argument can dislodge it?

Of course, nobody in their right mind cares about drawing a sharp line between Rationalism and Empiricism. But likewise, nobody in their right mind denies that there is at least one distinction worth drawing in this vicinity. Team-Plato, with Descartes pitching and Leibniz at shortstop, uses POS considerations to argue that (3) we encode experience and frame hypotheses in very interesting ways, and (4) much of what we know is due to how we encode experience/hypotheses, as opposed to specific experiences that "confirm" specific hypotheses. There is another team, more motley, whose roster includes Locke, Hume, Skinner, and Quine. They say that while (5) there are surely innate mechanisms that constrain the space of hypotheses available to human thinkers, (6) much of what we know is due to our having experiences that confirm specific hypotheses.

To be sure, (3-6) are compatible. Disagreements concern cases, and "how much" falls under (6). And I readily grant that there is ample room for (6) under the large tent of inquiry into knowledge of language; again, see chapter one of Aspects. Members of Team-Plato can agree that (6) has its place, against the background provided by (3) and (4); though many members of the team will insist on sharply distinguishing genuine cases of "inductive bias," in which one of two available hypotheses is antecedently treated as more likely, from cases that reflect knowledge of how the relevant vocabulary delimits the hypothesis space (as opposed to admitting a hypothesis but assigning a low or even zero prior probability). But my question here is whether there is any good reason for skepticism about the use of POS arguments in support of (3) and (4).

Absent a plausible proposal about how generalizations of the sort Jeff mentions are learned, why shouldn't we conclude that such generalizations fall under (4) rather than (6)?

Sidepoint: it's not like there is any good basis, empirical or conceptual, for thinking that most cases will fall under (6)--or that relegation to (4) should be a last resort. The history of these debates is littered with versions idea that Empiricism is somehow the default/simpler/preferable option, and that Rationalists have some special burden of proof that hasn't yet been met. But I've never met a plausible version of this idea. (End of sidepoint.)

I'm asking because this bears on the question of whether or not linguistics provides an interesting and currently tractable case study of more general issues about cognition. (That was the promise that led me into linguistics; but as a philosopher, I'm used to getting misled.)

If people think that POS arguments are generally OK in the cognitive sciences, but not in linguistics, that's one thing. If they think that POS arguments are generally suspect, that's another thing. And I can't tell which kind of skepticism Jeff's post was eliciting.

4 comments:

  1. I think there are real differences between language and geometry. One is that the truths of geometry don't depend on contingent cultural facts or vary from place to place.

    For me it is not a philosophical debate but entriely an empirical one. There is, as Chomsky never tires of pointing out, some part of human genetic endowment that is absent in rocks and kittens that allows us to acquire language. This acquisition ability is some sort of LAD, which takes the input and outputs some grammar that defines an infinite set of sound meaning pairings. Both the LAD and the input are *essential* to the process.


    You write " I think knowledge of language is an interesting example of knowledge acquired under the pressure of experience, but not acquired by generalizing from experience. ".
    For me the term "generalizing" just means that the grammar output is not just a memorisation of the input: i.e. that the child can understand and produce sentences that are not in the input. And "from experience" just means that the output of the LAD depends on the input. I don't see how one can deny either of these. And learning is the term we use in the learning theory community for that process. So I think there is a terminological difference here unless you are referring to Fodorian brute causal processes, which I guess you aren't?


    Now the POS used to be an argument for domain specific knowledge in the LAD. I never liked that argument. This has now changed in the MP era and now the POS is just, as it should be a question: what the hell is going on in the LAD that such rich grammars can come out of such poor inputs? (to copy the title of a recent book).

    So my view is we can move forward by making specific proposals about the structure of the LAD as a computationally efficient learning algorithm that can output grammars of the right type, and we can make progress by developing a general theory of learning grammars. As opposed to using a general theory of probabilistic learning like Charles Yang does or like the Bayesians do. But that's just me; others have different approaches which is of course good. One approach however does not try to come up with models of the LAD. Rather it comes up merely with restrictions on the class of grammars output by the LAD.

    So rather than saying "the LAD is this function ..." they say "the LAD is some function whose outputs lie in the class of grammars G". i.e. just defining the range of the function.
    Or worse just saying "the LAD outputs grammars that have property P" where the class of grammars and the property P are not defined. My scepticism is about these types of solutiom, which in the examples I have looked at don't explain anything. I.e. with "solutions" to the POS of the form "X is innate so the learner doesn't have to learn X ". And my second objection to this is Darwin's problem in the case when X is clearly specific to language, and the proposed solution which is just to kick the can down the road and hope that at some point in the future there will be a reduction to something non language specific.

    (PS One of the reasons I dislike the term UG is that as it is often used it blurs the distinction between the LAD and properties of the outputs of the LAD. So if anyone feels the need to use the term UG, please specify which you mean.)

    ReplyDelete
  2. Just lost another comment so here is a truncated version:

    You write
    "(4) much of what we know is due to how we encode experience/hypotheses, as opposed to specific experiences that "confirm" specific hypotheses.
    ...
    .(6) much of what we know is due to our having experiences that confirm specific hypotheses. "

    So I am not sure I see the contrast between 4 and 6 in quite the right place.
    So if we have a Bayesian PCFG learner, then it doesn't rely on specific experiences that confirm specific hypotheses, but rather on a subtle computation where potentially every experience confirms probabilistically almost every other hypothesis to a greater or lesser extent, whereas a triggering algorithm in a P & P model does have a specific experience that confirms a specific hypothesis (a parameter value setting).
    Whereas I think most people would have this the other way round.

    ReplyDelete
  3. Alex: I don't think our goals, or even the methods, are all that different.

    Linguists, especially those in the P&P persuasion, try to characterize the properties of human languages with hopefully good learnability properties. Some of us work on this more directly: for instance, William Sakas, Bob Berwick and I will argue that a linguistically motivated parameter space is feasible to learn, at next year's GALANA conference to be held at Maryland.

    Folks like you, and many in the grammar induction tradition, try to characterize formally learnable classes of languages. The burden of proof is to show that these classes of languages correspond to something like human language.

    None of us has really gotten there yet but all of us, including the Bayesians, are reacting to the consequences of formal learning theories. As Gold noted, to achieve positive learnability, we can restrict the space of grammars or provide the learner with information about statistical distributions of the languages. Of course, these two tracks are not mutually exclusive.

    ReplyDelete
  4. I look forward to reading your paper with Bob and William when it is ready ; sounds like the sort of work there should be more of.
    I think our methods and assumptions are quite similar though we come to different conclusions -- but these are very different from those of Paul and Jeff and Norbert. So I wonder what your take is on the strategy of making thinks like PISH innate without going all the way to a finite class of grammars.


    I think a lot of the work in inductive inference exactly looks at how to "characterize formally learnable classes of languages". That isn't quite what I do -- I am trying to characterize learnable languages, with respect to certain algorithmic approaches. I don't think there is any problem then with relating these languages to human languages, since (most of the time) they are fairly close variants of the standard grammars that mathematically precise linguists use. Other people in the grammatical inference community look at other types of grammars -- pattern grammars, external contextual grammars that are rather further away from the standard view.

    Anyway, I am sure your turkey needs either to be basted or taken out off its brine depending on what time zone you are in, so happy thanskgiving to all, and especially to Norbert for stimulating some interesting discussion.

    ReplyDelete