Alex C has provided some useful commentary on my
intentionally incendiary post concerning the POS (here).
It is useful because I think that it highlights an important misunderstanding
concerning the argument and its relation to the phenomenon of Auxiliary
Inversion (AI). So, as a public service, let me outline the form of the argument once more.
The POS is a tool for investigating the structure
of FL. The tool is useful for factoring out the causal sources for some or
another feature of a rule or principle of a language particular grammar G. Some
features of G (e.g. rule types) are what they are because the input is what it
is. Other features look as they do because they reflect innate principles of
FL.
For example: we trace the fact that English Wh movement
leaves a phonetic gap after the predicate that assigns it a theta role to the
fact that English kids hear questions like what
did you eat. Chinese kids don’t form questions in this way as they hear the
analogues of you ate what. In contrast we trace the fact that sentences
like *Everyone likes him with him interpreted as a pronoun bound by everyone as ill-formed back to Principle
B, which we (or at least I) take to arise from some innate structure of FL. So,
again, the aim of the POS is to distinguish those features of our Gs that have
their etiology in the PLD from those that have it in the native structure of
FL.
Note, for POS to be so deployable, its subject matter must
be Gs and their properties; their operations and principles. How does this
apply to AI? Well, in order to apply the
POS to AI we need a G for AI. IMO, and
that of a very large chunk of the field, AI involves a transformation that
moves a finite Aux to C. Why do we/I believe this? Well, we judge that the best
analyses of AI (i.e. the phenomenon) involves a transformation that moves Aux
to Comp (A-to-C) (i.e. A-to-C names a rule/operation)[1].
The analysis was first put forward in Syntactic
Structures (LSLT actually, though SS was the first place where many
(including me) first encountered it) and has been refined over time, in
particular by Howard Lasnik (the best discussion being here).
The argument is that this analysis is better for a variety of reasons than alternative
analyses. One of the main alternatives is to analyze the inversion via a phrase
structure operation, an alternative that Chomsky and Lasnik considered in
detail and argued against on a variety of grounds. Some were not convinced by
the Chomsky/Lasnik story (e.g. Sag, Pullum, Gazdar) as Alex C notes in linking
to Sag’s Sapir Lecture on the topic). Some
(e.g. me) were convinced and still are. What’s this have to do with the POS?
Well for those convinced by this story, there follows
another question: what does A-to-C’s properties tell us about FL? Note, this
question makes no sense if you don’t
think that this is the right description (rule) for AI. In fact, Sag says as much in his lecture
slides (here).
Reject this presupposition then the conclusion of the POS as applied to AI will
seem to you unconvincing (too bad for you, but them’s the breaks). Not because
the logic is wrong, but because the factual premise is rejected. If you do accept this as an accurate
description of the English G rule underlying the phenomenon of AI then you
should find the argument of interest.
So, given the rule
of Aux to Comp that generates AI phenomena, we can ask what features of the
rule and how it operates are traceable to the structure of the Primary
Linguistic Data (PLD: viz. data available to and used by the English Child in
acquiring the rule A-to-C) and how much must be attributed to the structure of
FL. So here we ask how much of the details of an adequate analysis of AI in
terms of A-to-C can be traced to the structure of the PLD and how much cannot.
What cannot, the residue, it is proposed, reflects the structure of FL.
I’ve gone over the details before, so I will refrain yet
again here. However, let’s consider for a second how to argue against the POS argument.
1. One
can reject the analysis, as Sag does. This does not argue against the POS, it
argues against the specific application in the particular case of AI.
2. One
can argue that the PLD is not as impoverished as indicated. Pullum and Scholtz
have so argued, but I believe that they are simply incorrect. Legate and Yang
have, IMO, the best discussion of how their efforts miss the mark.
These are the two ways to argue against the conclusion that
Chomsky and others have regularly drawn. The debate is an empirical one resting
on analyzed data and a comparison of
PLD to features of the best explanation.
What did Berwick, Pietroski, Yankama and Chomsky add to the
debate? Their main contribution, IMO, was two fold.
First, they noted that many of the arguments against the POS
are based on an impoverished understanding of the relevant description of AI.
Many take the problem to be a fact about good and bad strings. As BPYC note,
the same argument can be made in the domain of AI where there are no ill-formed
strings, just strings that are monoguous where one might have a priori expected ambiguity.
Second, they noted that the pattern of licit and illicit
movement that one sees in AI data appear as well in many other kinds of data,
e.g. in cases of adverb fronting and, as I noted, even in cases of WH movement
(both argument and adjunct). Indeed, for any case of an A’ dependency. BPYC’s conclusion is that whatever is
happening in AI is not a special feature of AI data and so not a special
feature of the A-to-C rule. In other words, in order to be an adequate, any
account of AI must over to these other cases as well. Another way of making the
same point: if an analysis explains only AI
phenomena and does not extend to these other cases as well then it is
inadequate![2]
As I noted (here), these cases all unify when you understand
that movement from a subject relative clause is in general prohibited. I also
note (BUT THIS IS EXTRA, as Alex D commented) that subject RC as islands is an
effect generally accounted for in terms of something like subjacency theory
(this latter coming in various guises within GG bounding nodes, barriers,
phases and has analogues in other frameworks).
Moreover, I believe that a POS argument would be easy to construct that
island effects reflect innately specified biases of FL.[3]
So, that’s the logic of the argument. It is very theory
internal in the sense that it starts from an adequate description of the rules
generating the phenomenon. It ends with a claim about the structure of FL. This
should not be surprising: one cannot conclude anything about an organ that
regulates the structure of grammars (FL) without having rules/principles of
grammar. One cannot talk about explanatory adequacy without having candidates
that are descriptively adequate, just as one cannot address Darwin’s Problem
without candidate solutions to Plato’s. This is part of the logic of the POS.[4]
So, if someone talks as if he can provide a POS argument that is not theory
internal, i.e. that does not refer to the rules/operations/principles involved,
run fast in the other direction and reach for your wallet.
Appendix:
For the interested, BPYC in an earlier version of their
paper note that analyses of the variety Sag presents in the linked to slides
have an analogous POS problem to the one associated with transformations in
Chomsky’s original discussion. This is not surprising. POS arguments are not
proprietary to transformational approaches. They arise within any analysis
interested in explaining the full range of positive and negative data. At any
rate, here are parts of two deleted footnotes that Bob Berwick was kind enough
to supply me with that discusses the issue as it relates to these settings. The
logic is the following: relevant AI pairings suggest mechanisms and given a
mechanism POS problem can be stated for that mechanism. What the notes make clear is that analogous
POS problems arise for all the mechanisms that have been proposed once the
relevant data is taken into account (See Alex Drummond’s comment here
(March 28th), which makes a similar point). The take home message is
that non-transformational analyses don’t sidestep POS conclusions so much as
couch them in different technical terms. This should not be a surprise to those
that understand that the application of the POS tool is intimately tied to the
rules that are being proposed and the rules that are often proposed are usually
(sadly) tightly tied to the relevant data that is being considered.[5] At any rate, here is some missing text from from
two notes in an earlier draft of the BPYC paper. I have highlighted two
particularly relevant observations.
Such pairings are a part of nearly every linguistic
theory that considers the relationship between structure and interpretation,
including modern accounts such as HPSG, LFG, CCG, and TAG. As it stands, our
formulation takes a deliberately neutral stance, abstracting away from details
as to how pairings are determined, e.g., whether by derivational rules as in
TAG or by relational constraints and lexical-redundancy rules,
as in LFG or HPSG. For example, HPSG
(Bender, Sag, and Wasow, 2003) adopts an “inversion lexical rule” (a so-called
‘post-inflectional’ or ‘pi-rule’) that takes ‘can’ as input, and then outputs
‘can’ with the right lexical features so that it may appear sentence initially
and inverted with the subject, with the semantic mode of the sentence altered
to be ‘question’ rather than ‘proposition’.
At the same time this rule makes the Subject noun phrase a ‘complement’
of the verb, requiring it to appear after ‘can’. In this way the HPSG
implicational lexical rule defines a pair of the exactly the sort described by
(5a,b), though stated declaratively rather than derivationally. We consider one example in some detail
because here precisely because, according to at least one reviewer, CCG does
not ‘link’ the position before the main verb to the auxiliary. Note, however,
that combinatorial categorical grammar (CCG), as described by Steedman (2000)
and as implemented as a parser by Clark and Curran (2007), produces precisely
the ‘paired’ output we discuss for “can eagles that fly eat.” In the Clark and Curran parser, ‘can’ (with a part of speech MD, for modal), has the
complex categorial entry (S[q])/S([b]\NP))/NP, while the entry for “eat” has
the complex part of speech label S[b]\NP. Thus the lexical feature S[b]/NP,
which denotes a ‘bare’ infinitive, pairs the modal “can” (correctly) with the
bare infinitive “eat” in the same way as GPSG (and more recently, HPSG), by
assuming that “can” has the same (complex) lexical features as it does in the
corresponding declarative sentence. This information is ‘transmitted’ to the
position preceding eat via the proper sequence of combinatory operations, e.g.,
so that ultimately “can,” with the feature (S[q])/S([b]\NP)) along with “eat,”
with the feature S[b]/NP can combine. At this point, note that the combinatory
system combines “can” and “eat” in that order, as per all combinatory operations,
exactly as in the corresponding ‘paired’ declarative, and exactly following our
description that there must be some mechanism by which the declarative and its
corresponding polar interrogative form are related (in this case, by the
identical complex lexical entries and the rules of combinatory operations,
which work in terms of adjacent symbols) [my emphasis, NH]. However, it is
true that not all linguistic theories adopt this position; for example, Rowland
and Pine, 2000; explicitly reject it (thereby losing this particular
explanatory account for the observed cross-language patterns). A full
discussion of the pros and cons of these differing approaches to linguistic
explanation outside the scope of the present paper.
----------------------------
As the main text indicates, one way to form pairs
more explicitly is to use the machinery proposed in Generalized Phrase
Structure Grammar (GPSG), or HPSG, to ‘remember’ that a fronted element has
been encountered by encoding this information in grammar rules and nonterminal,
in this case linking a fronted ‘aux’ to the position before the main verb via a
new nonterminal name. This is
straightforward: we replace the context-free rules that PTR use, S ® aux IP, etc., with new
rules, S ® aux IP/aux, IP® aux/aux vi, aux/aux®v where the ‘slashed’ nonterminal names
IP/aux and aux/aux ‘remember’ that an aux has been generated at the front a
sentence and must be paired with the aux/aux expansion to follow. This makes explicit the position for
interpretation, while leaving the grammar’s size (and so prior probability)
unchanged. This would establish an explicit pairing, but it solves the original
question by introducing a new stipulation since the nonterminal name explicitly
provides correct place of interpretation rather than the wrong place and does
not say how this choice is acquired [my emphasis NH]. Alternatively, one could adopt the more recent HPSG approach of
using a ‘gap’ feature that stands in the position of the ‘unpronounced’ v, a,
wh, etc., but like the ‘slash category’ proposal this is irrelevant in the
current context since it would enrich the domain-specific linguistic component
(1), contrary to PTR’s aims – which, in fact, are the right aims within the
biolinguistic framework that regards language as a natural object, hence
subject to empirical investigation in the manner of the sciences, as we have
discussed.
[1]
In what follows it does not matter whether A-to-C is an instance of a more
general rule, e.g. move alpha or merge, which I believe is likely to be the
case.
[2]
For what it’s worth, the Sag analysis that Alex C linked to and I relinked to
above fails this requirement.
[3]
Whether these biases are FL specific, is, of course, another question. The
minimalist conceit is that most are not.
[4]
One last point: one thing that POS arguments highlight is the value of
understanding negative data. Any good
application of the argument tries to account not only for what is good (e.g.
that the rule can generate must Bill eat)
but also account for what is not (e.g. that the system cannot generate *Did the book Bill read amused Frank).
Moreover, the POS often demands that the negative data be ruled out in a
principled manner (given the absence of PLD that might be relevant). In other
words, what we want from a good account is that what is absent should be absent for some reason other
than the one we provide for why we move WHs in English but not in Chinese. I
mention this because if one looks at Sag’s slides, for example, there is no
good discussion of why one cannot have a metarule that targets an Aux within a
subject. And if there is an answer to
this, one wants an account of how this prohibition against this kind of
metarule extends to the cases of adverb fronting and WH question formation that
seem to illustrate the exact same positive and negative data profiles. In my
experience it is the absence of attention to the negative data that most
seriously hampers objections to the AI arguments. The POS insists that we
answer two questions: why is what we find ok ok and why is what we find to be
not ok not ok. See the appendix for further discussion of this point as applied
to the G/HPSG analyses that Sag discusses.
[5]
To repeat, one of the nicest features of the BPYC paper is that it makes clear
that the domain of relevant data (the
data that needs covering) goes far beyond the standard AI cases in polar
questions that is the cynosure of most analyses.
As they say, one man's ponens is another man's tollens. If you have a theory of grammar and then on analysis it is clear that large chunks of it cannot be learned and must be innate, then there are two approaches. One is to say as you do, "Darwin be damned." The other is to question whether your theory might be false.
ReplyDeleteParticularly when, as is uncontroversial, the grammars are severely undetermined by the linguistic evidence available [1], and Chomksyan linguists rely heavily on non-empirical assumptions (like the SMT, full interpretation etc etc ) in the process of theory construction. Given these facts, and the antipathy that many here have to theories of learning, it is unsurprising that the theories you come up with are not learnable.
One conclusion is that Darwin is wrong, the other is that these theories (the standard theory, the revised extended standard theory, P & P, etc etc. ) are wrong.
[1] e.g. " Choice of a descriptively adequate grammar for the language L is always much underdetermined (for the linguist, that is) by data from L."
There is an interesting presupposition in your answer that I want to make explicit: you seem to assume that empiricist learning is a general cognitive mechanism that explains how cognition functions in other domains. In other words, it treats language as the outlier while other areas of mentation are easily described in empiricist terms (without much innate hardware necessary). So far as I can tell, (see Gallistel and my posts on him for discussion), nothing could be further from the truth. Given that this is so, the standard ML approaches, which make very weak assumptions about domain specific knowledge are probably wrong EVERYWHERE. So even if one is a partisan of Darwin, the specific approaches you seem to favor have very little if any Darwinian street cred.
DeleteSeen in this light, it is not only my theories that are unlearnable (actually they are (or it is reasonable to think they are) given the right set up of the hypothesis space and the right priors) but almost every form of cognitive competence we are aware of. You really should read Gallistel's stuff on classical learning in rats. If he is right, and I believe he is, then classical learning theories of the empiricist (ML) variety are biologically hopeless. I take that to imply that Darwin would not favor them.
I think it is genuinely uncontroversial that there are many behaviours in many species that are innate: spider webs, ungulates walking a few minutes after birth, some bird songs (but not all), nesting behaviours of some birds etc etc. and this can be verified by raising the animal in isolation and so on.
Deleteand it is I think pretty clear that there are learned behaviours that are very highly canalised by some innate structures --- in vision, navigation etc.
But all of these have some common factors -- they are evolutionarily very ancient (tens of millions of years), and they are clearly adaptive, and there is as a result no problem for Darwin.
Clearly there is a difference between these things and relative clause extraction.
It's worth pointing out that Gallistel's notion of a domain is somewhat different from the way that you and I use it --- for example, he considers probabilistic learning a domain, whereas for me it would be a mechanism that could be used widely in many different domains, of which one might be language processing. This terminological difference might account for some of this disagreement.
However, it's different if the contentious theoretical proposals go on to predict true facts for which clear evidence in the PLD is close to nonexistent, which seems to be the case for agreements with quirky and non-quirky case-marked convert subjects in Icelandic. Unfortunately, most of the cases people argue about aren't anywhere near so extreme.
ReplyDeleteI take it you meant 'covert' subjects? Also did you mean 'fortunately'?
DeleteYes, I do mean 'covert', but also think I mean 'unfortunately', because if the usual cases were more extreme, there would be more solid evidence for UG. Of course the rhetorical force of 'unfortunately' in English is interesting, I recall Howard Lasnik pointing out that it usually means 'unfortunately for the proponents of the idea I am attacking, but fortunately for me'.
DeleteAt POS System Kuwait, we are the leading provider of advanced point-of-sale solutions for businesses of all sizes. pos system kuwait
ReplyDelete