In a previous post (here),
I confessed to hubris: I thought that solving PoS1 problems sufficed to finesse
PoS2 difficulties. I now think that I was wrong to think this. But hey, I was
young and brash and now I am mellow and judicious. Just the exuberance of
youth! In this post, I’d like to
consider two issues: (i) what models we have for investigating how LADs acquire
their particular Gs given a specification
of the possible human Gs and (ii) how PoS1 conclusions might be leveraged
to address PoS2 concerns.
IMO, GG has gotten further in limning the limits of G-hood
than we have gotten in explaining how LADs move through the space of possible
Gs to the actual Gs acquired. GG has made serious progress in addressing PoS1
issues. We can tell pretty good stories about G-invariant properties (i.e. why
certain kinds of dependencies are unattested in human Gs (e.g. Islands, ECP,
CED etc.) and can also sketch out accounts regarding those things that Gs must
contain (e.g. anaphors must be “close” to their antecedents in a way we can
specify pretty well)). This provides us with pretty good accounts of what sorts
of Gs are impossible (i.e. what kinds of dependencies Gs will never contain).
In contrast, we do not have particularly good accounts for
why speakers acquire the particular Gs that they do (e.g. why does English obey
the Fixed Subject Constraint (FSC) but Italian doesn’t? Why isn’t English pro-drop? Why doesn’t English have
resumptive pronouns?).[1]
Though we do have accounts aiming in this direction from the acquisition,
diachronic and typological literature.[2]
The accounts fall into two basic types.
The first kind of account parameterizes a principle, the
classic case being the CP/TP parameter for bounding nodes. You remember the
story based on work by Rizzi.[3]
The Subjacency Principle is an invariant UG principle. However, the principle
is defined over bounding nodes, and these can vary across Gs. The differences
between English and French with regard to extraction from embedded questions
reduces to the fact that in English TP is a bounding node while in Italian CP
is. This difference suffices to explain both the similarities and differences
with regard to island extraction in the two languages.[4]
On this story, acquisition amounts to fixing the value of the bounding node
parameter in your language.
In a very interesting paper in progress (hence I cannot link
to it, sorry, but be patient), Dustin Chacon, Mike Fetters, Margaret Kandel,
Eric Pelzl and Colin Phillips (CFKPP) call this “direct learning.” How is the
value fixed? By induction from the PLD (choose your favorite inductive theory),
which, it is hoped, provides sufficient amounts of robust data to allow the LAD
to directly fix the value of the parameter (see note 5). To my knowledge (and
please correct me if there is stuff out there that contradicts what I am about
to say), we are still not sure if the
actual PLD available in Italian and English suffices to fix the two possible
values.[5]
The second kind of account keeps principles fixed (no
parametric variation of the principles) but allows for derivations that
circumvent the relevant universal condition. This is similar to CFKPP’s
conception of “indirect learning.” There are several examples of this. For
example, Reinhart’s proposal that CP can have more than one “escape hatch” and
thereby allow two WHs to move to an embedded Spec CP position thereby allowing
one of them to exit while still adhering to the Subjacency Condition. On this
view the different data are not traced to a parameter within bounding theory,
but to another kind of fact, namely that different Gs allow for different kinds
of rules (viz. English Gs only allow CP expansion rules with a single CP
specifier, while other Gs (e.g. Romanian/Bulgarian) might allow more than one,
thereby leaving an Spec C exit for a second A’-mover). There is potential
degree 0 data that could fix this (e.g. sentences like “Who what bought” would
support the conclusion that CP can house multiple WHs). However, the only
investigations of the PLD that I know of (by Lydia Grebenyova for Russian)
suggest that multiple interrogatives are very far from ubiquitous in the PLD
(actually there are none). If so, how the rule allowing multiple Spec Cs would
be acquired remains a mystery.
Let’s consider another example where a much more satisfying
story exists (CFKPP discuss this case at length and explore its subtleties).
Take Rizzi’s explanation for why English but not Italian[6]
is subject to the Fixed Subject Condition (FSC): (1a/b):[7]
(1) a.
*Who1 do you know that t1 ate a large supper (English)
b. Who1 do you know that t1
ate a large supper (Italian)
The account has the following components:
(2) a.
Something like the FSC (non-parameterized) is part of UG
b. Italian has a way of evading the requirements of the FSC, but English
doesn’t
c.
That Italian can generate structures that evade the FSC is manifest in simple
Italian clauses
More concretely, FL/UG contains something like the that-t filter. It stars structures in
which C0 governs a trace (e.g. *[CP … C [ t1…]]).
Italian (but not English) allows for post verbal subject constructions, in
which the subject DP is not in the government purview of C:[8]
(3) a.
Had telephoned John (ok-Italian/*-English)
b.
[ C [ [had [VP telephoned
John]]]]
As a WH moving from the position of John in (3b) will not generate a structure subject to the FSC,
sentences like Who do you think that
phoned will be fully acceptable in Italian. In other words, Italian does respect the FSC, and the FSC is exactly the same in English and Italian.
The difference between them is that the Italian allows the effects of the FSC
to be evaded by allowing for movement from post-verbal subject position.
Two things to note: first, post-verbal subjects are not
rarities in Italian (or Spanish which is similar) so we expect them to arise frequently
and robustly in the PLD. This should provide plenty of PLD fodder for whatever
rules generate post verbal subject constructions in Italian and Spanish.
Second, having post-verbal subjects suffices to evade the
FSC, but it is possible that there exist other ways of doing so. Nonetheless,
it appears that this is a very common way of evading the FSC. CFKPP reviews the
FSC variation literature, and suggests that there are not all that many ways to
skirt the FSC. Before reading CFKPP I was under the impression (based on widely
cited work by Sobin) that certain dialects of English provided evidence that
one could evade the FSC in other ways (English does not have post verbal
subjects). However, the CFKPP provides excellent reasons (based in part on work
by Cowart) that Sobin’s findings are at best inconclusive and most likely
incorrect.
CFKPP does something else that is very important: it actually tries to estimate how much data
there is in actual PLD bearing on the FSC in both English and Spanish/Italian
(effectively the same language for FSC purposes). Bottom line: not very much at
all, so were the LAD required to “directly learn” whether the FSC held, it
would have a very difficult time doing so. There is just not that much direct
data bearing on it. Instead, the child seems to assume that it holds
universally. However, this does not imply that every language will appear to respect the FSC for there may
be indirect ways of meeting its requirements while still deriving sentences
that allow traces abutting Cs. As post verbal subject constructions provide
such an out, the differences between English and Italian follow even if we the
FSC is left unparameterized.[9]
Note, btw, that this kind of analysis highlights the
difference between a Chomsky vs a Greenberg Universal. On this story the FSC
regulates Italian Gs just as much as English ones despite its effects being invisible in Italian. In
other words, the FSC holds in Italian despite never appearing to hold there.
This makes sense on Chomsky’s conception of universals but not Greenberg’s.
Chomsky universals are generalizations about structures Greenberg universals
about surface forms. They are very different, though far too often confused (as
I rail about again in previous posts (see here
and here
for a reprise).
Ok, back to the main point and I end. There is lots of G
variation, and this means that some properties of Gs are acquired on the basis
of actual PLD. When one looks carefully, it appears that for many kinds of
variation, there is really not that much PLD to go on, and this raises a PoS2
problem. We have a couple of examples of how to solve such PoS2 problems. However,
there has been relatively little attention paid to the specific problems it
raises (I also plead guilty here). Regarding these, CFKPP presents a useful
classical PoS challenge to people of my ilk:
We challenge theoretical
syntacticians working on any phenomenon that varies between languages to
consider whether the phenomenon in question lends itself to direct observation
or not. If not, it must be conditioned on other observable phenomena. This can
serve as a useful heuristic for constructing accounts of phenomena in
comparative syntax. (20)
Yes, yes and yes again. Note in cases where indirect stories
are required, looking for them can generate interesting research into the
possible variation among Gs. The Rizzi account of FSC above begins by assuming
that the FSC is universal and then looks for ways that particular Gs might
circumvent it. Such cases of indirect acquisition leverage what we believe to
hold given standard PoS1 considerations. So why does Italian appear not to obey
the FSC? Not because the that-t
filter doesn’t hold in Italian, but because Italian G allows for derivations
that circumvent its strictures. How do Italian Gs do this? By allowing for
post-verbal subjects which allow licit “subject” A’-movement derivations. Is
this fact about Italian Gs learnable? Yes. Post verbal subjects are not rare,
and so the LAD has evidence for postulating rules to generate these structures,
while the English kid does not. So, PLD driven acquisition plus UG fixed
principles can lead to plausible accounts of G variation (i.e. to stories
addressing the question how John/Gianni acquired the particular Gs they did).
What’s the moral: don’t parameterize your
principles but look for G rules/structures that would allow them to be
empirically mute. This sort of strategy suggests taking attested universals
very strictly (i.e. as not
parameterized) as they serve as boundary conditions on adequate descriptions of
particular Gs. Thus, though PoS1 considerations don’t directly solve PoS2
problems, in particular contexts they suggest approaches to G variation that
can circumvent PoS2 problems.
Last point: I’ve lamented the fact that we’ve stopped
holding syntacticians’ feet to Plato’s Fire. We should constantly be asking of
comparative syntax proposals what the acquisition scenario might be. We have,
IMO, refrained from doing this of late (and I include myself here). I suspect
that the reason for this is that we’ve all been seduced into doing languistics rather than linguistics. We have stopped thinking of syntax as a
method for investigating FL and have adopted the view that the ultimate goal of
syntax is to explain syntactic patterns, rather than to use syntactic patterns
to investigate the fine structure of FL. That’s unfortunate for many reasons,
not the least of which is that it serves to Balkanize the discipline. If syntacticans refuse to take responsibility
for the cognitive relevance of their results, why should anyone else listen?
It’s not too late to change this. I again suggest that at
every variation talk we ask how the proposed variation might be acquired.
Syntacticians should be expected to have thought about this problem in
developing their proposals. Maybe we should start asking syntacticians to
specify what kind of data could account for the presented variation and whether
this is plausibly available in the PLD the child might have access to. We now
have quite a few Childes data sets and maybe we should start asking
syntacticians to peek at these in making their proposals. Having a workable solution is too high a bar.
Having thought about the problem, considered the possibly relevant PLD, and entertained
possible solutions is not. After all if a proposed account of a given variation
is un-acquirable that is an excellent reason for thinking that the analysis is
wrong.
[1]
Note that even here, we do not address the specific
LAD question but idealize to a situation where we aggregate Gs and reify them
as languages. So nobody studies why/how Norbert acquires his idiosyncratic G
but how a typical English speaker acquires GEnglish, an object that
strictly speaking does not exist.
[2]
By this I do not mean to imply that there is not good and sold work on this
issue. I’ve discussed lots of this before. Berwick, Polinsky, Lidz, Yang,
Guasti, Rizzi, Lightfoot, Roberts, Dresher, Fodor, Sakas and many others have
addressed this question fruitfully. That said, I think we understand this issue
less well than we do PoS1 concerns.
[3]
Amusingly, the parameter theory is suggested in a footnote in Rizzi’s
deservedly famous paper. The paper itself presented a different story. The
parameter idea really took off with LGB, Rizzi’s discussion reworked in a
systematic way that gave us the P&P architecture.
[4]
I am reporting the history here. Grimshaw provided what to my mind was pretty
compelling evidence that this was the wrong way to describe the data.
[5]
If one assumes that English G is the unmarked case, then the investigation
should concentrate on Italian PLD. The data required to fix CP as value are
actually quite recondite, at least if eyeballed informally. Using standard
Degree 0+ assumptions, violations of the WH-island constraint could not serve as PLD. So what might?
Extraction from subject islands might (e.g. Of which Ferrari did the driver
crash into the wall?) but I would bet that such data are few and far between in
actual Italian PLD. Thus, the direct evidence for the CP/TP parameter are, I
suspect, pretty rare in the PLD and so directly fixing the value of the
parameter should be pretty challenging. At present, I have no idea how such a
parameter might be fixed.
[6]
Of course there is no English nor Italian. Even in these cases we idealize and
don’t study particular individuals but study abstractions.
[7]
I am using Anglicized Italian so excuse the accent.
[8]
CFKPP discuss the that-t version of
the FSC and understand the constraint in terms of adjacency. This may be right,
but I doubt it. I suspect that what’s at stake is not adjacency but
hierarchical proximity, inverted subjects being lower than Spec T. However, for
what follows the details don’t matter much.
[9]
There is a great paper testing Rizzi’s proposal in non-standard Italian
dialects by Brandi and Cordin. It’s here.
This really is a fun read and if you’ve never looked at it, you are in for a
treat. The basic idea is that certain dialects can tell us overtly whether a WH
is moving from Spec T or from a lower verbal position. In particular, movement
from Spec TP is signaled with an obligatory subject clitic. Only if this clitic
is absent is movement of a “subject” permissible. Take a look, it’s very pretty
syntax.
These are great points, to which I add a few observations.
ReplyDeleteOn my view, POS1 and POS2 are what LGB factors out into the core and the periphery. The core parametric system consists of the set of possible Gs (and thus ruling out the impossible Gs), and the mechanism of learning is parameter setting (more anon). The acquisition of the periphery deals with idiosyncrasies: exceptions, noise, nursery rhymes, and all the other messy bits of the primary linguistic data. This amounts to a garbage detector: the core system is not compromised (the child isn’t misled by noise, exceptions …) while lexicalized exceptions can be committed to memory accordingly. I have written a bit on the garbage detection problem in some of the earlier posts.
On parameter setting (POS1): CFKPP’s challenge to theoretical syntacticians, which you (and I) endorse, is to go back to the golden age of GB. Even when I started, a lot of syntax papers in the canon (written in the 1980s) were raising, and tackling, these challenges all in one place: a parameter is proposed, and the author very often has at least an informal discussion of what kind of data would be sufficient to distinguish the target values. Informal, to be sure, never verified in child directed data, but it seemed like the goal of linguistic theory is very much tied to the problem of language acquisition and everyone was thinking about it in their day job. There is very little of that these days.
The challenge can be met in small steps as well as long strides. For simple cases, it may be possible to work out what kind of data would support alternative parameter values, and we can then dive into the language specific data in CHILDES to start running correlations. But heroic projects like Sakas and Fodor are necessary for dealing intricately interacting parameters. There needs to be more of that. (Parameters and big data: sounds like a match made in heaven.) It’s a shame that it didn’t happen sooner, and the problem of acquisition (and POS) no longer seems to be at the forefront of syntactic theorizing.
This comment has been removed by the author.
ReplyDeleteI can't wait to read this paper, it sounds like it should be a good one ;) A few notes on your notes:
ReplyDeleteOur corpus was constructed in such a way that we actually looked for CP/TP bounding phenomena in addition to that-t effects, and this too was not looking very encouraging. The focus on these kinds of extractions in recent years seems to have narrowed to explaining just variability in extractions out of subjects since Rizzi/Grimshaw, so I'm not even sure what the indirect learning story might be for wh-island (non)-violations, but I might just not have read the right papers. Given the data that's available to us at the moment, this seems to be in desperate need of an indirect learning story, insofar as the crosslinguistic variability claims are true. Actually, I suspect that most, if not all, the learnability conditions on apparent variability in islands will need to be considered very very carefully when framed this way, if they are robust across speakers.
On your point about how homogenizing labels like "English" and "Spanish" are. The work by Han, Lidz and Musolino was really influential in how we approached thought about this issue. A very likely outcome might had been that English speakers really DID show rampant variation, precisely because it's hard for learners to correctly infer whether their language doesn't have the that-t effect (or conversely for Spanish speakers). If so, then we would have wanted to AVOID the Rizzian story. In other words, if it turns out that people exposed to similar PLD come to different conclusions about their grammar, then one wants a *less* learnable theory of the difference. In fact, this was one of the things that made us skeptical of the Sobin facts – if it were the case that Dialect A systematically had FSC effects, but not Dialect B, then there would need to be some PLD that distinguished Dialect A from Dialect B, and last I knew midwesterners didn't have postverbal subjects or rich agreement :)