One more point: Jeff observes that "it is right to bring to the fore the question of how UG makes contact with data to drive learning," and that APL are right to raise this question. I would agree that the question is worth raising and worth investigating. What I would deny is that APL contributes to advancing this question in any way whatsoever. There is a tendency in academia to think that all work should be treated with respect and politesse. I disagree. All researchers should be so treated, not their work. Junk exists (APL is my existence proof were one needed) and identifying junk as junk is an important part of the critical/evaluative process. Trying to find the grain of trivial truth in a morass of bad argument and incoherent thinking retards progress. Perhaps the only positive end that APL might serve is to be a useful compendium of junk work that neophytes can go to to practice their critical skills. I plan to use it with my students for just this purpose in the future.
At any rate, here is a remark by Alex C that generated this informative reply (i.e. citations for relevant work) by Jeff Lidz.
Alex Clark:
So consider this quote: (not from the paper under discussion)
"It is standardly held that having a highly restricted hypothesis space makes
it possible for such a learning mechanism to successfully acquire a grammar that is compatible with the learner’s experience and that without such restrictions, learning would be impossible (Chomsky 1975, Pinker 1984, Jackendoff 2002). In many respects, however, it has remained a promissory note to show how having a well-defined initial hypothesis space makes grammar induction possible in a way that not having an initial hypothesis space does not (see Wexler 1990 and Hyams 1994 for highly relevant discussion).
The failure to cash in this promissory note has led, in my view, to broad
skepticism outside of generative linguistics of the benefit of a constrained initial hypothesis space."
This seems a reasonable point to me, and more or less the same one that is made in this paper: namely that the proposed UG don't actually solve the learnability problem.
Jeff Lidz:
Alex C (Oct 25, 3am ([i.e. above] NH])) gives a quote from a different paper to say that APL have identified a real problem and that UG doesn't solve learnability problems.
The odd thing, however, is that this quote comes from a paper that attempts to cash in on that promissory note, showing in specific cases what the benefit of UG would be. Here are some relevant examples.
Sneed's 2007 dissertation examines the acquisition of bare plurals in English. Bare plural subjects in English are ambiguous between a generic and an existential interpretation. However, in speech to children they are uniformly generic. Nonetheless, Sneed shows that by age 4, English learners can access both interpretations. She argues that if something like Diesing's analysis of how these interpretation arise is both true and innate, then the learner's task is simply to identify which DPs are Heim-style indefinites and the rest will follow. She then provides a distributional analysis of speech to children does just that. The critical thing is that the link between the distributional evidence that a DP is indefinite and the availability of existential interpretations in subject position can be established only if there is an innate link between these two facts. The data themselves simply do not provide that link. Hence, this work successfully combines a UG theory with distributional analysis to show how learners acquire properties of their language that are not evident in their environment.
Viau and Lidz (2011, which appeared in Language and oddly enough is cited by APL for something else) argues that UG provides two types of ditransitive construction, but that the surface evidence for which is which is highly variable cross-linguistically. Consequently, there is no simple surface trigger which can tell the learner which strings go with which structures. Moreover, they show that 4-year-olds have knowledge of complex binding facts which follow from this analysis, despite the relevant sentences never occurring in their input. However, they also show what kind of distributional analysis would allow learners to assign strings to the appropriate category, from which the binding facts would follow. Here again, there is a UG account of children's knowledge paired with an analysis of how UG makes the input informative.
Takahashi's 2008 UMd dissertation shows that 18month old infants can use surface distributional cues to phrase structure to acquire basic constituent structure in an artificial language. She shows also that having learned this constituent structure, the infants also know that constituents can move but nonconstituents cannot move, even if there was no movement in the familiarization language. Hence, if one consequence of UG is that only constituents can move, these facts are explained. Distributional analysis by itself can't do this.
Misha Becker has a series of papers on the acquisition of raising/control, showing that a distributional analysis over the kinds of subjects that can occur with verbs taking infinitival complements could successfully partition the verbs into two classes. However, the full range of facts that distinguish raising/control do not follow from the existence of two classes. For this, you need UG to provide a distinction.
In all of these cases, UG makes the input informative by allowing the learner to know what evidence to look for in trying to identify abstract structure. In all of the cases mentioned here, the distributional evidence is informative only insofar as it is paired with a theory of what that evidence is informative about. Without that, the evidence could not license the complex knowledge that children have.
It is true that APL is a piece of shoddy scholarship and shoddly linguistics. But, it is right to bring to the fore the question of how UG makes contact with data to drive learning. And you don't have to hate UG to think that this is valuable question to ask.