0. This will have to do for now in lieu
of several posts on the different varieties of computational complexity that have, like Gandalf at The Prancing Pony, been delayed. I hope the inevitable brevity does not come
across as ‘shrill,’ even though the delay is of a nature to vex any sane person – Norbert will tell you the story if you ask.
1. Where I grew up, a language acquisition
model had 4 parts: (i) an initial state; (ii) a target state(s); (iii) input
data & (iv) an acquisition algorithm, mapping (i) to (ii), using (i) and
(iii). I have to side with Stephen Crain & Norbert on this one: figuring
out some of the constraints on (i), viz., Condition C, takes us part of the way, and in my book, a really long, long way, to an understanding
of language acquisition. Even though we don’t understand much about (iv). Of
course, one can decide to throw away all we’ve learned about (i) so far and choose
another lifestyle. But it would seem to me that the
burden of proof lies with those who choose to walk this alternative route,
since (e.g. Alex) notes that he has “no idea” where the constraints in (i) come
from.
2. As far as I can make out, much the
same holds for any biological
phenomenon we know anything about. By way of comparison, consider, eg, human
bipedalism. We know quite a bit about
the initial state: H. sap. but not Pan Troglodytes, winds up walking on 2
legs, and we know a lot about the genomic differences. But not enough. No biologist would venture to say, even
given all this knowledge about the differences in initial states, that they have a ‘complete acquisition algorithm’
that explains how human zygotes map to walking babies. Prediction is very difficult, especially about
the future, but it’s a fair guess that no such account is likely to be
forthcoming any time soon. However, this brute fact doesn’t seem to stop the
sequence machines running at the Broad Institute, nor the long string of Nobel
prizes handed down from Stockholm, nor for that matter, the day-to-day research
of any biologist I know. They just keep
plugging away, finding more and more about the initial state, and how that
differs between us and other species. Nor does any biologist I know ever rely
on, or even expect there to ever be, a ‘formal model’ for this mapping from (i)
to (ii), in the sense intended by folks like Alex. Now, all this parallels Condition C: a
universal competence that H. sap. has,
but Pan Trog. lacks. Why throw away all this hard-won knowledge? Would such folks really say that that we must now abandon DNA and figure out some general mapping ensuring that people wind up walking but chimps don't?
3. I don’t know about you,
but Alex could do me a big favor by carefully explaining exactly how all the
neat results that he always points to, eg, Ryo Yokoshina’s work on the
learnability of multiple-context free languages (MCFGs), (eg, Theor. Comp. Sci. 412:19, 1755-1852) actually work with natural language/grammar examples. Because, to be honest, I for one simply don’t
get it – at least when I read through these articles, the results always pull
punches, eg, Y’s results are actually about “new subclasses of multiple context-free languages with variants of substitutability”, and says, “the
obtained learnable classes are however not
rich, as we have seen in Section 3 that several rather simple languages are
not 2d-substitutable. pd-substitutability easily causes too
much generalization from finite languages even when p = 2” (p. 1828, italics mine; sorry, no space to cover the
terminology here). So are English, Warlpiri, etc. included here or no? I simply can’t tell. And the MCFG-MG bridge
seems equally shaky. Take for example
Alex’s own version of this, learning using the MCFG/substitution method as
applied to (one version of) minimalist grammars. (I’m referring now from the
slides and talk he gave on this at UQAM September 2011.) Here, at least for me,
there’s one big catch (really several): on first blush, my intuition tells me that the MCFGs all blow up
exponentially in size as compared to their strongly equivalent MG
counterparts. So does the learning. This is a common
succinctness phenomenon, as I’ll detail in my next blog. In my old thesis (1982), chapter 4, pp. 384-390, there’s a demonstration that GPSG grammars wind up exponentially larger than (strongly) equivalent GB style grammars (sadly, this part didn’t make it into the published book, due to page limits). As for the whole business of about
learnability-by-substitution, I’ve yet to see any method that blunts the force
of Chomsky’s thorough discussion (1955) of its pitfalls – take the discussion
on homonyms for starters. But I’m
prepared to be enlightened. Where’s the
beef?
4. I apologize in advance, but I won't be able to respond to the expected torrent of replies for a day or so – see under (0) above.
I can answer some of this very easily:
ReplyDeletenatural languages are certainly not in the class learnable by that paper in TCS -- that uses 2d-substitutability, which is far stronger than 1d-substituability and natural languages don't have that property as you know.
I conjecture that all natural languages (weakly) lie in the classes learnable by my joint paper with Ryo on learning PMCFGs. These grammars include a copying operation; I was convinced by Greg Kobele's thesis that these are needed in syntax, and they can also do things like case-stacking and reduplication in morphology.
But -- and this is a big "but" -- that learner uses membership queries. And it is a weak learner (also a big but).
So I have a learner which is probabilistic, and I have some proposals for strong learners (only in draft form but I hope to publish this summer), but all in separate papers. In an ideal world we would have everything in one formal model, but for the reasons that you sketch in part 1, that is unlikely to happen real soon, and perhaps unrealistic.
It also has one more flaw which you point out very precisely in the the second half of point 3.
MGs are equivalent to MCFGs, but MGS are exponetially more compact just as GPSGs are exponentially more compact that CFGs. (slightly different technically but you know what I mean).
So I completely agree that even the best learning algorithm for CFGs is too weak as we need to learn something more compact. That is something I have only vague ideas about -- what sort of feature systems can we learn? Another very good problem, but I am working mostly on the strong learning stuff at the moment.
I also completely agree with your comment about Chomsky 55 -- I just wrote a survey paper which said: (I quote)
"The naive application
of these heuristic approaches has been known for a long time to suffer from
some serious problems; Chomsky was perhaps the first to articulate these problems
explicitly in his doctoral thesis [6], and indeed much of the technical work that we
describe can be seen as an attempt to either answer or avoid those problems, which
at the time were taken to be compelling arguments against the very possibility of
distributional learning."
So I think the criticisms that Chomsky 55 make are exactly the right ones -- they are amazingly prescient, and though we have solutions to some of them now, some still resist analysis.
Wow! This can be one particular of the most beneficial blogs. We have ever arrive across on this subject Basically Great. I am also an expert in this topic therefore I can understand your effort.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDelete