Comments on Faculty of Language: Baker’s Paradox II: Stay Positive

:) Happy to chime in here! Indeed, that makes goo...

2014-06-04T13:07:22.491-07:00

:) Happy to chime in here!

Indeed, that makes good sense (and I can see the relationship to the paradigmatic gaps mentioned in post III). So this then brings it back to the potential INE interpretation -- because no generalization to a productive rule occurs, this is equivalent to defaulting to H2 (no productive rule, just memorize). Under a story where you're weighing H1 vs. H2, the decision to default to H2 seems like you're (for now) assuming that the other 6 are going to be exceptions because you haven't (yet) observed them following the rule.

But anyway, maybe this is something that you'll be taking up again in that later post that'll discuss expectation vs. exception. :)

Hi Lisa, good to see you here. I think for the ca...

2014-06-04T11:47:05.980-07:00

Hi Lisa, good to see you here. I think for the case you raise, the learner will not generalize at all: it will memorize the 5. The other 6 may be picked up by the still more general rule--if there is one--but the learner will not use them like the other 5. I think the case of paradigmatic gaps in the next post is similar.

Yeah, I definitely had the same initial impression...

2014-06-04T11:26:52.366-07:00

Yeah, I definitely had the same initial impression Benjamin had about INE (which I mentioned to you to in a separate comm). But I think I can see where you're coming from about the distinction. The idea I have in mind goes something like this.

The hypotheses would be H1 = it's a productive rule vs. H2 = it's not productive and these are all individual lexical things that need to be memorized. The "how much is enough" is a way to make a decision between H1 and H2. So, if you observe enough direct positive evidence of the productive rule (i.e., N/ln N or fewer exceptions could possibly exist, given what you've seen, like in the 8 out of 11 example you give here), it doesn't matter whether those exceptions do exist or not -- H1 holds. So that's different from INE.

However, the INE might come back if you haven't yet observed enough so that you know for sure there are at most N/ln N exceptions (e.g., observing 5 of the 11 obeying the productive rule). How do you make the decision about what the other 6 do so that you can choose between H1 and H2? It sounds like the default would be to assume H2 (these are all individually memorized since you've only seen 5 of 11 obeying the rule), so the learner is effectively interpreting the absence of the productive rule with the other 6 as if they were definitely exceptions (and so H2 is true). That interpretation of the missing evidence does seem very INE.

Anyway, I should go check out your third post on this, too. ;)

Because 'right' doesn't occur with asl...

2014-06-02T17:42:15.862-07:00

Because 'right' doesn't occur with asleep, awake, alive or most of the others? My attempts in the past to come up with distributional criteria for AP vs PP that seemed true to me and could be taught to students have not been very successful, hopefully because the phenomena are not simple rather than because I'm an idiot.

Another point: none of us here believe in statistical learning without any kind UG at all (in the 'broad' sense of UG where it's just whatever bias the learner has wrt language, regardless of whether it's a property of a task-specific language faculty). So it is not necessarily a problem that lo frequency of ordinary attributive adjectives in ordinary relative clauses doesn't cause them to be preempted by prenominal ones, since these adjectives to occur as predicates of main clauses, and UG might make it difficult to block 'the man who is tall' without also blocking 'the man is tall'.

'Might' being a key word here; as you rightfully insist, we need more in the way of specific proposals. It seems to me that either algorithms or MDL/Bayesian identifications of better analyses for contemporary syntactic theories that people actually use are needed; the current Bayesian/MDL work does seem to be mostly limited to toy theories, or non-toy ones such as CCG which are too limited in terms of what they can do typologically.

Hi Avery: This is why the model is a purely distr...

2014-06-02T08:15:55.580-07:00

Hi Avery: This is why the model is a purely distributional one! How to analyze the structural properties of these items is above my pay scale. But I don't think you need to notice absences. By far the most common use of the "right" type of intensifiers is "right here/there/away", more frequently than even with PPs. Why can't the child say: if A is used with "right", B is used with "right", then A and B behave similarly? If it walks like a duck and quacks like a duck ...

Hi Benjamin: This may be a terminological point, o...

2014-06-02T07:50:15.746-07:00

Hi Benjamin: This may be a terminological point, or maybe I didn't express myself well. The learning model I advocate evaluates the batting averages of rules (e.g., 8/11 in a corpus) to see if they are sufficiently high and thus worthy of generalization. The learner does not need any hypothesized expectation of the 3 unattested items. Even if the three items do NOT follow the rule (i.e., they appear prenominally, and in fact one mother did say "the alive one"), the generalization is still warranted. This, I think, is quite different from the way the INE is typically considered: if you hypothesis a rule that does X, you expect to see the attestation of X. I will come back to this point in a later post because I think the framing of the Paradox mixes up expectations and exceptions and makes it more paradoxical that it is.

The main problem I've having so far with this ...

2014-06-02T05:33:54.893-07:00

The main problem I've having so far with this is how the non-preposable adjectives (including locational 'present') are detected as being PP's, since the evidence seems to be that they appear with different intensifiers (right, straight, well, but not very) than regular adjectives do, but that also seems like it might be hard to detect this without the capacity to notice absences. Especially difficult in this case because the facts about what intensifiers are usable with these crypto PP seem to be very confusing.

2014-06-02T05:23:02.970-07:00

This comment has been removed by the author.

Thanks for the interesting write-ups, I'm look...

2014-06-02T03:05:39.438-07:00

Thanks for the interesting write-ups, I'm looking forward to the next posts.

I'm not sure I agree with how you characterize indirect negative evidence. You write

"Conversely, if the learner does not witness enough positive instances, it will decide the generalization is unproductive, proceed to lexicalize the positively attested examples and refrain from extending the pattern to novel items."

which sounds exactly right but also sounds like one of the (rather obvious) ways of spelling out negative indirect evidence for me: Given the positive examples, form an expectation of what ought to hold if the pattern were productive. And if that expectation is not met, do not generalize. In other words, the fact that there aren't enough positive examples constitutes indirect negative evidence against a productive generalization.

I guess this is really just a terminological point but perhaps there is something else I'm missing, so I thought I'd just bring this up.