Comments

Showing posts with label pattern matching. Show all posts
Showing posts with label pattern matching. Show all posts

Wednesday, January 10, 2018

An update on Empiricism vs Rationalism debates

All papers involve at least two levels of discourse. The first comprises the thesis: what is being argued? What evidence is there for what is being argued? What’s the point of what is being argued? What follows from what is being argued? The second is less a matter of content than of style: how does the paper say what it says? This latter dimension most often reveals the (often more obscure) political dimensions of the discipline, and relates to topics like the following: What parts of one’s butt is it necessary to cover so as to avoid annoying the powers that be? Which powers is it necessary to flatter in order to get a hearing or to avoid the ruinous “high standards of scrutiny” that can always be deployed to block publication? Whose work must get cited and whose can be safely ignored? What are the parameters of the discipline’s Overton Window? If you’ve ever been lucky enough to promote an unfashionable position, you have developed a sensitivity to these “howish” concerns. And if you have not, you should. They are very important. Nothing reveals the core shibboleths (and concomitant power structures) of a discipline more usefully than a forensic inquiry into the nature of the eggshells upon which a paper is treading.  In what follows I would like to do a little eggshell inspection in discussing a pretty good paper from a (for me) unexpected source (a bunch of Bayesians). The authors are Lake, Ullman, Tenenbaum and Gersham (LUTG) all from the BCS group at MIT. The paper (here) is about the next steps forward for an AI that aspires to be cognitively interesting (like the AI of old), and maybe even technologically cutting edge, (though LUTG makes this second point very gingerly).

But that is not all the paper is about. LUTG is also a useful addition to the discussions about Empiricism (E) vs Rationalism (R) in the study of mind, though LUTG does not put matters in quite this way (IMO, for quasi-political reasons, recall “howish” concerns!). To locate the LUTG on this axis will require some work on my part. Here goes.

As I’ve noted before, E and R have both a metaphysical and an epistemological side. Epistemologically, E takes minds to be initially (relatively) unstructured, with mental contour forged over time as a by-product of environmental input as sampled by the senses.  Minds on this view come to represent their environments more or less by copying their structures as manifest in the sensory input.  Successful minds are ones that effectively (faithfully and efficiently) copy the sensory input. As Gallistel & Matzel put it, for Es minds are “recapitulative,” their function being to reliably take “an input that is part of the training input, or similar to it, [to] evoke[s] the trained output, or an output similar to it” (see here for discussion and links). Another way of putting this point is that E minds are pattern matchers, whose job is to track the patternings evident in the input (see here). Pattern matching is recapitulative and E relies on the idea that cognition amounts to tracking the patternings in the sensory input to yield reliable patterns.

Coupled with this E theory of mind comes a metaphysics. Reality has a relatively flat causal structure. There is no rich hierarchy of causal mechanisms whose complex interactions lie behind what we experience. Rather, what you sense is all there is. Therefore, effectively tracking and cataloguing this experience and setting up the right I/O associations suffices to get at a decent representation of what exists. There is no hidden causal iceberg of which this is the sensory/perceptual tip. The I/O relations are all there is. Not surprisingly (this is what philosophers are paid for), the epistemology and the metaphysics fit snugly together.

R combines its epistemology and its metaphysics differently. Epistemologically, R regards sensory innocent minds to be highly structured. The structure is there to allow minds to use sensory/perceptual information to suss out the (largely) hidden causal structures that produce these sensations/perceptions. As Gallistel & Matzel put it R minds are more like information processing systems structured to enable minds to extract information about the unperceived causal structures of the environment that generate the observed patterns of sensation/perception. On the R view, minds sample the available sensory input to construct causal models of the world which generate the perceived sensory patterns. And in order to do this, minds come richly stocked with the cognitive wherewithal necessary to build such models. R epistemology takes it for granted that what one perceives vastly underdetermines what there is and takes it to be impossible to generate models of what there is from sensory perception without a big boost from given (i.e. unlearned) mental structures that make possible the relevant induction to the underlying causal mechanisms/models.

R metaphysics complements this epistemological view by assuming that the world is richly structured causally and that what we sense/perceive is a complex interaction effect of these more basic complexly interacting causal mechanisms. There are hidden powers etc. that lie behind what we have sensory access to and that in no way “resembles” (or “recapitulates”) the observables (see here for discussion).

Why do I rehearse these points yet again? Because I want to highlight a key feature of the E/R dynamic: what distinguishes E from R is not whether pattern matching is a legit mental operation (both views agree that it is (or at least, can be)). What distinguishes them is the Es think that this is all there is (and all there needs to be), while Rs reserve an important place for another kind of mental process, one that builds models of the underlying complex non visible causal systems that generate these patterns. In other words, what distinguishes E from R is the belief that there is more to mental life than tracking the statistical patterns of the inputs. R doesn’t deny that this is operative. R denies that this suffices. There needs to me more, a lot more.[1]

Given this backdrop, it is clear that GG is very much in the R tradition. The basic observation is that human linguistic facility requires knowledge a G (a set of recursive rules). Gs are not perceptually visible (though their products often are). Further, cursory inspection of natural languages indicates that human Gs are quite complex and cannot be acquired solely by induction (even sophisticate discovery procedures that allowed for inductions over inductions over inductions…). Acquiring Gs requires some mental pre-packaging (aka, UG) that enables an LAD to construct a G on the basis of simple haphazard bits of G output (aka, PLD). Once acquired, Gs can be used to execute many different kinds of linguistic behaviors, including parsing and producing novel sentences and phrases. That’s the GG conceit in a small nutshell: human linguistic facility implicates Gs which implicates UGs, both G and UG being systems of knowledge that can be put to multiple uses, including G acquisition, and the production and comprehension of an unbounded number of very different novel linguistic expressions within a given natural language. G, UG, competence vs performance: the hallmarks of a R theory of linguistic minds.

LUTG makes effectively these points but without much discussing the language case (it does at the end but mainly to say that it won’t discuss it).  LUTG’s central critical claim is that human cognition relies on more than pattern matching. There is, in addition, model building, which relies on two kinds of given mental contents to guide learning.  The first kind, what LUTG calls “core” (and an R would call “given”) involves substantive specific content (e.g. naïve physics and psychology). The second involves formal properties (e.g. compositionality and certain kinds of formal analogy (what LUTG calls “learning-to-learn”).[2] LUTG notes that these two features pre-condition further learning and are indispensible. LUTG further notes that the two mental powers (i.e. model building and pattern recognition) can likely be fruitfully combined, though it (rightly, IMO) insists that model building is the central cognitive operation. In other words, LUTG recapitulates the observations above that R theories can incorporate E mechanisms and that E mechanisms alone are insufficient to model human cognition and that without the part left out, E pattern matching models are poor fits with what we have tons of evidence to be central features of cognitive life. Here’s how the abstract puts it:

We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

The paper usefully goes over these points in some detail.  It further notes that this conception of the learning (I prefer the term “acquisition” myself) problem in cognition challenges associationist assumptions which LUTG observes (again rightly IMO) is characteristic of most work in connectionism, machine learning and deep learning. LUTG also points to ways that “model free methods” (aka pattern matching algorithms) might usefully supplement the model building cognitive basics to improve performance and implementation of model based knowledge (see 1.2 and 4.3.2).[3]

Section 3 is the heart of the critique of contemporary AI, which largely ignores the (R) model building ethos that LUTG champions. As it observes, the main fact about humans when compared with current AI models is that humans “learn a lot more from a lot less” (6). How much more? Well, human learning is very flexible and rarely tied to the specifics of the learning situation. LUTG provides a nice discussion of this in the domain of learning written characters. Further, human learning generally requires very little “input.” Often, a couple of minutes or exposures to the relevant stimulus is more than enough. In fact, in contrast to current machine learning or DL systems humans do not need to have the input curated and organized (or, as Geoff Hinton recently put it (here): humans, as opposed to DLers, “clearly don’t need all the labeled data.”). And why not? Because unlike what DLers and connectionists and associationists have forever assumed, human minds (and hence brains) are not E devices but R ones. Or as LUTG puts it (9):

People never start completely from scratch, or even close to “from scratch,” and that is the secret to their success. The challenge of building models of human learning and thinking then becomes: How do we bring to bear rich prior knowledge to learn new tasks and solve new problems so quickly? What form does that prior knowledge take, and how is it constructed, from some combination of inbuilt capacities and previous experience?

In short, the blank tablet assumption endemic to E conceptions of mind is fundamentally misguided and a (the?) central problem of cognitive psychology is to figure out what is given and how this given stuff is used. Amen (and about time)!
Section 4 goes over what LUTG takes to be some core powers of human minds. This includes naïve theories of how the physical world functions and how animate agents operate. In addition, with a nod to Fodor, it outlines the critical role of compositionality in allowing for human cognitive productivity.[4] It is a nice discussion and makes useful comments on causal models and their relation to generative ones (section 4.2.2).
Now let’s note a few eggshells. A central recurring feature of the LUTG discussion is the observation that it is unclear how or whether current DL approaches might integrate these necessary mechanisms. The paper does not come right out and say that DL models will have a hard time with these without radically changing its sub-symbolic associationist pattern matching monomania, but it strongly suggests this. Here’s a taste of this recurring theme (and please note the R overtones).

As in the case of intuitive physics, the success that generic networks will have in capturing intuitive psychological reasoning will depend in part on the representations humans use. Although deep networks have not yet been applied to scenarios involving theory of mind and intuitive psychology, they could probably learn visual cues, heuristics, and summary statistics of a scene that happens to involve agents. If that is all that underlies human psychological reasoning, a data-driven deep learning approach can likely find success in this domain.
However, it seems to us that any full formal account of intuitive psychological reasoning needs to include representations of agency, goals, efficiency, and reciprocal relations. As with objects and forces, it is unclear whether a complete representation of these concepts (agents, goals, etc.) could emerge from deep neural networks trained in a purely predictive capacity. Similar to the intuitive physics domain, it is possible that with a tremendous number of training trajectories in a variety of scenarios, deep learning techniques could approximate the reasoning found in infancy even without learning anything about goal- directed or socially directed behavior more generally. But this is also unlikely to resemble how humans learn, understand, and apply intuitive psychology unless the concepts are genuine. In the same way that altering the setting of a scene or the target of inference in a physics-related task may be difficult to generalize without an understanding of objects, altering the setting of an agent or their goals and beliefs is difficult to reason about without understanding intuitive psychology.

Yup. Right on. But why so tentative? “It is unclear?” Nope it is very clear, and has been for about 300 years since these issues were first extensively discussed. And we all know why. Because concepts like “agent,” “goal,” “cause,” “force,” are not observables and not reducible to observables. So, if they play central roles in our cognitive models then pattern matching algorithms won’t suffice. But as this is all that E DL systems countenance, then DL is not enough. This is the correct point, and LUTG makes it. But note the hesitancy with which it does so. Is it too much to think that this likely reflects the issue mooted at the outset about the implicit politics that one’s dance over the eggshells reveals.
It is not hard to see from how LUTG makes its very reasonable case that it is a bit nervous about DL (the current star of AI). LUTG is rhetorically covering its posterior while (correctly) noting that unreconstructed DL will never make the grade. The same wariness makes it impossible for LUTG to acknowledge its great debt to R predecessors.[5] As LUTG states, its “goal is to build on their [neural networks, NH] successes rather than dwell on their shortcomings” (2). But those that always look forward and never back won’t move forward particularly well either (think Obama and the financial crisis). Understanding that E is deeply inadequate is a prerequisite for moving forward. It is no service to be mealy-mouthed about this. One does not finesse one’s way around veery influential bad ideas.
Ok, so I have a few reservations about how LUTG makes its basic points. That said, this is a very useful paper. It is nice to see this coming out of the very influential Bayesian group at MIT and in a prominent place like B&BS. I am hoping that it indicates that the pendulum is swinging away from E and towards a more reasonable R conception of minds. As I’ve noted the analogies with standard GG practice is hard to miss. In addition LUTG rightly points to the shortcomings with connectionist/deep learning/neural net approaches to mental life. This is good. It may not be news to many of us, but if this signals a return to R conceptions of mind, it is a very positive step in the right direction.[6]


[1] R often goes farther: that even tracking the relevant perceptual regularities requires lots of given mental baggage. The world does not come pre-labeled. So zeroing in on the relevant dimensions for inductive generalization itself requires lots of pre-packaged “knowledge.” Geoff Hinton (the daddy of deep learning) seems to have come to a similar view of late concerning how hard it is to get things off the ground without curated data. See below for a reference.
[2] GGers should find this reminiscent of Chomsky’s discussion in Aspects of substantive and formal universals and how these interact to create Gs (models) of the ambient degenerate and deficient linguistic input available to the child (PLD).
[3] Or to put this in GG friendly terms; LUTG resurrects something very like a competence/performance distinction, model building being analogous to the former and model free methods applied to these models being analogous to the latter. An idea I found interesting is that given models of competence provides a useful domain for performance models wherein sophisticated pattern matching algorithms do their work. Conceptually, this idea seems very similar to what Charles Yang has been advocating as well (see here).
[4] Actually, though compositionality is critical to productivity, it does not suffice for generating an “infinite number of thoughts” or using an “infinite number of sentences”  “from a finite set if primitives” (14). For this we need more than compositionality, we need recursion as well.
[5] It is amazing, IMO, how small a role Chomsky plays in LUTG’s discussion. So far as I can tell, all of its major points were developed in the modern period by him. I am pretty sure that some of the authors know this but that highlighting this fact would hurt the paper politically by offending the relevant leading DL lights.
[6] BTW, LUTG has a nice discussion of the biological “plausibility” of neural net models of the brain. The short point is that short of being pictorially suggestive, there is no real reason for thinking that brains are like connectionist nets. As LUTG puts it (20):

Many seemingly well-accepted ideas concerning neural computation are in fact biologically dubious, or uncertain at best…

For example, most neural networks use some form of gradient based (e.g. back propogation) or Hebbian learning. It has long been argued, however, that
backpropagation is not biologically plausible. As Crick (1989) famously pointed out, backpropagation seems to rquire that information be transmitted backward along the axon, which does not fit with realistic models of neuronal function…

As LUTG observes, this should not in and of itself stop people from investigation neural nets as possible models of brain computation, but it should put an end to the prejudice that brains are nets because they look net-like. Sheesh! 

Wednesday, July 13, 2016

Linguistic creativity 1

Once again, this post got away from me, so I am dividing it into two parts.

As I mentioned in a recent previous post, I have just finished re-reading Language & Mind (L&M) and have been struck, once again, about how relevant much of the discussion is to current concerns. One topic, however, that does not get much play today, but is quite well developed in L&M is it’s discussion of Descartes’ very expansive conceptions of linguistic creativity and how it relates to the development of the generative program. The discussion is surprisingly complex and I would like to review its main themes here. This will reiterate some points made in earlier posts (here, here) but I hope it also deepens the discussion a bit.

Human linguistic creativity is front and center in L&M as it constitutes the central fact animating Chomsky’s proposal for Transformational Generative Grammar (TGG). The argument is that a TGG competence theory is a necessary part of any account of the obvious fact that humans regularly use language in novel ways. Here’s L&M (11-12):

…the normal use of language is innovative, in the sense of much of what we say in the course of normal use is entirely new, not a repetition of anything that we have heard before and not even similar in pattern - in any useful sense of the terms “similar” and “pattern” – to sentences or discourse that we have heard in the past. This is a truism, but an important one, often overlooked and not infrequently denied in the behaviorist period of linguistics…when it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training, with innovation being at most a matter of “analogy.” The fact surely is, however, that the number of sentences in one’s native language that one will immediately understand with no feeling of difficulty or strangeness is astronomical; and that the number of patterns underlying our normal use of language and corresponding to meaningful and easily comprehensible sentences in our language is order of magnitudes greater than the number of seconds in a lifetime. It is in this sense the normal use of language is innovative.

There are several points worth highlighting in the above quote. First, note that normal use is “not even similar in pattern” to what we have heard before.[1] In other words, linguistic competence is not an instance of pattern matching or recognition in any interesting sense of “pattern” or “matching.”  Native speaker use extends both to novel sentences and to novel sentence patterns effortlessly. Why is this important?

IMO, one of the pitfalls of much work critical of GG is the assimilation of linguistic competence to a species of pattern matching.[2] The idea is that a set of templates (i.e. in L&M terms: “a stored set of patterns”) combined with a large vocabulary can easily generate a large set of possible sentences in the sense of templates saturated by lexical items that fit. [3] Note, that such templates can be hierarchically organized and so display one of the properties of natural language Gs (i.e. hierarchical structures).[4] Moreover, if the patterns are extractable from a subset of the relevant data then these patterns/templates can be used to project novel sentences. However, what the pattern matching conception of projection misses is that the patterns we find in Gs are not finite and the reason for this is that we can embed patterns within patterns within patterns within…you get the point. We can call the outputs of recursive rules “patterns” but this is misleading for once one sees that the patterns are endless, then Gs are not well conceived of as collections of patterns but collections of rules that generate patterns. And once one sees this, then the linguistic problem is (i) to describe these rules and their interactions and (ii) to further explain how these rules are acquired (i.e. not how the patterns are acquired).

The shift in perspective from patterns (and patternings in the data (see note 5)) to generative procedures and the (often very abstract) objects that they manipulate changes what the acquisition problem amounts to. One important implication of this shift of perspective is that scouring strings for patterns in the data (as many statistical learning systems like to do) is a waste of time because these systems are looking for the wrong things (at least in syntax).[5] They are looking for patterns whereas they should be looking for rules. As the output of the “learning” has to be systems of rules, not systems of patterns, and as rules are, at best, implicit in patterns, not explicitly manifest by them, theories that don’t focus on rules are going to be of little linguistic interest.[6]

Let me make this point another way: unboundedness implies novelty, but novelty can exist without unboundedness. The creativity issue relates to the accommodation of novel structures. This can occur even in small finite domains (e.g. loan words in phonology might be an example). Creativity implies projection/induction, which must specify a dimension of generalization along which inputs can be generalized so as to apply to instances beyond the input. This, btw, is universally acknowledged by anyone working on learning. Unboundedness makes projection a no-brainer. However, it also has a second important implication. It requires that the generalizations being made involve recursive rules. The unboundedness we find in syntax cannot be satisfied via pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.

Dare I add (more accurately, can I resist adding) that pattern matching is the flavor of choice for the Empricistically (E) inclined. Why? Well, as noted, everyone agrees that induction must allow generalization beyond the input data. Thus even Es endorse this for Es recognize that cognition involves projection beyond the input (i.e. “learning”). The question is the nature of this induction. Es like to think that learning is a function from input to patterns abstracted from the input, the input patterns being perceptually available in their patternings, albeit sometimes noisily.[7] In other words, learning amounts to abstracting a finite set of patterns from the perceptual input and then creating new instances of those patterns by subbing novel atoms (e.g. lexical items) into the abstracted patterns. E research programs amount to finding ways to induce/abstract patterns/templates from the perceptual patternings in the data. The various statistical techniques Es explore are in service of finding these patterns in the (standardly, very noisy) input. Unboundedness implies that this kind of induction is, at best, incomplete. Or, more accurately, the observation that the number of patterns is unbounded implies that learning must involve more than pattern detection/abstraction. In domains where the number of patterns is effectively infinite, learning[8] is a function from inputs to rules that generate patterns, not to patterns themselves. See link in note 6 for more discussion.

An aside: Most connectionist learners (and deep learners) are pattern matchers and, in light of the above, are simply “learning” the wrong things. No matter how many “patterns” the intermediate layers converge on from the (mega) data they are exposed to they will not settle on enough given that the number of patterns that human native speakers are competent in is effectively unbounded. Unless the intermediate layers acquire rules that can be recursively applied they have not acquired the right kinds of things and thus all of this modeling is irrelevant no matter how much of the data any given model covers.[9]

Another aside: this point was made explicitly in the quote above but to no avail. As L&M notes critically (11): “it was almost universally claimed that a person’s knowledge of language is representable as a stored set of patterns, overlearned through constant repetition and detailed training.” Add some statistical massaging and a few neural nets and things have not changed much. The name of the inductive game in the E world is to look for perceptual available patterns in the signal, abstract them and use them to accommodate novelty. The unboundedness of linguistic patterns that L&M highlights implies that this learning strategy won’t suffice the language case, and this is a very important observation.

Ok, back to L&M

Second, the quote above notes that there is no useful sense of “analogy” that can get one from the specific patterns one might abstract from the perceptual data to the unbounded number of patterns with which native speakers display competence. In other words, “analogy” is not the secret sauce that gets one from input to rules So, when you hear someone talk about analogical processes reach for your favorite anti-BS device. If “analogy” is offered as part of any explanation of an inferential capacity you can be absolutely sure that no account is actually being offered. Simply put, unless the dimensions of analogy are explicitly specified the story being proffered is nothing but wind (in both the Ecclesiastes and the scatological sense of the term).

Third, the kind of infinity human linguistic creativity displays has a special character: it is a discrete infinity. L&M observes that human language (unlike animal communication systems) does not consist of a “fixed, finite number of linguistic dimensions, each of which is associated with a particular nonlinguistic dimension in such a way that selection of a point along the linguistic dimension determines and signals selection of a point along the associated nonlinguistic dimension” (69). So, for example, higher pitch or chirp being associated with greater intention to aggressively defend territory or the way that “readings of a speedometer can be said, with an obvious idealization, to be infinite in variety” (12). 

L&M notes that these sorts of systems can be infinite, in the sense of containing “an indefinitely large range of potential signals.” However, in such cases the variation is “continuous” while human linguistic expression exploits “discrete” structures that can be used to “express indefinitely many new thoughts, intentions, feelings, and so on.”  ‘New thoughts’ in the previous quote clearly meaning new kinds of thoughts (e.g. the signals are not all how fast the car is moving). As L&M makes clear, the difference between these two kinds of systems is “not one of “more” or “less,” but rather of an entirely different principle of organization,” one that does not work by “selecting a point along some linguistic dimension that signals a corresponding point along an associate nonlinguistic dimension.” (69-70).

In sum, human linguistic creativity implicates something like a TGG that pairs discrete hierarchical structures relevant to meanings with discrete hierarchical structures relevant to sounds and does so recursively. Anything that doesn’t do at least this is going to be linguistically irrelevant as it ignores the observable truism that humans are, as matter of course, capable of using an unbounded number of linguistic expressions effortlessly.[10] Theories that fail to address this obvious fact are not wrong. They are irrelevant.

Is hierarchical recursion all that there is to linguistic creativity? No!! Chomsky makes a point of this in the preface to the enlarged edition of L&M. Linguistic creativity is NOT identical to the “recursive property in generative grammars” as interesting as such Gs evidently are (L&M: viii). To repeat, recursion is a necessary feature of any account aiming to account for linguistic creativity, BUT the Cartesian conception of linguistic creativity consists of far more than what even the most explanatorily adequate theory of grammar specifies.  What more?



[1] For an excellent discussion of this see Jackendoff’s very nice (though unfortunately (mis)named) Patterns in the mind (here).  It is a first rate debunking of the idea that linguistic minds are pattern matchers.
[2] This is not unique to the linguistic cognition. Lots of work in cog sci seems to identify higher cognition with categorization and pattern matching. One of the most important contributions of modern linguistics to cog sci has been to demonstrate that there is much more to cognition than this. In fact, the hard problems have less to do with pattern recognition than with pattern generation via rules of various sorts.  See notes 5 and 6 for more off handed remarks of deep interest.
[3] I suspect that some partisans of Construction Grammar fall victim to the same misapprehension.
[4] Many cog-neuro types confuse hierarchy with recursion. A recent prominent example is in Frankland and Greene’s work on theta roles. See here for some discussion. Suffice it to say, that one can have hierarchy without recursion, and recursion without hierarchy in the derived objects that are generated. What makes linguistic objects distinctive is that they are the products of recursive processes that deliver hierarchically structured objects.
[5] Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small finite domains. Creativity implies projection, which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization involves recursive rules. Unboundedness cannot be pattern matching. It requires a specification of rules that can be repeatedly applied to create novel patterns. Thus, it is important to keep the issue of unboundedness separate from that of projection. What makes the unboundedness of syntax so important is that it requires that we move beyond the pattern-template-categorization conception of cognition.
[6] It is arguable that some rules are more manifest in the data that others are and so are more accessible to inductive procedures. Chomsky makes this distinction in L&M, contrasting surface structures which contains “formal properties that are explicit in the signal” to deep structure and transformations for which there is very little to no such information in the signal (L&M:19). For another discussion of this distinction see (here).
[7] Thus the hope of unearthing phrases via differential intra-phrase versus inter-phrase transition probabilities.
[8] We really should distinguish between ‘learning’ and ‘acquisition.’ We should reserve the first term for the pattern recognition variety and adopt the second for the induction to rules variety. Problems of the second type call for different tools/approaches than those in the first and calling both ‘learning’ merely obscures this fact and confuses matters.
[9] Although this is a sermon for another time, it is important to understand what a good model does: it characterizes the underlying mechanism. Good models model mechanism, not data. Data provides evidence for mechanism, and unless it does so, it is of little scientific interest. Thus, if a model identifies the wrong mechanism not matter how apparently successful in covering data, then it is the wrong model. Period. That’s one of the reasons connectionist models are of little interest, at least when it comes to syntactic matters.
            I should add, that analogous creativity concerns drive Gallistel’s arguments against connectionist brain models. He notes that many animals display an effectively infinite variety of behaviors in specific domains (caching behavior in birds or dead reckoning in ants) and that these cannot be handled by connectionist devices that simply track the patterns attested. If Gallistel is right (and you know that I think he is) then the failure to appreciate the logic of infinity makes many current models of mind and brain beside the point.
[10] Note that unbounded implies novelty, but novelty can exist without unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in small sets. Creativity implies projection which must specify a dimension of generalization along which inputs can be extended to apply to instances beyond the input. Unboundedness makes projection a no-brainer. It further implies that the generalization is due to recursive rules that require more than establishing a fixed number of patterns that can be repeatedly filled to create novel instances of that pattern.