This post follows up on this
one. The former tries to identify the relevant computational problems that
need solving. This one discusses some ways in which the Marrian perspective
does not quite fit the linguistic situation. Here goes.
Theories that address computational problems are
computational level 1 theories. Generative Grammar (GG) has offered accounts of
specific Gs of specific languages, theories of FL/UG that describe the range of
options for specific Gs (e.g. GB is such a theory of FL/UG) and accounts that
divide the various components of FL into the linguistically specific (e.g.
Merge) and the computationally/cognitively general (e.g. Merge vs. feature
checking and minimal search). These accounts aim to offer partial accounts for
the three questions in (1-3). How do they do this? By describing,
circumscribing and analyzing the class of generative procedures that Gs
incorporate. If these theories are on the right track, they partially explain
how it is that native speakers can understand and produce language never before
encountered, what LADs bring to the problem of language acquisition that
enables them to converge on Gs of the type they do despite the many splendored
poverty of the linguistic input and (this is by far the least developed
question) how FL might have arisen from a pre-linguistic ancestor. As these are
the three computational problems, these are all computational theories. However,
the way linguists do this is somewhat different from what Marr describes in his
examples.
Marr’s general procedure is to solve level 1 problems by
appropriating some already available off the shelf “theory” that models the
problem. So, in his cash register example, he notes that the problem is
effectively an arithmetical one (four functions and the integers). In vision
the problem is deriving physical values of the distal stimulus given the input
stimuli to the visual system. The physical values are circumscribed by our
theories of what is physically possible (optics, mechanics) and the problem is
to specify these objective values given proximate stimuli. In both cases,
well-developed theories (arithmetic, optics) serve to provide ways of
addressing the computational problem.
So, for example, the cash register “solves” its problems by
finding ways of doing addition, subtraction, multiplication and division of
numbers which corresponds to adding items, subtracting discounts, adding many
of the same item and providing prices per unit. That’s what the cash register
does. It does basic arithmetic. How does it do it? Well that’s the level 2
question. Are prices represented in base 2 or base 10? Are discounts registered
on the individual items as registered or is taken off the total at the end?
These are level 2 questions of the level 1 arithmetical theory. There is then a
level 3 question: how are the level 2 algorithms and representations embodied?
Silicon? Gears and fly-wheels? Silly putty and string? But observe that the
whole story begins with a level 1 theory that appropriates an off the shelf
theory of arithmetic.
The same is true of Marr’s theories of early vision where
there are well-developed theories of physical optics to leverage a level 1
theory.
And this is where linguistics is different. We have no
off-the shelf accounts adequate to describing the three computational problems
noted. We need to develop one and that’s what GG aims to do: specify level 1
computational theories to describe the lay of the linguistic land. And how do
we do this? By specifying generative procedures and representations and
conditions on operations. These theories circumscribe the domain of the possible; Gs tell us what a possible linguistic object in a specific
language is, FL/UG tells us what a possible
G is and Minimalist theories tell us what a possible
FL is. This leaves the very real question of how the possible relates to the
occurrent: how do Gs get used to figure out what this sentence means? How does FL/UG get used to build this G that the LAD is acquiring? How
does UG combine with the cognitive and computational capacities of our
ancestors to yield this FL (i.e. the
ones humans in fact have)? Generative procedures are not algorithms, and (e.g.)
representations the parser uses need not be the ones that our level 1 G
theories describe.
Why mention this? Because it is easy to confuse procedures
with algorithms and representations in Marr’s level 2 sense with Chomsky’s
level 1 sense. I know that I confused them, so this is in part a mea culpa
and in part a public service. At any rate, the levels must be kept conceptually
distinct.
I might add that the reason Marr does not distinguish
generative procedures from algorithms or level 1 from level 2 representations
is that for him, there is no analogue of generative procedures. The big
difference between linguistics and vision is that the latter is an input system
in Fodor’s sense, while language is a central system. Early visual perception
is more or less pattern recognition, and the information processing problem is
to get from environmentally generated patterns to the physical variables that
generate these patterns.[1]
There is nothing analogous in language, or at least not
large parts of it. As is well known, the syntactic structures we find in Gs are
not tied in any particular way with the physical nature of utterances.
Moreover, linguistic competence is not related to pattern matching. There are
an infinite number of well-formed “patterns,” (a point that Jackendoff rightly
made many moons ago). In short, Marr’s story fits input systems better than it
does central systems like linguistic knowledge.
That said, I think that the Marr picture presses an issue
that linguists should be taking more seriously. The real virtue of Marr’s
program for us lies in insisting that
the levels should talk to one
another. In other words, the work on any level could (and should) inform the
theories at the other levels. So, if we know what kinds of algorithms
processors use then this should tell us something abut the right kinds of level
1 representations we should postulate.
The work by Pietroski et. al. on most (discussed here)
provides a nice illustration of the relevant logic. They argue for a particular
level 1 representation of most in
virtue of how representations get used to compare quantities in certain visual
tasks. The premise is that transparency between level 1 and level 2 representations
is a virtue. If it is, then we have an argument that the structure of most looks like this: |{x: D (x) & Y (x)}| > |{
x: D (x)}| - |{x: D(x) & Y (x)}| and not like
this: |{x: D (x) & Y (x)}| > {x: D (x)
& - Y (x)}|.
Is transparency a reasonable
assumption. Sure, in principle. Of course, we may find out that it raises
problems (think of the Derivational Theory of Complexity (DTC) in days of
yore). But I would argue that this is a good
thing. We want our various level theories to inform one another and this means
countenancing the likelihood that the various kinds of claims will rub roughly
against one another quite frequently. Thus we want to explore ideas like the
DTC and representational transparency that link level 1 and level 2 theories.[2]
Let me go further: in other
posts I have argued for a version of the Strong Minimalist Thesis (here and here and
here) which can be recast in Marr terms as
follows: assume that there is a strong transparency between level 1 and level 2
theories in linguistics. Thus, the objects of parsing are the same as those we
postulate in our competence theories, and the derivational steps index
performance complexity a BOLD responses and other CN measures of occurrent
processing and real time acquisition and… This is a very strong thesis for it says that the categories and procedures
we discover in our level 1 theories strongly correlate with the algorithms and
representations in our level 2 theories. That would be a very strong claim and
thus very interesting. In fact, IMO, interesting enough to take as a regulative
ideal (as a good research hypothesis to be explored until proven decisively
wrong, and maybe even then). This is what Marr’s logic suggests we do, and it
is something that many linguists feel inclined to resist. I don’t think we
should. We should all be Marrians now.
To end: Marr’s view was that
CNers ignored level 1 theories to their detriment. In practice this meant
understanding the physical theories that lie behind vision and the physical
variables that an information processing account of vision must recover. This
perspective had real utility given the vast amount we know about the physical
bases of visual stimuli. These can serve to provide a good level 1 theory.
There is no analogue in the domain of language. The linguistic properties that
we need to specify in order to answer the three computational problems in (1-3)
are not tied down in any obvious ways to the physical nature of the “input.”
Nor do Gs or FL appear to be all that interesting mathematically so that there
is off the shelf stuff that we can use to specify the countours of the
linguistic problem. Sure, we know that we need recursive Gs, but there are
endlessly many different kinds of recursive systems and what we want for a
level 1 linguistic theory is a specification of the one that characterizes our
Gs. Noting that Gs are recursive is, scientifically, a very modest observation
(indeed, IMO, close to trivial). So, a good deal of the problem in linguistics
is that posing the problem does not invite a lot of pre-digested technology
that we can throw at it (like arithmetic or optics). Too bad.
However, thinking in level terms
is still useful for it serves as a useful reminder that we want our level 1
theories to talk to the other levels. The time for thinking in these terms
within linguistics has never been more ripe. Marr 3 level format provides a
nice illustration of the utility of such cross talk.
[1]
Late vision, the part that gets to object recognition, is another matter. From
what I can tell, “higher” vision is not the success story that early vision is.
That’s why we keep hearing about how good computers are at finding cats in
Youtube videos. One might surmise that the problem vision has with object
recognition is that they have not yet developed a good level 1 theory of this
process. Maybe they need to develop a notion of a “possible” visual object.
Maybe this will need a generative combinatorics. Some have mooted this
possibility. See this
on “geons.” This kind of theory is recognizably similar to our kinds of GGs. It
is not an input theory, though like a
standard G it makes contact with input systems when it operates.