In standard Marr example, there are non-“mental” magnitudes that “mental” operations are aiming to estimate. So in vision it is shapes, trajectories, parallax, luminescence etc. We have theories of these magnitudes independent of how we estimate them cognitively. They are real physical magnitudes.
So too with addition and prices and cash registers. We understand what addition is and how it works quite independently of how we use it to provide a purchase price.
None of this holds in the language case (or other internal systems in Fodor’s sense). There are no physical magnitudes of relevance or math structures of use. Rather we are trying to zero in on a level 1 theory by seeing how it is used in stylized settings (judgment data being the prime mover here). The judgment task is an interesting probe into the elvel 1 theory because we have reason to believe that it provides a clean picture of the underlying mechanism. Why clean? Because it abstracts away from the exigencies of time pressure, storage pressure, attention pressure etc. that are part and parcel of real time performance. It’s the system functioning at its best because it is functioning well within the limits of its computational (time/space) capacities. That’s the advantage of data drawn in reflective equilibrium. However, this does not mean that it is resource unconstrained (after all any judgment involves parsing, hence memory and attention) but it means that the judgment does not run up against the limits of memory/attention and so likely displays the properties of the computational system more cleanly than if the computational system is cramped by non-computational constraints such as high memory or attention demands.
With this in mind, let’s now get back to a Marr conception of GG.
The creative aspect of language use (that humans can produce and understand an unbounded number of novel sentences) is the BIG FACT that, as Chomsky noted in Current Issues, is one of the phenomena in need of explanation. He notes there that a G is a necessary construct for explaining this obvious behavioral fact (note, that creativity holds is a fact about speakers and what they can do based on what we actually see them do). Without a function that relates sounds and meaning over an unbounded domain (delivers and unbounded number of <s,m> pairs) there is no possible hope of accounting for this behaviorally evident creative capacity. In other words, Gs are necessary for any account of linguistic creativity.
Here’s a Marr question: what level theory in Marr’s sense is a G in this context? Here’s a proposal: Gs are Marr level 1 theories. If this is correct, we can ask what level 2 theories might look like. Level 2 theories would show how to compute Gish level 1 properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory. The DTC specifies how level 1 constructs are related to measures of actual on-line sentence processing. If the DTC is correct, it suggests certain kinds of algorithms, one’s that track the derivational complexity of derivations. Of course, none of this is to endorse the DTC (though, to repeat, I do like it quite a bit), but to illustrate Marr-like logic as relates to linguistic theories.
The main problem with Marr's division is not that we can't use it (I just did), but that the explanatory leverage Marr got out of a level 1 theory in vision and cash registers seems absent in syntax. Why? Because Marr’s examples are able to use already available theories for level 1 purposes. In other words, there are already good level 1 theories on offer in vision and cash registers for the finding and these can serve to circumscribe the the computational problems that must be solved (i.e. explain how the physical optical paramters are mentally computed given input at the retina or how arithmetical functions are embodied in the cash register). Let’s elaborate a bit.
In the vision case, the level 1 account is built on a theory of physical optics which relates objective (non-mental) physical magnitudes (luminescence, shape, parallax, motion, etc.) to info available on the retina. The computational description of the problem becomes how to calculate these real physical magnitudes from retinal inputs. This is a standard inverse problem as there are many ways for these physical variables to relate to patterns of activity on the retina. So the problem is to find the right set of mental constraints in the mental computation that given the retinal input delivers values for the objective variables the level 1 theory specifies. Concepts like “rigidity” serve this purpose to get you shape from motion. Rigidity makes sense as a level 2 computational constraint given the level 1 properties of visual system. So if we assume, for example, objects are rigid then computing their shape using retinal inputs is possible if we can compute their motion from retinal inputs.
In the cash register case in place of optics we have arithmetic. It turns out that calculating grocery bills is a simple math problem with a recognizable arithmetical structure which prices in items embody and that cash registers can calculcate. Given this (i.e. given that we know what the calculation is) we can ask how a cash register does the calculation in real time. How does the cash register do addition? How does it “represent” numbers numerically? Etc.
None of this is level 1 leverage is available in the language case. Thus, Gs are not constrained by physical magnitudes in the way vision is (the “physics” of language tells us next to nothing about linguistically relevant variables) and there is no interesting math that lies behind syntax (or if there is we haven't found it yet). Linguists need to construct the level 1 theory from scratch and that's what GGers do. The problem does tell us that speakers have internalized recursive procedures (RP) but not the kinds of RPs (and there are endlessly many). It’s the job of GGers to discover the kinds of RPs that native speakers have when they are linguistically able. We argue that our internalized use rules with a certain limited format and generate representations of a certain limited shape. The data we typically use is performance data (judgments) hopefully sanitized to remove many performance impediments (like memory constraints and attention issues). We assume that this data reflects an underlying mental system (or at least I do) that is casually responsible for the judgment data we collect. So we use some cleanish performance data (i.e. not distorted by sever performance demands) to infer something about the structure of a level 1 theory.
Now if this is the practice, then it looks like it runs together level 1 and level 2 considerations. You cannot judge what you cannot parse. But that's life. We also recognize that delving more deeply into the details of performance might indicate that the level 1 theories we have might need refining (the representations we assume might not be the ones that real time parsing uses, the algorithms might not reflect the derivational complexity of the level 1 theory). Sure. But, and here I am speaking personally, there would be a big payoff if the two lined up pretty closely. Syntactic representations might not be use-representations but it would be surprising to me if the two diverged radically. After all if they did, then how come we pair the meanings we do with the sounds we do? If our stable pairings are due to our G competence then we must be parsing a G function in real time when we judge the way we do. Ditto with the DTC (which I personally believe we have abandoned too quickly, but that’s a story for another time). At any rate, because we don't have (epistemologically) "autonomous" level 1 theories as in vision and cash registers our level 1 and 2 theories are harder to distinguish. Thus, in linguistics, the 1 vs 2 distinction is useful but should not be treated as a dualism. In fact, I take the Pietroski et al work on most to demonstrate the utility of taking the G representation problem to be finding <s,m>s that fit with how we use meanings when actually calculate quantities. How the system engages with other systems during performance can tell us something about the representational format of the system beyond what <s,m> pairings might.
Last point: I can imagine having syntax embodied someplace explicitly or implicitly without being usable. I can even imagine that what we know is in no way implicated in what we do. But I would find this very odd for the linguistic case and even odder given our empirical practice. After all, what we do in practice is infer what we know by looking at what we do in a circumscribed set of doings. This does not imply that we should reduce linguistic knowledge to behavior, but it does seem to imply that our behavior exploits the knowledge we impute and that it is a useful guide to the structure of that knowledge. Once one makes that move, why are some bits of behavior more privileged than others in principle? I can't see why. And if not, then though the competence/performance distinction is useful I would hesitate to confer on it metaphysical substance.
I would actually go a little further: as a regulative ideal we should assume strong transparency between level 1 and level 2 theories in linguistics, though this is not as obvious an assumption to make in the domain of cash registers and vision. I think that it is a very good default assumption that the categories that we think are relevant in our G theories are also the objects our parser parses and our producer produces. There is more to both activities than what Gs describe, but there is at least as much as what Gs describe and in roughly the way that Gs describe it. That’s why judgments are good probes into G structure. So, in our domain, given that we are not in the enviable Marr position of having off the shelf level 1 theories, it is likely that the level 1 theories we develop will be very level 2 pregnant, or so we should assume.
Let me put this another way: say we have two theories that are equally adequate given standard data and say that one (A) fits transparently with our performance theories and the other (B) does not. I would take this as evidence that A is the right level 1 theory. Wouldn’t you? And if you would, then doesn’t this imply that we are taking transparency as a mark of level 1 adequacy? We conclude that the level 1 formats should be responsive to level 2 meshing concerns.
This is not like what we would do in the cash register example (I don’t think). Were we to find that the cash register computes in base 2 rather than base 10 and uses ZF sets as the numerical representation of numbers we would not conclude that it is not “doing” arithmetic. Base 10 or base 2, ZF sets or Arabic numerals it’s doing arithmetic. There is nothing really analogous in the G domain. There might be parsing representations different from G representations, but this is not the default assumption. This makes the Marrian level considerations less clear cut in the language case than the vision case.
To end: thinking Marrishly is a good exercise for the cognitively inclined GGer (hopefully all of you). But, the ling case is not like the others Marr discusses and so we should use his useful distinctions judiciously.
 It’s worth recalling Fodor’s thinking that only input systems were modular. Chomsky disagreed. However, what might be right is that only input systems perfectly fit Marr’s 3-level template. This is not surprising given Marr’s interests. As I said in the earlier post, Marr had relatively little to say about higher level object recognition. It is conceivable that there the reason that little progress has been made on this high level topic is the absence of a competence theory in the GG sense.