Faculty of Language: Linguistics from a Marrian perspective; an afterword

Saturday, April 2, 2016

Linguistics from a Marrian perspective; an afterword

Big surprise, David Adger and Peter Svenonius got me thinking. Their comments to the two previous posts provoked. Here is a longish attempt to deal with their points. Thx. I urge you to look at their points in detail if you are interested in the Marrian take on these issues.

In standard Marr example, there are non-“mental” magnitudes that “mental” operations are aiming to estimate. So in vision it is shapes, trajectories, parallax, luminescence etc. We have theories of these magnitudes independent of how we estimate them cognitively. They are real physical magnitudes.

So too with addition and prices and cash registers. We understand what addition is and how it works quite independently of how we use it to provide a purchase price.

None of this holds in the language case (or other internal systems in Fodor’s sense).[1] There are no physical magnitudes of relevance or math structures of use. Rather we are trying to zero in on a level 1 theory by seeing how it is used in stylized settings (judgment data being the prime mover here). The judgment task is an interesting probe into the elvel 1 theory because we have reason to believe that it provides a clean picture of the underlying mechanism. Why clean? Because it abstracts away from the exigencies of time pressure, storage pressure, attention pressure etc. that are part and parcel of real time performance. It’s the system functioning at its best because it is functioning well within the limits of its computational (time/space) capacities. That’s the advantage of data drawn in reflective equilibrium. However, this does not mean that it is resource unconstrained (after all any judgment involves parsing, hence memory and attention) but it means that the judgment does not run up against the limits of memory/attention and so likely displays the properties of the computational system more cleanly than if the computational system is cramped by non-computational constraints such as high memory or attention demands.

With this in mind, let’s now get back to a Marr conception of GG.

The creative aspect of language use (that humans can produce and understand an unbounded number of novel sentences) is the BIG FACT that, as Chomsky noted in Current Issues, is one of the phenomena in need of explanation. He notes there that a G is a necessary construct for explaining this obvious behavioral fact (note, that creativity holds is a fact about speakers and what they can do based on what we actually see them do). Without a function that relates sounds and meaning over an unbounded domain (delivers and unbounded number of <s,m> pairs) there is no possible hope of accounting for this behaviorally evident creative capacity. In other words, Gs are necessary for any account of linguistic creativity.

Here’s a Marr question: what level theory in Marr’s sense is a G in this context? Here’s a proposal: Gs are Marr level 1 theories. If this is correct, we can ask what level 2 theories might look like. Level 2 theories would show how to compute Gish level 1 properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory. The DTC specifies how level 1 constructs are related to measures of actual on-line sentence processing. If the DTC is correct, it suggests certain kinds of algorithms, one’s that track the derivational complexity of derivations. Of course, none of this is to endorse the DTC (though, to repeat, I do like it quite a bit), but to illustrate Marr-like logic as relates to linguistic theories.

The main problem with Marr's division is not that we can't use it (I just did), but that the explanatory leverage Marr got out of a level 1 theory in vision and cash registers seems absent in syntax. Why? Because Marr’s examples are able to use already available theories for level 1 purposes. In other words, there are already good level 1 theories on offer in vision and cash registers for the finding and these can serve to circumscribe the the computational problems that must be solved (i.e. explain how the physical optical paramters are mentally computed given input at the retina or how arithmetical functions are embodied in the cash register). Let’s elaborate a bit.

In the vision case, the level 1 account is built on a theory of physical optics which relates objective (non-mental) physical magnitudes (luminescence, shape, parallax, motion, etc.) to info available on the retina. The computational description of the problem becomes how to calculate these real physical magnitudes from retinal inputs. This is a standard inverse problem as there are many ways for these physical variables to relate to patterns of activity on the retina. So the problem is to find the right set of mental constraints in the mental computation that given the retinal input delivers values for the objective variables the level 1 theory specifies. Concepts like “rigidity” serve this purpose to get you shape from motion. Rigidity makes sense as a level 2 computational constraint given the level 1 properties of visual system. So if we assume, for example, objects are rigid then computing their shape using retinal inputs is possible if we can compute their motion from retinal inputs.

In the cash register case in place of optics we have arithmetic. It turns out that calculating grocery bills is a simple math problem with a recognizable arithmetical structure which prices in items embody and that cash registers can calculcate. Given this (i.e. given that we know what the calculation is) we can ask how a cash register does the calculation in real time. How does the cash register do addition? How does it “represent” numbers numerically? Etc.

None of this is level 1 leverage is available in the language case. Thus, Gs are not constrained by physical magnitudes in the way vision is (the “physics” of language tells us next to nothing about linguistically relevant variables) and there is no interesting math that lies behind syntax (or if there is we haven't found it yet). Linguists need to construct the level 1 theory from scratch and that's what GGers do. The problem does tell us that speakers have internalized recursive procedures (RP) but not the kinds of RPs (and there are endlessly many). It’s the job of GGers to discover the kinds of RPs that native speakers have when they are linguistically able. We argue that our internalized use rules with a certain limited format and generate representations of a certain limited shape. The data we typically use is performance data (judgments) hopefully sanitized to remove many performance impediments (like memory constraints and attention issues). We assume that this data reflects an underlying mental system (or at least I do) that is casually responsible for the judgment data we collect. So we use some cleanish performance data (i.e. not distorted by sever performance demands) to infer something about the structure of a level 1 theory.

Now if this is the practice, then it looks like it runs together level 1 and level 2 considerations. You cannot judge what you cannot parse. But that's life. We also recognize that delving more deeply into the details of performance might indicate that the level 1 theories we have might need refining (the representations we assume might not be the ones that real time parsing uses, the algorithms might not reflect the derivational complexity of the level 1 theory). Sure. But, and here I am speaking personally, there would be a big payoff if the two lined up pretty closely. Syntactic representations might not be use-representations but it would be surprising to me if the two diverged radically. After all if they did, then how come we pair the meanings we do with the sounds we do? If our stable pairings are due to our G competence then we must be parsing a G function in real time when we judge the way we do. Ditto with the DTC (which I personally believe we have abandoned too quickly, but that’s a story for another time). At any rate, because we don't have (epistemologically) "autonomous" level 1 theories as in vision and cash registers our level 1 and 2 theories are harder to distinguish. Thus, in linguistics, the 1 vs 2 distinction is useful but should not be treated as a dualism. In fact, I take the Pietroski et al work on most to demonstrate the utility of taking the G representation problem to be finding <s,m>s that fit with how we use meanings when actually calculate quantities. How the system engages with other systems during performance can tell us something about the representational format of the system beyond what <s,m> pairings might.

Last point: I can imagine having syntax embodied someplace explicitly or implicitly without being usable. I can even imagine that what we know is in no way implicated in what we do. But I would find this very odd for the linguistic case and even odder given our empirical practice. After all, what we do in practice is infer what we know by looking at what we do in a circumscribed set of doings. This does not imply that we should reduce linguistic knowledge to behavior, but it does seem to imply that our behavior exploits the knowledge we impute and that it is a useful guide to the structure of that knowledge. Once one makes that move, why are some bits of behavior more privileged than others in principle? I can't see why. And if not, then though the competence/performance distinction is useful I would hesitate to confer on it metaphysical substance.

I would actually go a little further: as a regulative ideal we should assume strong transparency between level 1 and level 2 theories in linguistics, though this is not as obvious an assumption to make in the domain of cash registers and vision. I think that it is a very good default assumption that the categories that we think are relevant in our G theories are also the objects our parser parses and our producer produces. There is more to both activities than what Gs describe, but there is at least as much as what Gs describe and in roughly the way that Gs describe it. That’s why judgments are good probes into G structure. So, in our domain, given that we are not in the enviable Marr position of having off the shelf level 1 theories, it is likely that the level 1 theories we develop will be very level 2 pregnant, or so we should assume.

Let me put this another way: say we have two theories that are equally adequate given standard data and say that one (A) fits transparently with our performance theories and the other (B) does not. I would take this as evidence that A is the right level 1 theory. Wouldn’t you? And if you would, then doesn’t this imply that we are taking transparency as a mark of level 1 adequacy? We conclude that the level 1 formats should be responsive to level 2 meshing concerns.

This is not like what we would do in the cash register example (I don’t think). Were we to find that the cash register computes in base 2 rather than base 10 and uses ZF sets as the numerical representation of numbers we would not conclude that it is not “doing” arithmetic. Base 10 or base 2, ZF sets or Arabic numerals it’s doing arithmetic. There is nothing really analogous in the G domain. There might be parsing representations different from G representations, but this is not the default assumption. This makes the Marrian level considerations less clear cut in the language case than the vision case.

To end: thinking Marrishly is a good exercise for the cognitively inclined GGer (hopefully all of you). But, the ling case is not like the others Marr discusses and so we should use his useful distinctions judiciously.

[1] It’s worth recalling Fodor’s thinking that only input systems were modular. Chomsky disagreed. However, what might be right is that only input systems perfectly fit Marr’s 3-level template. This is not surprising given Marr’s interests. As I said in the earlier post, Marr had relatively little to say about higher level object recognition. It is conceivable that there the reason that little progress has been made on this high level topic is the absence of a competence theory in the GG sense.

29 comments:

davidadgerApril 2, 2016 at 5:03 PM
So I think where I'm not in complete agreement is with the following "Gs are Marr level 1 theories. If this is correct, we can ask what level 2 theories might look like. Level 2 theories would show how to compute Gish level 1 properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory."

I think that indeed Gs are Marr level 1 (computational) theories of I-languages, but I don't see that there are necessarily any level 2 theories of Gs. There are level 2 theories of the processes that put Gs to use in generating new expressions for thought or for speaking/signing, and for use in parsing expressions that are experienced, and maybe other things too. Maybe I just misunderstand Marr, or I'm too embedded in competence/performance style thinking, but, if we take, say, parsing of a signal (which could be a signal got at by the senses, or it could be a signal from whatever thinking is), I think we'd want to have a Marr level 1 theory of that (what is the computational problem being solved) as well as a level 2 theory (what algorithms/processes are involved). So if you can have a level 1 and level 2 theory of parsing, it seems to me that it's mixing up categories to say that the level 1 theory of parsing can serve as the level 2 theory of the syntax.

From this perspective, some version of the DTC, for example, would be a level 1 theory of parsing, not a level 2 theory of the syntax. It would specify the computational problem as one that links derivations legitimised by a G to parses (predicting parser errors via properties of the G, for example). How you implement this, whether it's by transductions into symbols, or into weighted neural nets, or whatever, would be the level 2 theory. You could imagine a different level 1 theory for this task (say, how to connect surface strings to completed representations) and a different level 2 theory (chart parsing, or whatever).

I'm not saying that this is the only way one could set things up (so I think it's completely doable to say that the syntax is just an abstraction over the processes implemented by the parser, which I guess is close to the view you're sketching here?), but I don't see any inconsistency in saying that the computational theory of syntax is, ultimately, a theory of brain states that are distinct from those states and processes that are recruited when bits of language are processed or generated. It's a theory stated at a certain level of abstraction from the actual physical states and mechanisms, but it's a theory of what is essentially a steady-state of the brain that is the repository of the particular G that the individual has and that links sounds and meanings across an unbounded domain. In such a view it doesn't really make sense to ask about Marr level 2 implementations.

Ok, have to go and catch my flight back to London!
ReplyDelete
Replies
Mark JohnsonApril 3, 2016 at 1:04 AM
While Marr's levels are of course a tremendous contribution to the field (I'd be extremely pleased if anything I do has as much impact), I don't think we need to slavishly follow them (we're doing science, not religion!).

I've always found it strange that the top level was called the "computational level", since I think it really should specify the information and the constraints that are relevant to the domain. I prefer to call it the "informational level" for that reason.

Norbert suggests that unlike the cash register domain (where the computational level uses ZF integers, while the algorithmic level uses 32 bit binary), perhaps in linguistics we can use the same representations at all levels.

I think algorithms can be stated at lesser or greater degrees of abstraction; one could certainly specify the behaviour of a cash register without specifying how integers are encoded.

But I think at some stage we'll have to explain how linguistic representations are mapped onto neural circuitry. There are lots of wondrous things in our brains, but I doubt if you'll ever cut open a skull and see a tree. We might discover that the brain encodes trees e.g., using pointers the way we do in conventional computers. Personally I think that's highly unlikely; trees are great for drawing on blackboards, and it could be that our linguistic trees are a static depiction of a temporal sequence of brain states.

In any event, I suspect that discovering how linguistic structures are represented in the brain would be a substantial advance.
ReplyDelete
Replies
AveryAndrewsApril 3, 2016 at 2:40 AM
I think another aspect of the difference between Marr's instances and linguistics is that Marr had nothing comparable to trying to say something relevant to learning, change and typology. Specifying the sound-meaning relation for a single language looks like a 'computational' problem to me (sharing Mark Johnson's lack of enthusiasm for that term), with the difference that the machinery is more complicated and mathematically less well understood than for Marr's instances. But any 'computational' account runs into 'Quine's Challenge' from 1972 (also Suppes, a bit later) as to how one of two extensionally equivalent grammars (that generate the same sound-meaning relation) can be correct and the other wrong.

Chomsky's answer from the mid-60s is that the correct/better grammar was the one that was part of a general account that said something/more about the possibilities for learning, and by extension, change (the result of imperfect learning), and typology (the results of changes scattered around the landscape).

In the 1980s, some people such as Gareth Evans, Chris Peacocke and Martin Davies addressed this in terms of learning or decay of a given language (they don't appear to have noticed the relevance of diachronic change and typology), the latter two talking about 'level 1.5' with reference to Marr. The level 1.5 idea doesn't seem to have caught on, but

a) it is definitely something that is in linguistics but not in his original picture

b) an grammar that's part of a story about these things does seem to me to be more likely to be relevant to how parsing works at the algorithmic level and what the neural implementation would be like, although I can't really justify why I think this.
ReplyDelete
Replies
Peter SvenoniusApril 5, 2016 at 2:15 AM
You say a grammar is a level 1 theory, the level at which the computational problem is stated. Do you mean that, for example, a syntax with the LCA and a syntax without the LCA but with Mirror, or with head-complement directionality, or with cyclic linearization, are different level 1 theories of what the computational problem is?

It is not obvious that they necessarily start from different assumptions about the computational problem of creativity. So do you mean that those syntaxes are all the same grammar? If so, I still don't understand David's comment that there is no algorithmic level.

Similarly for a lot of other syntactic proposals -- like whether heads are merged or, as in David's book, structures self-merge and get labels from a labeling algorithm -- if they are not different algorithmic theories of the same computational problem, then you must be saying that they are different theories of the computational problem.

Isn't it just as useful to think of them as algorithmic solutions to a common computational problem? Then, in that sense, isn't a grammar a level 2 theory?
ReplyDelete
Replies
Mark JohnsonApril 6, 2016 at 6:10 AM
I used to think that Marr's levels are related to the level of abstraction of a description, but this discussion helped me realise that they are largely orthogonal. For example, I can have an algorithmic level description of a cash register that explains how memory registers are updated in response to button-pushes, etc., without explaining whether base 2 or base 10 arithmetic is used. Our current understanding of how language is implemented in the brain is very abstract (underspecified or vague may be a better term).
ReplyDelete
Replies

Faculty of Language

Comments

Saturday, April 2, 2016

Linguistics from a Marrian perspective; an afterword

29 comments:

Contributors