Saturday, April 2, 2016

Linguistics from a Marrian perspective; an afterword

Big surprise, David Adger and Peter Svenonius got me thinking. Their comments to the two previous posts provoked. Here is a longish attempt to deal with their points. Thx. I urge you to look at their points in detail if you are interested in the Marrian take on these issues.

In standard Marr example, there are non-“mental” magnitudes that “mental” operations are aiming to estimate. So in vision it is shapes, trajectories, parallax, luminescence etc. We have theories of these magnitudes independent of how we estimate them cognitively. They are real physical magnitudes.

So too with addition and prices and cash registers. We understand what addition is and how it works quite independently of how we use it to provide a purchase price.

None of this holds in the language case (or other internal systems in Fodor’s sense).[1] There are no physical magnitudes of relevance or math structures of use. Rather we are trying to zero in on a level 1 theory by seeing how it is used in stylized settings (judgment data being the prime mover here). The judgment task is an interesting probe into the elvel 1 theory because we have reason to believe that it provides a clean picture of the underlying mechanism. Why clean? Because it abstracts away from the exigencies of time pressure, storage pressure, attention pressure etc. that are part and parcel of real time performance. It’s the system functioning at its best because it is functioning well within the limits of its computational (time/space) capacities. That’s the advantage of data drawn in reflective equilibrium. However, this does not mean that it is resource unconstrained (after all any judgment involves parsing, hence memory and attention) but it means that the judgment does not run up against the limits of memory/attention and so likely displays the properties of the computational system more cleanly than if the computational system is cramped by non-computational constraints such as high memory or attention demands.

With this in mind, let’s now get back to a Marr conception of GG.

The creative aspect of language use (that humans can produce and understand an unbounded number of novel sentences) is the BIG FACT that, as Chomsky noted in Current Issues, is one of the phenomena in need of explanation. He notes there that a G is a necessary construct for explaining this obvious behavioral fact (note, that creativity holds is a fact about speakers and what they can do based on what we actually see them do). Without a function that relates sounds and meaning over an unbounded domain (delivers and unbounded number of <s,m> pairs) there is no possible hope of accounting for this behaviorally evident creative capacity. In other words, Gs are necessary for any account of linguistic creativity. 

Here’s a Marr question: what level theory in Marr’s sense is a G in this context? Here’s a proposal: Gs are Marr level 1 theories. If this is correct, we can ask what level 2 theories might look like. Level 2 theories would show how to compute Gish level 1 properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory. The DTC specifies how level 1 constructs are related to  measures of actual on-line sentence processing. If the DTC is correct, it suggests certain kinds of algorithms, one’s that track the derivational complexity of derivations. Of course, none of this is to endorse the DTC (though, to repeat, I do like it quite a bit), but to illustrate Marr-like logic as relates to linguistic theories.

The main problem with Marr's division is not that we can't use it (I just did), but that the explanatory leverage Marr got out of a level 1 theory in vision and cash registers seems absent in syntax. Why? Because Marr’s examples are able to use already available theories for level 1 purposes. In other words, there are already good level 1 theories on offer in vision and cash registers for the finding and these can serve to circumscribe the  the computational problems that must be solved (i.e. explain how the physical optical paramters are mentally computed given input at the retina or how arithmetical functions are embodied in the cash register). Let’s elaborate a bit.

In the vision case, the level 1 account is built on a theory of physical optics which relates objective (non-mental) physical magnitudes (luminescence, shape, parallax, motion, etc.) to info available on the retina. The computational description of the problem becomes how to calculate these real physical magnitudes from retinal inputs. This is a standard inverse problem as there are many ways for these physical variables to relate to patterns of activity on the retina. So the problem is to find the right set of mental constraints in the mental computation that given the retinal input delivers values for the objective variables the level 1 theory specifies. Concepts like “rigidity” serve this purpose to get you shape from motion. Rigidity makes sense as a level 2 computational constraint given the level 1 properties of visual system. So if we assume, for example, objects are rigid then computing their shape using retinal inputs is possible if we can compute their motion from retinal inputs.

In the cash register case in place of optics we have arithmetic. It turns out that calculating grocery bills is a simple math problem with a recognizable arithmetical structure which prices in items embody and that cash registers can calculcate. Given this (i.e. given that we know what the calculation is) we can ask how a cash register does the calculation in real time. How does the cash register do addition? How does it “represent” numbers numerically? Etc.  

None of this is level 1 leverage is available in the language case. Thus, Gs are not constrained by physical magnitudes in the way vision is (the “physics” of language tells us next to nothing about linguistically relevant variables) and there is no interesting math that lies behind syntax (or if there is we haven't found it yet). Linguists need to construct the level 1 theory from scratch and that's what GGers do. The problem does tell us that speakers have internalized recursive procedures (RP) but not the kinds of RPs (and there are endlessly many). It’s the job of GGers to discover the kinds of RPs that native speakers have when they are linguistically able. We argue that our internalized use rules with a certain limited format and generate representations of a certain limited shape. The data we typically use is performance data (judgments) hopefully sanitized to remove many performance impediments (like memory constraints and attention issues). We assume that this data reflects an underlying mental system (or at least I do) that is casually responsible for the judgment data we collect. So we use some cleanish  performance data (i.e. not distorted by sever performance demands) to infer something about the structure of a level 1 theory.

Now if this is the practice, then it looks like it runs together level 1 and level 2 considerations. You cannot judge what you cannot parse. But that's life. We also recognize that delving more deeply into the details of performance might indicate that the level 1 theories we have might need refining (the representations we assume might not be the ones that real time parsing uses, the algorithms might not reflect the derivational complexity of the level 1 theory). Sure. But, and here I am speaking personally, there would be a big payoff if the two lined up pretty closely. Syntactic representations might not be use-representations but it would be surprising to me if the two diverged radically. After all if they did, then how come we pair the meanings we do with the sounds we do? If our stable pairings are due to our G competence then we must be parsing a G function in real time when we judge the way we do. Ditto with the DTC (which I personally believe we have abandoned too quickly, but that’s a story for another time). At any rate, because we don't have (epistemologically) "autonomous" level 1 theories as in vision and cash registers our level 1 and 2 theories are harder to distinguish. Thus, in linguistics, the 1 vs 2 distinction is useful but should not be treated as a dualism. In fact, I take the Pietroski et al work on most to demonstrate the utility of taking the G representation problem to be finding <s,m>s that fit with how we use meanings when actually calculate quantities. How the system engages with other systems during performance can tell us something about the representational format of the system beyond what <s,m> pairings might.

Last point: I can imagine having syntax embodied someplace explicitly or implicitly without being usable. I can even imagine that what we know is in no way implicated in what we do. But I would find this very odd for the linguistic case and even odder given our empirical practice. After all, what we do in practice is infer what we know by looking at what we do in a circumscribed set of doings. This does not imply that we should reduce linguistic knowledge to behavior, but it does seem to imply that our behavior exploits the knowledge we impute and that it is a useful guide to the structure of that knowledge. Once one makes that move, why are some bits of behavior more privileged than others in principle? I can't see why. And if not, then though the competence/performance distinction is useful I would hesitate to confer on it metaphysical substance.

I would actually go a little further: as a regulative ideal we should assume strong transparency between level 1 and level 2 theories in linguistics, though this is not as obvious an assumption to make in the domain of cash registers and vision. I think that it is a very good default assumption that the categories that we think are relevant in our G theories are also the objects our parser parses and our producer produces. There is more to both activities than what Gs describe, but there is at least as much as what Gs describe and in roughly the way that Gs describe it. That’s why judgments are good probes into G structure. So, in our domain, given that we are not in the enviable Marr position of having off the shelf level 1 theories, it is likely that the level 1 theories we develop will be very level 2 pregnant, or so we should assume.

Let me put this another way: say we have two theories that are equally adequate given standard data and say that one (A) fits transparently with our performance theories and the other (B) does not. I would take this as evidence that A is the right level 1 theory. Wouldn’t you? And if you would, then doesn’t this imply that we are taking transparency as a mark of level 1 adequacy? We conclude that the level 1 formats should be responsive to level 2 meshing concerns.

This is not like what we would do in the cash register example (I don’t think). Were we to find that the cash register computes in base 2 rather than base 10 and uses ZF sets as the numerical representation of numbers we would not conclude that it is not “doing” arithmetic. Base 10 or base 2, ZF sets or Arabic numerals it’s doing arithmetic. There is nothing really analogous in the G domain. There might be parsing representations different from G representations, but this is not the default assumption. This makes the Marrian level considerations less clear cut in the language case than the vision case.

To end: thinking Marrishly is a good exercise for the cognitively inclined GGer (hopefully all of you). But, the ling case is not like the others Marr discusses and so we should use his useful distinctions judiciously.




[1] It’s worth recalling Fodor’s thinking that only input systems were modular. Chomsky disagreed. However, what might be right is that only input systems perfectly fit Marr’s 3-level template. This is not surprising given Marr’s interests. As I said in the earlier post, Marr had relatively little to say about higher level object recognition. It is conceivable that there the reason that little progress has been made on this high level topic is the absence of a competence theory in the GG sense.

29 comments:

  1. So I think where I'm not in complete agreement is with the following "Gs are Marr level 1 theories. If this is correct, we can ask what level 2 theories might look like. Level 2 theories would show how to compute Gish level 1 properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory."

    I think that indeed Gs are Marr level 1 (computational) theories of I-languages, but I don't see that there are necessarily any level 2 theories of Gs. There are level 2 theories of the processes that put Gs to use in generating new expressions for thought or for speaking/signing, and for use in parsing expressions that are experienced, and maybe other things too. Maybe I just misunderstand Marr, or I'm too embedded in competence/performance style thinking, but, if we take, say, parsing of a signal (which could be a signal got at by the senses, or it could be a signal from whatever thinking is), I think we'd want to have a Marr level 1 theory of that (what is the computational problem being solved) as well as a level 2 theory (what algorithms/processes are involved). So if you can have a level 1 and level 2 theory of parsing, it seems to me that it's mixing up categories to say that the level 1 theory of parsing can serve as the level 2 theory of the syntax.

    From this perspective, some version of the DTC, for example, would be a level 1 theory of parsing, not a level 2 theory of the syntax. It would specify the computational problem as one that links derivations legitimised by a G to parses (predicting parser errors via properties of the G, for example). How you implement this, whether it's by transductions into symbols, or into weighted neural nets, or whatever, would be the level 2 theory. You could imagine a different level 1 theory for this task (say, how to connect surface strings to completed representations) and a different level 2 theory (chart parsing, or whatever).

    I'm not saying that this is the only way one could set things up (so I think it's completely doable to say that the syntax is just an abstraction over the processes implemented by the parser, which I guess is close to the view you're sketching here?), but I don't see any inconsistency in saying that the computational theory of syntax is, ultimately, a theory of brain states that are distinct from those states and processes that are recruited when bits of language are processed or generated. It's a theory stated at a certain level of abstraction from the actual physical states and mechanisms, but it's a theory of what is essentially a steady-state of the brain that is the repository of the particular G that the individual has and that links sounds and meanings across an unbounded domain. In such a view it doesn't really make sense to ask about Marr level 2 implementations.

    Ok, have to go and catch my flight back to London!

    ReplyDelete
    Replies
    1. Hope the flight was fun and the weather not too terrible. Ok, I think I may have pinpointed the fulcrum of our impasse. For me level1 and level 2 and level 3 theories are theories of a mental organ/module/system. So, a level 1 analysis is an analysis at a certain grain of abstraction OF a certain module There are various linguistic systems one might be interested in accounting for. The two most prominent are the system that underlies our linguistic creativity and the system that underlies our capacity to acquire a natural language. These two systems can now be described in a leveled way a al Marr. The level 1 theory of the first system will involve some Gish like thing. The level 2 theory of this system will involve some theory of how this Gish thing is deployed (e.g. the DTC). The following makes little sense on this view: that level 2 theories are theories of level 1 theories. In this sense I agree that there is no level 2 theory OF the syntax. Rather both our G theories and our performance theories are level 1 and level 2 theories respectively of the systems that parse/produce and acquire Gs.

      This seems to me to fit with your points. So, level 2 theories are not theories of level 1 accounts. Rather both are accounts of some system.

      If this is right (or defensible) then it still allows room for saying why the level 1 vs 2 distinction is more productive in vision than it is is ling (for example). The reason is that in vision it makes sense to ask what are the physical variables that are being discerned by the visual system and what features do these objective variables have. It makes sense to think that vision "works" by tracking these variables using mental operations which in turn rely on certain representations and algorithms. So, given a theory of physical optics how do we estimate its properties from information hitting the retina. This "objective" piece is what's missing in the language case and so the neatest division of labor that Marr can exploit is missing in the language case. There is no physical theory of language structure that our internal operations are trying to compute approximately. There are not sentences "out there." The analogue of level 1 and level 2 in the ling domain will be something more like the old competence/performance distinction.

      At any rate, thx for the opportunity to think this through for myself more clearly. I suspect that like all matters of exegesis, there is not right way to think about these things.

      Delete
    2. sorry about the delayed reply. So it looks like we agree on the basic concept but just have different view of what Marr meant by level 1 and 2. We agree that there's no level 2 theory OF syntax, which was really my point. I still think we can have level 1 and level 2 theories of parsing, production, etc., and it's not clear to me that you agree with that, but I definitely am with you about why level 1/2 is more productive in vision than for syntax (cos there is no level 2 OF syntax). And definitely level 1 and 2 is not as applicable in syntax as competence performance because syntax is a theory of structures, not of the transduction of structures/information.

      Delete
  2. While Marr's levels are of course a tremendous contribution to the field (I'd be extremely pleased if anything I do has as much impact), I don't think we need to slavishly follow them (we're doing science, not religion!).

    I've always found it strange that the top level was called the "computational level", since I think it really should specify the information and the constraints that are relevant to the domain. I prefer to call it the "informational level" for that reason.

    Norbert suggests that unlike the cash register domain (where the computational level uses ZF integers, while the algorithmic level uses 32 bit binary), perhaps in linguistics we can use the same representations at all levels.

    I think algorithms can be stated at lesser or greater degrees of abstraction; one could certainly specify the behaviour of a cash register without specifying how integers are encoded.

    But I think at some stage we'll have to explain how linguistic representations are mapped onto neural circuitry. There are lots of wondrous things in our brains, but I doubt if you'll ever cut open a skull and see a tree. We might discover that the brain encodes trees e.g., using pointers the way we do in conventional computers. Personally I think that's highly unlikely; trees are great for drawing on blackboards, and it could be that our linguistic trees are a static depiction of a temporal sequence of brain states.

    In any event, I suspect that discovering how linguistic structures are represented in the brain would be a substantial advance.

    ReplyDelete
    Replies
    1. Another anti-religion screed huh? Just kidding. Yes, the question is not how to shoe horn linguistics into a Marr picture but what doing so might tell us. I think that it suggests two things. First that as applied to central systems, Marr's very influential approach needs some tweaking. Second, that seeing closer relations between level 1 and 2 concerns within linguistics would be useful. As regards level 3, we are a long way off still, though assuming the DTC (as is done effectively in some exciting recent work by Dehaene, Poeppel and others) seems like a good idea operationally.

      last pint: I think we are struck with "computational."

      Delete
    2. But, maybe 'extensional' would work as a replacement for 'computational'?

      Delete
  3. I think another aspect of the difference between Marr's instances and linguistics is that Marr had nothing comparable to trying to say something relevant to learning, change and typology. Specifying the sound-meaning relation for a single language looks like a 'computational' problem to me (sharing Mark Johnson's lack of enthusiasm for that term), with the difference that the machinery is more complicated and mathematically less well understood than for Marr's instances. But any 'computational' account runs into 'Quine's Challenge' from 1972 (also Suppes, a bit later) as to how one of two extensionally equivalent grammars (that generate the same sound-meaning relation) can be correct and the other wrong.

    Chomsky's answer from the mid-60s is that the correct/better grammar was the one that was part of a general account that said something/more about the possibilities for learning, and by extension, change (the result of imperfect learning), and typology (the results of changes scattered around the landscape).

    In the 1980s, some people such as Gareth Evans, Chris Peacocke and Martin Davies addressed this in terms of learning or decay of a given language (they don't appear to have noticed the relevance of diachronic change and typology), the latter two talking about 'level 1.5' with reference to Marr. The level 1.5 idea doesn't seem to have caught on, but

    a) it is definitely something that is in linguistics but not in his original picture

    b) an grammar that's part of a story about these things does seem to me to be more likely to be relevant to how parsing works at the algorithmic level and what the neural implementation would be like, although I can't really justify why I think this.

    ReplyDelete
    Replies
    1. I agree that the way to get around indeterminacy issues is to take the mental representation angel very seriously. Let me again point to the Pietroski et al work which does just that with truth theoretically equivalent proposals for representing 'most.' It seems that once we start thinking about how the meaning interacts with other aspects of cognition (approx numbers) and vision then we have arguments for one representation over others. It's the function in intension that we want to specify, not its extensional outputs. And going mentalistic gives us a handle on this.

      I also agree that looking at typology and change matters as this will likely focus on the representational properties of Gs, i.e. the points of difference and trajectories of change will exploit the natural cleavages of the representations. Good.

      Delete
    2. Also relevant is Bob Frank's paper with Shyam Kapur in LI 1996, where they argue that different word order parameters with extensionally equivalent coverage are differentially learnable.

      Delete
  4. You say a grammar is a level 1 theory, the level at which the computational problem is stated. Do you mean that, for example, a syntax with the LCA and a syntax without the LCA but with Mirror, or with head-complement directionality, or with cyclic linearization, are different level 1 theories of what the computational problem is?

    It is not obvious that they necessarily start from different assumptions about the computational problem of creativity. So do you mean that those syntaxes are all the same grammar? If so, I still don't understand David's comment that there is no algorithmic level.

    Similarly for a lot of other syntactic proposals -- like whether heads are merged or, as in David's book, structures self-merge and get labels from a labeling algorithm -- if they are not different algorithmic theories of the same computational problem, then you must be saying that they are different theories of the computational problem.

    Isn't it just as useful to think of them as algorithmic solutions to a common computational problem? Then, in that sense, isn't a grammar a level 2 theory?

    ReplyDelete
    Replies
    1. I make a distinction between computational problems and computational theories. One can have many different theories aimed at solving the same computational problem. Thus, for example, what physical parameters the visual system is tracking to do what it does is an open question, and different computational theories can be on offer. Or which is basic and which derived is an open question.

      By analogy, the same holds for syntactic theories. There can be different theories aimed at addressing the same problem. I would take this to be what you describe.

      That said, the distinction between an algorithm and a generative procedure is not as clean as it might be in the visual domain or the cash register domain. For that reason to imbue it with some possible content I have (sorry; implicitly) been taking algorithms to yield real time procedures, thus linking them with performance issues. Maybe this is not a good idea and maybe it does not cleave to the basic Marr idea, though I think that it does. Certainly Davie (and maybe Chomsky) think that at the distinction is without substance in the language case. I am proposing that it can be made useful and interesting for it advocates for a closer relation between level 1 and level 2 investigations within linguistics than is commonly assumed in practice. This, of course, goes back to some proposals I made concerning how we should interpret the strong minimalist thesis.

      So, the syntactic theories you mention are all level 1 from where I sit as they are not intended as accounts of how syntactic structure is computed in real time. That said, I happily concede that there are other ways of interpreting linguistics from a Marr point of view.

      Delete
    2. Now I understand --- and now I think that your (and David's) interpretation is the same as Marr's, in fact Marr writes (p. 29) "Chomsky's (1965) theory of transformational grammar is a true computational theory in the sense defined earlier. It is concerned solely with specifying what the syntactic decomposition of an English sentence should be, and not at all with how that decomposition should be achieved. Chomsky himself was very clear about this --- it is roughly his distinction between competence and performance ..."

      Delete
    3. Yes, I'd agree with Norbert here, Peter. Various syntactic theories are all at Marr's level 1 cos they are solutions to a what question (what is the pairing of forms and meanings, roughly). How these structures are used by other systems may give rise to how questions, in which case there will be algorithms for doing the computational task.

      Delete
    4. That--competence = level 1 = syntactic theory--is also what Marr saw it, as Peter's quote from VISION makes clear. But I recall Chomsky denying this connection, on the ground that Marr's problem deals with input-output systems, which is not the way he sees a theory of syntax.

      An additional point. The independence of levels, taken literally, means that they do not have to interact with or mutually constrain each other. But in Marr's own work, they do. It seems crazy to insist that they cannot interact with each other, and I think that theories that allow for such interactions are preferable to those that do/can not.

      In our business, a prominent example is the grammar-parser relationship as you all noted. The type of "transparency", i.e., grammar-parser isomorphism, is a good thing (see Berwick & Weinberg, and more recently Colin Phillips). Marr realized this, through the work of Mitch Marcus. Later on the page that Peter quotes from:

      "This point was appreciated by Marcus (1980), who was concerned precisely with how Chomsky's theory can be realized and with the kinds of constraints on the power of the human grammatical processor that might give rise to the structural constraints in syntax that Chomsky found. It even appears that the emerging 'trace' theory of grammar (Chomsky & Lasnik 1977) may provide a way of synthesizing the two approaches--showing that, for example, some of the rather ad hoc restrictions that form part of the computational theory may be consequences of weakness in the computational power that is available for implementing syntactical processing".

      Later in VISION, Marr's imagined sparring partner would say (p357):

      "Why has artificial intelligence shown such resistance to traditional Chomskian approaches to syntactical analysis? Only Marcus seems to have embraced it."

      It still seems to be a minority opinion now, inside and outside of linguistics, never mind artificial intelligence or cognitive science.

      Delete
    5. Charles, so I think that what you say Chomsky thinks is what I was trying, cackhandedly, to get at. Syntax is really a theory of knowledge, not of input-output (which is what I called transduction, above, though input-output is better as it's more general). As such, it's a theory at the computational level, but the other levels don't really make sense for syntax. I think Norbert's point, that the problem is creativity of language use, is interesting, but I also think it's too mushy a problem to think of in the Marrian way. The problem of getting the right pairing of forms and meanings is tractable, but only lends itself to a computational level. The problems of parsing and production are also tractable, but these do seem to me to be input-output. You have a stream of percepts and you need to associate these with a structure legitimized by the grammar, which probably involves a kind of Marrian analysis of input. Maybe the same thing holds on the other side for thinking. You have a stream of internal stuff, which you process so as to associate it with a structure.

      Delete
    6. If the parser isn't the level two theory of the grammar; i.e. the parser computes some relation between forms and meanings which is different than what is specified by the grammar, then I think there is a serious problem regarding what the grammar is a theory of.

      In other words: the grammar is a theory of the form meaning relation which underlies in some abstract way our linguistic behaviour. Our linguistic behaviour is manifested in our association of forms with meanings (the parser). Therefore, the grammar is a theory of this form meaning relation.

      It is, I think, incoherent to say that there is only, for example, a computational level theory of something. Marr does make a distinction between type I and type II theories; a type II theory is, essentially, a gerrymandered phenomenon where no type I theory is possible. He suggests in fact that language may be like this, but I think that intervening years have shown him in error.

      Delete
    7. I agree with Greg here, and I believe have said as much in the posts. Imagine we discover that the performance systems use representations and algorithms that are entirely different from those that our best Gs suggest. Imagine too that the reps and algorithms have a "competence" interpretation so that they can get what the best competence theories urge. Then wouldn't we conclude that the right competence theory has the representations and "rules" that our performance system is pinpointing? I think we would. If so, then our practice implies a commitment to the view that our competence theories qua theories of knowledge are nonetheless also the bases for our performance theories. Indeed, as Greg notes, rightly in my view, were this not so then we would have to divorce our competence theories entirely from the data we use to build them; speaker's intuitions. Of course, it is possible that speakers' intuitions are based on an entirely different system than is our actual everyday practice, but that conclusion is in the vicinity of absurd (I believe).

      Lasst point: I think we should welcome this conclusion. We have already seen cases where certain performances have been very insightful concerning the properties of Gs. Again, think of the Pietroski et al work on 'most.' More recently see Yang's wonderful work on the elsewhere principle. So, rather than insulate Gs from performance considerations, we should embrace this, all the while understanding that GOOD arguments concerning their relationship is, like all good arguments, hard to make and thin on the ground.

      Delete
    8. I disagree with Greg and, it seems, Norbert! The way I think about it, the parser doesn't compute a relation between forms and meanings, it computes what derivations/structure(s) can be associated with an input, and perhaps also which is likeliest in the context. The grammar says what the structures can be, so, a la classical generative theory, there are structures/derivations licensed by the grammar that are unparsable. I quite like the idea that something similar is going on on the other side, so that there is a `thought-parser' that computes which aspects of thought can be associated with grammatical derivations/structures. So the link between form and meaning is mediated, though underdetermined, by the nature of the grammar.

      It seems to me that Norbert is being uncommonly empiricist about things: we don't have to divorce our competence theories entirely from the data we use to build them. Our judgments of acceptability are surely a great source of data about the grammar, but the grammar doesn't determine these judgments. It is one factor that enters into that aspect of our behaviour, probably an important one because there is, no doubt, some transparency between parser and grammar. But transparent doesn't mean identical.

      Delete
    9. This comment has been removed by the author.

      Delete
    10. Charles wrote: That--competence = level 1 = syntactic theory--is also what Marr saw it, as Peter's quote from VISION makes clear. But I recall Chomsky denying this connection, on the ground that Marr's problem deals with input-output systems, which is not the way he sees a theory of syntax.

      Yes, in the interview here:
      http://www.sciencedirect.com/science/article/pii/S0093934X99921193
      Chomsky says
      "As for Marr's famous three levels of analysis, he was concerned with input-output systems (e.g. the mapping of retinal images to internal representations). Language is not an input-output system. Accordingly, Marr's levels do not apply to the study of language, though one could adapt them to the very different problem of characterizing cognitive systems accessed in processing and production."

      Although I think I agree with the gist of what Norbert and Greg are saying (transparency is good etc.), rather than saying that the relationship between the parser and the grammar is one that can be thought of as the relationship between different levels, I would say that the parser is something that can be analyzed at each of the three levels whereas the grammar is not (for basically the reasons Chomsky points out). That there can be such thing as a level-one description of the parsing process comes out most clearly, to me, in things like John Hale's work on entropy reduction and surprisal, where one focuses on the task of processing some input incrementally while abstracting away from any of the representations or mechanisms that carry out this task. And most of the questions we think about when it comes to parsing are of course level-two questions, as everyone senses.

      I suppose if one really needed to find a way to fit grammars into the levels, I would suggest that grammars are at a higher level again, which we could call level zero, which differs from level one in that we abstract away from the "directionality" of the task and things like that. But I say this not as a serious suggestion, just to point out the ways in which I don't think grammars belong at level one.

      Delete
    11. Marr was concerned with the analysis of information processing systems. Parsing is such a system (from form to meaning), and production is such a system (from meaning to form). Do these relate the same meanings and forms together? Presumably yes. Let's describe this relation (and call it `the grammar'). Voila, a level 1 theory. Tim, your `level 1 parsing process' is simply a description of a probability model over sound meaning pairs; i.e. a probabilistic grammar.

      AFAICT, Chomsky's denial that language is an information processing system is based on his hypothesis that the information processing use to which the language system is put is derivative. I don't think that his rejection is necessary, as, if narrow syntax = recursion (although it should perhaps better be called `narrow thought'), what we call the linguistic system could still be the way the parser exploits the structures of narrow thought to map forms to meanings.

      Delete
    12. As an aside, of course Marr's particular levels (1,2,3) are not god-given, but simply concrete examples of how to think about the kinds of ways we can talk about a single system. There're infinite possibilities (or at least, more than 3), but, Marr says, we should at least understand how things are implemented in meat, the basic time course and representational issues involved, and what on earth is taking place in the abstract.

      Delete
    13. This is probably just getting nit-picky, but I wouldn't put it this way only because a level one theory (as I understand it) describes the particular input-output mapping for one specific processing system/task, such as parsing or production. But at any rate, factoring out a common relation between form and meaning from two different tasks like that is (of course!) a perfectly sensible thing to do, and is presumably an important part of understanding "what on earth is taking place in the abstract" in the case of language.

      Delete
    14. Sorry, Tim, I think I mischaracterized your position (re: entropy reduction). Your point was that these information theoretic measures made sense irrespective of what they were talking about. This is sort of a strange situation to be in, as it means that we can view the parser as instantiating two high level theories at once.

      Re: your `nit-picking'
      I am only trying to push this because I think that the alternative is a troublesome tendency to reify grammar, at which point the entire point of linguistics becomes obscured.

      Delete
    15. @David: I like being attacked from the left wrt parsers and Gs. I don't argue that they are identical. I argue that transparency is a decent assumption that should be given up only in the face of empirical pressures and that a theory that neatly interleaves Gs with performance systems is for that reason superior than Gs that do not. This does not make Gs parsers but it commits hostages to the view that what Gs say is the right G structure is what parsers are trying to recover, be these PMs or derivations trees. They map the relations between S and Ms that parsers are trying to get. Again, I think one can think of Gs in different ways. David's point is coherent. It just strikes me as very unattractive. It has all the virtues of Katzian Platonism, which is a good reason IMO to not adopt it.

      Moreover, as I said, the view I am outlining does fit well with our practice. We infer level 1 Gs from behavioral data. We infer that part of the shape of the data is to be understood in terms of the knowledge embedded in the system. This makes sense if that knowledge is implicated in the actual behavior. If it isn't then I do not see how we can infer anything at all. The purely abstract problem of how S can be related to Ms (and vice versa) has almost no interesting content. What we want is how our system works and for this we infer from behavioral data. Is it complicated? Sure. But if you agree that a G enjoys brownie points if it smoothly fits with a performance system (transparency) then you agree level 2 issues and level 1 issues closely interact. That's what I think is the real virtue of the Marr picture in a linguistic setting. And it is a nice virtue, one that we should be happy to adopt.

      I will leave the last word to you. Thx for the discussion.

      Delete
    16. @Greg "I am only trying to push this because I think that the alternative is a troublesome tendency to reify grammar, at which point the entire point of linguistics becomes obscured.". This is probably the nub of our disagreement here. I don't see why reifying grammar obscures the entire point of linguistics. In fact, I think grammar (construed as an I-language) absolutely exists, physically, and our theoretical models are really just models of brain states at a certain level of abstraction. I think there's evidence that the parser isn't the same as the grammar, so whatever the right theory of human parsing is, it'll involve different brain states.
      @Norbert yeowch! I guess I deserved being called a Platonist after I said you were being empiricist! Though my comment about brain states above makes me skeptical that I am. So, I wonder if `transparent' is the right word. From my perspective, the parser has to be designed so that it can make use of the structures/derivations defined by the grammar. It may have other limitations (memory, for example), but it would be pretty useless if it couldn't map to those structures. So I think we are in agreement there. But the parser and the grammar are, from this perspective, distinct, which means that the parser isn't a level 2 theory of what the grammar is a level 1 theory of. Of course the knowledge is implicated in the behaviour, as I said above, so our judgments about acceptability are one good source of data about the nature of the grammar. But, as I've been saying from the start, I don't think syntax has a level 2 theory. Marr's levels are useful for processing systems, which the grammar, though computationally specified via the licensing of derivations, is not a processing system (though it is used by these).

      Delete
  5. I used to think that Marr's levels are related to the level of abstraction of a description, but this discussion helped me realise that they are largely orthogonal. For example, I can have an algorithmic level description of a cash register that explains how memory registers are updated in response to button-pushes, etc., without explaining whether base 2 or base 10 arithmetic is used. Our current understanding of how language is implemented in the brain is very abstract (underspecified or vague may be a better term).

    ReplyDelete
    Replies
    1. Not quite, I don't think. For Marr, representational issues are part of level 2 as well. SO, yes, one can study algorithms independently of the representations manipulated, but as Gallistel noted, the two are often intimately tied together. The main difference with vision vs linguistics is that the level 1 and 2 representations likely talk to each other more forcefully in the language case than the vision case, or so it seems to me.

      Delete
    2. One more point: our understanding how most things are represented in the brain is "underspecified," at least once one gets beyond early perceptual processing. Do we really have a good idea of higher level representations of visual objects, for example? For what I can tell, the answer is no. The more one gets to the central systems, the less we seem to know about brain implementations. Interestingly, we do have data now that brains track the kinds of objects linguists postulate (I am thinking of Poeppel and Franklin and Greene here). So, once again ling is not so different.

      Delete