Thursday, November 8, 2018

Guest Post by William Matchin: Reflections on SinFonIJA 11, Part 1

Before posting this, let me encourage others to do what William is doing here. Send me stuff to post on FoL. The fact that I have become fat and lazy does not mean that others need go to seed as well. This is the first of two posts on the conference. 

*****

I thought that I would write my thoughts about SinFonIJA 11 in Krakow, Poland, which just finished this past weekend. It was organized by three professors in the Dept. of English Studies at Jagiellonian University in Krakow: Marta Ruda, a former visiting scholar at UMD Linguistics, Mateusz Urban, and Ewa Willim, who was Howard Lasnik’s student and the recipient of the infamous manuscript ‘On the Nature of Proper Government’[1]. All three of them were gracious hosts and the conference was very well organized, informative, and fun. SinFonIJA is a regional[2]conference on formal linguistic analysis focusing on syntax and phonology, but as a neuroscientist, I felt quite welcome and many of the attendees expressed interest in my work. Kraków is a beautiful city and definitely worth visiting, to boot; if you ever visit, make sure to see the Wieliczka salt mine[3].

I suppose my sense of welcome was helped by the fact that the main theme of the conference was “Theoretical Linguistics within Cognitive science” – I was invited to chair a round table discussion on how linguistics is getting on with the other cognitive sciences these days. Linguistics was a founding member of the modern cognitive sciences during the cognitive revolution in the 50s and 60s – perhaps the founding member, with the work by Chomsky in Generative Grammar stimulating interest in deeper, abstract properties of the mind and articulating an alternative vision of language from the dominant behaviorist perspective. Marta was the key instigator of this theme – this was a frequent topic of discussion between us while we were both at the UMD Linguistics dept., which has a unique capacity to bridge the gaps between formal linguistic theory and other fields of cognitive science (e.g., acquisition, psycholinguistics, neuroscience). The invited keynote speakers comprising the round table addressed foundational questions underlying linguistic theory as well as the relation between formal linguistics and the cognitive sciences in their own talks. The main part of this post will reflect on this topic and the roundtable discussion, but before that I’d like to discuss Zheng Shen’s talk, which highlighted important issues regarding the methods in formal linguistics. Much of what I say here reiterates a previous post of mine on FoL[4].

Methods and data in formal linguistics

Lately there has been noise about the quality of data in formal linguistics, with some non-formal linguists calling for linguists to start acting more like psychologists and report p-values (because if you don’t have p-values, you don’t have good data, naturally). My impressions are that these concerns are greatly exaggerated and a non-sequitur. If anything, my feelings are that formal linguistics, at least of the generative grammar variety, is on a greater empirical footing than psycholinguistics and neurolinguistics. This is because linguistics rightly focuses on theoretical development, with data as a tool to sharpen theory, rather than a fixation on data itself. This is illustrated well by Shen’s talk.

Shen began by discussing his analysis of agreement in right node raising (RNR) and its empirical superiority over other accounts (Shen, 2018[5]). His account rested on a series of traditional informal acceptability judgments, consulting a small number of native speakers of English to derive the patterns motivating his analysis. Interestingly, other authors offered a competing account of agreement in RNR, which was not just an alternative analysis but included conflicting data patterns – the two papers disagreed on whether particular constructions were good and bad (Belk & Neelman, 2018) (see the abstract submitted by Shen for details[6]). Shen then performed a series of carefully designed acceptability judgment experiments to sort out the source of the discrepancy, ultimately obtaining patterns of data from large groups of naïve participants that essentially agreed with his judgments rather than Belk & Neelman’s. 

Psychologists (particularly Ted Gibson & Ev Fedorenko) have been heavily critical of methods in formal linguists of late, claiming that informal acceptability judgments are unreliable and misleading (Gibson & Fedorenko, 2010; 2013; their claim of weak quantitative standards in linguistics has been directly contradicted by the exhaustive research of Sprouse & Almeida, 2012; 2013, which illustrates a replication rate of 95-98% of informal judgments presented in a standard syntax textbook as well as a leading linguistics journal with naïve subjects in behavioral experiments[7],[8]). This disagreement about data with respect to RNR appears to support these attacks on formal linguistics by providing a concrete example.

This critique is invalid. First, the two sets of authors agreed on a large set of data, disagreeing on a small minority of data that happened to be crucial for the analysis. The two competing theoreticalaccounts highlighted the small discrepancy in data, leading to a proper focus on resolving the theoretical dispute via cleaning up the data point.

Second, Shen’s original judgments were vindicated. In other words, the behavioral experiments essentially replicated the original informal judgments. In fact, Shen noted several quite obvious issues with the use of naïve subjects, in that they may not be sensitive to making judgments under particular interpretations – that is, they may judge the string to be acceptable, but not under the crucial interpretation/structural analysis under consideration. It took a large amount of work (and I assume money) to handle these issues with multiple experiments to (in a nutshell) merely replicate informal judgments that were obtained far more rapidly and easily than the experiments. Essentially, no new data points were obtained – only replications. It is not clear why Shen and Belk & Neelman disagreed on the data (potentially because of dialect differences, British vs. American English) – but it certainly the problem was not with Shen’s informal judgments.

These two facts inform us that while large-scale experiments can be useful, they are not the drivers of research. Shen’s hard work provided replications in the context oftwo detailed, competing theoretical analyses. The experimental data were only acquired after the theoretical analyses were proposed, and those analyses were based on informal judgment data. If we take Gibson & Fedorenko’s (2010) demands for eschewing informal judgments entirely, then we would end up with disastrous consequences, namely slavishly collecting mass amounts of behavioral data, and spending inordinate amounts of time analyzing that data, all in the absence of theoretical development (which is one of the drivers of the un-replicability plague of much of social psychology). Theory should drive data collection, not the other way around.

With that said, the next section changes gears and discusses the special topic of the conference.

Theoretical linguistics within cognitive science: a crisis?

First, I will summarize my introduction to the round table and the main feelings driving what I and Cedric Boeckx perceive to be a crisis regarding the place of formal linguistics in the cognitive sciences – from my perspective, cognitive neuroscience specifically. As I pointed out in a previous blog post on Talking Brains[9], this crisis is well-illustrated by the fact that the Society for the Neurobiology of language has never had a formal linguist, or even a psycholinguist, present as a keynote speaker in its 10 years of existence, despite many presentations by neuroscientists and experts on non-human animal communication systems.

I think there are many reasons for the disconnect – paramount among these a lack of appreciation for the goals and insights of linguistic theory, sociological factors such as a lack of people who are knowledgeable of both domains and the objectives of both sides, and likely many others. My main point was not to review all of the possible reasons. Rather, I thought it appropriate when discussing with linguists to communicate what is possible for linguists to do to rekindle the interaction among these fields (when I talk to cognitive neuroscientists, I do the opposite – discuss what they are missing from linguistics). I used my own history of attempting to bridge the gaps among fields, raising what I perceived to be a frustrating barrier - the competence/performance distinction. Consider this line from Aspects (Chomsky, 1965), the authoritative philosophical foundation of the generative grammar research enterprise:

… by a generative grammar I mean simply a system of rules that in some explicit and well-defined way assigns structural descriptions to sentences”

The idea that language is a system of rules is powerful. In the context of the mentalistic theory of grammar, it embodies the rejection of behaviorism in favor of a more realistic as well as exciting view of human nature – that our minds are deep and, in many ways, independent of the environment, requiring careful and detailed study of the organism itself in all of its particularities rather than merely a focus on the external world. It calls for a study of the observer, the person, the machine inside of the person’s head that processes sentences rather than the sentences themselves. This idea is what sparked the cognitive revolution and the intensive connection between linguistics and the other cognitive sciences for decades, and led to so many important observations about human psychology.

For a clear example from one of the conference keynote speakers: the work Ianthi Tsimpli did on Christopher, the mentally impaired savant who apparently had intact (and in fact, augmented) ability to acquire the grammar of disparate languages[10], including British Sign Language[11], in the face of shocking deficits in other cognitive domains. Or my own field, which finds that the formal descriptions of language derived from formal linguistic theory, and generative grammar in particular – including syntactic structures with abstract layers of analysis and null elements, or sound expressions consisting of sets of phonological features that can be more or less shared among speech sounds – have quite salient impacts on patterns of neuroimaging data[12],[13].

However, it is one thing to illustrate that hypothesized representations from linguistic theory impact patterns of brain activity, and another to develop a model for how language is implemented in the brain. To do so requires making claims for how things actually work in real time. But then there is this:

“... agenerative grammar is not a model for a speaker or a hearer ... When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed ... to construct such a derivation”.

The lack of investigation into how the competence model is usedposes problems. It is one thing to observe that filler gap dependences – sentences with displaced elements involving the theoretical operation Movement(or internal merge, if you like) – induce increased activation in Broca’s area relative to control sentences (Ben-Shachar et al., 2003), but quite another to develop a map of cognitive processes on the brain. Most definitely it is not the case that Broca’s area “does” movement[14].

It is clearly the case that linguists would like to converge with neuroscience and use neuroscience data as much as possible. Chomsky often cites the work of Friederici (as well as Moro, Grodzinsky, and others). For instance, in Berwick & Chomsky’s recent book Why Only Us they have a central part of the book devoted to the brain bases of syntax, adopting Friederici’s theoretical framework for a neurobiological map of syntax and semantics in the brain. Much of my work has pointed out that Friederici’s work, while empirically quite exceptional and of high quality, makes quite errant claims about how linguistic operations are implemented in the brain.

Now, I think this issue can be worked on and improved upon. But how? The only path forward that I can see is by developing a model of linguistic performance – one that indicates how linguistic operations or other components of the theory are implemented during real-time sentence processing and language acquisition. In other words, adding temporal components to the theory, at least at an abstract level. This was my main point in introducing the round table – why not work on how exactly grammar relates to parsing and production, i.e. developing a performance model?

At the end of Ian Roberts’s talk, which quite nicely laid out the argument for strict bottom-up cyclicity at all levels of syntactic derivation, where there was some discussion about whether the derivational exposition could be converted to a representational view that does not appeal to order (of course it can). Linguists are compelled by the competence/performance distinction to kill any potential thinking of linguistic operations occurring in time. This makes sense if one’s goal is to focus purely on competence. With respect to making connections to the other cognitive sciences, though, the instinct needs to be the reverse – to actually make claims about how the competence theory relates to performance.

Near the end of my talk I outlined three stances on how the competence grammar (e.g., various syntactic theories of a broadly generative type) relates to real-time processing (in this context, i.e. parsing):

1.    The grammar is a body of static knowledge accessed during acquisition, production, and comprehension (Lidz & Gagliardi, 2015).This represents what I take to be the standard generative grammar view – that there is a competence “thing” out there that somehow (in my view, quite mysteriously) mechanistically relates to performance. It’s one thing to adopt this perspective, but quite another to flesh out exactly how it works. I personally find this view to be problematic because I don’t think there are any other analogs or understandings for how such a system could be implemented in the brain and how it constrains acquisition and use of language (but I am open to ideas, and even better – detailed theories).

2.    The grammar is a “specification” of a parser (Berwick & Weinberg, 1984; Steedman, 2000).The idea is that there really is no grammar, but rather that the competence theory is a compact way of describing the structural outputs of the “real” theory of language, the performance models (parser/producer). If this is so, that’s quite interesting, because in my view it completely deprives the competence model of any causal reality, which completely removes its insight into any of the fundamental questions of linguistic theory, such as Plato’s problem – how language is acquired. I do not like this view.

3.    The grammar is a real-time processing device, either directly (Miller, 1962; Phillips, 1996) or indirectly (Fodor et al., 1974; Townsend & Bever, 2001) used during real-time processing and acquisition.I very much like this view. It says that the competence model is a thing that does stuff in real time. It has causal powers and one can straightforwardly understand how it works. While I don’t think that the models advocated for in these citations ultimately succeeded, I think they were spot on in their general approach and can be improved upon.

While I personally heavily favor option (3), I would love to see work that fleshes out any of the above while addressing (or leading the way to address) the core philosophical questions of linguistic theory, as discussed by Cedric Boeckx’s.

Part 2 of this post raises and addresses some of the comments by the keynote speakers on this topic.


[1]If you don’t know this story you would best hear about it from the original participants.
[2]The regional domain consists of the former Austro-Hungarian Empire. This divides the borders of current countries, so Krakow is in but Warsaw is out.
[3]Wielizca is no average mine – it was in parts beautiful and educational. It is way more fun than it sounds.
[4]http://facultyoflanguage.blogspot.com/2016/09/brains-and-syntax.html
[5]Doctoral dissertation.
[7]Gibson, E., & Fedorenko, E. (2010). Weak quantitative standards in linguistics research. Trends in cognitive sciences14(6), 233-234; Gibson, E., & Fedorenko, E. (2013). The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes28(1-2), 88-124.; Sprouse, J., & Almeida, D. (2012). Assessing the reliability of textbook data in syntax: Adger's Core Syntax. Journal of Linguistics48(3), 609-652; Sprouse, J., Schütze, C. T., & Almeida, D. (2013). A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua134, 219-248.
[8]95-98% is probably an underestimate, because there are likely cases where subjects incorrectly report their judgments without properly making the judgment under particular interpretations, etc. However, even taking the 95-98% number at face value, what do we think the replication rate is in certain fields of social psychology? Are formal linguists really supposed to change their way of doing things to match a field that is notoriousthese days for lack of rigor?
[10]Smith, N. V., & Tsimpli, I. M. (1995). The mind of a savant: Language learning and modularity. Blackwell Publishing.
[11]Smith, N., Tsimpli, I., Morgan, G., & Woll, B. (2010). The signs of a savant: Language against the odds. Cambridge University Press.
[12]Brennan, J. R., Stabler, E. P., Van Wagenen, S. E., Luh, W. M., & Hale, J. T. (2016). Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and language157, 81-94.
[13]Okada, K., Matchin, W., & Hickok, G. (2018). Phonological Feature Repetition Suppression in the Left Inferior Frontal Gyrus. Journal of cognitive neuroscience, 1-9.
[14]For evidence on this point see the following papers. Wilson, S. M., & Saygın, A. P. (2004). Grammaticality judgment in aphasia: Deficits are not specific to syntactic structures, aphasic syndromes, or lesion sites. Journal of Cognitive Neuroscience16(2), 238-252. Matchin, W., Sprouse, J., & Hickok, G. (2014). A structural distance effect for backward anaphora in Broca’s area: An fMRI study. Brain and language138, 1-11. Rogalsky, C., Almeida, D., Sprouse, J., & Hickok, G. (2015). Sentence processing selectivity in Broca's area: evident for structure but not syntactic movement. Language, cognition and neuroscience30(10), 1326-1338.

20 comments:

  1. At the risk of sounding like a broken record: I really don't think that option (1) for the relationship between competence and performance is all that mysterious. The question of what you need to supplement a grammar with to make a working theory of sentence comprehension depends, of course, on exactly what sort of grammar you start with, and so I think this question inherits an air of mystery from the fact that we often try to answer it without being clear about exactly what our competence grammar consists of. But given any particular concrete proposal about what competence grammars look like, there are answers to be had.

    I think the toy-example case study of how the performance question pans out under the simplifying assumption that competence takes the form of a context-free grammar is informative. I wrote about this in detail in section 3 here:
    https://linguistics.ucla.edu/people/hunter/parsing/expt-syntax-handbook.pdf
    (See in particular the first bullet point in section 3.4.)

    When you adopt more linguistically-realistic grammars, things get more complicated, but the issues are not qualitatively different. (There are problems to be solved, but no mysteries.) Stabler's work provides one concrete picture of what the competence grammar looks like, and given that starting point the question of what needs to be added (i.e. what sort of system might parse a sentence by accessing that static knowledge) can also be answered:
    https://onlinelibrary.wiley.com/doi/full/10.1111/tops.12031
    http://www.aclweb.org/anthology/W18-2809
    (My usual advice to students is that understanding the CFG case is a very useful stepping stone towards understanding these. See also section 4 of my paper linked above.)

    The general point I want to make doesn't depend on us all buying into Stabler's formalism. It's just one worked-out theory of what the competence might looks like, for which the parsing question is being tackled; the absence of other such models shouldn't be taken as a strike against option (1).

    ReplyDelete
    Replies
    1. "The question of what you need to supplement a grammar with to make a working theory of sentence comprehension depends, of course, on exactly what sort of grammar you start with, and so I think this question inherits an air of mystery from the fact that we often try to answer it without being clear about exactly what our competence grammar consists of"

      I think this is exactly the problem. Thanks for the specific references. It is most useful to have specific proposals, e.g. for minimalism, that illustrate the ramifications for the causal role of the grammar. And less technical papers, or at least a clear exposition of the model. When Chomsky introduces merge, or any other previous instantiation of a theory of generative grammar, I understand (at least roughly) how it works. When I ask about parsing implementation I get answers that are not of the same form as how the grammatical model is introduced, i.e. they do not seem to make a clear proposal for an integrated grammar/parser. That's the sort of thing I am looking for, and I think that's what would greatly help cognitive (neuro)scientists use linguistic theory to guide their work.

      Delete
    2. In other words, what is needed is "this is an answer to your question", rather than "this is how one might try to go about answering your question".

      Delete
    3. Following up on Tim's remark: it's not even clear to me that 1, 2, and 3 are distinct. For the sake of simplicity I'll illustrate this with CFGs, but as Tim said, the general argument also holds for Minimalism, GPSG, TAG, and CCG, among others.

      A parsing algorithm is the combination of

      1. a data structure (how do you store parses), and
      2. a control structure (how do you prioritize parses), and
      3. a parsing system.

      The first two don't have much to do with anything a linguist would consider a grammar, so let's put those aside.

      The parsing system is a collection of rules that tell you what can be inferred and/or conjectured based on the information available so far. The parsing system, in turn, is a specific instantiation of a parsing schema (careful, don't confuse parsing system and parsing schema).

      For instance, the parsing schema for a top-down CFG parser tells you that if you have an X from position i to position j in the string, then you can instead assume that you have some YZ from i to j. In the parsing system, this general rule is replaced by a collection of inferences like the following:

      - [i, VP, j] -> [i, V NP, j]
      - [i, VP, j] -> [i, AdvP VP, j]
      - [i, NP, j] -> [i, Det N, j]
      - [i, NP, j] -> [i, Det AP N, j]

      Since we already know the underlying parsing schema, we can factor it out of these inferences to get the following:

      - VP -> V NP
      - VP -> Adv VP
      - NP -> Det N
      - NP -> Det AP N

      So there's your grammar. The difference between 1 and 2 seems to be whether you take the grammar as your starting point and combine it with the parsing schema to obtain the parsing system, or do what I did above and obtain the grammar by factorization of the parsing system with respect to the parsing schema.

      For what it's worth, we could also keep the parsing system and the grammar fixed and obtain the parsing schema through factorization. So that's option 4: the parser is a static specification of a dynamic processing system for the grammar.

      That leaves option 3. It, too, fits into the general picture above. The crucial link is intersection parsing, which allows you to reduce a grammar to exactly that fragment that only generates the input sentence. Parsing then reduces to generation with this restricted grammar, and a "parser" is just a specification of how the intersection step is carried out on the fly.

      These are all different perspectives on the same object. They only seem different if one takes the view that they describe a cognitive procedure for how the parser is implemented algorithmically - what is static, what is compiled on the fly. But that strikes me as misguided, just like Minimalism's bottom-up derivations don't commit you to the assumption that sentences are built bottom-up. Computation, and by extension cognition, is too abstract an object to be fruitfully described this way.

      So I don't see a good reason to treat these three views as distinct, and more importantly I don't understand what one could gain from doing so. There is no empirical issue that seems to hinge on it. For the purpose of linking theoretical linguistics to neuroscience, it does not matter how you divide up the workload, what matters is that you understand what remains constant irrespective of how the workload is moved around.

      Delete
    4. Of course I like the analysis of competence being a formal grammar of some type, and performance being a sound and complete parser of some type, but there are a couple of problems, and thinking about may serve to separate the three proposals above.



      First is disambiguation : a parser returns a compact representation (parse forest or whatever) of all the possible parses of the input that are licensed by the grammar, whereas the processing system will return one parse, maybe the most likely parse under some probabilistic model. But any model of the performance systems has to be stuffed full of probabilities or word frequencies or something that does the same job if one has an ideological aversion to probabilities.

      Second, sound and complete parsers are going to be slow, (i.e. worse than linear time) whereas we know that the performance system works in real time and aren't sound and complete since grammaticality (being generated by the grammar) and acceptability (approximately being processed without excessive weirdnesses) aren't the same thing.

      Delete
    5. I agree that these two problems are crucial if we are building a cognitive plausible theory, but again I'm not sure I see how there is any difference among the three proposals.

      Let’s look at disambiguation. In the decomposition Thomas outlined, this would mostly be part of the control structure, no? Work linking the MG parser to processing effects has mostly followed the idealization of “parser as a perfect oracle”. So: no disambiguation needed. Obviously this is a huge idealization, pushed forward with a precise aim in mind (i.e. effects of sentence structure on memory).

      But we can discard the perfect oracle and add one that is biased by structural/lexical probabilities, or whatever other decision making mechanism we like, while keeping everything else constant in the definition of the parsing system (plus or minus a probability distribution on the MG rules, I guess).

      Delete
  2. Thanks for the post and discussion! tried to pose a question along this line of thought to Chomsky last year, though not very successfully:

    Chesi & Moro (2015) have recently argued that competence and performance are ac- tually interdependent. I would argue that there are essentially three possible scenarios in which the relation of grammar (G) and a parser as a performance system (P) could work out: (i) G could be independent of P, (ii) G could be accessed by P online during processing, or (iii) it could turn out that G is only implemented in wetware insofar as the totality of P’s mechanisms gives rise to a system behaving in a way that is captured by the description of G. What are your thoughts about this? And how would you describe the relation of linguis- tics to psychology and neuroscience?

    Chomsky: I don’t understand any of this. The study of competence can’t be isolated from psychology because it is part of psychology—unless we (perversely) define “psy- chology” to exclude internally-stored knowledge of language, arithmetic, etc. Psy- cholinguistics, for the past 50 years, has been closely integrated with the study of linguistic competence. How could it be otherwise? Same with neurolinguistics. Linguistic competence is represented in the brain (not the foot, not in outer space) and the same is true of performances that access this stored knowledge of language.
    Speaking personally, I’ve always regarded linguistics, at least the aspects that interest me, as part of psychology, hence ultimately biology. The relation of linguis- tics to psychology is similar to the relation of the theory of vision to psychology: part to whole. And insofar as we are concerned with what is happening in the brain, it’s integrated with neuroscience. In brief, I don’t see how any of these ques- tions even arise except under delimitation of fields that seem quite arbitrary and have never made sense to me.

    Source: Biolinguistics

    ReplyDelete
  3. Perhaps one problem with the 'psychological reality of grammars' issue is that while, on the one hand, we can make rather convincing cases that many grammatical generalizations ought to be captured by the mental structures involved in language use (for example: each language has a system of rules that determine NP structure, which are independent of where the NP is found in the utterance, and of any inflectional features it might possess: if this were not the case, we would expect the apparent NP rules of languages to disintegrate over time, leading to different structures for preverbal and postverbal, or nominative, dative, accusative etc NPs, neither of which seem to happen). [I have the impression that many neuroscientists and pscyhologists are blind to this kind fact]

    On the other hand, to go from a pile of significant generalizations to a formalism that expresses them and can be used for parsing and generation involves a rather large number of arbitrary decisions, so that it is quite reasonable to balk at the claim that the entire grammar is mentally represented. It might be better to say that the entire grammar is a probably pretty bad representation of some mental structures, containing a lot of guesswork and even random choices, but also depicting certain aspects of what the mental structures do.

    ReplyDelete
  4. "If we take Gibson & Fedorenko’s (2010) demands for eschewing informal judgments entirely, then we would end up with disastrous consequences, namely slavishly collecting mass amounts of behavioral data, and spending inordinate amounts of time analyzing that data, all in the absence of theoretical development (which is one of the drivers of the un-replicability plague of much of social psychology)"

    A. We don't eschew informal judgments at all: everyone builds their hypotheses on informal judgments. See Mahowald et al. (2016, Language) for some simple quantitatively-motivated tricks to being sure about your judgment data.

    B. It's not "slavish" to collect data. If you use crowd-sourcing it can be very fast. That's one of the points of the early papers, and one that surely Sprouse & Almeida etc agree with, because they ran 100 2-condition experiments in a few weeks, as did we (in Mahowald et al. 2016).

    C. "all in the absence of theoretical development" gathering good quantitative data is not to the exclusion of developing theories. It's just orthogonal. My collaborators have added plenty of useful theoretical content to theories of language over the years.

    (In general, it would help the discussion if you wouldn't mis-represent sides you don't agree with.)

    ReplyDelete
    Replies
    1. I'm glad to hear that we do agree on certain points, i.e. the importance of informal judgments. What I disagree with mainly is the idea of "weak quantitative standards" in linguistics research. Could you be more precise where I misrepresent sides? I think that Shen's work represents things as they should be - do the quantitative work only when you need to. Gibson and Fedorenko (2010, TICS) did not advocate for this position. They advocated that linguists only publish experimental data rather than informal judgments. If I am wrong about that, please inform me.

      I should have amended my comments to clarify that I did not mean that there was no theoretical import of experimental work. My point was that the time spent on analyzing data is by definition time not spent on theory. In principle these things are logically orthogonal, but in practice they are not.

      Crowd sourcing may be "very fast" but it takes more time than informal judgments. And Shen's work illustrated quite well that it's very tricky to collect data from naive subjects. Crowdsourced data are harder to deal with because you can't interact directly with subjects, control all of the factors you need to, etc., to e.g. ensure that subjects are making judgments under certain interpretations. This requires very careful experiments. This then brings in a host of related issues in experimental design, data collection, statistical analysis, etc. It's non-trivial.

      Delete
    2. Read Mahowald et al (2016). Here is the link and abstract:

      http://tedlab.mit.edu/tedlab_website/researchpapers/Mahowald_et_al_2016_Language.pdf

      While published linguistic judgments sometimes differ from the judgments found in large-scale formal experiments with naive participants, there is not a consensus as to how often these errors occur nor as to how often formal experiments should be used in syntax and semantics research. In this article, we first present the results of a large-scale replication of the Sprouse et al. 2013 study on 100 English contrasts randomly sampled from Linguistic Inquiry 2001–2010 and tested in both a forced-choice experiment and an acceptability rating experiment. Like Sprouse, Schütze, and
      Almeida, we find that the effect sizes of published linguistic acceptability judgments are not uniformly large or consistent but rather form a continuum from very large effects to small or nonexistent effects. We then use this data as a prior in a Bayesian framework to propose a small n acceptability
      paradigm for linguistic acceptability judgments (SNAP Judgments). This proposal makes it easier and cheaper to obtain meaningful quantitative data in syntax and semantics research. Specifically, for a contrast of linguistic interest for which a researcher is confident that sentence A is better than sentence B, we recommend that the researcher should obtain judgments from at least five unique participants, using at least five unique sentences of each type. If all participants in the sample agree that sentence A is better than sentence B, then the researcher can be confident that the result of a full forced-choice experiment would likely be 75% or more agreement in favor of sentence A (with a mean of 93%). We test this proposal by sampling from the existing data and find that it gives reliable performance.

      Delete
    3. In a public discussion, it's good practice to explain one's position. Referring to a paper isn't very helpful. Few will read that, and even fewer will be able to relate the paper as well to the discussion as the person that originally brought it up.

      Just based on the abstract, I wonder, for instance, how this paradigm solves the issue of dialectal variation. If researcher R has a strong judgment for phenomenon P, but can't find enough participants that share their specific dialect, then what? Should R spend tons of time and energy to confirm something they know to be true based on the strength of their own judgment, or completely forego working on P? What about field work on exotic or almost extinct languages, where one might only have one or two consultants to work with? Or is the general idea just that in most cases, a little bit more effort can improve reliability, but in others we have to make do with what's available? If so, does that imply that data from exotic languages is less trustworthy than dominant languages like English for which one can easily find speakers at a university? Doesn't that exacerbate the oft-lamented dominance of English and some other western languages in syntax?

      So if anything, I now have more questions, not fewer.

      Delete
    4. For almost extinct languages, I think it certainly does imply that acceptability judgments are less trustworthy than in dominant languages, and this would apply for numerous stigmatized varieties of non-vanishing languages as well (colloquial urban Indonesian, for example). The result is that people who work on these languages rely more on texts, which raise their own bag of issues, but depending on the details, you can sometimes get important theoretical results out of them, such as the hopelessness of the original Minimalist idea that agreement depends on adjacency at some structural level, falsified by the Tangkic languages of Australia (c.f. the discussion in Richards (2013) 'Lardil “Case Stacking” and the Timing of Case Assignment', and in Pesetsky (2013) _Russian Case Morphology and the Syntactic Categories_. Other times, you stare at texts and see nothing, so theory from texts alone is a bit hit or miss. Intriguingly, of the co-presenters of extreme case-stacking in Austrlia, Alan Dench and Nicholas Evans ('Multiple case marking in Australian languages'), the former is not averse to collecting judgments, while the latter seems to be pretty firm on about not doing so.

      Of course understanding the texts requires its own kind of intuitive judgement, but these seem to be considerably more reliable (does this, in its present context, mean this: ...)

      Delete
    5. It's a bit mystifying to keep reading that crowd-sourced acceptability judgments are supposedly "fast". Sure, once you do the legwork of making sure you have a good, well controlled experiment, data collection in English and maybe a handful of other languages that can be easily crowdsourced can be "fast" if your point of comparison is standard psycholinguistic or neurolinguistic work, but it is nowhere near as fast as doing several replications of small scale experiments that theoretical linguists do. It took Sprouse and Almeida (2012) and Sprouse et al (2013) a good two years to get the design of the experiments and the creation of the stimuli ready for data collection, and that involved very specialized knowledge in syntax to make sure the materials were testing what we hoped to test. Doing these experiments is neither easy, nor particularly fast, and least of all cheap, at least if you want them to be informative and justify the overhead compared to other sources of data theoretical linguists have been shown to productively work with. There are potential large payoffs in some edge cases, but that would not be the norm. Finally, I am yet to see a convincing case as to why naïve participants over whom you have little control over the internet should be seen as the gold standard for data collection in acceptability judgment tasks.

      Delete
  5. Something I would suggest wrt 1 is to focus on what were sometimes called 'linguistically significant generalizations', a term which seems to me to have gotten less use in recent generative writing, but with the modification of calling them something like 'acquired lingistic generalizations' (ALGs). Facts which could be described extensionally as distinct facts, but are clearly learned as one. For example, noun phrase structure in English could be given in two versions, one for singular, one for plural, eliminating the need for the morphosyntactic feature of number, while with other languages, more features can be eliminated at the expense of more mostly identical copies of the NP structure rules, with consequent failure to represent ALGs.

    But the argument that this would be wrong even if somebody could claim some kind of computational advantage for it (I recall somebody arguing something like this in a book about a parser for German, which would seem to need nine repetitions of the PS rules for NP) is these putatively different NP rules never change independently of each other in language change, giving us the existence of NP structure as an ALG, along with ALGs for agreement (and concord, if we distinguish them), and case-marking when this exists.

    Therefore, the real-time processing device(s) should work in terms of the ALGs. Certain issues become less urgent if we focus on these rather than 'the grammar' as usually conceived. For example, we clearly don't have to learn to have trouble with multiple center embeddings, so that status of this as 'competence' vs 'performance' can be seen as secondary to its clear status as 'not an ALG'. This might even extend to some traditional generative grammar topics such as island constraints (I recall many decades ago discussing with Howard Lasnik the idea that since island constraints were universal, they could and maybe should be reclassified as 'performance', or, iirc, that perhaps this distinction is somewhat overrated (am not sure about that recollection)).

    The idea of capturing ALGs, whether by a conventional generative 'competence theory' or a performance mechanism certainly does not solve all problems. For one thing, not all of the arguable ALGs are as well-motivated as NP structure, so there will be disputes about what to represent, and also, all sorts of problems would arise in embodying them into performance mechanisms, in the same way as happens with conventional generative models. But my impression is that many people in research communities bordering on generative grammar have no idea about them at all, and things might be better if they did.

    ReplyDelete
    Replies
    1. I think that's a very valuable point. Even if you don't believe in the competence-performance distinction, for instance, generalizations across constructions still push you in a certain direction. Suppose you want to use finite-state automata to model language. Even for simple constructions you'll quickly get giant automata with tons of redundancies, e.g. between subject and object NPs. In order to make the representation more compact, you'll have to factorize them into something like recursive transition networks (with a finite bound on stack depth). But once you have that, you also have tree structures as an encoding of how you move through the network. Tree structure emerges naturally from the computations even if you don't believe in unbounded nesting and even if you don't care about constituency or semantic interpretation. There's simply no way around it if you want a compact description.

      That said, I don't see why you think that the real-time processing device(s) should work in terms of the ALGs. We have the same problem as with separating parsers and grammars, except that now learning enters the picture. The ALGs we observe could be purely due to the learning algorithm. The learner would use transition networks as a representation of the objects in the hypothesis space, but what is actually learned is a finite-state automaton. The automaton implicitly incorporates the generalizations of the learner, but does not state them explicitly. Since explicit generalizations are, at least sometimes, computationally costly, this is not all that outlandish.

      Delete
    2. I wouldn't insist that the real time processing device really has to work in terms of the ALGs, but I think that the proposal that they do is the simplest hypothesis, until some evidence appears that the ALGs are getting processed into whatever the rtpd does use, and, ideally, something about how this is done.

      An intermediate possibility is that the ALGs are used, but get supplemented with further information to help the rtpd work faster, e.g. a table indicating such things as the propbability of an XP starting being followed by a YP inside a ZP, or simply accumulating a big table of frequently encountered expressions with their meanings.

      Whatever best accounts for the evidence, as long as the ALGs are not ignored, which is what seem to tend to happen, in many circles.

      Delete
  6. Re ALGs. The point can be pushed further. If ALGs can be picked up by an independently motivated learning theory, then their existence and associated properties needn't be attributed to a theory of UG (or a theory of real time processing). Likewise, if some supposed "linguistically significant generalization" turn out not to be significant--again, as determined by a learning theory--then we needn't worry about them at all: the learner literally remembers the input, and theorists needn't rig the machinery to block them. In a word, ALGs should not be ignored especially if UG is less complex than previously supposed.

    ReplyDelete
  7. Definitely, tho the learning theory would ultimately have to be supported by facts about acquisition, change and typology (current typology being the result of past typolgy and change), so it would be a better and more systematic version of the more naive style of reasoning ('hey, we see things with the same internal structure, in different places, with different combinations of inflectional features, so people must be learning things that we'll call 'noun phrase rules'), but resting on the same kind of basis.

    Another point I'll suggest is that when people devise/choose a generative theory or framework, they have an assortment of (kinds of) ALGs they want to capture, but it seems to me that there is also a certain amount of guesswork in going from the pile of ALGs to the full theory, & I'm not sure that there is enough general recognition that you can accept the validity of some ALGs without being terribly persuaded by the generative theory that they are being presented in. (But experience shows that without a generative theory, people do a much worse job of finding ALGs than they do with one.)

    ReplyDelete
  8. Withdraw bitcoin from blockchain , it is now very easy to do with this website. This website has many facilities and features. This is best chance to withdraw your cryptocurrency.

    ReplyDelete