Sunday, April 28, 2019

Scheering forces

I'll respond to Tobias in a post rather than a blog reply because he raises several points, and I want to include a picture or two.

1. TS: "When you cross the real-world boundary, i.e. when real-world items (such as wave lengths) are mapped onto cognitive categories (colors perceived), you are talking about something else since the real world is not a module."

The arguments I was making hold equally well for representations within the central nervous system (CNS), for example between the retina, the lateral geniculate nucleus and V1. Real-world spatial relations are mapped partially veridically onto the retina (due to the laws of optics). The spatial organization of the retina is (partially) maintained in the mapping to cortex; that is, LGN and V1 are retinotopic. So the modules here are the retina, LGN and V1, which are certainly modules within the CNS.

The same sort of relationship is true for acoustic frequency, the cochlea, the medial geniculate nucleus (MGN), and A1. Acoustic frequencies are mapped partially veridically onto the coiled line of hair cells in the cochlea (due to laws of acoustics). That is, frequency is mapped into a spatial (place) code at the cochlea (this is not the only mechanism for low frequencies). And the cochlear organization is partially preserved in the mappings to MGN and A1, they are cochleotopic (= tonotopic). There is an "arbitrary" aspect here: frequency is represented with a spatial code. But the spatial code is not completely arbitrary or random, but organized and ordinal, such that frequency increases monotonically from the apex to the base in the cochlea, as shown in the diagram from Wikipedia, and is preserved in tonotopic gradients in A1. That is, the mappings between the modules are quasimorphisms.

2. TS: "when I use the word "arbitrary" I only mean the above: the fact that any item of list A may be associated with any item of list B."

Then I think you should find a different term. I also think there has been far too much focus on the items. As I have tried to explain, items enter into relationships with other items, and we need to consider the preservation of these relationships across the interface or the lack thereof; we need to keep track of the quasimorphisms. So it is not the case for many of the intermodular interfaces in sensation and perception that any item on one side of the interface can be mapped to any item on the other side of the interface. Spatial and temporal and other ordering relationships tend to be preserved across the interfaces, and this strongly constrains the mapping of individual items. Remarkably, this is true even in synesthesia, see Plate 9 from Cytowic 2018.

3. TS: "That's all fine, but I am talking about the mind, not about the brain. Whatever the wirings in the brain, they won't tell us anything about how cognitive items of two distinct vocabularies are related (Vocabulary Insertion), or how a real-world item is associated to a cognitive category (wave length - color)."

I am not a dualist, and I doubt that this blog is a good forum for a discussion of the merits of mind/body dualism. Here is a quote from Chomsky 1983 on the mind/brain, he reiterates this in Chomsky 2005:257 and in many other places.

"Now, I think that there is every reason to suppose that the same kind of “modular” approach is appropriate for the study of the mind — which I understand to be the study, at an appropriate level  of  abstraction,  of  properties  of  the  brain ..."

Just to be clear, I am not saying that cognitive scientists should defer to neuroscientists, but they should talk to them. The idea that we have learned nothing about color perception and cognition from the study of the human visual pathway is simply false.

4. TS: "is there evidence for interfaces that are not list-based?"

Yes, almost any (non-linguistic) set of items with an ordering relation. When aspects of the ordering relation are preserved across the interface the mapping will be a quasimorphism, and thus the item-to-item mappings will be strongly constrained by this, that is, if a < b then f(a) <f  f(b). What's unusual about the lexicon is that small changes in pronunciation can lead to enormous changes in meaning. In many of the other cases we instead end up with a very small, almost trivial look-up table, something like the sets of basis vectors for the two spaces, as with homomorphisms between groups in algebra.

5. TS: "is there evidence for associations that correspond to partial veridicality, i.e. where the to-be-related items are commensurable, i.e. allow for the assessment of similarity?" ...
"The same goes for the association of real-world items with cognitive categories: trying to assess the (dis)similarity of "450-485 nm" and "blue", as opposed to, say, "450-485 nm" and "red" (or any other perceived color for that matter) is pointless. Wave lengths and perceived colors are incommensurable and you won't be able to tell whether the match is veridical, non-veridical or partially veridical."

This isn't pointless at all. In fact, remarkable progress has been made in this area. See, for example, Hardin 1988, Hardin & Maffi 1997, Palmer 1999 and Bird et al 2014. The match is partially veridical in a variety of ways. Small changes in spectral composition generally lead to small changes in perceived hue; the mapping is a quasimorphism. Importantly, the topology of the representation changes -- and thus is a non-veridical aspect of the mapping, from a linear relation to a circular one in the cone cells of the retina to an opponent process representation in LGN.

6. TS: "The secret key is the look-up table that matches items of the two modules."

I agree with this, except that I want the look-up table to be as small as possible, the "basis vectors" for the spaces. In my opinion, the best way to accomplish this is with innate initial look-up tables for the features, giving the learner the initial conditions for the Memory-Action and Perception-Memory mappings. The feature-learning approaches, including Mielke 2008,  Dresher 2014 and Odden 2019, start with an ability to perceive IPA-like phonetic representations. I simply don't believe that this is a plausible idea, given how difficult even simple cases are for such an approach, as explained in Dillon, Dunbar & Idsardi 2013.


Bird CM, Berens SC, Horner AJ & Franklin A. 2014. Categorical encoding of color in the brain. Proceedings of the National Academy of Sciences, 111(12), 4590–4595.

Chomsky N. 1983. The Psychology of Language and Thought: Noam Chomsky interviewed by Robert W. Rieber. In RW Rieber (ed) Dialogues on the Psychology of Language and Thought. Plenum.

Chomsky N. 2005. Reply to Lycan. In LM Antony & N Hornstein (eds) Chomsky and his Critics. Blackwell.

Cytowic RE. 2018. Synesthesia. MIT Press.

Dillon B, Dunbar E & Idsardi WJ. 2013. A single-stage approach to learning phonological categories: insights from Inuktitut. Cognitive Science, 37(2), 344–377.

Hardin CL. 1988. Color for Philosophers: Unweaving the Rainbow. Hackett.

Hardin CL & Maffi L. 1997. Color Categories in Thought and Language. Cambridge University Press.

Palmer SE. 1999. Vision Science: Photons to Phenomenology. MIT Press.

Wednesday, April 24, 2019

ECoG to speech synthesis

In Nature today, another fascinating article from Eddie Chang's lab at UCSF. They were able to synthesize intelligible speech from ECoG recordings of cortical activity in sensory-motor and auditory areas. The system was even able to decode and synthesize speech successfully from silently mimed speech. The picture (Figure 1 in the article) shows a block diagram of the system.

There are also two commentaries on the work, along with some speech samples from the system.

Friday, April 19, 2019

A possible EFP developmental trajectory from syllables to segments

Infants can show a puzzling range of abilities and deficits in comparison with adults, out-performing adults on many phonetic perception tasks while lagging behind in other ways. Some null results using one procedure can be overturned with more sensitive procedures and some contrasts are "better" than others in terms of effect size and various acoustic or auditory measures of similarity (Sundara et al 2018). And there are other oddities about the infant speech perception literature, including the fact that the syllabic stimuli generally need to be much longer than the average syllable durations in adult speech (often twice as long). One persistent idea is that infants start with a syllable-oriented perspective and later move to a more segment-oriented one (Bertoncini & Mehler 1981), and that in some languages adults still have a primary orientation for syllables, at least for some speech production tasks (O'Seaghdha et al 2010; but see various replies, e.g. Qu et al 2012).

More than a decade ago, I worked with Rebecca Baier and Jeff Lidz to try to investigate audio-visual (AV) integration in 2 month old infants (Baier et al 2007). Infants were presented with one audio track along with two synchronized silent movies of the same person (namely Rebecca) presented on a large TV screen. The movies were of different syllables being produced; the audio track generally matched one of the movies. Using this method we were able to replicate the results of Kuhl & Meltzoff 1982 that two month old infants are able to match faces and voices among /a/, /i/, and /u/. Taking this one step further, we were also able to show that infants could detect dynamic syllables, matching faces with for example /wi/ vs. /i/. We did some more poking around with this method, but got several results that were difficult to understand. One of them was a failure of the infants to match on /wi/ vs /ju/. (And we are pretty sure that "we" and "you" are fairly frequently heard words for English learning infants.) Furthermore, when they were presented with /ju/ audio alongside /wi/ and /i/ faces, they matched the /ju/ audio with the /wi/ video.  This behavior is at least consistent with a syllable-oriented point of view: they hear a dynamic syllable with something [round] and something [front] in it, but they cannot tell the relative order of [front] and [round]. This also seems consistent with the relatively poor abilities of infants to detect differences in serial order (Lewkowicz 2004). Rebecca left to pursue electrical engineering and this project fell by the wayside.

This is not to say that infants cannot hear a difference between /wi/ and /ju/, though. I expect that dishabituation experiments would succeed on this contrast. The infants would also not match faces for /si/ vs /ʃi/ but the dishabituation experiment worked fine on that contrast (as expected). So, certainly there are also task differences between the two experimental paradigms.

But I think that now we may have a way to understand these results more formally, using the Events, Features and Precedence model discussed on the blog a year ago, and developed more extensively in Papillon 2018. In that framework, we can capture a /wi~ju/ syllable schematically as (other details omitted):

The relative ordering of [front] and [round] is underspecified here, as is the temporal extent of the events. The discrimination between /wi/ and /ju/ amounts to incorporating the relative ordering of [front] and [round], that is, which of the dashed lines is needed in:

When [round] precedes [front], that is the developing representation for /wi/; when [front] precedes [round] that is the developing representation for /ju/. Acquiring this kind of serial order knowledge between different features might not be that easy as it is possible that [front] and [round] are initially segregated into different streams (Bregman 1990) and order perception across streams is worse than that within streams. It's conceivable that the learner would be driven to look for additional temporal relations when the temporally underspecified representations incorrectly predict such "homophony", akin to hashing collision.

If we pursue this idea more generally, the EFP graphs will gradually become more segment oriented as additional precedence relations are added, as in:

And if we then allow parallel, densely-connected events to fuse into single composite events we can get even closer to a segment oriented representation:

So the general proposal would be that the developing representation of the relative order of features is initially rather poor and is underspecified for order between features in different "streams". Testing this is going to be a bit tricky though. An even more general conclusion would be that features are not learned from phonetic segments (Mielke 2008) but that features are gradually combined during development to form segment sized units. We could also include other features encoding extra phonetic detail to these developing representations, it could then be the case that different phonetic features have different temporal acuity for the learners, and so cohere with other features to different extents.


Baier, R., Idsardi, W. J., & Lidz, J. (2007). Two-month-olds are sensitive to lip rounding in dynamic and static speech events. In Proceedings of the International Conference on Auditory-Visual Speech Processing.

Bertoncini, J., & Mehler, J. (1981). Syllables as units in infant speech perception. Infant behavior and development, 4, 247-260.

Bregman, A. (1990). Auditory Scene Analysis. MIT Press.

Kuhl,  P.K.,  and  Meltzoff,  A.N.  (1982).  The  Bimodal  perception of speech in infancy. Science, 218, 1138-1141.

Lewkowicz, D. J. (2004). Perception of serial order in infants. Developmental Science, 7(2), 175–184.

Mielke, J. (2008). The Emergence of Distinctive Features. Oxford University Press.

O’Seaghdha, P. G., Chen, J. Y., & Chen, T. M. (2010). Proximate units in word production: Phonological encoding begins with syllables in Mandarin Chinese but with segments in English. Cognition, 115(2), 282-302.

Papillon, M. (2018). Precedence Graphs for Phonology: Analysis of Vowel Harmony and Word Tones. ms.

Qu, Q., Damian, M. F., & Kazanina, N. (2012). Sound-sized segments are significant for Mandarin speakers. Proceedings of the National Academy of Sciences, 109(35), 14265-14270.

Sundara, M., Ngon, C., Skoruppa, K., Feldman, N. H., Onario, G. M., Morgan, J. L., & Peperkamp, S. (2018). Young infants’ discrimination of subtle phonetic contrasts. Cognition, 178, 57–66.

Tuesday, April 9, 2019

Two cipher examples

I'm not convinced that I'm getting my point about "arbitrary" across, so maybe we should try some toy examples from a couple of ciphers. Let's encipher "Colorless green ideas" in a couple of ways.

1. Rot13: "Colorless green ideas" ⇒ "Pbybeyrff terra vqrnf". This method is familiar to old Usenet denizens. It makes use of the fact that the Latin alphabet has 26 letters by rotating them 13 places (a⇒n, b⇒o, ... m⇒z, n⇒a, o⇒b, ...) and so this method is its own inverse. That is, you decode a rot13 message by doing rot13 on it a second time. This is a special case of a Caesar cipher. Such ciphers are not very arbitrary as they mostly preserve alphabetic letter order, but they "wrap" the alphabet around into a circle (like color in the visual system) with "z" being followed by "a". In a rotation cipher, once you figure out one of the letter codes, you've got them all. So if "s" maps to "e" then "t" maps to "f" and so on.

2. Scrambled alphabet cipher: Randomly permute the 26 letters to other letters, for example A..Z ⇒ PAYRQUVKMZBCLOFSITXJNEHDGW. This is a letter-based codebook. This is arbitrary, at least from the individual letter perspective, as it won't preserve alphabetic order, encoding "Colorless" as "Yfcftcqxx". So knowing one letter mapping (c⇒y) won't let you automatically determine the others.

But this cipher does preserve various other properties, such as capitalization, number of distinct atomic symbols, spaces between words, message length, doubled letters, and sequential order in general.

Even word-based code books tend to preserve sequential order. That is, the message is input word-by-word from the beginning of the message to the end. But more sophisticated methods are possible, for example by padding the message with irrelevant words. It's less common to see the letters of the individual words scrambled, but we could do that for words of varying lengths, say by having words of length 2 reversed, 21, so that "to" would be encoded as "to" ⇒ "ot" ⇒ "fj". And words of length three might be scrambled 312, length four as 2431, and so on, choosing a random permutation for each word length. Adding this encryption technique will break apart some doubled letters. But the word order would still be preserved across the encryption.

These toy examples are just to show that "arbitrary" vs "systematic" isn't an all-or-nothing thing in a mapping. You have to consider all sorts of properties of the input and output representations and see which properties are being systematically preserved (or approximately preserved) across the mapping, and which are not. Temporal relations (like sequencing) are particularly important in this respect.

Saturday, April 6, 2019


Chabot 2019:

"The notion that phonetic realizations of phonological objects function in an arbitrary fashion is counterintuitive at best, confounding at worst. However, order is restored to both phonology and phonetics if a modular theory of mind (Fodor 1983) is considered. In a modular framework, cognition is viewed as work carried out by a series of modules, each of which uses its own vocabulary and transmits inputs and outputs to other modules via interfaces known as transducers (Pylyshyn 1984; Reiss 2007), and the relationship between phonetics and phonology must be arbitrary. This formalizes the intuition that phonology deals in the discrete while phonetics deals in the continuous. A phonological object is an abstract cognitive unit composed of features or elements, with a phonetic realization that is a physical manifestation of that object located in time and space, which is composed of articulatory and perceptual cues." [italics in original, boldface added here]

The implication seems to be that any relation between discrete and continuous systems is "arbitrary". However, there are non-arbitrary mappings between discrete and continuous systems. The best known is almost certainly the relationship between the integers (ℤ) and the reals (ℝ). There is a homomorphism (and only one) from ℤ into ℝ, and it is the obvious one that preserves addition (and lots of other stuff). Call this H. That is, H maps {... -1, 0, 1 ...}  in ℤ to {... -1.0, 0.0, 1.0 ...} in ℝ (using the C conventions for ints and floats). Using + for addition over ℤ and +. for addition over ℝ, then H also takes + to +. (that is, we need to say what the group operation in each case is, this is important when thinking about symmetry groups for example). So now it is true that for all i, j in ℤ H(i + j) = H(i) +. H(j).

However, mapping from ℝ onto ℤ (quantization, Q) is a much trickier business. One obvious technique is to map the elements of ℝ to the nearest integer (i.e. to round them off). But this is not a homomorphism because there are cases where for some r, s in ℝ, Q(r +. s) ≠ Q(r) + Q(s), for example Q(1.6 +. 1.6) = Q(3.2) = 3, but Q(1.6) + Q(1.6) = 2 + 2 = 4. So the preservation of addition from ℝ to ℤ is only partial.


Chabot A 2019. What’s wrong with being a rhotic?. Glossa, 4(1), 38. DOI:

Thursday, April 4, 2019


(In memory of Felix d. 1976, Monty d. 1988, Jazz d. 1993)

In Nature today, some evidence that cats can distinguish their own names. The cats were tested in their homes using a habituation-dishabituation method. This is in contrast to dogs, who have been tested using retrieval tasks, because "the training of cats to perform on command would require a lot of effort and time." From a quick scan of the article, it isn't clear if the foils for the names were minimal pairs though.

Call for papers: Melodic primes in phonology

From Alex and Tobias:

Special issue of Canadian Journal of Linguistics/Revue canadienne de linguistique
Call for papers

We are calling for high-quality papers addressing the status of melodic primes in phonology, in particular in substance-free phonology frameworks. That is, do phonological primes bear phonetic information, if so how much and in which guise exactly? How are melodic primes turned into phonetic objects? In the work of Hale & Reiss, who have coined the term substance-free phonology, it is only phonological computation which is unimpacted by phonetic substance, though it is, however, present in the phonology: melodic primes are still phonetic in nature, and their phonetic content determines how they will be realized as phonetic objects. We are interested in arguments which argue for the presence of phonetic information in melodic primes as well as an alternative position which sees melodic primes as being entirely void of phonetic substance.

At the recent Phonological Theory Agora in Nice, there was some discussion regarding the implications a theory of substance-free melodic primes has for phonology; a variety of frameworks – including Optimality Theory, Government Phonology, and rule based approaches – have all served as a framework for theories which see melodic primes as entirely divorced from phonetic information. The special issue seeks to high-light some of those approaches, and is intended to spark discussion between advocates of the various positions and discussion between practitioners of different frameworks.

We are especially interested in the implications a theory of substance-free primes has for research in a number of areas central to phonological theory, including: phonological representations, the acquisition of phonological categories, the form of phonological computation, the place of marginal phenomena such as “crazy rules” in phonology, the meaning of markedness, the phonology of signed languages, the nature of the phonetics/phonology interface, and more. Substance-free primes also raise big questions related to the question of emergence: are melodic primes innate or do they emerge through usage? How are phonological patterns acquired if primes are not innate?

As a first step, contributors are asked to submit a two page abstract to the editor at

Contributions will be evaluated based on relevance for the special issue topic, as well as the overall quality and contribution to the field. Contributors of accepted abstracts will be invited to submit a full paper, which will undergo the standard peer review process. Contributions that do not fulfill the criteria for this special issue can, naturally, still be submitted to the Canadian Journal of Linguistics/Revue canadienne de linguistique.

(a) June 1, 2019: deadline for abstracts, authors notified by July
(b) December 2019: deadline for first submission
(c) January 2020: sending out of manuscripts for review
(d) March 2020: completion of the first round of peer review
(e) June 2020: deadline revised manuscripts
(f) August 2020: target date for final decision on revised manuscripts
(g) October 2020: target date for submission of copy-edited manuscripts
(h) CJL/RCL copy-editing of papers
(i) End of 2020: Submission of copy-edited papers to Cambridge University Press (4 months before publication date).

Wednesday, April 3, 2019

Dueling Fodor interpretations

Bill Idsardi

Alex and Tobias from their post:

"The ground rule of (Fodorian) modularity is domain specificity: computational systems can only parse and compute units that belong to a proprietary vocabulary that is specific to the system at hand."


"Hence Hale & Reiss' statement that nothing can be parsed by the cognitive system that wasn't present at birth (or that the cognitive system does not already know) appears to be just incorrect. Saying that unknown stimulus can lead to cognitive categories everywhere except in phonology seems a position that is hard to defend."

I think both parties here are invoking Fodor, but with different emphases. Alex and Tobias are cleaving reasonably close to Fodor 1983 while Charles and Mark are continuing some points from Fodor 1980, 1998.

But Fodor is a little more circumspect than Alex and Tobias about intermodular information transfer:

Fodor 1983:46f: "the input systems are modules ... I imagine that within (and, quite possibly, across)[fn13] the traditional modes, there are highly specialized computational mechanisms in the business of generating hypotheses about the distal sources of proximal stimulations. The specialization of these mechanisms consists in constraints either on the range of information  they can access in the course of projecting such hypotheses, or in the range of distal properties they can project such hypotheses about, or, most usually, on both."

"[fn13] The "McGurk effect" provides fairly clear evidence for cross-modal linkages in at least one input system for the modularity of which there is independent evidence. McGurk has demonstrated that what are, to all intents and purposes, hallucinatory speech sounds can be induced when the subject is presented with a visual display of a speaker making vocal gestures appropriate to the production of those sounds. The suggestion is that (within, presumably, narrowly defined limits) mechanisms of phonetic analysis can be activated by -- and can apply to -- either acoustic or visual stimuli. It is of central importance to realize that the McGurk effect -- though cross-modal -- is itself domain specific -- viz., specific to language. A motion picture of a bouncing ball does not induce bump, bump, bump hallucinations. (I am indebted to Professor Alvin Liberman both for bringing McGurk's results to my attention and for his illuminating comments on their implications.)" [italics in original]

I think this quote deserves a slight qualification, as there is now quite a bit of evidence for multisensory integration in the superior temporal sulcus (e.g. Noesselt et al 2012). As for "bump, bump, bump", silent movies of people speaking don't induce McGurk effects either. The cross-modal effect is broader than Fodor thought too, as non-speech visual oscillations that occur in phase with auditory oscillations do enhance brain responses in auditory cortex (Jenkins et al 2011).

To restate my own view again, to the extent that the proximal is partially veridical with the distal, such computational mechanisms are substantive (both the elements and the relations between elements). The best versions of such computational mechanisms attempt to minimize both substance (the functions operate over a minimum number of variables about distal sources; they provide a compact encoding) and arbitrariness (the "dictionary" is as small as possible; it contains just the smallest fragments that can serve as a basis for the whole function; the encoding is compositional and minimizes discontinuities).

And here's Fodor on the impossibility of inventing concepts:

Fodor 1980:148: "Suppose we have a hypothetical organism for which, at the first stage, the form of logic instantiated is propositional logic. Suppose that at stage 2 the form of logic instantiated is first-order quantificational logic. ... Now we are going to try to get from stage 1 to stage 2 by a process of learning, that is, by a process of hypothesis formation and confirmation. Patently, it can't be done. Why? ... [Because] such a hypothesis can't be formulated with the conceptual apparatus available at stage 1; that is precisely the respect in which propositional logic is weaker than quantificational logic."

Fodor 1980:151: "... there is no such thing as a concept being invented ... It is not a theory of how you acquire concepts, but a theory of how the environment determines which parts of the conceptual mechanism in principle available to you are in fact exploited." [italics in original]

You can select or activate a latent ability on the basis of evidence and criteria (the first order analysis might be much more succinct than the propositional analysis) but you can't build first order logic solely out of the resources of propositional logic. You have to have first order logic already available to you in order for you to choose it.


Fodor JA 1980. On the impossibility of acquiring "more powerful" structures. Fixation of belief and concept acquisition. In M Piattelli-Palmarini (ed.) Language and Learning: The Debate between Jean Piaget and Noam Chomsky. Harvard University Press. 142-162.

Fodor JA 1983. Modularity of Mind. MIT Press.

Fodor JA 1998. Concepts: Where Cognitive Science went Wrong. Oxford University Press.

Jenkins J, Rhone AE, Idsardi WJ, Simon JZ, Poeppel D 2011. The Elicitation of Audiovisual Steady-State Responses: Multi-Sensory Signal Congruity and Phase Effects. Brain Topography, 24(2), 134–148.

Noesselt T, Bergmann D, Heinze H-J, Münte T, Spence C 2012. Coding of multisensory temporal patterns in human superior temporal sulcus. Frontiers in Integrative Neuroscience, 6, 64.