Comments

Friday, May 17, 2019

Why do geese honk?

A question to CBC Radio's Quirks and Quarks program, "Why do Canada geese honk while migrating?" The answer the CBC gives is "They honk to communicate their position in the flock". But Elan Dresher gave a different answer back in 1996.

New blog: Outdex

Here's a new blog, Outdex, featuring Thomas Graf (and friends), that should be of interest to people who read FoL. This week, Thomas has a post on the "inverted T" (or "inverted Y") model of Generative Grammar.

Thursday, May 9, 2019

GG + NN = Thing 1 + Thing 2?

Language, stealing adopting the BBS model, has a target article by Joe Pater and several replies.

Here's my attempt at bumper-sticker summaries for the articles. You can add your own in the comments. (No, there are no prizes for this.)

Pater: GG + NN + ?? = Profit!

Berent & Marcus: Structure + Composition = Algebra

Dunbar: Marr + Marr = Marr

Linzen: NNs learn GGs sometimes, sorta

Pearl: ?? = interpretability

Potts: Functions + Logic = Vectors + DL

Rawski & Heinz: No Free Lunch, but there is a GI tract

Pater starts out with the observation that Syntactic Structures and "The perceptron: A perceiving and recognizing automaton" were both published in 1957.

Here is a list of other things that were published in 1957 (hint: 116). It may say too much about me, but some of my favorites over the years from this list have included: The Cat in the Hat, From Russia with Love, The Way of Zen, Endgame and Parkinson's Law. But I'm afraid I can't really synthesize all that into an enlightened spy cat whose work expands to fill the nothingness. You can add your own mash-ups in the comments. (No, there are no prizes for this either.)


Sunday, April 28, 2019

Scheering forces

I'll respond to Tobias in a post rather than a blog reply because he raises several points, and I want to include a picture or two.

1. TS: "When you cross the real-world boundary, i.e. when real-world items (such as wave lengths) are mapped onto cognitive categories (colors perceived), you are talking about something else since the real world is not a module."

The arguments I was making hold equally well for representations within the central nervous system (CNS), for example between the retina, the lateral geniculate nucleus and V1. Real-world spatial relations are mapped partially veridically onto the retina (due to the laws of optics). The spatial organization of the retina is (partially) maintained in the mapping to cortex; that is, LGN and V1 are retinotopic. So the modules here are the retina, LGN and V1, which are certainly modules within the CNS.

The same sort of relationship is true for acoustic frequency, the cochlea, the medial geniculate nucleus (MGN), and A1. Acoustic frequencies are mapped partially veridically onto the coiled line of hair cells in the cochlea (due to laws of acoustics). That is, frequency is mapped into a spatial (place) code at the cochlea (this is not the only mechanism for low frequencies). And the cochlear organization is partially preserved in the mappings to MGN and A1, they are cochleotopic (= tonotopic). There is an "arbitrary" aspect here: frequency is represented with a spatial code. But the spatial code is not completely arbitrary or random, but organized and ordinal, such that frequency increases monotonically from the apex to the base in the cochlea, as shown in the diagram from Wikipedia, and is preserved in tonotopic gradients in A1. That is, the mappings between the modules are quasimorphisms.



2. TS: "when I use the word "arbitrary" I only mean the above: the fact that any item of list A may be associated with any item of list B."

Then I think you should find a different term. I also think there has been far too much focus on the items. As I have tried to explain, items enter into relationships with other items, and we need to consider the preservation of these relationships across the interface or the lack thereof; we need to keep track of the quasimorphisms. So it is not the case for many of the intermodular interfaces in sensation and perception that any item on one side of the interface can be mapped to any item on the other side of the interface. Spatial and temporal and other ordering relationships tend to be preserved across the interfaces, and this strongly constrains the mapping of individual items. Remarkably, this is true even in synesthesia, see Plate 9 from Cytowic 2018.



3. TS: "That's all fine, but I am talking about the mind, not about the brain. Whatever the wirings in the brain, they won't tell us anything about how cognitive items of two distinct vocabularies are related (Vocabulary Insertion), or how a real-world item is associated to a cognitive category (wave length - color)."

I am not a dualist, and I doubt that this blog is a good forum for a discussion of the merits of mind/body dualism. Here is a quote from Chomsky 1983 on the mind/brain, he reiterates this in Chomsky 2005:257 and in many other places.

"Now, I think that there is every reason to suppose that the same kind of “modular” approach is appropriate for the study of the mind — which I understand to be the study, at an appropriate level  of  abstraction,  of  properties  of  the  brain ..."

Just to be clear, I am not saying that cognitive scientists should defer to neuroscientists, but they should talk to them. The idea that we have learned nothing about color perception and cognition from the study of the human visual pathway is simply false.

4. TS: "is there evidence for interfaces that are not list-based?"

Yes, almost any (non-linguistic) set of items with an ordering relation. When aspects of the ordering relation are preserved across the interface the mapping will be a quasimorphism, and thus the item-to-item mappings will be strongly constrained by this, that is, if a < b then f(a) <f  f(b). What's unusual about the lexicon is that small changes in pronunciation can lead to enormous changes in meaning. In many of the other cases we instead end up with a very small, almost trivial look-up table, something like the sets of basis vectors for the two spaces, as with homomorphisms between groups in algebra.

5. TS: "is there evidence for associations that correspond to partial veridicality, i.e. where the to-be-related items are commensurable, i.e. allow for the assessment of similarity?" ...
"The same goes for the association of real-world items with cognitive categories: trying to assess the (dis)similarity of "450-485 nm" and "blue", as opposed to, say, "450-485 nm" and "red" (or any other perceived color for that matter) is pointless. Wave lengths and perceived colors are incommensurable and you won't be able to tell whether the match is veridical, non-veridical or partially veridical."

This isn't pointless at all. In fact, remarkable progress has been made in this area. See, for example, Hardin 1988, Hardin & Maffi 1997, Palmer 1999 and Bird et al 2014. The match is partially veridical in a variety of ways. Small changes in spectral composition generally lead to small changes in perceived hue; the mapping is a quasimorphism. Importantly, the topology of the representation changes -- and thus is a non-veridical aspect of the mapping, from a linear relation to a circular one in the cone cells of the retina to an opponent process representation in LGN.

6. TS: "The secret key is the look-up table that matches items of the two modules."

I agree with this, except that I want the look-up table to be as small as possible, the "basis vectors" for the spaces. In my opinion, the best way to accomplish this is with innate initial look-up tables for the features, giving the learner the initial conditions for the Memory-Action and Perception-Memory mappings. The feature-learning approaches, including Mielke 2008,  Dresher 2014 and Odden 2019, start with an ability to perceive IPA-like phonetic representations. I simply don't believe that this is a plausible idea, given how difficult even simple cases are for such an approach, as explained in Dillon, Dunbar & Idsardi 2013.

References:

Bird CM, Berens SC, Horner AJ & Franklin A. 2014. Categorical encoding of color in the brain. Proceedings of the National Academy of Sciences, 111(12), 4590–4595.

Chomsky N. 1983. The Psychology of Language and Thought: Noam Chomsky interviewed by Robert W. Rieber. In RW Rieber (ed) Dialogues on the Psychology of Language and Thought. Plenum.

Chomsky N. 2005. Reply to Lycan. In LM Antony & N Hornstein (eds) Chomsky and his Critics. Blackwell.

Cytowic RE. 2018. Synesthesia. MIT Press.

Dillon B, Dunbar E & Idsardi WJ. 2013. A single-stage approach to learning phonological categories: insights from Inuktitut. Cognitive Science, 37(2), 344–377.

Hardin CL. 1988. Color for Philosophers: Unweaving the Rainbow. Hackett.

Hardin CL & Maffi L. 1997. Color Categories in Thought and Language. Cambridge University Press.

Palmer SE. 1999. Vision Science: Photons to Phenomenology. MIT Press.

Wednesday, April 24, 2019

ECoG to speech synthesis

In Nature today, another fascinating article from Eddie Chang's lab at UCSF. They were able to synthesize intelligible speech from ECoG recordings of cortical activity in sensory-motor and auditory areas. The system was even able to decode and synthesize speech successfully from silently mimed speech. The picture (Figure 1 in the article) shows a block diagram of the system.



There are also two commentaries on the work, along with some speech samples from the system.

Friday, April 19, 2019

A possible EFP developmental trajectory from syllables to segments

Infants can show a puzzling range of abilities and deficits in comparison with adults, out-performing adults on many phonetic perception tasks while lagging behind in other ways. Some null results using one procedure can be overturned with more sensitive procedures and some contrasts are "better" than others in terms of effect size and various acoustic or auditory measures of similarity (Sundara et al 2018). And there are other oddities about the infant speech perception literature, including the fact that the syllabic stimuli generally need to be much longer than the average syllable durations in adult speech (often twice as long). One persistent idea is that infants start with a syllable-oriented perspective and later move to a more segment-oriented one (Bertoncini & Mehler 1981), and that in some languages adults still have a primary orientation for syllables, at least for some speech production tasks (O'Seaghdha et al 2010; but see various replies, e.g. Qu et al 2012).

More than a decade ago, I worked with Rebecca Baier and Jeff Lidz to try to investigate audio-visual (AV) integration in 2 month old infants (Baier et al 2007). Infants were presented with one audio track along with two synchronized silent movies of the same person (namely Rebecca) presented on a large TV screen. The movies were of different syllables being produced; the audio track generally matched one of the movies. Using this method we were able to replicate the results of Kuhl & Meltzoff 1982 that two month old infants are able to match faces and voices among /a/, /i/, and /u/. Taking this one step further, we were also able to show that infants could detect dynamic syllables, matching faces with for example /wi/ vs. /i/. We did some more poking around with this method, but got several results that were difficult to understand. One of them was a failure of the infants to match on /wi/ vs /ju/. (And we are pretty sure that "we" and "you" are fairly frequently heard words for English learning infants.) Furthermore, when they were presented with /ju/ audio alongside /wi/ and /i/ faces, they matched the /ju/ audio with the /wi/ video.  This behavior is at least consistent with a syllable-oriented point of view: they hear a dynamic syllable with something [round] and something [front] in it, but they cannot tell the relative order of [front] and [round]. This also seems consistent with the relatively poor abilities of infants to detect differences in serial order (Lewkowicz 2004). Rebecca left to pursue electrical engineering and this project fell by the wayside.

This is not to say that infants cannot hear a difference between /wi/ and /ju/, though. I expect that dishabituation experiments would succeed on this contrast. The infants would also not match faces for /si/ vs /ʃi/ but the dishabituation experiment worked fine on that contrast (as expected). So, certainly there are also task differences between the two experimental paradigms.

But I think that now we may have a way to understand these results more formally, using the Events, Features and Precedence model discussed on the blog a year ago, and developed more extensively in Papillon 2018. In that framework, we can capture a /wi~ju/ syllable schematically as (other details omitted):


The relative ordering of [front] and [round] is underspecified here, as is the temporal extent of the events. The discrimination between /wi/ and /ju/ amounts to incorporating the relative ordering of [front] and [round], that is, which of the dashed lines is needed in:



When [round] precedes [front], that is the developing representation for /wi/; when [front] precedes [round] that is the developing representation for /ju/. Acquiring this kind of serial order knowledge between different features might not be that easy as it is possible that [front] and [round] are initially segregated into different streams (Bregman 1990) and order perception across streams is worse than that within streams. It's conceivable that the learner would be driven to look for additional temporal relations when the temporally underspecified representations incorrectly predict such "homophony", akin to hashing collision.

If we pursue this idea more generally, the EFP graphs will gradually become more segment oriented as additional precedence relations are added, as in:



And if we then allow parallel, densely-connected events to fuse into single composite events we can get even closer to a segment oriented representation:



So the general proposal would be that the developing representation of the relative order of features is initially rather poor and is underspecified for order between features in different "streams". Testing this is going to be a bit tricky though. An even more general conclusion would be that features are not learned from phonetic segments (Mielke 2008) but that features are gradually combined during development to form segment sized units. We could also include other features encoding extra phonetic detail to these developing representations, it could then be the case that different phonetic features have different temporal acuity for the learners, and so cohere with other features to different extents.

References

Baier, R., Idsardi, W. J., & Lidz, J. (2007). Two-month-olds are sensitive to lip rounding in dynamic and static speech events. In Proceedings of the International Conference on Auditory-Visual Speech Processing.

Bertoncini, J., & Mehler, J. (1981). Syllables as units in infant speech perception. Infant behavior and development, 4, 247-260.

Bregman, A. (1990). Auditory Scene Analysis. MIT Press.

Kuhl,  P.K.,  and  Meltzoff,  A.N.  (1982).  The  Bimodal  perception of speech in infancy. Science, 218, 1138-1141.

Lewkowicz, D. J. (2004). Perception of serial order in infants. Developmental Science, 7(2), 175–184.

Mielke, J. (2008). The Emergence of Distinctive Features. Oxford University Press.

O’Seaghdha, P. G., Chen, J. Y., & Chen, T. M. (2010). Proximate units in word production: Phonological encoding begins with syllables in Mandarin Chinese but with segments in English. Cognition, 115(2), 282-302.

Papillon, M. (2018). Precedence Graphs for Phonology: Analysis of Vowel Harmony and Word Tones. ms.

Qu, Q., Damian, M. F., & Kazanina, N. (2012). Sound-sized segments are significant for Mandarin speakers. Proceedings of the National Academy of Sciences, 109(35), 14265-14270.

Sundara, M., Ngon, C., Skoruppa, K., Feldman, N. H., Onario, G. M., Morgan, J. L., & Peperkamp, S. (2018). Young infants’ discrimination of subtle phonetic contrasts. Cognition, 178, 57–66.

Tuesday, April 9, 2019

Two cipher examples

I'm not convinced that I'm getting my point about "arbitrary" across, so maybe we should try some toy examples from a couple of ciphers. Let's encipher "Colorless green ideas" in a couple of ways.

1. Rot13: "Colorless green ideas" ⇒ "Pbybeyrff terra vqrnf". This method is familiar to old Usenet denizens. It makes use of the fact that the Latin alphabet has 26 letters by rotating them 13 places (a⇒n, b⇒o, ... m⇒z, n⇒a, o⇒b, ...) and so this method is its own inverse. That is, you decode a rot13 message by doing rot13 on it a second time. This is a special case of a Caesar cipher. Such ciphers are not very arbitrary as they mostly preserve alphabetic letter order, but they "wrap" the alphabet around into a circle (like color in the visual system) with "z" being followed by "a". In a rotation cipher, once you figure out one of the letter codes, you've got them all. So if "s" maps to "e" then "t" maps to "f" and so on.

2. Scrambled alphabet cipher: Randomly permute the 26 letters to other letters, for example A..Z ⇒ PAYRQUVKMZBCLOFSITXJNEHDGW. This is a letter-based codebook. This is arbitrary, at least from the individual letter perspective, as it won't preserve alphabetic order, encoding "Colorless" as "Yfcftcqxx". So knowing one letter mapping (c⇒y) won't let you automatically determine the others.

But this cipher does preserve various other properties, such as capitalization, number of distinct atomic symbols, spaces between words, message length, doubled letters, and sequential order in general.

Even word-based code books tend to preserve sequential order. That is, the message is input word-by-word from the beginning of the message to the end. But more sophisticated methods are possible, for example by padding the message with irrelevant words. It's less common to see the letters of the individual words scrambled, but we could do that for words of varying lengths, say by having words of length 2 reversed, 21, so that "to" would be encoded as "to" ⇒ "ot" ⇒ "fj". And words of length three might be scrambled 312, length four as 2431, and so on, choosing a random permutation for each word length. Adding this encryption technique will break apart some doubled letters. But the word order would still be preserved across the encryption.

These toy examples are just to show that "arbitrary" vs "systematic" isn't an all-or-nothing thing in a mapping. You have to consider all sorts of properties of the input and output representations and see which properties are being systematically preserved (or approximately preserved) across the mapping, and which are not. Temporal relations (like sequencing) are particularly important in this respect.