Elsevier

Neurocomputing

Volume 71, Issues 10–12, June 2008, Pages 2023-2028
Neurocomputing

Visual music and musical vision

https://doi.org/10.1016/j.neucom.2008.01.025Get rights and content

Abstract

This paper aims to bridge human hearing and vision from the viewpoint of database search for images or music. The semantic content of an image can be illustrated with music or conversely images can be associated with a piece of music. The theoretical basis of the bridge is synaesthesia, a property of human perception. A prototype cross-media retrieval system is built, using principles established in the neuroscientific study of synaesthesia.

Introduction

The word synaesthesia [1], [2], meaning “joined sensation”, describes an involuntary physical experience in which the stimulation of one sensory modality reliably causes a perception in one or more different modalities.

Synaesthesia has been known for centuries: in 1694, John Locke reported that a blind mathematician, Nicholas Saunderson, associated the color red with the sound of a trumpet. Synaesthesia was first described scientifically by Galton in Nature in 1880 and also reported in some early books of Colouree in 1890 and Farbenhören in 1927. Recent research has shown that synaesthesia affects mental activities such as abstracting, metaphorising and even the evolution of languages. As noted by the neurologist Richard Cytowic, synaesthesia is a window into an enormous expanse of the mind.

There are many types of synaesthesia, as almost any two senses can be combined. For example, sights can be associated with sounds, sounds can be associated with tastes and so on. Some people could experience vivid colours when listening to music, or have strong tactile sensations like tingling when hearing noises. Recent research reported that about 19 types of synaesthesia have been recorded, among which the colour-graphemi and colour-auditory are the most common types of synaesthesia.

In colour-graphemi synaesthesia the experience of hearing, or even thinking about letters, numbers, words or shapes produces a highly specific colour perception in simple patterns. Patricia Lynne Duffy reported a case in which the common letters had colours and the colour of a word was dominated by the colour of the first letter.

In colour-auditory synaesthesia the experience of hearing voices, music, or random noise, can produce colours, textures and shapes. Early research in psychology have intended to find the associations between images or colours and music, as sound waves and light waves are received by different sense organs but could bring very similar feelings. Pythagoras, the Greek philosopher and mathematician, built the Pythagorean scale in music and intended to link his scale with colours. By using a prism, Newton studied light waves. He then also observed that both light waves and musical tones involved vibrations that could be measured and thus arranged in a scale, that is to say, colour and sound are at least analogous.

It has also been widely observed that aesthetic connections exist between images and music. Whistler gave one of his paintings a musical title “Nocturne in Blue and Gold, Old Battersea Bridge.” In our normal life, when a person sees blue, he might detect the “scent” of blue, which could be hopeless and suffering feelings—it is the evocation of feelings by the colour blue, meanwhile, the Blues music is often associated with sadness. So, it is natural for us to wonder how to build a link between the Blues music, the colour blue and the feelings of sadness?

More interestingly, how could we build a computer system to interpret the relationship between ears and eyes? For instance, for an image showing tragedy, the system is expected to facilitate its representation with a piece of sad music, like Blues, rather than a piece of Spanish Corrida music.

There could be many methods to simulate synaesthesia in a computer system. One of them is cross-media retrieval. Such a system should have the following components: first, separately extract features from different multimedia information such as images and music; second, map all features to a common feature space (and/or develop a similarity metric) in order to compare them; third, introduce a human–computer interaction mechanism to cluster features, and train the system interactively. A user of the resulting cross-media retrieval system would supply a piece of music and obtain a set of images related to the music or conversely supply an image and obtain a set of musical pieces related to the image. One application could be the choice of the “most suitable” background music for a home page.

The remaining part of this paper introduces a sample synaesthesia system with cross-media retrieval, and is organised as follows: Section 2 briefly reviews content-based image/music retrieval. Section 3 introduces a platform and Section 4 reports preliminary results. Section 5 concludes.

Section snippets

Content-based image/music retrieval

As far as we know, there is as yet no computer system for representing music by images or vice versa. Some related work can be found in [3], but it is not in a computerised environment. Microsoft Windows Media Player represents sounds by colourful moving graphical patterns, such as curves or particles. It generates the patterns based on the rhythm; however, the creation of a picture inspired by music and with semantic objects and/or scenes in it is still beyond the state-of-the-art.

Is it

Platform

Fig. 1 shows a framework for a cross-media retrieval system, which takes a piece of Media B (e.g., music) as input enquiry and returns a set of Media A (e.g., images). In this framework features are extracted from both kinds of media, and compared in pairs before fusion (in features are compared in pair, a fusion stage can also be added). Alternatively, all features from both media are mapped to the same space, and compared within this space.

In this paper, we only built a simple sample

Experiments

We used the three music samples as queries to search in the two image databases, respectively. MFCC coefficients were used in all experiments for representing music, while, for each combination, different image features were employed as stated in the captions of the following figures.

Six groups of samples retrieved results are given in figures. Experiments 1–4 are on I-A and Experiments 5 and 6 are on I-B.

Conclusions

Synaesthesia, a property of human brain, has been a popular research topic in cognitive science for years. In daily life, synaesthetes form a large part of the population but we do not realise that it is very usual to have such kind of involuntary physical experience. In synaesthesia a perception in one sense modality, e.g., hearing, causes a perception in another modality, e.g., vision. Synaesthesia suggests that it is possible to construct bridge between different media, is thus a promising

Xuelong Li holds a permanent post in the School of Computer Science and Information Systems, Birkbeck College, University of London, London, UK and is a Visiting Professor with Tianjin University, Tianjin, China. His research interests include cognitive computing, digital image/video processing, analysis, retrieval, and indexing, pattern recognition, biometrics, and visual surveillance. His research activities are partly sponsored by the EPSRC, the British Council, Royal Society, etc. He has

References (3)

  • Lawrence E. Marks

    The Unity of the Senses: Interrelations Among the Modalities

    (1978)
There are more references available in the full text version of this article.

Cited by (0)

Xuelong Li holds a permanent post in the School of Computer Science and Information Systems, Birkbeck College, University of London, London, UK and is a Visiting Professor with Tianjin University, Tianjin, China. His research interests include cognitive computing, digital image/video processing, analysis, retrieval, and indexing, pattern recognition, biometrics, and visual surveillance. His research activities are partly sponsored by the EPSRC, the British Council, Royal Society, etc. He has around a hundred scientific publications. Dr. Li is an associate editor of IEEE T-CSVT, T-IP, T-SMC-B, and T-SMC-C. He is also an editor of four books, an editorial board member of several other journals, including Neurocomputing, and a guest coeditor of special issues. He is the recipient of several best paper awards and nominations. He has served as a chair or a cochair of a dozen conferences and a programme committee member for more than eight conferences. He is a member of the IEEE and several of its technical committees. He is the Chair of IEEE-SMC Technical Committee on Cognitive Computing.

Dacheng Tao received the B.Eng. degree from the University of Science and Technology of China (USTC), the M.Phil degree from the Chinese University of Hong Kong, and the Ph.D degree from the University of London. He is currently an assistant professor at the Hong Kong Polytechnic University. His research interests include artificial intelligence, computer vision, data mining, information theory, machine learning, and visual surveillance. He published extensively at IEEE T-PAMI, T-KDE, T-IP, T-MM, T-CSVT, T-SMC, IEEE CVPR, ICDM, ACM Multimedia, KDD, etc. Previously he gained several Meritorious Awards from the Int'l Interdisciplinary Contest in Modeling, which is the highest level mathematical modeling contest in the world, organized by COMAP. He is an editor two books and a guest editor of six journals. He is an associate editor of the Neurocomputing (Elsevier) journal. He co-chaired the Special Session on Information Security at the IEEE ICMLC and the Workshop on Knowledge Discovery and Data Mining from Multimedia Data and Multimedia Applications at IEEE ICDM.

Stephen J. Maybank received the B.A. degree in mathematics from King's College Cambridge in 1976 and the Ph.D degree in computer science from Birkbeck College, University of London in 1988. He was a research scientist at GEC from 1980 to 1995, first at MCCS, Frimley, and then, from 1989, at the GEC Marconi Hirst Research Centre in London. In 1995, he became a lecturer in the Department of Computer Science at the University of Reading and, in 2004, he became a professor in the School of Computer Science and Information Systems at Birkbeck College, University of London. His research interests include camera calibration, visual surveillance, tracking, filtering, applications of projective geometry to computer vision and applications of probability, statistics, and information theory to computer vision. He is the author of more than 85 scientific publications and one book. He is a senior member of the IEEE.

Yuan Yuan is currently a Lecturer at the Aston University, United Kingdom. She received her B.Eng. degree from the University of Science and Technology of China and Ph.D. degree from the University of Bath, United Kingdom. She has around thirty scientific publications in journals and conferences on visual information processing, compression, retrieval etc. She is an associate editor of International Journal of Image and Graphics (World Scientific) and an editorial board member of Journal of Multimedia (Academy Publisher). She was on program committees of many IEEE/ACM conferences. She is a reviewer for several IEEE transactions, other international journals and conferences. She is a member of the IEEE and IEEE Signal Processing Society.

View full text