1 Introduction

Kandinsky, who is a pioneer in the field of abstract painting, left many works that could aim at depicting music [1, 2]. Music is very abstract expression, but that can present various abstract images to listeners [3]. Kandinsky’s purpose of depicting music using colors is considered not to let viewers see his works as abstract paintings, but to let viewers feel abstract images that can be felt on listening to music. However, if viewers do not have enough knowledge of art, they cannot easily get abstract images from abstract paintings [4].

To enhance abstract images of viewers of abstract paintings, the authors have proposed a music generation system that utilizes viewers’ gazes [5, 6]. Using a gaze detection equipment, the system detects the gaze of a viewer who sees an abstract painting, where the viewer’s gaze moves over the painting, and tends to stay at certain points. At each of the points at which the gaze stays, the color of that point is converted to sound so that as the gaze moves, music that consists of converted sounds in time series is generated. It can be expected that the authors’ music generation system can prompt the viewer of the abstract painting to imagine various abstract images that the painter intends to express.

Concerning music composition using still images such as drawings and/or paintings, Xenakis developed a system called UPIC [7, 8], which scans a still image so that lines and points in the image are converted to sounds using a computer. UPIC’s algorithm assigns vertical coordinates of the image to pitch and horizontal coordinates to timeline. However, as the temporal change in the position in a still image, the horizontal scan might not be very reasonable. Another example of combining paintings and music is Iura’s work entitled “Map” [9], in which the color pointed by the user’s mouse, which is an alternative of the viewer’s gaze, is converted to sound. However, to our knowledge, Iura has not explored the effectiveness of his proposed system.

This paper explores whether the authors’ music generation system can enhance the viewer’s abstract imaginations from abstract paintings such as Kandinsky’s works, by subjective tests.

2 Gaze Based Music Generation System

The authors’ music generation system detects the gaze of a person who views an abstract painting by a gaze detection equipment such as the Eye Tracker. The gaze tends to stay at certain points in the painting for certain durations. At each of such staying points, color information and shape information of figures, is obtained. The diagram in Fig. 1 illustrates the authors’ gaze based music generation system.

Fig. 1.
figure 1

Gaze based music generation system

  1. 1.

    The gaze of a person who views the abstract painting is detected by a gaze detection equipment such as the Eye-Tracker, and each gazed point is tracked in the abstract painting. The tracked points of the gaze are smoothed with a simple averaging that uses the gaze points in the 15 frames prior to the current frame. This could remove noise due to blinking and false recognition.

  2. 2.

    Gazed regions (objects or figures) are extracted from the painting using the gazed points. Specifically, as shown Fig. 2, the gazed region, which consists of multiple pixels, is obtained by finding a region with similar colors to that of the gazed point in the neighborhood of the gazed point. Here, the color similarity is obtained by computing the Euclidean distance D between the color P (R, G, B) of the gazed point and the color P’ (R, G, B) of a pixel in the neighborhood. Note that the values of R, G, and B range between 0 and 255 and that if D is smaller than 30, the color of that pixel in the neighborhood is judged to be similar to the color of the gazed point.

    Fig. 2.
    figure 2

    Gazed region obtained by the gazed point in the abstract painting

  3. 3.

    Key, chord and melody are determined by the averaged color of the gazed region. This paper determines the key, chord and melody by using the authors’ proposed method [5, 6], which converts color to sounds based on correspondence between tonality and the colors people with synesthesia feel. Music tempo is determined by the area of the gazed region. Sound position between left and right is determined by the centroid of that region.

  4. 4.

    The parameters determined in 3 are converted to MIDI (Musical Instrument Digital Interface), and are sent to the software synthesizer so that music (sound series) is generated.

3 Experiments

As described in Sect. 1, the authors’ music generation system is expected to prompt the viewer of the abstract painting to imagine abstract images such as ones people feel from music. Experiments that explore whether the authors’ proposed system can actually prompt the viewer’s abstract images were conducted. The experiments compare the two cases in which the viewers see abstract paintings, while hearing the music generated by the authors’ system and without hearing any music. Subjects are asked to utter what they feel during the experiments, and their utterances are recorded. The recorded utterances are analyzed by the Protocol analysis [10]. Details of the experiments are as follows.

3.1 Subjects

Nineteen (19) male and female students participated in the experiments. All of the subjects have normal eyesight and hearing.

3.2 Abstract Paintings

Eight abstract paintings drawn by Kandinsky (Table 1) were used for the experiments.

Table 1. Kandinsky’s abstract paintings presented to subjects

3.3 Equipment

A 15 inches LCD display was used for displaying each of the eight abstract paintings to each of the 19 subjects. A digital image of each abstract painting was displayed at the center of the LCD display in the black colored background. Headset (SONY MDR-Z30) was used for letting each subject hear the music generated by the authors’ system. Tobii REX (Tobii Technology AB) was used for the eye-tracker in the authors’ system.

3.4 Procedure

The procedure of the experiments is as follows.

  1. 1.

    Before the experiment, we instruct each of the subjects to express what he/she thinks and how he/she feels, by uttering words so that we can record the words he/she utters.

  2. 2.

    Four of the eight abstract paintings are presented to each subject without music. The other four paintings are presented to each subject while playing the music generated based on the detected gaze. The order of displaying the eight abstract paintings and whether “with” or “without” music are determined at random for each subject. Each painting is displayed for at least 30 s till the subject does the operation for terminating that display. The subject’s utterances are recorded during the time the painting is displayed.

  3. 3.

    Unless the displayed painting is the final one, return to 2. after 5 s interval. If it is the final one, the experiment for that subject is over.

4 Results

4.1 Abstract Image and Concrete Idea

The recorded words uttered by the nineteen subjects are analyzed. This paper defines “abstract image” as “image without concreteness”, which is different from associations of objective and/or realistic things. Namely, abstract images include subjective images such as impressions and remarks as well as vague imaginations that could correspond to nouns that represent intangible things. Therefore, we classify the subjects’ utterances into the following two categories.

  • Category A “Concrete Idea”: Concrete, tangible things such as specific objects.

  • Category B “Abstract Image”: Abstract imaginations or intangible things such as adjectives and impressions.

Examples of uttered words classified into Categories A and B are listed in Table 2.

Table 2. Examples of words classified into the two categories (Category A: concrete idea; Category B: Abstract image).

4.2 Numbers of Uttered Words in Categories A and B

To explore whether the authors’ music generation system can enhance the viewer’s abstract imaginations, we compare the numbers of utterances classified into the category A and category B in case of “without music” and “with music”.

The numbers of uttered words in each of the two categories in case of “with music” and “without music” are shown in Fig. 3. In Fig. 3, each number indicates the mean value of the total numbers of the words uttered by the 19 subjects for the eight abstract paintings. It turns out that in case of “without music” the number of utterances classified into category A is larger than category B, and that in case of “with music (hearing gaze based music)” category B is larger than category A. As a result of conducting t-test, p-value is less than 0.05 for the number differences between category A and category B in case of “without music” and “hearing gaze based music”. These results support that the authors’ music generation system can enhance viewers’ abstract images more than “without music” situation.

Fig. 3.
figure 3

Mean numbers of uttered words in each category in case of “without music” and “with music (hearing gaze based music)”.

4.3 Numbers of Uttered Words in Each Abstract Painting

The numbers of utterances classified into category A and category B for each of the eight abstract paintings are explored in case of “without music” and “with music (hearing gaze based music)”. Figures 4 and 5 show the mean value of the total numbers of the words uttered by the 19 subjects for each of the eight abstract paintings in case of “without music” and “with music”, respectively. As shown in Figs. 4 and 5, it turns out that except for #5, the mean numbers for category B in case of “with music (hearing gaze based music)” are larger than “without music”. In addition, as shown in Fig. 5, it turns out that the mean numbers of utterances in category B for #5 and #6 in case of “without music” are larger than category A.

Fig. 4.
figure 4

“Without music”:Mean numbers of uttered words for each abstract painting

Fig. 5.
figure 5

“With music (hearing gaze based music)”: Mean numbers of uttered words for each abstract painting.

5 Discussion

As described in Sect. 4.2, in case of “without music”, the number of utterances classified into category A is larger than category B, and that in case of “with music (hearing gaze based music)” category B is larger than category A. This could mean the authors’ gaze based music generation system can enhance viewer’s abstract imaginations.

As described in Sect. 4.3, only in the abstract painting #5, the number of utterances in category B in case of “without music” is larger than “with music (hearing gaze based music)”, while in case of the other paintings, category B’s utterances in case of “with music” are larger. On the other hand, in case of “without music”, in only #5 and #6, the numbers of uttered words in category B are larger than category A. These phenomena could indicate that the contents and/or features of abstract painting influence on the effect of the authors’ music generation system. In particular, in the abstract paintings #5 and #6, which enhance abstract imagination in case of “without music” more than “with music”, only circles and rectangles are depicted, respectively, while various shapes are depicted in the other paintings. Compared with the other paintings, these two paintings use colors with lower brightness and saturation. This could mean that what are depicted in abstract painting affect viewers’ abstract imaginations.

The above results could imply that we need to explore relationships between the contents of abstract paintings and music generated by the authors’ system.

6 Conclusion

This paper has explored the effectiveness of prompting abstract paintings’ viewers’ abstract imaginations by the authors’ gaze based music generation system. Experiments using 19 subjects and eight abstract paintings were conducted for the two cases in which the subjects see the abstract paintings without hearing any music and while hearing the viewers’ gaze based music generated by the authors’ system. The experimental results imply that “hearing gaze based music” could enhance the viewers’ abstract imagination. Remaining issues include that we need to explore relationships between the contents of abstract paintings and music generated by the authors’ music generation system.