Keywords

1 Introduction

Onomatopoeia refers to a word that phonetically imitates, resembles, or suggests the sound (or motion that accompanies sound) that it describes. Common occurrences of onomatopoeia include animal noises such as “oink”, “meow”, “roar” and “chirp” [16]. Onomatopoeia is much used in paper comics that have no aural feedback, to enrich the static image cuts [4]. However, it is used less in a spoken fashion especially in the Indo-European languages. The extent of use of onomatopoeia can differ between languages and cultures, and Koreans, in particular, make use of onomatopoeia in both written (not necessarily for comics) and spoken ways very frequently. It is often used even along with the actual sound in caption for entertainment shows (in Korea [6]) and games [8]) as a way to dramatize, emphasize, exaggerate and draw attention the situation [19].

On the other hand, one of the important goals to achieve in virtual reality (VR) is to provide a rich content experience by giving users a high level of immersion and presence (feeling as if situated in the [5, 13]). There are variety elements that can affect and improve the sense of presence, including the first person viewpoint, visual realism, interaction, shielding from the operating environment, and physical immersion, just to name a few.

Based on such observations and research results, we explore if the use of onomatopoeia, associated with sound feedback. could also bring about the similar effect, i.e. enhancement in presence and immersion for VR. Note that showing of onomatopoeia is not physically possible in the real world, however, there have been many instance of purely virtual worlds or proper mixing of physically unrealistic effects (e.g. showing motion profiles of a flying ball in a tennis game [1]) such that it can still induce the suspension of disbelief and not come in the way of, or even enhance the level of immersion and presence [5, 13]. On the other hand, we can also expect overusing onomatopoeia can be distracting, too unrealistic and break the sense of immersion and illusion [12].

In the following, we present a pilot experiment (and its results) comparing the user’s subjective user experience, presence and immersion in two virtual worlds, each configured in test conditions: (1) sound feedback with no onomatopoeia and (2) sound feedback with it. First, we shortly review related research in the next section.

2 Related Work

There have been several works that has artificially added “text (or glyphs)” and localized “sound word animation” to images [9] and videos [15] with the intent to add the effects of, e.g. dynamics and excitement. As for the use of onomatopoeia, perhaps due to the cultural orientation, there have been just few researches mainly from Japan, as applied to images or computer graphic contents, but not for immersive virtual reality. For example, Wang et al. developed a method for automatically transforming non-verbal video sounds to animated sound words and positioning them near the sound source objects in the video for visualization. Furthermore, they conducted a user study to show that animated sound words helped clarify the sound source and made the video watching more enjoyable [15]. Yamamoto et al. developed a method to transform the environmental sound into the corresponding onomatopoeia and visualized them in various ways (e.g. varying fonts and sizes to reflect sound qualities and loudness) [18]. In a similar vein, Fukusato and Morishima considered the estimation and depiction of onomatopoeia in computer-generated animation based on physical parameters [3]. More recently, Shimoda and Yani have applied the deep neural network in labeling an image with the appropriate onomatopoeia [12].

Among many presence enhancing elements, we take a note of the role of attention and affordance [10, 11, 17]. The use of onomatopoeia, to direct user’s attention, can be regarded as a similar approach. Illusion effects and multimodal effects for enriching VR contents are also related approaches [2, 7, 20].

3 Experiment

3.1 Experiment Design

A usability experiment was carried out to assess the projected merits of the use of onomatopoeia, as an additive to the regular sound effect, towards directing user attention, affordance, and thereby to enhancing immersion and presence. The assessment was made over two different virtual reality scenes. Thus, the main factors were the use of onomatopoeia and scene type, making the experiment designed as a 2-factors x 2-levels within subject repeated measure.

3.2 Experiment Task

The experimental task involved the subject to experience two different scenes, “Animal Farm” and “Busy Kitchen” (see Fig. 1) and assess them guided by a subjective survey. Each scene type was configured in the two test conditions, i.e. (A) one with ordinary sound effects (denoted “Sound-only”), e.g. from the animals (cow, sheep, chicken, pig), cooking objects (kettle, boiling pot, operating mixer, baking oven), and other miscellaneous ones (flag on a pole, clock on the wall, running water from a faucet, ringing phone), (B) the other with the added onomatopoeia (denoted “With-words”). Therefore, as a within-subject experiment, there were four virtual environments for the subject to experience and assess, presented in a balanced order: (1) Farm/Sound-only, (2) Farm/With-words, (3) Kitchen/Sound-only, (4) Kitchen/With-words. To minimize any learning effect, the respective scenes (Farm and Kitchen) was slightly reconfigured between experiencing (1) and (2), and (3) and (4): as for the Farm, the number and kinds of animals making sounds were made and distributed differently (the total staying the same), and as for the Kitchen, the types of food prepared, several types of cooking apparatus and time of day (one for breakfast and other supper preparation) were changed. The total number of objects making sounds in the Farm was about 12 and 15 for the Kitchen, distributed evenly throughout the scene (see Fig. 2). For the condition (B), the onomatopoeias were positioned right next or above to the pertaining objects with sufficient contrast for visibility. The actual sound words (onomatopoeia) used in the experiment all common and registered in the standard Koreans dictionary.

Fig. 1.
figure 1

The two test virtual environments: “Animal Farm” (left) and the “Busy Kitchen” (right), “Sound-only” (above) and With-words” (below).

Fig. 2.
figure 2

The location of sound making object. (above – Animal Farm, below – Busy Kitchen)

3.3 Experiment Procedure

Thirty six people (18 females and males) aged between 19 and 38, (mean = 23.4, SD = 3.3) participated in the study. Each subject experienced and freely navigated around the four test VR scenes (for 5 min each) in an order balanced around the content theme (Farm/Kitchen) and the experimental factor (with or without the onomatopoeia). The subject was told to count the total number of notable objects (e.g. animals, kitchen objects) and sound making objects, and also identify which objects or how many of them were making sounds. Such measures were collected to evaluate quantitatively the effect of the onomatopoeia with respect to user focus/attention and awareness.

After experiencing the each scene (2-conditions x 2-content types), the subject answered a survey which asked of the general usability (fatigue, sickness, naturalness), content perception (liveliness, realism, affordance and attention), presence/immersion, and enjoyment/preference level, all in 7 level Likert scale (see Table 1). Finally, the subjects were asked of various situations in the respective scenes (see Table 2) and to describe the experience virtual space in words, which was linguistically analyzed for its richness in expression similar to the approach in [14].

Table 1. The subjective survey assessing a variety of aspects in user experience (all answered in 7 level Likert scale).
Table 2. Object attention questions (for Busy Kitchen - Breakfast scene).

The testing platform was implemented with Unity3D and run on a desktop PC with the HTC VIVE head-set and interaction controller (for navigation). Further experiment procedural details are omitted due to space restriction. Our basic expected outcomes were that both quantitative (e.g. more accurate assessment of the objects in the scene) and subjective (e.g. preferred, helpful in perceiving the content to be rich, and leading to a higher level of awareness, presence, and immersion) performance would be better with the additive use of the onomatopoeia to sound.

4 Results and Discussion

See Fig. 3.

Fig. 3.
figure 3

The levels of various aspects of user experience for the four conditions: (1) Farm – Sound only, (2) Farm – With words, (3) Kitchen – Sound only, (4) Kitchen – With words.

4.1 General Usability (Fatigue, Naturalness, Sickness)

ANOVA/t-test was applied for the analysis of these dependent variables (corresponding responses to the survey in Table 1). A statistically significant difference was found in the fatigue level (F1) when onomatopoeias were used (vs. not using) for the Farm scene (t(35) = −2.935, p = 0.006*), but not for the Kitchen. The Farm containing relatively more onomatopoeias per unit area (and thus also for the momentary visual span) seems to have caused this difference.

Subjects felt the scene to be, expectedly, relatively unnatural (N1) when the onomatopoeias were used for the Farm scene (t(35) = 1.898, p = 0.066), but again not for the Kitchen. Considering that the overall realism did not suffer despite the use of the sound words (see Sect. 4.2), it seems this result emanates from the scene/object specific characteristics (e.g. animals making occasional sounds/text vs. cookeries making constant sounds).

Sickness levels were low (below 2 out of 7) as the virtual environment incurred no substantial navigation (just slow-paced local exploration) and no differences were found between Sound-only and With-words.

4.2 Content Perception (Realism, Liveliness)

It was actually projected that the additive use of onomatopoeia possessed a danger of lowering the realism (R1, R2 - since obviously, such a thing does not exist in the real world). However, no significant effect was found on this variable. Aside from the usual suspension of disbelief, we attribute this partially to the fact that most people are already accustomed to using and seeing onomatopoeia in everyday living, comics and other media form (at least in Korea).

ANOVA/t-test found that users generally felt the scenes to be more dynamic and livelier (D1) when the onomatopoeias were used in the Animal Farm (t(35) = −2.076, p = .045), but not in the Busy Kitchen. Animals compared to kitchen objects moved and were expected to do so freely and onomatopoeia helped the users feel improved dynamics and liveliness. In contrast, the kitchen objects were mostly static to begin with, and the added sound words would be ineffective, and rightly so. We believe that the dynamism and liveliness can further increases for moving objects, if we also animate the words in tune with the object motion. The perceived liveliness might improve even for the non-moving objects with onomatopoeia animated in synch with the sounds made.

Even though the scenes were not interactive, users reported they were led to interact with the objects in both contents, implicating a strong sense of affordance when the onomatopoeia were used. Unfortunately, to the two relevant questions of object attention and affordance (A1 and A2), statistically significant effects were not found. However, other measures of object attention were found to be different with a statistically significant difference (see Sect. 4.4).

4.3 Immersion and Presence

ANOVA/t-test found a significant difference for the user felt immersion and presence (P1, P3 collectively) for the Kitchen scenes (Breakfast - p = .002; Supper - p = .027), but not for the Farm scenes. In the Busy Kitchen scenes, user felt immersion and presence were higher when onomatopoeias are used. Note that it was observed and found that subjects felt a substantial level of affordance and interest (for interaction possibility) through the onomatopoeia (see Sect. 4.2). This seems to have led the subjects to perceive higher immersion and presence. On the other hand, in the Farm scene, subjects felt that the animals were arranged too crowdedly and there were excessively many additive texts (in unit area), raising the level of fatigue, and eventually distracting them and breaking the sense of immersion/presence. The onomatopoeia did not help the scene to be felt interactive because of the very nature of the content type (nothing much to do with animals vs. kitchen objects having clear functional purposes). On the other hand, subjects reported that it was much easier to focus, attend and localize the objects in the less crowded Kitchen scene with the onomatopoeia (see Sect. 4.2). In fact, there are research works indicating a strong correlation between presence and affordance/attention [17] and vice versa with distraction [17].

4.4 Situation Awareness

The situation awareness was evaluated in two ways: (1) accuracy in the counting and naming objects (total or sound making) (2) linguistically analyzing the subject’s scene descriptions (evaluated based on the richness of the vocabularies or expressions). In both the Animal Farm and Busy Kitchen scenarios, subjects were generally more correct in the counting and identifying objects and grasping features of objects when there was onomatopoeia either for the total or the sound making ones (we omit the exact statistical figures). In fact, when there was no onomatopoeia, the counts were significantly inaccurately less. Despite the no effect not being found for questions A1 and A2 (see Sect. 4.2), based on this result, we believe the subjects had an object-wise understanding.

Subjects were also asked to freely write out and describe the scenes they experienced. We applied the neuro-linguistic programming to code the subjective experience [14]. In general, the subject’s descriptions were richer when the onomatopoeia accompanied the sound effect. For example, the descriptions were longer and changed from simple depiction to a more refined story (see Table 3). Also, the descriptions were partly imagined out with colorful vocabularies with the perceptual position changed to the first person from the third (see Table 4).

Table 3. Examples of Farm scene description from a subject (Sound-only and With-words conditions) – originally written in Korean and translated to English.
Table 4. Examples of Kitchen scene description from a subject (Sound-only and With-words conditions) – originally written in Korean and translated to English.

4.5 Enjoyment/Preference

ANOVA/t-test found mixing results for the user enjoyment/preference as well. For example, subjects showed a significantly more willingness/preference for the content with the onomatopoeia for the Busy Kitchen, but not for the Animal Farm. As posited, this seems to relate to the difference in the object affordance and attention to them, leading the subjects to raise their interest and the desire to interact.

5 Conclusion and Future Work

In this paper we verified that using onomatopoeia as added to the sound feedback could improve the user experience in virtual reality. Our experiment result has found that the judicial use of onomatopoeia can indeed help direct user attention, offer object affordance and thereby enhance user experience and even the sense of presence and immersion, without degrading the perceived realism. Summarizing the specifics, important factors that would make an effective application of onomatopoeia to sounds were: (1) having a scene (and associated objects) that has an interactive theme and highly functional, (2) not crowding the scene with too much additive texts.

In the future, we would like to investigate in how to present the onomatopoeias in various ways, such as in their position, pose, size, animation, timings, and how they might also affect the user experience in virtual and mixed reality. Using mimetic words for emphasizing the object movement is also another consideration along this line. The ultimate goal is to derive a specific guideline for utilizing onomatopoeia in VR/MR. Finally, how onomatopoeia can help give experience to the hearing-impaired or how it can be associated with other modalities such as the haptic and visual (e.g. object animation) are another interesting future work topic.