1 Introduction

Museums require the behavior of visitors to be stimulated in order to guide them to appropriate exhibited works or avoid congestion. Therefore, many guidance systems are utilized to induce human behavior. A typical example of a guidance system used in public spaces is a large-screen display and a speaker for audio assistance. However, it is now possible to present information individually through smartphones and various wearable devices. A user is required to gather the information that s/he needs for her/his behavior selection.

Many of these conventional guidance systems utilize words and signals. Thus, their meaning needs to be interpreted so that the appropriate behavior is selected. However, these types of guidance systems have the disadvantage of going unnoticed or being too intrusive in quiet spaces, such as museums. In addition, explicit information presented by these systems could ruin the mood of the exhibition.

Psychological studies have shown that human behavior is altered by ambient information. For example, the lighting in an environment affects the content of conversation and speaking volume [1, 2]. The arrangement of color in a classroom greatly influences students’ achievement [3]. In addition, it is becoming clear that environmental lighting and temperature affects the amount of food consumed [4, 5]. The temperature of an object that people touch affects graciousness [6]. These findings show that seemingly meaningless information greatly affects human emotions, judgment, and behavior [7] and suggest that controlling the environmental information can control human emotions and behavior.

In this study, we focused on implicitly inducing human behavior in public spaces with environmental sound, such as environmental noise and background music (BGM). These sounds significantly affect pleasant/unpleasant feelings [8]. Therefore, we hypothesized that human behavior could be induced implicitly by creating comfortable/uncomfortable sound fields. This effect is evoked in the subject without their conscious attention to the sound and interpretation of the meaning of the sound. Therefore, this induction method using environmental sound balances the competing goals of guidance efficiency and maintaining the mood of a museum. Recently, sound presentation systems that can create sound fields within a narrow area have been developed [911]. These systems can be used to create local sound fields that induce appropriate human behavior in appropriate places. In order to create sound fields that divide space without a physical barrier, we constructed a system that can focus sound to a narrow area by using directional loudspeakers. We used this system to conduct an experiment and examined our ability to generate an acoustic field at a narrow target position. We also examined the output accuracy of the presented acoustic pressure. In addition, we tested whether humans were unconsciously guided by the effects of this system.

2 Inducing Human Behavior by Presenting Sensory Stimuli

Conventional guidance techniques instruct about appropriate routes or behaviors with explicit auditory or visual information, such as characters and symbols. For example, methods for presenting guidance information through mobile/wearable include devices that users carry [12, 13] and projectors embedded in the environment [14]. While the methods that use visual stimuli can present a lot of information, the user is always required to pay attention to the presented information. Auditory information is often used to make announcements in public spaces, including museums. Sound requires less attention than visual information in order to grasp the information. However, it is difficult to present different information to individuals because the same sound is usually presented to the entire space through a speaker.

Intuitive guidance techniques that do not use explicit symbols have been studied. For example, Yoshikawa et al. proposed a system for directing the pedestrian movement to one side of a passage by creating a vection field with a lenticular lens [15]. However, because this system could affect untargeted people in the space, it is difficult to induce the behavior of only a particular person. A haptic interface is sometimes used for realizing intuitive guidance. For example, Amemiya and Maeda proposed a navigation system that uses perceptual attraction forces [16]. Humans translate asymmetric acceleration into a one-directional force because of their perception characteristics. Their method, which utilizes these characteristics, can generate constant force by repeating one cycle of motion for navigation, even though it is a nongrounded device. Narumi et al. constructed a system called Thermotaxis, which controls the positions of humans in public spaces with thermal stimulation and which is based on the findings that temperature sensations are closely related to pleasure-displeasure feelings [17, 18]. Thermotaxis presents a virtual thermal field in which the temperature that is presented through a wearable device changes according to the position of each user. They showed that it is possible to control the position of users and the physical and psychological distance between the users with the temperature of the thermal stimuli [19]. In addition to thermal stimuli, haptic stimuli strongly affect emotions. Sakurai et al. proposed a method to evoke multiple emotions by presenting a wide variety of haptic stimuli, such as thermal stimulation, vibration stimulation, and pressure stimulation [20, 21]. Human behavior changes as a result of emotional changes that occur in response to various sensory stimuli because emotional states largely influence judgment and behavior changes [7].

However, a wearable device is required to be able to present personalized information or control stimuli when using these methods. However, it is possible that the user would pay attention to the wearable device itself rather than the stimuli presented through the device because wearing the device increases physical load.

As an example of a method that induces behavior through other senses, Maeda et al. proposed a system that induces walking courses with vestibular electrical stimulation [22]. The system controls the pedestrians walking direction according to perceived gravity acceleration that functions by passing weak electric current between electrodes placed behind the ears. While this approach does not require the user to direct their attention to the stimuli, it is difficult for the user to change direction by themselves due to the very strong forces. Moreover, the safety of the long-term use of this method has not been confirmed.

In this study, we aimed to realize a method for inducing human behavior unconsciously without the use of a wearable device. We assumed that it would be used in a museum and with a focus on auditory sensation. As previously mentioned, auditory sensations allow the presentation of information to humans without requiring their attention as much as that required for visual sensations. However, conventional methods that use auditory sensations have been used as a way to provide explicit information through announcements in the main language. Therefore, as is the case with the use of visual symbols, it is necessary to interpret the meaning of the presented information.

However, inexplicit sounds have also been shown to greatly influence pleasant/unpleasant feelings of humans and have various psychological effects [8]. In recent years, some approaches that use these findings have tried to influence human behavior. For example, there have been attempts to reduce the crime rate by playing classical music in a public space, such as a station [23]. In addition, a high-frequency sound called mosquito noise, which is only heard by the young, is used in convenience stores to suppress youth misbehavior [24].

Methods to manipulate behavior with sound have been incorporated in the field of art. For instance, Hein published a work called, “Invisible Labyrinth,” which requires a participant to walk around an exhibition space while wearing headphones [25]. Noise comes from the headphones, and the volume of the noise changes according to the position of the participants in the space. This installation makes the participants move as if there is an invisible load in the exhibition space with nothing in it because all of the participants try to find a way to avoid hearing the unpleasant noise. Although this approach requires the wearing of a device and it is not necessarily involving the unconscious, these observations suggest that it is possible to induce a human to a specific position by affecting their feelings through the presentation of a sound. Based on these observations of induction of the position/movement of humans and in order to realize a method to induce the position/movement of the user, we propose a system to divide space into comfortable and uncomfortable areas by controlling the acoustic environment at arbitrary positions in the space.

3 Acoustic Research System for Inducing Human Behavior

3.1 Directional and Local Sound Presentation

Sound consists of vibration stimuli transmitted through air and that normally have no locality. Recently, studies have been conducted in order to synthesize a sound field by presenting sound to particular locations.

Many studies that have investigated synthesizing sound fields in a real environment have adopted methods that create focal points of sound by controlling the time required to reproduce and amplify the sounds, which are outputs from lined speaker arrays arranged side by side [9]. These methods can reproduce high-definition sound fields. However, these methods have a number of drawbacks: they are very expensive due to the requirement for a few hundred speaker arrays and the installation locations of the speaker arrays are limited.

Another method that can be used to present sound locally is to use directional loudspeakers, such as parametric speakers. Parametric speakers implement strong directivity and reflectivity due to their use of ultrasound as the carrier [10]. Ikefuji et al. developed a system for forming stereophonic sound [11], which consisted of units that include multiple parametric speakers and face in multiple directions. The ultrasounds, which are outputs from the unit, reflect off the floor, wall surfaces, ceiling, and mirrors that are set in the space and travel to the listener. The positions where the ultrasounds are reflected to, then seem to be the source of the sound. In addition, stereophonic sound can be reproduced by constructing several audio image locations.

Some studies have been conducted in order to provide new experiences through the high directivity of sound irradiated from parametric speakers. For example, Kimura et al. produced VITA, which is a space-filling display system that visualizes sound beams from a unit that includes multiple parametric speakers, which enable various spatial sound interactions with visual feedback [26]. Ueta et al. proposed a system named Juke Cylinder that enables users to feel like their body is made up of various instruments by irradiating the sound of the instruments to the user’s hand from parametric speakers, which are incorporated in the chassis of the system [27].

The high directivity of the parametric speakers enables local presentation of sounds. Parametric speakers are commonly used for movement guidance in stations and for descriptions of exhibited works in museums because of their low cost and acoustic quality improvements. In the preceding cases, the direction of the parametric speakers is fixed so that sounds are presented in a certain direction. If the direction of the parametric speakers are controlled dynamically, the sound could be presented locally anywhere. This approach has the advantages of low cost and high flexibility for use in different situations.

Based on the studies and reviews described above, we constructed an acoustic research (AR) system to induce human behavior with parametric speakers.

3.2 Acoustic AR System to Create a Local Sound Field

We propose a system capable of inducing the feeling that there is an acoustic field by presenting different sounds with parametric speakers according to the position of the listener (Fig. 1). This system consists of two parametric speakers (Tristate Inc.), two pan-tilt camera platforms, a depth sensor (Microsoft Kinect), an A/D D/A converter (audio interface, M-AUDIO Profire 610), and a Laptop PC. The target area where the sound was presented was sandwiched between two parametric speakers, which were arranged on the ceiling. Each parametric speaker was rested on a pan-tilt camera platform. The directions of the camera platforms were controlled with servo-motors (Futaba FP-S3101). Hereafter, this system is collectively called the speaker unit. The speaker units were linked to the audio interface connected to a PC by a wired connection.

Fig. 1.
figure 1

The configuration of the acoustic research system

The system turned the parametric speakers in the direction of the listener’s head in order to present sound depending on the position of the listener. First, the depth sensors detected the position of the listener’s head. Next, the parametric speakers faced the user’s head by manipulating the angle of the camera platform according to the detected coordinates. Each of the outputs from the speaker unit branched and sent audio sources from the audio interface. The type of sound and sound pressure of each audio source were controlled depending on the position of the user. In this way, this system made the listener feel that there was a sound field when the listener moved within the target space by changing the type of sound or loudness according to the position of the listener.

3.3 An Experiment to Investigate the Effectiveness of the AR System for Generating Virtual Sound Fields in a Space

In order to test whether our proposed system generated several virtual sound fields in which different sounds were provided to individuals sharing the space, the system’s ability to generate acoustic fields at a narrow target position was evaluated. In addition, we evaluated how the generated acoustics were localized in order to investigate whether this system could generate high locality of an acoustic field at the target position.

Figure 2 shows the environment of the performance evaluation. The target area for generating the sound fields was set to be 200 cm2 between the speaker units. In this experiment, sound pressure was measured at the 35 points where it was picked up vertically and horizontally at 50-cm intervals in the target area. Hereafter, these 35 points are called measuring points. In addition, ultrasound was irradiated toward 9 points where it was picked up vertically and horizontally at 100-cm intervals within the 35 points. Hereafter, these 9 points are called target points. When the sound was output to each irradiating point, the sound pressure was measured at each of the measuring points. Through this procedure, the system was examined according to whether sound pressure increased locally only at each target point. Table 1 shows the equipment and conditions adopted in this experiment.

Fig. 2.
figure 2

Experimental setup

Table 1. Equipment and condition adopted in this experiment

3.4 Results and Discussion

Figure 3 shows the sound pressure level distribution at each of the 9 target points through linear interpolating of the values of the sound pressures at each of 35 measuring points. The average sound pressures at 9 target points were 55 dB.

Fig. 3.
figure 3

Sound pressure level distribution during sound output to each target point using the constructed acoustic AR system

The average at a position 100 cm away from each target point was 47 dB, which dropped by an average of 8.0 dB. According to the index about acoustic perception, sound at about 55 dB is considered loud and annoying, and sound at about 45 dB can be heard but it is not bothersome [28]. Depending on this index, our system could present different sounds to each individual when the individuals were at a distance of 100 cm.

The change in sound pressure in the x-direction in the experimental area was slow compared to the change in the y-direction. Thus, it is thought that the presented sound wave passed target points adjacent to the x-direction of the experimental area due to the smallness of the incident angle of the presented sound waves to the x-y plane. Setting the speakers to a higher position would solve this problem.

The maximum difference between the sound pressures of each target point was 4.7 dB. This result indicated that variations of sound pressures fell within the value of the target points anywhere. Based on Figure 5, possible reasons for this varying of the sound pressures were the decay distance of the parametric speakers and the control of the direction of the misalignment of the camera platform. The former case was issued by placing the parametric speakers so that the distances between each of the speakers did not differ much. With the camera platform, more accurate calibration or controlling the angle helps with the latter case.

The results suggested the effectiveness of our system in generating a virtual sound field.

4 Inducing Human Behavior with the Acoustic AR System

4.1 Experimental Hypothesis, Settings, and Procedures

We investigated whether human behavior was induced by dividing a space into comfortable/uncomfortable areas with an acoustic AR system. In this experiment, the system divided a space into a high sound pressure/low sound pressure area. It makes humans in the space feel that these two areas are different places. Through psychological effects of the sound that differently affect different people, sound is classified as pleasant or unpleasant [8, 29]. The difference between pleasant/unpleasant depending on the presented sound also affects human behavior and behavioral cognition [30, 31]. These findings suggest that a high sound pressure area with a pleasant sound and a low sound pressure area with an unpleasant sound were considered relatively comfortable. We hypothesized that humans stay in these comfortable areas longer than in the other areas.

Figure 4 shows the environment of this experiment. A space in front of four pictures, which were projected by a single focus projector, is set as the area of movement of a participant. The details of the four pictures are described later. The area of movement for the participant was divided into 7 areas (hereafter referred to as point-areas) every 50 cm in the horizontal direction. The time that the participant stayed in each area was measured.

Fig. 4.
figure 4

Experimental environment for inducing the position of participants (Left: Conditions for sound fields, Right: Experimental environment)

This experiment used the two following types of sounds to induce the position of the participant: White noise as a sound, which would be perceived as unpleasant (hereafter referred to as WN conditions) and jazz music, which was considered a pleasant sound (hereafter referred to as BGM conditions). In addition, a trial of silence was set up for comparison of the effects of the two types of sound (hereafter referred to as S conditions). Table 2 shows the conditions of each sound.

Table 2. Experimental conditions for sounds used in the experiment

Figure 4 shows the type of generated sound fields that were used in the experiment.

In order to make sure no participants determined the purpose of this experiment, the participants were asked to watch four pictures of tarot cards and choose their favorite picture and second favorite picture as a dummy task. The task was set because repeating more than one trial would cause the participants to move in a narrow space for the experiment without a sense of question. Sixty-eight types of tarot cards were used in this experiment to disperse the preference for pictures that have quality that can be appreciated.

Each participant underwent 6 trials in the WN condition (2 trials on each sound field distribution shown in Figure 4) 6 trials in the BGM condition (2 trials on each sound field distribution shown in Figure 4), and 3 trials in the S conditions, for a total of 15 trials of the above task. Each trial was 1-minute long, and the time that the participant stayed in each area was measured. After the end of 15 trials, the participants were asked to listen to the noise and jazz music that were used in this experiment once again with headphones, and they rated the comfort level of the sounds with a 7-point Likert scale. In addition, they were asked whether they noticed a difference in each trial. The participants included 9 males and 1 female. All of them were in their twenties.

4.2 Results

We investigated the effects on inducing human behavior at each acoustic space in the WN and BGM conditions. We calculated the average time that a participant stayed at a low sound pressure area in 6 trials of the WN conditions and of the BGM conditions, and the average time they stayed at point-area 3, point-area 4, and point-area 5 of all 3 trials in the S condition. The average times of each participant in each of the calculated conditions were further averaged over all of the participants (Fig. 5.)

Fig. 5.
figure 5

The average staying time at each low sound pressure area in each sound condition

The average staying times in each of the conditions were as follows: 35.8 s (WN conditions), 25.7 s (BGM conditions), and 26.5 s (S conditions). t-Tests showed a marginally significant difference between average staying time under WN conditions and S conditions (p < 0.10.) There was no significant difference between the BGM and S conditions (p = 0.31.)

The participants evaluated that the white noise that was used in the WN conditions was rather uncomfortable (average score was 2.8), and the jazz music that was used in the BGM conditions was rather comfortable (average score was 5.3). According to the questionnaire after all the trials, 7 participants noticed a difference in the sound pressure and 2 participants felt like the sound pressure was different. In addition, 1 participant moved to the low sound pressure area while watching the presented pictures.

4.3 Discussion

This study showed that the time spent in the low sound pressure areas increased in the WN condition. Because white noise was evaluated as an uncomfortable sound, these results were consistent with the hypothesis that the participants felt pleasant feelings in areas of low sound pressure when uncomfortable sound occurred, and they stayed there longer.

However, the results showed that the average time participants stayed in the low sound pressure area decreased about 1 s in the BGM condition. Because the jazz music was evaluated as a comfortable sound, these results were consistent with the hypothesis that participants stayed at low sound pressure areas a shorter amount of time when comfortable sounds occurred. However, there was no significant difference in the staying time. This result might be because the difference in the presented sound pressure was not enough to lose the pleasant feelings evoked by the comfortable BGM.

As for a method to induce behavior with comfortable sounds, selecting the type of sound according to the context of the user’s experience is important. At this time, the sound does not relate to the place or the setting of the experiment. Specific sounds are preferred in a certain context. For example, loud techno would be preferred in a discotheque. In contrast, calming classical music would be popular at expensive restaurants. Similarly, whether the sound being heard is in line with the context of the experience or not could influence the feelings of comfort.

Moreover, we attempted to induce human behavior with only binary sound pressure that differed vastly from each other. However, high sound pressure does not always produce feelings of comfort as in the above-mentioned examples. It is possible that separating the sound pressure of the presented sound into several steps could induce the user to the area where the appropriate sound pressure is located.

The size of the sound fields could also affect the induction of human behavior. Because the width of each area for the participants’ movements was only 50 cm, there was a possibility that they passed soon without noticing changes in sound pressure at each low sound pressure area compared to the other area, depending on their walking pace. Therefore, the difference in the effects of inducing human behavior in our proposed method due to the size of acoustic field needs to be investigated.

5 Conclusion

In this paper, we proposed a method to induce human behavior without interpreting presented information by dividing a space into a comfortable area and an uncomfortable area with acoustic field generation techniques.

First, we created a nonwearable system that presented sound locally and created acoustic fields virtually by using multiple directional loudspeakers. We tested whether the system generated acoustic fields at narrow target positions. In an attempt to generate and control the right sound field at each location we also examined how the generated acoustic field was localized by comparing acoustic pressure measures between the target position and other positions. This test showed the following: Sound pressure decreases by 8.0 dB on average at a distance of 100 cm from the location where the sound is presented and the dispersion of sound pressure fell within a 200-cm square. These results suggested that the system was able to present sound locally with an accurate output of acoustic pressure.

Next, we investigated whether dividing a space into comfortable/uncomfortable areas with the acoustic AR system induced human behavior. As a result, the time that participants stayed in the low sound pressure area increased when an uncomfortable sound was presented. This suggested that our system could change the behavior of visitors and increase their sojourn time at a target position. Although the effect of inducing behavior was not seen when a comfortable sound was presented, human behavior could possibly be induced by considering the condition of the acoustic field or a relationship between the experience of the users and sound.

Although this study intended to use a confined narrow area as the target area, our proposed method has the potential to induce not only the position of visitors but also their direction and trajectory. In addition, it is thought that our proposed methods would change the distance between people interactively by adapting our method for multiple users. In the future, we will examine whether the system is capable of inducing human behavior in a highly public area, such as an exhibition space, by constructing acoustic fields that suit the exhibition content.