1 Introduction

Traditional human–computer interaction (HCI) studies have considered the users as rational individuals who can discard all their emotional selves to interact efficiently and rationally with computers. However, more and more psychology and technology researchers have held a different view that emotion plays a critical role in every human–computer-related activity [1,2,3]. Nevertheless, not only can HCI itself induce users’ different emotional states, but also environmental characteristics can convey emotional information (e.g., pleasant sounds make users feel pleasant). Researchers have found that emotional information from the surrounding environment can affect our basic cognitive activities, such as early visual perception [4], and high-level cognitive processing, such as problem solving [5], sentence understanding [6], purchase decisions [7]. The extensive influences of emotional information on cognitive activities imply that our common HCI behaviors can also be affected by emotion, because HCI activities involve different kinds of simple and complex cognitive processes. Taking web searching as an example, when individuals search for information online, they first have to detect the information, identify it, and then select the related information. Researchers have found that information selection results can be largely affected by the emotional contexts [8]. For example, under negative emotional contexts individuals were more likely to perceive the negative information [9]. Nevertheless, it is still unclear whether such modulation effect could occur in the initial stage (e.g. information detection) of the searching process. Hence, in the present study we adopted three cognitive experiments to test how emotional information from the surrounding environment affecting information detection performances.

2 Related Research

2.1 Emotion Classification

Emotions are the response mechanism by which individuals give proper answers to external inputs and internal changes. Some researchers believe there are six basic emotions—fear, anger, sadness, joy, disgust, and surprise—that are shared by all humans [10]. Psychologists and neuroscientists have explored many methods to test whether different types of emotion, such as fear or happiness, are processed identically, but the results are rather inconsistent [11,12,13]. Some studies found that negative information, e.g., fear or anger, receives better processing than neutral or happy emotions [11, 14], while others supported the superiority of happiness [12].

In contrast to basic-emotion classification, researchers have described emotions with different dimensions or scales, such as valence and arousal [15]. Emotional valence refers to the extent to which the stimuli can be considered as positive or negative, and arousal corresponds to the intensity of the stimuli. Generally, emotional valence decides the direction of human behavior either going forward or withdrawing, and the arousal level may strengthen that behavior. Researchers have proposed that emotional processing is a bimodal process [16], and sounds and pictures can evoke the same emotional responses. Therefore, in the present study, whether emotional contexts created by sounds or pictures can have the same influences on the information-detecting process was tested.

2.2 Emotion Affects Information Processing

Emotion affects information processing in myriad ways. Researchers found that emotion information itself can affect the time we get access to it [17, 18]. For example, researchers found that emotional information with a highly negative valence, e.g., a fear face, can be detected more quickly than information with neutral or positive valence [18]. Similar findings were found when individuals were required to search for a fear face from a number of neutral or happy faces [13]. Researchers have explained the above phenomenon with a threat superiority theory, according to which threat-related information is important for surviving and, thus, has the priority to get processed. Moreover, studies have further indicated that an emotional stimulus can affect the processing of information presented shortly afterward [4]. For example, researchers applied a rapid serial visual presentation (RSVP) paradigm and found that, when an emotional stimulus was presented as a distractor, the target that followed the emotional stimulus could hardly be detected [19]. Such impairments in information processing performance were also found in information identification or recognition tasks [20, 21] and even existed when the emotional stimulus had no relationship to the task goals [19]. The reason researchers gave for such impairments was that not enough attention could be allocated to subsequently presented targets, because attention were first captured by the emotional stimuli presented before. However, how emotional contexts affect information detection when visual information is presented briefly and without much attention required has received little attention.

On the other hand, researchers have also found that emotional contexts modulating subsequent task performance were largely affected by the emotional valence of the contexts. For example, individuals under negative emotional contexts (e.g. under a gossip condition) but not neutral contexts are more easily to detect the negative information [9]. Nevertheless, research about emotional information acting on visual perception produced inconsistent results. Some researchers found that neutral information can facilitate a subsequent visual perception task [22], while others supported the notion that fear or happiness emotions can boost the subsequent visual processing task [4, 23]. These contradictory findings reveal the need to test how different emotional contexts, e.g., neutral or fear, act on visual detection.

2.3 Present Study

Here, the aim was to test how environmental characteristics (i.e., emotional contexts) exert influences on our common HCI behavior, e.g., information detection. In the present study, three psychological experiments were conducted to examine how emotional contexts affect subsequent detection of briefly presented visual information, e.g., face images. For all experiments, after exposure to the emotional contexts created by ambient sounds or pictures, participants were required to detect a face image presented briefly with colorful patterns following as mask stimuli. Two questions were raised.

  1. (1)

    Can emotional contexts affect our face detection performances when emotional contexts are not related to the task goals?

  2. (2)

    Can emotional contexts created by ambient sounds or pictures play the same role in detection performances?

  3. (3)

    Can different emotional contexts e.g. neutral or fear make same influences on detection performances?

3 Method

3.1 Participants

A total of 30 participants (mean age = 24.34 years) took part in the study. 10 participants (7 female, 3 male) were engaged in Experiment 1, 10 (8 female, 2 male) participated in Experiment 2, 10 (6 female, 4 male) participated in Experiment 3. None declared any hearing impairments and all had normal or corrected vision. All participants were naive to the purpose of the experiments and were paid after they finished the experiments.

3.2 Stimuli

All stimuli were presented using MATLAB (The MathWorks, Natick, MA) together with the Psychophysics Toolbox extensions. Face stimuli used in the first three experiments were photographs of 8 actors (4 male and 4 female) with the most recognized expressions selected from the NimStim face-stimulus set. Both neutral and fear expressions were selected in Experiment 1 and only neutral expressions were selected in Experiment 2 and 3. All hair and nonfacial features were removed, and only the central face area (1.53° × 1.86°) was left. The face image was spaced at a random position along an imaginary ring 2° from the fixation. Sound stimuli used in Experiment 1 were 12 human voices speaking the vowel \a\ to express fear or neutral emotions. Sound stimuli used in Experiment 2 were 12 digital music samples from the Chinese sound database with fear or neutral emotions. All the sounds were presented with a sample rate of 44.10 kHz. 8 pictures selected from the International Affective Picture System (IAPS) were used as emotional stimuli in Experiment 3. The visual stimuli used for masking the face images were scrambled colorful patterns with a size of 6° × 6°. All stimuli were presented against a gray background at a viewing distance of 58.5 cm.

3.3 Procedure and Data Analysis

Experiment 1.

Each trial began with a human voice lasting 600–800 ms, and participants were just required to listen passively to the sound and view the central fixation mark (0.98° × 0.98°). And 100 s after the sound, a face image was presented for 15 ms, and then a mask stimulus was presented for 150 ms. When the mask disappeared, the central fixation mark was presented on the screen and observers were required to make a response with a left arrow or right arrow key to indicate whether a human face was presented before the mask. Observers had to make the response within 1500 ms. The intertrial interval was 800–1000 ms (see Fig. 1).

Fig. 1.
figure 1

Illustration of stimuli sequence in Experiment 1

Each participant finished 192 trials, which comprised 48 trials in each of four conditions. These conditions were created by crossing the emotional valence of the human voice (fearful vs. neutral) and the emotional valence of the detected face (fearful vs. neutral). For each condition, 24 trials were used as catch trials and no human face images but scrambled face images were presented for 15 ms. All test trials were presented in a new random order for each participant.

Experiment 2.

The procedure in Experiment 2 was the same as that in Experiment 1, except that digital music rather than the human voices were used as the sound stimuli. Moreover, only neutral faces were presented. In Experiment 2, participants had to complete 96 trials. Half of the trials were presented with fear sounds and the other half were presented with neutral sounds.

Experiment 3.

The procedure for Experiment 3 was the same as that for Experiment 1, except that not human voices but emotional pictures were provided as the attended stimuli. As in Experiment 2, only neutral faces were presented. Participants also had to finish 96 trials. Half of the trials were presented with fearful pictures and the other half were presented with neutral pictures. In both Experiment 2 and Experiment 3, the number of catch trials for each condition was half as that in Experiment 1.

For all the experiments, observers’ response times and sensitivity (\( {\text{A}}^{{\prime }} \)) were collected for further data analysis. \( {\text{A}}^{{\prime }} \) was calculated based on observers’ hit rate (H) and false alarm (F) data for each test condition based on the equation as below [24].

$$ {\text{A}}^{{\prime }} = .5 + \left( {{\text{H}} - {\text{F}})(1 + {\text{H}} - {\text{F}}} \right)/[4{\text{H}}(1 - {\text{F}})] $$
(1)

After finishing the experiment task, participants had to make a 9-point rating of the emotional sounds (Experiment 1 and 2) or pictures (Experiment 3) to evaluate their emotional valence and arousal levels. In the ratings, 1 represents the most negative emotion, while 9 represents the most positive emotion; 1 represents the least arousal level and 9 represents the highest arousal level.

4 Results

In Experiment 1, \( {\text{A}}^{{\prime }} \) were entered into a 2 × 2 two-way repeated measures analysis of variance with the emotional valence of the face (fearful vs. neutral) and the human voice (fearful vs. neutral) as within-subjects factors. Results revealed a significant interaction effect, F (1, 9) = 8.30, p = .018, ηp2 = .48. Simple effect further showed that, when neutral faces were masked by colorful patterns, observers’ sensitivity was enhanced when fearful voices were presented compared with when neutral voices were presented, t(9) = 3.46, p = .007, \( {\text{d}}^{{\prime }} \) = 1.10. However, when fear faces were masked by colorful patterns, observers’ sensitivities were not different between the condition when fearful sounds were presented and that when neutral sounds were presented, t(9) = .46, p = .653. Furthermore, we also conducted the Pearson’s correlations between the differences in sensitivity of neutral faces (fearful sounds presented condition minus neutral sounds presented condition) and the rating differences of emotional valence (fearful sounds minus neutral sounds). The results revealed a marginal significant correlation, r = −.56, p = .092, suggesting that the larger differences in the emotional valence ratings, observers’ sensitivity for neutral faces followed by fear voices were much smaller than those followed by neutral voices. Similar Pearson’s correlations were conducted with sensitivity differences for neutral faces and the rating differences of arousal levels, and no significant correlation was found, r = .23, p = .415. When response times were entered into the above analysis, no main effects or interaction effects were found, all ps > .05 (see Table 1). The results found in Experiment 1 implied that only the neutral face detection performances could be modulated by the emotional contexts, e.g., fear human voices. Therefore, in Experiment 2 and 3, whether different emotional contexts (e.g., created by digital music or pictures) can have identical influences on neutral face detection as human voices was examined.

Table 1. Mean RTs and A’ results across three experiments

In Experiment 2, observers’ response times and \( {\text{A}}^{{\prime }} \) were compared between the fear sounds presented condition and those of the neutral sounds presented condition, and no significant differences were found, t(9) = .42, p = .684 for response times, t(9) = .68, p = .513 for \( {\text{A}}^{{\prime }} \). Although observers’ subjective rating differences in emotional valence (neutral sounds minus fearful sounds) for digital music were significantly larger than those of human voices, p = .013, it still was not possible to find an emotional modulation effect on response times or sensitivities of neutral faces.

In Experiment 3, detection times for neutral faces followed the fearful pictures were not different from those followed the neutral pictures, t(9) = −.81, p = .439. Similarly, observers’ sensitivities for neutral faces were not different, t(9) = .34, p = .741. Again, observers’ subjective rating differences in emotional valence (neutral stimuli minus fearful stimuli) for emotional pictures were significantly larger than human voices, p < .001, however, we still failed to find an emotional modulation effect on observers’ detection performances of neutral faces.

5 Discussion

Previous studies have suggested that threat information as its significance for survive has privilege to get processing [13, 14, 18]. In this study, we found that emotional contexts created by ambient sounds could affect visual detection of face images presented shortly after the sounds. And this modulation effect was specific to neutral but not to fear face images. Moreover, such modulation effect existed only when human voices but not digital music were provided, suggesting a stimulus specificity of emotional modulation on face detection. Our study extended previous findings about emotional contexts acting on simple feature detection [4] and further indicated that emotional contexts could affect detection performances of complex stimuli (e.g., face images), even when the emotional contexts had no direct relationship with the current goals.

In the present study, we found that emotional contexts, especially the ambient sounds with fear emotions, could enhance but not impair neutral face detection sensitivity. The possible reason for the above findings was that the face presentation time was too short and participants did not have enough time to pay much attention to the masked face. Thus, whether attention was captured by emotional stimuli or not could not affect subsequent visual processing. Furthermore, neuroimaging studies have found that, when one processes emotional information, especially threat-related information, the amygdala (a brain area responsible for emotion processing) could be activated automatically [24]. Studies have further found that the amygdala has bidirectional connections with visual processing areas, such as face processing areas, e.g., fusiform areas [25]. Thus fear ambient sounds activate the amygdala, which sends neurofeedback to the face processing areas, and make the neurological representations of the faces enhanced. However, sensitivity enhancement for fear faces was not found in Experiment 1, which is possibly because individuals typically have better capability in detecting fear information because of the significance of fear information for our survival [13, 14, 18]. Therefore, it is possible that fear face detection performances cannot be further enhanced by emotional contexts, e.g., fear ambient sounds.

Previous studies have suggested that each emotional stimulus can be described in terms of valence and arousal [15]. Therefore, it is also desirable to determine whether emotional influences on face detection are largely affected by valence types or arousal levels. Studies have found that emotional information modulating information processing depended greatly on the arousal levels rather than valence types [26]. For example, individuals’ visual search efficiency was better for a happy face with a high arousal level than the counterpart of an angry face [12], or emotional stimuli with different arousal levels can affect individuals’ simulated driving performance, e.g., braking times [27]. However, contrary to previous findings, a marginal correlation was found between face detection sensitivity and individuals’ subjective evaluation in emotional valence. Additionally, there are still a small number of studies supporting the emotional valence modulation on subsequent visual performances [4, 28]. For example, individuals’ hazard perception can be impaired when threat-related stimuli rather than neutral stimuli are presented beforehand [22]. Although the emotional modulating on visual perception found in this study was different from that in the previous studies, it still can be concluded that emotional valence can affect subsequent visual perception. Nevertheless, arousal levels for fear stimuli were generally larger than neutral ones, so one cannot detach the role of arousal engaged in modulating visual perception. More research is needed to test more types of emotional valence (e.g., fear, happiness, and anger) on subsequent visual perception performances.

Emotional meaning can be conveyed by different kinds of stimuli such as speaking voice, affective speech melody, environment sounds, and visual scenes. Even though individuals’ emotional states can be induced by different types of emotional stimuli but brain areas responsible for these emotional stimuli are quite different [16]. On the other hand, human voices and faces are important for human emotional understanding and social communications, and were found have overlapped or directly linked brain areas responsible for them [29, 30]. Such close links in brain regions responsible for human voices and faces could result in the human voice’s effect on face detection being different from that of digital music. So, in the present study, only human voices, rather than digital music, were found to modulate face detection processing.

Researchers have proposed an implicit HCI concept and suggested that not only the behaviors of users but also the surrounding environment can convey information that is important for HCI activity [31]. For example, when the output displays of a smart phone are designed, designers should consider the current environment in which users are involved. If the user is in a dynamic environment, it is better to present output such as characters with a big font size, while, in a static environment, the font size of characters should be small to present more information. Emotion information is also a kind of environmental information, and one cannot ignore the influences of emotional contexts on HCI behaviors. More importantly, HCI activity sometimes can be significantly impaired under some emotional contexts, such as threat alerting. It is typically considered that threat-related information acts on HCI behavior just like the flight-or-fight response under stress, so that moderate intensity of fear stimuli may not impair but boost human behavior. Thus, an enhancement in face detection performance could be found in a fear-related emotional context.

6 Conclusions

Emotional contexts created by fear ambient sounds could enhance neutral but not fearful face detection performances. Such a modulation effect was found only when human voices were used as ambient sounds. The findings again show that one cannot ignore the influences of environmental characteristics (e.g., emotional contexts) on common HCI behavior, e.g., information detection.