1 Introduction

Rapid urbanization and the densely populated urban environment of modern society have led to increased stress and negative health outcomes for urban dwellers (Peen et al. 2010). In response to these issues, there is a growing demand for improving health and happiness in urban life, prompting a re-examination of the importance of contact with nature (Hartig et al. 2014; White et al. 2019). The beneficial impact of natural environments on health has been widely studied under the concepts of “natural healing” or “natural restoration.” Numerous studies have demonstrated the positive impact of exposure to nature on mental and physical health (Corazon et al. 2019; Ratcliffe 2021; Wilkie and Davinson 2021).

In these studies, the Perceived Restorativeness Scale (PRS) developed by Hartig et al. (1997) is frequently used to measure the restorative qualities of natural environments, along with the Perceived Restorativeness Soundscape Scale (PRSS) proposed by Payne (2013), which is particularly relevant to soundscape research. These scales have proven effective in quantifying the contributions of natural and sound environments to psychological restoration, making them highly pertinent to our research. However, experiencing the actual natural environment requires overcoming physical limitations such as time, space, and climate (Keniger et al. 2013). Consequently, not all individuals have the opportunity for adequate exposure to nature. In controlled environments such as hospitals and prisons, contact with nature is limited or impossible, restricting the healing benefits of natural environments (White et al. 2019). In this context, virtual natural exposure can provide benefits related to mental health and well-being to those who cannot access the outdoors (Berto 2014). With the anticipated rise in demand for nature-based preventive healthcare and treatment solutions, the virtual natural environment offers a potential solution for stress relief and other health issues (Nukarinen et al. 2022).

The development of immersive virtual reality (VR) technology using head-mounted displays (HMDs) has revealed new possibilities for replicating interactions with natural environments. Research suggests that greater perceived immersion enhances positive experiences in natural environments (e.g., forests or urban parks) and contributes to psychophysiological restoration (Bohil et al. 2011). VR has the potential to offer those without direct access to nature the opportunity to experience its beauty and healing effects, serving as a space for mental and emotional restoration. In addition, VR natural environments, free from physical constraints, can overcome limitations of time, space, and climate (Dascal et al. 2017). This allows urban residents to experience the tranquility of nature in a comfortable and safe environment, relieving urban stress and fostering positive emotions (Kothgassner et al. 2019; Chirico and Gaggioli 2019). Physiologically, virtual natural environments have been shown to reduce stress, as evidenced by heart rate variability (HRV) indicators (Blum et al. 2019; Rockstroh et al. 2020; Kim et al. 2021; Weibel et al. 2023).

To utilize the virtual environment as a restoration space, it is essential to design an effective acoustic environment. The concept of “soundscape” is relevant here (Schafer 1993; Kang and Schulte-Fortkamp 2016), as it interprets the sound environment in terms of human perception, using sound as a resource to satisfy human needs and desires in a sound environment design. In terms of psychophysiology, positive soundscapes have been shown to enhance positive emotions and induce changes in physiological responses (Annerstedt et al. 2013; Medvedev et al. 2015; Li and Kang 2019; Shu and Ma 2020; Li et al. 2021). Some studies suggest that restoration is possible not only in nature but also in urban environments through subjective responses to soundscapes (Karmanov and Hamel 2008; Jo and Jeon. 2021).

For example, Deng et al. (2020) examined audiovisual elements in a laboratory setting and found that auditory stimuli, such as the sounds of natural water, shrubs, flowers, and flying birds, were the most mentally restorative and preferred experiences. Jeon et al. (2010) evaluated water sounds as positive factors in acoustic environments, while Southon et al. (2017) found that bird sounds contributed to positive perceptions of landscapes. However, research on audiovisual interactions to design and implement virtual natural environments as psychophysiological restoration spaces is still insufficient, and interpretation at the perception model level is also lacking.

Accurately constructing the soundscape of virtual natural environments by understanding its impact on user experiences and examining the effects of soundscape perception is crucial for promoting the psychophysiological restoration of users. Some studies have indicated the need for a deeper understanding and evaluation of the qualitative aspects of virtual natural environments (Aletta et al. 2016). In particular, there is growing research interest in evaluating how the virtual representation of nature is perceived within virtual reality (Depledge et al. 2011).

Overall, most studies have primarily focused on measuring users’ subjective experiences and satisfaction, overlooking the evaluation of naturalness based on perceived sensory dimensions and their association with actual physiological responses. This research gap obstructs the accurate evaluation and effective utilization of VR natural environments. Therefore, there is a need to emphasize the evaluation of VR natural environment perception based on audiovisual naturalness. Grahn and Stigsdotter (2010) utilized the Perceived Sensory Dimension (PSD) scale comprising eight items to investigate the relationship between perceptions of urban green spaces and stress restoration. Their study analyzed the impact of each PSD dimension on users’ stress restoration, offering important insights into how the design of green spaces influences mental health. Similarly, Stigsdotter et al. (2017) employed the PSD scale to examine how forest design can improve mental health.

Xiang et al. (2022) conducted field surveys to examine the relationship between the difference between PSD and Perceived Affective Quality (PAQ) and satisfaction in urban green spaces. They derived a satisfaction model based on perception and presented a spatial design plan. These studies have greatly improved the understanding of the impact of environmental sensory factors on mental health by applying the PSD framework. However, most of these studies have focused on real environments, and research on PSD perception in VR environments has been relatively limited. In addition, there is insufficient research on how PSD perceptions actually influence users’ physical responses. Therefore, more comprehensive research is needed to identify the relationship between cognitive and emotional responses in VR environments and the resulting physical and mental responses using the PSD framework.

To bridge this research gap, the current study aims to investigate the impact of audiovisual elements on natural and perceptual responses and their influence on psychophysiological restoration in humans. The primary objective is to design and build a virtual natural environment that can serve as a restoration space and propose a restoration model to facilitate direct application in the design process. Consequently, the research objectives of this study are as follows:

RO1

Investigate the differences in psychophysiological responses according to the audiovisual elements of soundscape and PSD in the virtual natural environment.

RO2

Explore the nature and extent of the relationship between environmental perception and psychophysiological response.

RO3

Identify perceived factors related to naturalness that contribute to differences in psychophysiological responses across natural environments.

2 Methods

2.1 Subjects

This study focused on students residing in Seoul who were enrolled in universities or graduate schools in the city. The goal was to minimize data variation in subjective and psychophysiological responses by controlling for age, occupation, and socio-cultural factors. The age range of the subjects was between 21 and 30 years (mean age: 23.2 ± 2.6 years). Subjects were recruited through advertisements posted on school bulletin boards and university-specific social network services. The recruitment notice targeted subjects with mild levels of depression, stress, and anxiety to examine the impact of environmental experiences on psychophysiological restoration. Accordingly, subjects with severe levels of depression, stress, and anxiety requiring medical treatment were excluded. Although 46 subjects initially participated, 42 subjects were selected on the day of evaluation through a screening process based on health status (e.g., sleep deprivation, fatigue, personal health issues) and the exclusion of outliers in response data (e.g., those experiencing stress stimulation or motion sickness during the VR experience).

2.2 Audiovisual stimuli

The study utilized four 3D-modeled visual stimuli representing accessible natural environments: green, lake, stream, and waterfall areas. The Nature Manufacture’s Forest Environment-Dynamic Nature 3D model was employed, and High-Definition Render Pipeline (HDRP) settings were used to enhance immersion and realism. All virtual reality forest environments featured a clear sky with the same number of clouds and identical terrain.

Figure 1a illustrates the views and detailed visual characteristics observed by the subjects at each point. Visual parameters were calculated to quantitatively express the amount of vegetation (wood, grass, etc.), waterfront (lake, stream, waterfall), and sky elements in the virtual natural environment using a web-based tool called the Image Color Summarizer (Krzywinski n.d.). The color ratio was determined by capturing the main point of view from the subject’s perspective (as shown in Appendix B, Fig. 1), removing the black shading using Photoshop, and manually designating the green, sky, and water areas into three layers.

Luminance was calculated based on the standard formula “0.299 × red + 0.587 × green + 0.114 × blue” (ITU 2011). As for the color ratio, the green ratio was highest in the green area, while the water ratio was highest in the waterfall area. The lake and stream areas demonstrated similar color ratio distribution patterns. Luminance ranged between 73.4 and 77.0 Cd/m2, with minimal difference observed across locations.

Fig. 1
figure 1

a Evaluation locations showcasing various forest characteristics and visual ratios. G, S, and W represent the visual ratios of green, sky, and water areas, respectively, while L denotes luminance. b Mel-spectrograms of the five sound source types used in the study

In terms of the sound sources, 16 combinations were created by pairing three types of water sounds with the sounds of birds and insects typically found in forests, tailored to the type of virtual forest environment. Recordings of real environments were used for these sound sources. For bird sounds, recordings of the Eurasian Blackcap, which inhabits both urban and natural areas in Korea, were utilized. The insect sounds featured a combination of at least three different types of insects, including cicadas, katydids, and mole crickets. Based on previous studies and reports (Morinaga et al. 2003; Resource Systems Group, Inc. 2013; Hao et al. 2016), the equivalent continuous sound pressure level (LAeq) for lake, stream, and waterfall sources was set at 50, 55, and 60 dBA, respectively, while bird and insect sounds were set at 50 dBA.

Table 1 presents values analyzed using Artemis version 13.1 software (HEAD acoustics) for sound sources re-recorded through a head and torso simulator (Type 4100, Brüel and Kjær) within a Unity environment, accounting for directional, distance, and spatial components. In Table 1, “Site” refers to the location (green, lake, stream, waterfall area) followed by a number indicating the biophony: 1 (no species), 2 (insects), 3 (birds), and 4 (birds and insects). The parameters in Table 1 include LAeq (A-weighted equivalent continuous sound pressure level), LCeq−Aeq (difference between C-weighted and A-weighted sound pressure levels, used to evaluate low-frequency components), and LA10−A90 (difference between the 10th and 90th percentile sound pressure levels, indicating sound source variability).

Loudness refers to the subjective perception of sound intensity, while sharpness measures the distribution of low and high frequencies based on specific loudness values, influenced by high-frequency components in the acoustic environment. Fluctuation strength measures the degree of sound fluctuation, reflecting how the ear perceives changes in sound intensity (Zwicker and Fastl 2013). Roughness evaluates the perceived irregularity or roughness in a sound, often caused by rapid fluctuations in sound pressure, typically occurring at higher frequencies. Tonality assesses the musicality and high-frequency components of the sound (Moore et al. 2012). The analysis indicates that the flow rate of water sounds in the proposed virtual natural environment is higher. In addition, the inclusion of biophony sound sources increases LA10−A90, loudness, and tonality, while decreasing LCeq−Aeq. Among the various biophony sound sources, birdsong in particular was found to increase sharpness. These physical indicators confirm that the combination of audiovisual elements facilitates a detailed examination of the audiovisual characteristics in various virtual natural environments.

Table 1 Mean values of acoustic parameters for each evaluation area over three minutes

In this study, the VR environments were created using Unity 3D software. Visual information was provided through a head-mounted display (HTC VIVE Pro2, HTC) with a resolution of 2448 × 2448 pixels and a 120-degree horizontal viewing angle. For sound information, real-time adjustments were made to the direction of the sound according to the subject’s head rotation using the Oculus Desktop Audio function (Meta), integrated through Package Manager in the Unity 3D software and the head-tracker embedded in the HMD. Sound was delivered to the subjects using open-type headphones (HD-650, Sennheiser).

2.3 Questionnaires

The questionnaire administered before starting the experiment consisted of six parts: (1) demographic information (gender and age), (2) the Perceived Stress Scale (PSS; 10 items, 5-point Likert scale, 0–4 points) (Cohen et al. 1994), (3) the State-Trait Anxiety Inventory-Form Y (STAI-Y; 20 items, 4-point Likert scale, 0–3 points), (4) the Patient Health Questionnaire (PHQ-9; 9 items, 4-point Likert scale, 0–3 points), (5) the World Health Organization’s Quality of Life Questionnaire (WHOQOL-BREF; 26 items, 5-point Likert scale, 1–5 points) (Min et al. 2000), and (6) the subjects’ frequency of visiting natural spaces in the past year (5-point Likert scale, 1–5 points).

During the experiment, subjects completed five questionnaires for each stimulus in the following order: Source identification, PAQ, Satisfaction & Immersion, the Short-version Revised Restoration Scale (SRRS) developed by Han (2003), and PSD. Details of each questionnaire are provided in Appendix A, Table 1. The evaluations included: (1) Six items assessing audiovisual dominance for each stimulus, (2) Eight representative affective qualities (pleasant, calm, eventful, vibrant, annoying, chaotic, uneventful, and monotonous) to evaluate natural soundscapes, (3) Visual, auditory, and overall satisfaction evaluations for the virtual natural environment, (4) Psychological restoration response to the natural environment using the SRRS with eight items, and (5) Eight items (social, spatial, natural, refuge, prospect, serene, cultural, and richness in species) based on Grahn and Stigsdotter’s (2010) PSD for the natural environment (Appendix 1, Table 2).

The audiovisual satisfaction and dominance items were included to assess the immersive quality and overall effectiveness of the virtual environments. Evaluating these aspects helps clarify how well the visual and auditory elements work together to create a satisfying experience. The eight representative affective qualities were chosen because they are key dimensions of soundscape perception, providing insights into the emotional responses elicited by the natural soundscapes, which is crucial for designing soundscapes that promote psychological well-being. Axelsson et al.’s (2010) soundscape model suggests pleasantness and eventfulness as the two primary dimensions of a soundscape. Pleasantness is defined as the hedonic value of the sound environment, while eventfulness refers to the diversity of environmental sounds and their changes over time. According to data analysis using method A of ISO 12913-3 (2019), pleasantness and eventfulness can be expressed by the following equations:

$$\:Pleasantness=\left(p-a\right)+cos45^\circ\:\left(ca-ch\right)+cos45^\circ\:\left(v-m\right)$$
(1)
$$\:Eventfulness=\left(e-u\right)+cos45^\circ\:\left(ch-ca\right)+cos45^\circ\:\left(v-m\right)$$
(2)

where \(\:e\) is eventful, \(\:u\) is uneventful, \(\:p\) is pleasant, \(\:a\) is annoying, \(\:\text{c}a\) is calm, \(\:ch\) is chaotic, \(\:\text{v}\) is vibrant, and \(\:\text{m}\) is monotonous.

Visual, auditory, and overall satisfaction evaluations for the virtual natural environment were included to provide a comprehensive understanding of participant satisfaction. This aids in identifying the elements that contribute the most to a positive experience, informing improvements in virtual environment design to enhance restorative benefits. The SRRS, incorporating Kaplan and Kaplan’s (1989) and Ulrich’s (1993) restorative environment theory, offers a broader perspective than other restoration scales and has been validated as reliable and effective (Liu et al. 2022). We used the SRRS survey to evaluate the potential restorative effects of the environment as part of a psychological restoration response. The SRRS index, evaluated on a 9-point scale, averaged eight items.

Table 2 presents the demographic characteristics of the subjects who participated in the experiment. The mental health-related questionnaires assessed their status within the last month, based on the date of the experiment. A significant number of subjects demonstrated moderate stress levels exceeding 13 points. Various personal demographic data were captured through the PSS, STAI-Y, PHQ-9, and WHOQOL-BREF surveys. Notably, approximately 78.6% of the 42 subjects reported infrequent exposure to natural environments, with such experiences occurring less than 1–2 times a month.

Table 2 Subjects’ demographic characteristics

2.4 Procedure

The evaluation procedure and session structure are shown in Fig. 2. Prior to the experiment, a survey was conducted to gather demographic information and examine the psychological state of the subjects. Over two days, subjects experienced 8 out of the total 16 stimuli each day, with each stimulus presented for three minutes. On the first day, 8 stimuli were randomly selected and presented in a randomized order for each participant. The remaining 8 stimuli were presented in a randomized order on the second day. To ensure consistency in the experimental environment, instructions, and data collection procedures (including questionnaires and HRV measurements), all aspects of the experimental procedure, apart from the randomization of stimuli, were kept identical across the two days. This approach ensured that all stimuli were evenly distributed across the two days while minimizing potential biases caused by order effects or familiarity through randomization.

Subjects were advised to get sufficient sleep and rest the day before the experiment and to avoid consuming caffeine, tobacco, or any substance that could affect physiological responses on the day of the experiment. Prior to the experiment, subjects received a written explanation of the purpose of the study and information about the survey items. They were fully informed about the survey and physiological response data collection, and written consent was obtained from all subjects.

A questionnaire related to individual health conditions was administered through a Google survey, and all responses were collected before the start of the experiment.

Fig. 2
figure 2

Experimental process

On both days, psychophysiological responses were investigated as follows. Subjects received a brief training session on the pre-distributed instructions to ensure they understood the questionnaire items. After the explanation, subjects were equipped with HRV measurement hardware and VR devices (HMD and headphones). The normal operation of each device and any signal interference were checked. Subjects were allowed to take breaks freely upon request, without a specific schedule, with the stimulus presentation paused during breaks. Typically, subjects took breaks of approximately 10 min each day.

In this study, HRV was selected and measured as a physiological indicator to investigate the subjects’ physiological restoration responses to different audiovisual environmental experiences. HRV is a widely used measurement tool that provides an objective and reliable evaluation of autonomic neural activity. It reflects periodic fluctuations in heart rate over time, influenced by both internal and external environmental changes.

We administered computerized mental arithmetical tasks (MAT) for three minutes to induce stress in subjects before their VR experience. This method aimed to investigate psychophysiological restoration responses and identify changes in subjects’ stress states, following the approach of Jeon et al. (2023). For instance, subjects were asked to subtract 34 repeatedly from 2000 within five seconds, receiving feedback on their answers. Incorrect or unsolved responses within the time limit resulted in repeated questions, and after 20 consecutive correct answers, a more difficult question was posed to sustain stress. Regarding the randomization process, subjects were instructed to subtract random numbers between 11 and 49, excluding multiples of 5, starting from 2000, maintaining high cognitive load and ensuring stress induction.

Data analysis was conducted using SPSS Statistics (IBM, version 27) and SPSS Amos (IBM, version 29). Shapiro-Wilk and Kolmogorov-Smirnov tests for normality and Levene’s test for equality of variances were performed on all response data. Parametric statistics were applied to data that satisfied normality criteria, while nonparametric statistics were used for non-normal data. Analysis of variation (ANOVA) was performed to determine the statistical significance of differences in physiological restoration effects among the 16 spatial classifications. Spearman’s rank correlation (Spearman’s Rho) was used to examine the relationship between PSD responses to stimuli and the subject’s perception and physiological responses. Structural equation modeling (SEM) was conducted to investigate the effect of naturalness on psychophysiological restoration responses in virtual forest environments. Goodness-of-fit for the SEM was evaluated using the following indices: CMIN/DF (3 or lower), goodness-of-fit index (GFI, 0.9 or higher), standardized root mean square residual (SRMR, 0.08 or lower), root mean square error approximation (RMSEA, 0.08 or lower), and comparative fit index (CFI, 0.9 or higher). Indirect effects within the SEM model were examined using the Sobel test (Sobel 1982) and bootstrap analysis (Preacher and Hayes 2008). The bootstrap analysis was conducted using the maximum likelihood method with 2000 runs and a 95% significance level.

3 Results

3.1 Virtual soundscape characteristics

The perception of soundscapes and landscapes in the virtual natural environment was examined by categorizing them into perceived elements and affective quality. The elements recognized by subjects for each stimulus were divided into visual and auditory categories. Visual elements included birds, sky, vegetation, and water areas, while auditory elements comprised water sounds, bird songs, and insect sounds. The responses to dominant visual and auditory sources in the virtual nature environment are presented in Fig. 3. The dominance percentage represents the proportion of subjects who rated the environmental factors with a score of 3 or more on a 5-point Likert scale (Jeon and Jo 2020). In the case of G1, the auditory elements were not presented; therefore, the dominance of audio sources was excluded from Fig. 3a.

As shown in Fig. 3a, in environments with only one sound stimulus, that particular stimulus was predominantly recognized. When two or more sound stimuli were presented, they were generally recognized equally, but the perception of water sounds increased in the following order: Lake (55%) – Stream (58%) – Waterfall (63%).

Fig. 3
figure 3

Identification of dominant (a) sound sources and (b) visual sources for environmental perceptions in different stimuli

Next, Pearson correlation analysis was conducted to examine the relationship between the identification of visual sources in Fig. 3b and the visual ratio presented in Fig. 1a. In virtual natural environments, a significant correlation was observed between the green ratio and the subjective perception of vegetation (r = 0.33, p < 0.01), the water ratio and the subjective perception of the visual water element (r = 0.84, p < 0.01), and the sky ratio and the subjective perception of the sky element (r = 0.36, p < 0.01). The green ratio refers to the proportion of the green area within a stimulus, calculated using image analysis, while vegetation refers to the subjective perception of the amount of vegetation as assessed through the visual dominance questionnaire. Similarly, the water and sky ratios are objective measurements of their respective areas, and the visual water and sky elements are subjective perceptions. This finding indicates a high consistency between visual identification responses and visual parameters for the entire virtual natural environment, with the degree of visual perception for changes in water from being higher than for the green and sky ratios (Table 3).

Table 3 Pearson correlation between physical parameters and dominance of audiovisual elements for selected stimuli

Pearson correlation analysis was conducted to examine the relationship between the identification of sound sources and the acoustic parameters presented in Table 1. Water sounds exhibited significantly positive correlations with LAeq (r = 0.75, p < 0.01) and loudness (r = 0.73, p < 0.01), indicating their prominence and higher perceived volume. In contrast, water sounds were negatively correlated with LA10−A90 (r =-0.53, p < 0.01) and LCeq−Aeq (r=-0.71, p < 0.01), reflecting their stable and less variable acoustic properties. Additional negative correlations with sharpness (r=-0.48, p < 0.01), fluctuation strength (r=-0.36, p < 0.01), roughness (r=-0.23, p < 0.01) and tonality (r=-0.09, p < 0.05) suggest that water sounds are perceived as smoother, less fluctuating, and less tonal compared to other stimuli. Bird sounds, on the other hand, showed significant positive correlations with LAeq (r = 0.15, p < 0.01) and LA10−A90 (r = 0.51, p < 0.01), indicating their dynamic and variable nature. Furthermore, bird sounds were positively associated with sharpness (r = 0.21, p < 0.01), roughness (r = 0.75, p < 0.01), fluctuation strength (r = 0.57, p < 0.01), and tonality (r = 0.19, p < 0.01), highlighting their textural richness and tonal qualities. Insect sounds demonstrated distinct characteristics, with positive correlations observed for LA10−A90 (r = 0.31, p < 0.01), fluctuation strength (r = 0.35, p < 0.01), roughness (r = 0.45, p < 0.01), and tonality (r = 0.15, p < 0.01), indicating their fluctuating and tonal nature. Negative correlations with LCeq−Aeq (r=-0.15, p < 0.01) and loudness (r=-0.08, p < 0.05) suggest that insect sounds are quieter and less intense compared to other sound sources.

Using Pleasantness–Eventfulness equations, pleasantness and eventfulness were calculated based on the subjects’ responses to the survey regarding affective soundscape qualities. The relationship between eventfulness and pleasantness is typically shown in a Fig. 4 to illustrate the characteristics of the sound environment.

Fig. 4
figure 4

Pleasantness– Eventfulness dimension based on soundscape attributes in 16 combinations of audiovisual stimuli

Pleasantness scores for each audiovisual stimulus ranged from 1.39 to 3.11, while eventfulness scores demonstrated a relatively wider range, from − 5.75 to 1.72 points. Most environmental perceptions, except for S4, W1, W2, W3, and W4, which had positive eventfulness values, were distributed in the Calm area, characterized by negative eventfulness and positive pleasantness values. The S4, W1, W2, W3, and W4 environments were perceived as more eventful due to the relatively high average sound pressure level and loudness of audio elements, corresponding to the Vibrant area. This indicates that eventfulness in the virtual natural environment is recognized over a wider range compared to pleasantness, which is influenced by visual and sound types.

We also compared environmental perception differences according to biophony by location. Since eventfulness and pleasantness did not satisfy normality, we used the Mann-Whitney U test, a nonparametric statistical technique. Table 4 shows the differences in eventfulness and pleasantness according to changes in biophony in the same waterfront environment, suggesting how pleasantness varies with perceived changes in eventfulness within the natural environment.

Table 4 Comparison of locations according to biophony in virtual natural environments

When comparing environments with biophony to those without, both pleasantness and eventfulness significantly increased in the biophony environments, except for the waterfall environment. In particular, bird sounds in G1 and L1 had a significant effect on increasing pleasantness. However, in the waterfall environment, biophony did not affect perception due to the high sound pressure level of the waterfall sound. Therefore, the environmental perception effect of biophony was insignificant in environments heavily influenced by water sounds.

3.2 Psychological and physiological responses

Figure 5 illustrates the subjects’ audiovisual satisfaction, immersion, and psychological restoration scales for each stimulus. ANOVA indicated significant differences in auditory satisfaction, overall satisfaction, and immersion between G1 and the other stimuli. These differences can be attributed to the low auditory satisfaction in G1 due to the absence of auditory stimulation, impacting overall satisfaction, immersion, and psychological response. On the contrary, among the other stimuli, immersion ranged from 6.4 to 7.1 and SRRS scores from 7.0 to 7.4, with no significant differences in audiovisual satisfaction, immersion, and psychological response.

Fig. 5
figure 5

Audiovisual, overall satisfaction, immersion, and SRRS mean rating scores for each stimulus. Error bars represent the 95% confidence intervals

Figure 6 illustrates the physiological response through various HRV indicators, organized according to the experiment’s sequence. HRV indicators were examined in the order of the experiment, with average values for each stimulus schematized. The data were categorized into “Intra” (during the stimulus experience) and “Post” (after the stimulus experience). ANOVA was conducted on six measured HRV parameters following the experimental procedure. Findings from the virtual natural environment experience revealed (a) a statistically significant increase in stress resistance and TP by 14.70% and 5.31%, respectively, (b) a decrease in the stress index by 9.51%, and (c) HR and HF changes similar to baseline results before the stress task. These findings suggest that experiencing virtual nature stimulation boosted the autonomic nervous system by improving stress resistance and stabilizing heart rate while lowering the stress index. This indicates a positive physiological restoration response in virtual forest environments. The difference in response during and after the stimulus experience suggests a continuous physiological restoration reaction, with no significant differences observed except for LF.

Fig. 6
figure 6

Differences in HRV across experimental sequences. a SDNN (ms), b TSI, c HR (BPM), d TP (ms2), e LF (ms2), f HF (ms2). *p-value < 0.05, **p-value < 0.01). Error bars represent the 95% confidence intervals

In addition, ANOVA was performed on “Intra” and “Post” data to compare restoration responses for each stimulus. However, the differences in HRV restoration responses among stimuli were not significant, suggesting that individuals may not have highly sensitive physiological responses to changes in audiovisual stimuli in virtual natural environments.

3.3 Relation between virtual nature and human responses

Table 5 presents the correlation analysis results between PSD and subjects’ responses to the virtual environment. To examine the similarity between the response to the virtual natural environment and the perception of nature, this study compared eight PSD items with the subject’s environmental responses. The analysis revealed that water sounds were significantly correlated with the perception of nature when examining audio sources. Biophony allowed subjects to perceive the environment as rich in species, creating a strong sense of connection with the space. For visual elements, a higher perceived degree of the sky was associated with higher perceived degrees of prospect, refuge, and serenity. The visual elements of birds were significantly correlated with nature, prospect, and refuge, and the visual movement of birds showed a higher correlation with richness in species compared to bird sounds. Therefore, to enhance the natural feeling in the virtual environment, it is necessary to visually present the movement of animals and plants.

Regarding PAQ, pleasantness demonstrated a significant positive relationship with most PSD items, while eventfulness showed a significant positive relationship with richness in species and a significant negative relationship with serenity. Except for culture, which had low significance levels, pleasantness in the virtual natural environment exhibited a significant positive relationship with PSD. Increasing audiovisual natural features enhanced the perception of richness in species and reduced the awareness of serenity, making the virtual environment seem more eventful. In terms of immersion, overall satisfaction, SRRS, and PSD, it was confirmed that nature, prospect, space, and refuge showed relatively significant positive correlations compared to other indicators, with r = 0.29. Finally, the correlation with the physiological response did not indicate significant associations with PSD items. However, social and space items showed somewhat significant positive relationships with HRV change indicators. This suggests that when subjects experienced social and spatial elements in a virtual natural environment, their psychological comfort and stability increased, resulting in higher HRV. These results highlight the importance of these elements in virtual environment design and emphasize the need for further research to effectively utilize them in VR to enhance user immersion.

Table 5 Spearman’s rho between subjects’ responses and PSD

4 Discussion

4.1 Effects of audiovisual elements on naturalness

Previous studies have suggested that audiovisual elements in virtual natural environments correlate with perceptions of real natural environments (North and North 2018). This study found that certain factors had a stronger effect through the analysis of subjects’ responses and PSD. Among the sound elements, water and biophony sounds had a significant impact on the perception of naturalness, as confirmed by Emfield and Neider (2014). In addition, the results of the psychological and physiological responses revealed that environments without sound could reduce immersion, overall satisfaction, and psychological restoration response, even if they were visually satisfying. Therefore, it is crucial to provide a sound source that aligns with the environment when developing a restorative environment.

The analysis of visual elements indicated that subjects’ perceptions of the sky correlated with their perceptions of prospect, shelter, and peace. The visual element of birds showed a significant correlation with the subject’s sense of nature, outlook, shelter, and prospect, similar to findings by Annerstedt et al. (2013). Pleasantness showed a significant positive relationship with most PSD items, while eventfulness demonstrated a significant positive relationship with richness in species and a significant negative relationship with serenity. These results indicate that both pleasantness and eventfulness play an important role in enhancing naturalness in virtual environments.

Additionally, it is important to consider the characteristics of our sample, which included subjects with mild depression and stress. This population characteristic might have influenced the results, as individuals with mild depression or stress may exhibit different baseline levels of psychological and physiological responses compared to the general population. Studies have shown that natural and restorative environments are associated with reduced stress, depression, and anxiety symptoms (Pun et al. 2018) and lower morbidity rates (Maas et al. 2009). In our study, subjects with mild depression or stress may have been particularly responsive to the restorative virtual environments, which could explain the significant improvements observed in pleasantness and satisfaction. Similarly, their HRV parameters might have shown more pronounced changes in response to the virtual environments, reflecting their greater need for stress reduction and relaxation.

The results of this study are consistent with previous research but also highlight the significant impact of audiovisual elements in virtual natural environments on the perception of real natural environments. Therefore, these factors should be considered when configuring virtual natural environments to enhance user immersion.

4.2 Psychophysiological restoration model

This study investigated how perceptions of audiovisual virtual natural environments, created using VR technology, affect psychophysiological restoration compared to real natural environments. To determine whether 3D virtual environments can replicate the restorative effects of real nature, a SEM approach was employed, as shown in Fig. 7. In the SEM model, the SRRS items were utilized as observed variables for psychological restoration, while ∆SDNN, ∆TSI, and ∆LF served as observed variables for physiological restoration. Standardized path loading values for the SEM model are presented in Appendix 3 (Table 1), with non-standardized coefficients (estimate), standard errors (SE), critical ratios (CR), and significance levels (p-value) summarized in Appendix 3 (Table 2).

Fig. 7
figure 7

Psychophysiological restoration SEM in the virtual natural environment

The SEM analysis revealed that the path from Pleasantness → Immersion → Satisfaction had the highest coefficient of determination in psychological and physiological restoration. Additionally, the perception of eventfulness in natural elements contributed positively to satisfaction, enhancing the overall restorative effect. This path underscores the importance of immersion in shaping the perception of virtual environments, particularly in terms of satisfaction, which is crucial for psychophysiological restoration. In essence, the pleasantness experienced in immersive soundscapes that simulate natural environments appears to foster mental relaxation, enhance stress resistance, reduce stress levels, and influence the reduction of ∆LF associated with fatigue. The path from PSD to Eventfulness and Pleasantness highlights how sensory inputs influence these affective qualities. The direct effects of Pleasantness on psychological and physiological restoration indicate that immediate pleasantness can independently contribute to restoration outcomes. Similarly, the direct effects of Immersion on psychological and physiological restoration outcomes suggest that deep engagement with the environment has a significant restorative effect, supporting previous findings by Schutte et al. (2017) on the importance of immersion for physiological restoration. Although the path from Eventfulness to Psychological Restoration is less pronounced, it suggests that eventful environments can also provide psychological benefits through mental stimulation and engagement. Similar findings have been reported in earlier SEM frameworks for real natural environments, which identified pleasantness, impression, and satisfaction as key factors in psychophysiological restoration (Korpela et al. 2002; Pasanen et al. 2014). Berto (2005) highlighted the association between satisfaction with real natural environments and mental restoration, while Schutte et al. (2017) emphasized the role of immersion in physiological restoration. Consequently, the SEM model results for virtual natural environments not only complement existing models for the real natural settings but also enhance our understanding of the factors influencing psychophysiological restoration in virtual environments. These results suggest that virtual natural environments can effectively simulate the restorative effects of real natural environments.

4.3 Effects of PSD on perceptual-physiological responses

Table 6 illustrates the direct relationships between the parameters of the physiological restoration model and the psychophysiological restoration responses examined in this study. Overall, a significant association was observed between psychological responses and cognitive indicators, with significant relationships also found among physiological indicators. However, no significant differences were observed between physiological and subjective responses. A closer examination revealed that higher satisfaction levels were associated with increased stress resistance and decreased stress levels. Furthermore, greater pleasantness was associated with increased ∆TP, ∆LF, and ∆HF values, and a reduction in heart rate. These findings underscore the critical role of a high-satisfaction environment in enhancing stress resistance. Specifically, increased perceived satisfaction enhances coping abilities against stressors, while higher pleasantness in the scenery stimulates the autonomic nervous system. Despite these findings, no clear relationship was observed between psychological and physiological responses, which may be influenced by various factors.

Table 6 Spearman’s rho correlation between subjects’ responses

First, stimuli in a virtual natural environment might elicit different physiological responses compared to those experienced in a real natural environment. Even with a high degree of immersion achieved through audiovisual elements, aspects such as air quality, tactile sensations, and olfactory inputs may differ from those in nature. This cognitive dissonance can cause differences between physiological and psychological responses (Barfield et al. 1995; Depledge et al. 2011; Anderson et al. 2017). Second, virtual natural environments often provide relatively monotonous and predictable stimuli compared to the diverse stimuli of real natural environments, potentially leading to more stable physiological responses. This study aimed to examine whether emotional restoration and autonomic nervous system recovery during VR-based restoration experiences affect psychological and emotional stability. The findings indicated that each cluster demonstrated a tendency similar to the subjective survey responses and HRV baseline results.

Previous studies (Lüddecke and Felnhofer 2022; Weibel et al. 2023) have shown that virtual natural environments can induce physiological restoration reactions similar to those experienced in real natural environments. These results indicate that virtual natural environments can be a beneficial alternative for rest and restoration, particularly for individuals in modern societies who spend much of their time indoors. While virtual natural environments cannot entirely replace the actual natural environment, the positive correlations observed between naturalness, immersion, environmental awareness, and satisfaction indicate that these factors can aid psychophysiological restoration. A virtual natural environment designed with soundscape concepts can effectively support emotional, mental, and physiological restoration for those unable to access actual natural environments.

5 Conclusion

This study validated the psychophysiological restoration effects in subjects exposed to a virtual natural environment designed around soundscape concepts in a laboratory setting. The findings provide a comprehensive summary of how audiovisual stimuli influence human perception and physiological responses. First, the inclusion of natural sound sources significantly enhanced subjects’ psychological responses, sense of immersion, and overall satisfaction. This influence was contingent upon the characteristics of the soundscape, including variations in sound level and quality, and environmental perception. Psychological responses were assessed using the SRRS, immersion was measured using items related to audiovisual dominance and engagement, and overall satisfaction was evaluated through questions on visual, auditory, and general satisfaction. Environmental perception, particularly the perception of natural sound sources, was captured through affective qualities such as pleasantness and eventfulness, as measured by the PAQ. Second, analysis of HRV parameters revealed a consistent pattern of positive physiological restoration across all virtual natural environments examined. This evidence supports the potential for physiological restoration in virtual settings, offering a valuable alternative for urban residents who have limited access to natural settings. Finally, the SEM framework developed for this study highlighted the crucial roles of positive perception, high immersion, and pleasantness in promoting effective psychophysiological restoration within virtual natural environments.

Although the present study reveals important findings, it has several limitations. First, we did not separate or analyze the metrics collected on the first and second experimental days. As a result, it is unclear whether a habituation effect, wherein participants may exhibit diminished psychophysiological responses to repeated stimuli exposure over time, was present. Habituation is a phenomenon that could potentially influence the interpretation of psychophysiological metrics, as responses may decrease due to participants becoming accustomed to the stimuli. While the randomization of stimuli presentation aimed to mitigate order effects, the inability to control habituation represents a limitation of the study. Future research should consider separating data by experimental day and analyzing potential differences to clarify whether habituation effects emerge in similar experimental setups. Second, the HRV analysis to investigate physiological changes revealed only minimal differences between stimuli. This may be due to the relatively short duration of the experimental period, which did not sufficiently reflect changes in HRV over time. Therefore, future research should consider conducting follow-up observations over longer periods. Third, this study provided stimulation in a controlled VR environment, but it remains to be determined whether the short-term psychological and emotional restoration effects observed are sustained in the long run. Additionally, this experiment did not control subjects’ aversion or motion sickness symptoms when using HMDs for the first time, which is crucial for understanding the reactions of individuals who may struggle to adapt to the VR environment. Forth, the model fit was less than ideal, with a CMIN(χ2)/DF value of 4.478, exceeding the generally accepted threshold of 3. This suggests that the model could be improved. Additionally, one limitation of this study is that the psychoacoustic metrics were not fully integrated into the SEM analysis. Future research should focus on refining the model and improving fit indices by utilizing larger, more diverse samples while also incorporating these metrics more comprehensively to better understand their impact on psychological and physiological restoration. Fifth, the Mental Arithmetic Task (MAT) used to induce stress may have had varying effects on subjects, influencing their responses to the VR environments. Although rest periods were provided freely upon request, without a specific schedule, the habituation effect might not have been fully controlled. Future studies should implement more structured rest periods and additional measures to account for this effect. Finally, the findings are based on a specific set of virtual environments and stimuli, which may not encompass the full range of possible environmental conditions. Future research should explore a broader range of virtual environments and stimuli to validate and extend these findings. It is also necessary to investigate changes in physiological responses to various audiovisual combinations, using diverse physiological indicators such as brain waves and blood analysis. Additionally, evaluating environmental awareness across multiple sensory levels, including olfactory and tactile senses, will help identify restorative factors not observed in this study.

By quantitatively examining the relationship between psychophysiological restoration responses and environmental factors, this study highlights a novel approach in designing therapeutic virtual environments utilizing natural elements. Grounded in human responses, these insights can guide the development of virtual spaces that promote psychological and physiological health and well-being. This study’s integration of advanced analytical techniques and comprehensive evaluation of audiovisual stimuli distinguishes it from previous research, and offers new methodologies for optimizing virtual environment design.