1 Introduction

The immersive and spatial nature of 3D virtual reality (VR) are important qualities in lending itself as an attractive means for various types of training—including physical exercises (Lee and Kim 2018; Neumann et al. 2018). A typical method of physical training would be, e.g., understanding and remembering to follow various exercise instructions illustrated on paper or watching the trainer’s motion in the third person viewpoint and mimicking it. VR may be particularly suited for teaching physical exercises as it can provide the first person viewpoint and convey the spatial/proprioceptive sense of the required movements, e.g., as if enacted by one’s own body parts (Yang and Kim 2002; Cannavò et al. 2018). Consequently, many VR-based physical training systems have indeed been developed and shown their effectiveness (Ahir et al. 2020).

Postural instability has been proposed and has emerged (although not universally accepted yet) as one of the competing hypotheses for the cause of cybersickness (CS) or VR sickness (Riccio and Stoffregen 1991; Keshavarz et al. 2015; Stoffregen et al. 2008). This theory postulates that a VR user can become sick in provocative and unfamiliar situations (such as being immersed in a VR space) in which one does not possess (or has not yet learned) strategies or skills to maintain a stable posture and balance (Riccio and Stoffregen 1991; So et al. 2007; Dennison and D’zmura 2018). Losing balance is often seen as a consequence (rather than a cause) of CS, as one major symptom is dizziness, and it has even been used as one measure for CS (Chang et al. 2020; Weech et al. 2019). However, there is also some evidence that visually induced sickness (like CS) can be predicted by one’s postural instability (Smart et al. 2002). This led us to investigate VR-based balance training as one possible way to increase the tolerance to CS, focusing on a particular and clearly observable relevant physical ability. In this context, it is important that the proposed application of balance training for CS is also situated in the VR environment. If shown to be effective, the proposed method can also further corroborate the postural instability theory as well.

This paper presents a series of two human-subject experiments that examined the long-term trends of user balance learning and tolerance to CS under different experimental conditions. We assessed the potential inter-relationship between one’s balance ability and the level of CS. The transfer effect was also investigated by having the trained users tested for CS tolerance in new VR content. The main contributions are as follows:

  • Balance training can be effective in developing the tolerance to sickness from visual motion.

  • Immersive balance training is more effective for developing tolerance to CS than non-immersive training and mere extended exposure to VR.

  • The effect of balance training can be transferred to other VR content (not used for training).

The structure of this paper is as follows: Sect. 2 gives an overview of the related research. Section 3 details the first experiment (as a preliminary pilot study), comparing three training methods over a 2-week period towards any change in CS. Section 4 presents the second similar experiment (as a main study), but focuses on the two effective conditions as found in Experiment 1, with 1 week of training. Section 5 provides the experimental findings and makes an in-depth discussion of the implications, and lastly, Sect. 6 concludes the paper.

2 Related works

2.1 Balance training in VR

The human body as an articulated and complex skeleton structure is inherently mechanically unstable (Pai and Patton 1997; Lee and Chou 2006). Maintaining balance (around the center of mass) is a complex process that involves multiple systems in the human body. The vestibular, proprioceptive, and visual channels are used to detect and gather balance/pose-related information and the brain integrates them to coordinate and generate the motor responses and establish the center of pressure through the muscles and joints with constant adjustments to counteract the external perturbation to the body (Peterka 2018; Lee and Farley 1998; Lafond et al. 2004).

There are numerous physical training routines to improve one’s balance by strengthening and improving the capabilities of the aforementioned subsystems (Brachman et al. 2017). Conventional means of balance training usually involve following paper or live instructions from the second/third person point of view. Virtual reality (VR) can be an effective media offering the first-person perspective and sense of personal space in enhancing the understanding of the various training poses and work-outs (Cannavò et al. 2018; Pastel et al. 2022). Gamification can further provide the motivation and impetus to facilitate the training process (Dietz et al. 2022; Tuveri et al. 2016). However, the effect of balance training (VR-based or not) on CS has not been investigated much despite the fact that they are often touted to be closely related (Riccio and Stoffregen 1991; Haran and Keshner 2009; Rine et al. 1999).

2.2 Cybersickness

Cybersickness (CS) refers to the unpleasant symptoms when using immersive or VR simulators, especially with navigational content. Typical symptoms include disorientation, headache, nausea, and ocular strains (LaViola 2000; Kennedy et al. 1993). The leading explanation for CS is the “sensory mismatch theory”, which attributes CS to the conflicting user’s motion information as interpreted by between the visual and vestibular senses (LaViola 2000; Rebenitsch and Owen 2016). That is, the aforementioned unpleasant symptoms arise when the virtual/visual motion is perceived by the human’s visual system while the vestibular senses detect no physical motion. Note that the visual and vestibular systems are neurally coupled (Grsser and Grüsser-Cornehls 1972).

To combat these symptoms, several studies have focused on reducing the amount of or neutralizing the visual motion information to minimize the sensory mismatch (Fernandes and Feiner 2016; Park et al. 2022; Keshavarz et al. 2014). For instance, Fernandes and Feiner (2016) developed a dynamic size-shifting field-of-view (FOV) in response to the speed/angular velocity of users or content. When the user’s motion accelerates, the FOV is reduced, which in turn reduces the extent of the visual stimulation and, ultimately, the sickness. In a similar vein, blurring the peripheral visual field has been proposed to minimize the visual stimulation (Budhiraja et al. 2017; Lin et al. 2020; Caputo et al. 2021). Park et al. (2022), Yang et al. (2023) have proposed neutralizing the visual motion stimuli by simultaneously presenting the reverse optical flow.

Another popular theory is the rest frame theory (Harm et al. 1998), which points to the absence of reference object(s) (objects in the VR content that are not moving with respect to the user). The rest frame is thought to help the user maintain one’s balance and be aware of the ground (or gravity) direction (Hemmerich et al. 2020; Wienrich et al. 2018; Harm et al. 1998). One interesting remedy to CS is the inclusion of the virtual nose, which can be considered as a rest frame object (Wienrich et al. 2018; Wittinghill et al. 2015; Cao 2017).

Alternatively, a potential strategy for addressing CS might involve methods to alleviate the immediate symptoms and enhance users’ physical well-being, rather than directly targeting the root cause. These can include, e.g., supplying a fresh breeze (Igoshina et al. 2022; Kim et al. 2023), providing pleasant music or calming aural feedback (Keshavarz and Hecht 2014; Kourtesis et al. 2023; Joo et al. 2023), and reducing the weight of the headset (Kim et al. 2023). These measures can be regarded as a cognitive distraction as a way to reduce CS by preventing users from focusing on the sickness-inducing VR content (Kourtesis et al. 2023).

One newer hypothesis for the cause of CS is the “postural instability theory” (Riccio and Stoffregen 1991; Li et al. 2018), which suggests that the inability to maintain balance due to external factors such as unfamiliar, provocative, and challenging situations can induce sickness. Note that this does not preclude the fact that imbalance is one typical after-effect of the sickness as well. This is based on the various studies that have observed a strong correlation between one’s balancing ability (before) and the extent of the motion sickness (after) (Riccio and Stoffregen 1991; Li et al. 2018; Smart et al. 2002). This theory is also in line with the rest frame theory, i.e., the lack of the object (indicating the direction of gravity and helping one maintain balance) could be seen as a provocative situation for the user (Hemmerich et al. 2020; Wienrich et al. 2018; Harm et al. 1998).

Based on all these studies, one can posit that balance training while navigating in immersive VR would make the user even more unstable and exacerbate the extent of the CS. In turn, this could make the balance training itself even harder (Imaizumi et al. 2020; Horlings et al. 2009). Nevertheless, given that the user can endure the training, its effect can eventually ease and break this vicious cycle. We can further hypothesize that immersive feedback will be an important factor, as maintaining and training for balance involves the visual channel and spatial awareness, which is difficult to fully provide in non-immersive and 2D-oriented media.

On a related note, the length of time exposed to a virtual environment is known to affect CS (Duzmanska et al. 2018). Stanney et al. (2003) has found high correlations between exposure time and CS, with longer exposure times increasing the risk of CS. On the other hand, there is also the opposite view that people may build up a resistance or adapt over time (or by frequent exposures) to CS (Duzmanska et al. 2018). Thus, the exact relationship between extended exposure and CS symptoms is not firmly established.

To our knowledge, no prior work on applying balance training as a way to train for tolerance to CS has been reported. Note that similarly to any external stimulation, the mere repeated and prolonged exposure to VR in itself can certainly have the effect of insensitization or habituation to the CS (Palmisano and Constable 2022). However, we expect it to be a relatively time-consuming method and quickly receding in its effect (compared to active training), and little is known about whether there is any transfer effect to other VR contents (Palmisano and Constable 2022; Dużmańska et al. 2018; Adhanom et al. 2022; Smither et al. 2008). Considering various aspects, the formulated hypotheses are as follows:

  • H1: The training effect for CS through (with VR or non-VR based) balance training, if any, will be greater than just by the mere extended exposure to the same VR content.

  • H2: The training effect for balancing and developing CS tolerance will be greater with the use of immersive VR than with the non-VR 2D environment.

  • H3: There will be a transfer effect such that tolerance to CS developed by balance training is conveyed to newly exposed VR contents.

  • H4: If balance training enhances tolerance to CS, this partly serves as evidence for the posture instability theory, which suggests that imbalance is a potential cause of CS.

3 Experiment 1: VRT versus VRO versus 2DT

The purpose of this study was to make a preliminary exploration and investigation of any effect of trained and enhanced balance ability on individuals’ tolerance to CS. As such a tolerance may be affected by a range of possible factors, including the training methods, duration, and media types, the results would be used to solidify the design of the more focused follow-up second experiment with a larger participant pool.

3.1 Experimental design

The balance training may occur in either a non-immersive environment or a VR environment, using sickness-eliciting contents (i.e., navigation). We hypothesize that, given the same content, the effects of balance training on tolerance to CS will be stronger if the training occurs in the VR environment compared to using the non-immersive environment. Furthermore, we expected that even if the given VR content may be new and different from the one used for training, this trained tolerance can be transferred to it.

On the other hand, to differentiate the effect (if any) by between the training method/media type and mere exposure to VR, subjects were also tested as such under the same sickness-eliciting VR contents without any balance training. Humans can become habituated, desensitized, and tolerant to CS after long exposure to various stimuli by VR (Fransson et al. 2019; Duzmanska et al. 2018). Thus, EXP1 was designed as a two-factor mixed model study with repeated measures. The first factor was a between-subject factor with three training methods (see Fig. 1):

  • VRO: only exposure/just watching a sickness-eliciting navigation content using a VR headset, but without any balance training.

  • VRT: watching a sickness-eliciting navigation content using a VR headset while carrying out a balance training routine (i.e., it is a combined effect of balanced training with extended VR exposure.).

  • 2DT: watching a sickness-eliciting navigation content on a 2D projection-based display while carrying out a balance training routine.

Fig. 1
figure 1

The overview of the first experiment, conducted over 2 weeks and under three conditions. Participants were divided into between-subject groups according to the three different training conditions: a virtual reality-based balance training (VRT); b virtual reality exposure only (VRO); and c 2D projection display-based balance training (2DT). The two VR training contents used were d a jet fighter flight through a forest for the first week (EW1), which was relatively less CS-inducing, and e a wild roller-coaster ride for the second week (EW2), which was more CS-inducing. To assess the transfer effects of the balance training, participants were exposed to two other unexperienced transfer VR contents: f a rollercoaster ride [completely different from (e)], and g space exploration, on the first and last days of each respective week

As the effects of training may take time, the experiment was conducted over 2 weeks, but in two separate weekly segments: EXP1’s Week 1 (EW1) and Week 2 (EW2). Note that 2 weeks of balance training was deemed sufficient because marked progress is usually attainable in that time frame (Rasool and George 2007; Szczerbik et al. 2021). Thus, the time (days) constituted the second and within-subject factor.

EW1 proceeded over 4 days, and the participants were trained while watching the sickness-eliciting navigation content, which induced only a relatively moderate/lesser degree of CS to start the overall training gently (not too abruptly). After a 3-day break, EW2 was conducted with a duration of 5 days. Due to the possible learning effect and getting accustomed to the same content after repeated exposures, a new and more dynamic content (i.e., inducing more severe CS) was used. Although it is difficult to exactly quantify the difference in the induced-sickness levels, Fig. 2, which shows the navigation motion profiles of the respective content, makes it reasonably clear that the content from EW2 is likely to induce a much more severe level of CS.

Fig. 2
figure 2

The navigational path profiles of the two training VR contents in Experiment 1. a EW1 had a relatively simple profile, which was likely to induce CS only to a relatively little extent; b EW2 had a more complex profile, thus more likely to induce a higher level of CS

In summary, there were two mixed model and longitudinal experiments; each designed as a two-factor, \(3 \times 2\), repeated measure between subjects. While experimental tasks were carried out and data measured daily during the 4-day/5-day periods for EW1/EW2, the analysis focused solely on the differences between the first and last days of each week (making it a two-factor study).

Fig. 3
figure 3

The overall process for Experiment 1: Training with EW1 spanned 4 days, whereas that with EW2 lasted for 5 days, with a 3-day rest period between them. Balance tests were conducted on the first and last days of each week to assess the effects before and after the balance training sessions, taking into account physical fatigue. Additionally, transfer content was presented on the first and last days of each week, and the test for this content was conducted last on the last day of each week

3.2 Experimental setup and task

In both EW1 and EW2, except for the training contents used, the experimental task (training/exposing procedure) was the same (see Fig. 3).

For 2DT and VRT groups, participants engaged in a balance training routine known as the “one leg stand” (also known as the Flamingo test Uzunkulaoğlu et al. 2019; Marcori et al. 2022). In the 2DT setup, balance training was performed while participants viewed navigational content on a 60-inch projection display from a distance of 1.5 m (see Fig. 1c). In contrast, participants in the VRT group watched the same content through a Meta Quest 2 VR headset, which has a field of view of \(104 \times 98\)°.

The balance training routine during the 3-min viewing period was structured as follows: ready/rest for 30s—training (30s’)—rest (30s’)—training (30s’)—rest (30s’)—training (30s’). Instructions for the training, such as when to raise the leg or rest, were delivered through a visible user interface integrated into the system.

On the other hand, the VRO group did not perform any balance training, i.e., participants only experienced the same VR content using the VR headset while standing on two feet.

As already indicated, two VR contents (inducing the different levels of CS) were used—the lesser sickness eliciting one in EW1 (flying through the forest trail, Fig. 1d) and more in EW2 (wild roller coaster ride, Fig. 1e). Experimenting with the new, more difficult (sickening) content also allowed us to examine the user’s behavior and performance after a week of training.

The navigation path contained several types of motion—forward translation and pitch/yaw, rotation/turning in varied speed and acceleration (see Fig. 2). Moreover, to mitigate the learning effect as much as possible, not only were different VR contents used between EW1 and EW2, but adjustments were also implemented for the same content: e.g. the content was subtly altered by changing the mood of the surrounding environment (e.g., dawn, midday, evening, night, cloudy, and rainy) while ensuring that the sickness-inducing level remained consistent between each day.

The training was conducted twice daily for 4 days in EW1 and 5 days in EW2. Note that participants were free to put their foot back down anytime if they felt they were in danger of falling down (or for any reason, e.g., not being able to maintain balance or due to too much sickness) but were asked to resume and continue in their best way. The experiment helper stood by to prevent the participant from completely falling down. The participant was also free to stop the experiment at any time, although there was no such case.

3.3 Dependent variables

Three dependent variables of main interest were changes in (1) balance performance, (2) the level of CS over time, and (3) whether tolerance to CS was developed as a transfer effect. First, quantitative balance performance was measured in three ways:

  • Maintenance time: To assess the changes in participant’s balance ability, “one leg stand with eyes closed” (Bohannon et al. 1984) test was administered on the first and last days of each week (see Fig. 3). Maintenance time was measured until the foot was placed on the ground in seconds.

  • Number of foot down: The number of times the participant put their foot down during the training process was manually counted.

  • Center of mass variability: The extent of the deviation of the standing body from the reference center of mass was computed by analyzing the participant’s 2D pose data extracted from recorded video using PoseNet (Papandreou et al. 2017, 2018). Specifically, the variation in the midpoint of the screen space locations of the right and left hips was used to estimate this measure.

As for the level of CS, the Simulation Sickness Questionnaire (SSQ) was used (Kennedy et al. 1993) with 16 questions. SSQ gives the three scores for the symptoms of nausea (SSQ-N), oculomotor (SSQ-O), disorientation (SSQ-D), and the total (SSQ-T). However, the SSQ only asks of the existence of certain symptoms; thus, it is not possible to assess their probable cause, e.g., whether ones stem from the visual motion or the balancing act. Thus, in addition to the “Original” SSQ, two revised versions, “Visual” and “Balance”, were made and used. Each newly revised questionnaires asked of the same symptoms, but also of what the participants thought the source might be, i.e., from the visual motion or the balancing act.

Lastly, to confirm the transfer effects (related to H3), namely whether the tolerance to CS developed through balance training was effective even with new VR content, the participants’ CS levels (using the “Original” SSQ) were measured using completely different, sickness-inducing VR contents. These were tested with the EW1 transfer VR content, a rollercoaster rideFootnote 1 (see Fig. 1f), and the EW2 transfer VR content, space exploration (see Fig. 1g). It should be noted that the transfer content also lasted 3 min, and participants in all conditions viewed it while standing (with two feetFootnote 2) and wearing a VR headset, without engaging in any balance training.

3.4 Participants

Participants were recruited through the university’s online community. The first round of participants was surveyed for their self-reported sensitivity to motion sickness using the MSSQ-short (Golding 2006; Nesbitt et al. 2017) and familiarity or prior experiences in using the VR system. We notified the potential participants of the need to carry out balance training (one leg stand) for about 10–15 min per day for 2 weeks and asked them to excuse themselves if they deemed it to be beyond their physical capabilities. Participants in the extreme ends in terms of their reported sensitivity were also excluded, as our study targeted participants in the middle of the sensitivity spectrum.

Fifteen final participants (all male, aged 19 to 33, \(M = 25.6\), \(SD = 2.19\)) were selected and placed in the three-between groups (5 each) for VRT, VRO, and 2DT such that their MSSQ score variations were similar and within an reasonable range (see Table 1). All participants had at least some experience in using VR applications (mostly game playing and video watching) but did not have any prior balance training experience. The subjects were paid 16 USD per hour for their participation (a total of about $120 for the whole 2 weeks). All 15 subjects managed to finish the experiment in 2 weeks without giving up in the middle.

Table 1 Demographic comparison of the two participant groups

3.5 Experimental procedure

The participants first filled out the consent agreement form, were briefed about the procedure of the experiment, and were explained the experimental tasks. Five to ten minutes were given for the participants to get oneself familiarized with the balancing task while watching the content through the monitor or the headset. In particular, the participants were given detailed instructions on how to carefully respond to the three types of SSQs and to think deeply about the probable causes of the symptoms the best they could. The helper assisted the subject to position oneself in front of the monitor on the floor (with cushioned walls) or donning and adjusting the headset. The helper also stood by to prevent the subject from falling down.

On each day, the participants for 2DT and VRT performed balance training routines (described in Sect. 3.2). The participants selected which foot to use to stand or lift on their own. This protocol was designed considering that the similar one-leg with eyes closed test lasted around 30 s on average (Hong-sun et al. 2019), and our own pilot test (with four males) indicated that exceeding 1 min often led to muscle strain. Meanwhile, the VRO group just watched the VR content for 3 min in a normal standing pose.

After the respective treatments, participants rested and filled out the survey. After experiencing all treatments, informal post-briefings were taken. It should be noted that participants were free to stop the experiment at any time for any reason, and the experiment received approval from the Institutional Review Board (No. 2023-0143-01).

3.6 Results

Considering the \(3 \times 2\) mixed design and the collected longitudinal data being both continuous and non-parametric, the nparLD (Noguchi et al. 2012) method was utilized to evaluate the statistical effects of the factor and its interactions. Pairwise comparisons were conducted using the Kruskal–Wallis test to analyze the factor of the training method (a between-design factor), while the factor of the day (a within-design factor) was assessed using the Wilcoxon signed-rank test. All tests were applied Bonferroni correction with a 5% significant level. As EW1 and EW2 were conducted in different settings, the analysis was performed separately.

3.6.1 Change in sickness levels

The primary focus of this study was to alleviate CS caused by visual mismatch through balance training. There is a possibility that physical challenges from the balancing act could be similar to many CS symptoms assessed by the SSQ (e.g., “disorientation” from trying to stand on one foot). We acknowledge the inherent difficulty in objectively and correctly judging the sources of the CS symptoms and any possible interaction between these factors (participants were allowed to attribute a given symptom to both visual stimulation and balance exercises). Thus, as described earlier, two additional versions of the “Original” SSQ-termed “Visual” and “Balance”-were prepared. Such measures may enable participants to differentiate and report whether the CS symptoms were induced by the visual and/or balance training. Nevertheless, given that the Original SSQ is widely used, and as such has been more thoroughly validated than our revised versions, this section primarily reports the statistical analysis results based on the Original SSQ. The results derived from the modified versions (Visual and Balance) are presented in the “Appendix”.

Trends in CS score over EW1/EW2 are shown in Table 2, Figs. 4 and 5.

Fig. 4
figure 4

Changes in the SSQ scores (total, nausea, oculomotor, and disorientation) over the 4-day period in EW1

Fig. 5
figure 5

Changes in the SSQ scores (total, nausea, oculomotor, and disorientation) over the 5-day period in EW2

Table 2 Trends in the level of CS over 2 weeks for Experiment 1
Table 3 Results for the “training content” in Experiment 1 were analyzed using the nparLD methods for the omnibus test to assess the effects of the factors

In EW1, the nparLD revealed significant differences at the CS level for all SSQ items in relation to Days (\(p <.001\)), but there were no effects of Training Methods and interaction effects (\(p >.05\)), as shown in Table 3.

On the other hand, the pairwise comparisons with respect to the within-factor of Days (1 day vs. 4 day) showed significant reductions in the level of CS for both VRT groups (1 day > 4 day; \(p <.05\)) and 2DT groups (1 day > 4 day; \(p <.05\)) for all SSQ items. However, the VRO group showed a significant reduction only in the Nausea (1 day > 4 day; \(p =.045\)*), while the reductions in other items did not reach statistical significance (see Table 3b). It should be noted that the VRO groups were only exposed to immersive/visual simulation without any balance training. These results indicate the possibility that balance training is significantly effective in developing CS tolerance (supporting H1). Meanwhile, the pairwise comparison for between-factor Training Methods revealed no significant differences in the first and last days.

For EW2, due to the high complexity of its content (see Fig. 2), overall CS levels increased as expected (see Table 2). Statistical analysis indicates that significant effects of Days were observed across all SSQ items (\(p <.001\)***), while interaction effects were significant for all SSQ items except SSQ-O with a p value of.076. However, Training Methods showed a significant effect only for SSQ-D with a p value of.016* (see Table 3a).

Regarding the pairwise comparisons on Days, the SSQ scores on the last day were significantly lower than those on the first day in all training methods (see Table 3b). Among them, increased tolerance to the CS by simple sustained exposure and habituation to VR (i.e., VRO) were observed. This effect did not occur in EW1, where significant differences were observed only in SSQ-N (\(p =.045\)*)and not in other categories, leading us to posit that the exposure effect without balance training requires a relatively longer duration, supporting H1.

The pairwise comparison for the Training Methods showed several significant differences: SSQ-T (VRT < VRO*; 2DT < VRO**), SSQ-O (2DT < VRO**) and SSQ-D (VRT < VRO*; 2DT < VRO***). It is noteworthy that VRO, in particular, exhibited significantly higher levels compared to other training methods on the first day. This was due to the lack of prior balance training and no increased tolerance to sickness from sustained exposure in EW1. Interestingly, the VRT, which initially started with the highest CS levels in EW1, showed lower levels than VRO on the first day of EW2.

3.6.2 Balance performance

To relate the potential effect of balance training on tolerance for CS, three measures were taken: (1) the duration of participants’ balance maintenance; (2) the number of feet down, where participants had to place one foot back on the ground (indicating balance failure); and (3) the variability in their centers of mass. The first one was measured before and after the training sessions, while the latter two were measured during the training sessions (see Fig. 3).

Table 4 shows the trend in balance performance during EW1 and EW2, along with the statistical results for the within-subject factor. Note that the between-subject factor, i.e., the Training Methods, is not presented due to the limited number of participants in each subject group.

Balance maintenance was measured using the “one leg stand with eyes closed” test (Hongsun et al. 2018). The Wilcoxon signed-rank test showed no statistical differences in VRT and 2DT among the tested days; however, when comparing the first day (EW1-1) and the final day (EW2-5), we observed a relatively large increase of 27 s in the average time for VRT, whereas the increase was only 2 s for 2DT. These findings indicate that the VR environment possibly may have had a more significant impact on developing balance abilities compared to the 2D environment.

For the number of foot downs, the Wilcoxon signed-rank test revealed that EW2-1 was significantly larger than EW2-5 in VRT (\(p =.049\)), indicating that VR environments can enhance balance performance. The lack of significance in EW1 may be attributed to the relatively less-inducing CS, resulting in a lower number of foot downs. Moreover, in the 2DT setup, where participants did not wear a VR headset, the visible real environment (e.g., wall, floor) might have helped the participants maintain their balance.

Table 4 Results of balance performance over 2 weeks in VRT and 2DT for Experiment 1, showing average values (standard deviation)

The center of mass variability showed significant differences in all comparisons. In EW1, both groups significantly decreased in EW1-4 compared to EW1-1 (VRT: \(p =.002\)**; 2DT: \(p =.005\)**). However, in EW2, while there was a significant decrease in VRT (EW2-1 > EW2-5; \(p =.001\)**), 2DT showed a increase (EW2-1 < EW2-5; \(p =.47\)). We believe this is because EW1’s training content is quite monotonous. This simplicity helps them maintain a stable center of balance, making it difficult for any factors to have an effect.

Overall, these findings support our second hypothesis (H2), suggesting that immersive training can be more effective in improving balance compared to a 2D or non-immersive environment.

3.6.3 Balance and sickness correlation

The Pearson correlation coefficient test was used to further investigate the relationship between balance performance and the reduction in sickness. The following null hypothesis values of correlations were made: (1) sickness scores and balance maintenance time would be negatively correlated; (2) sickness and the number of balance fails would be positively correlated; and (3) sickness and center of mass variability would be positively correlated. The results are summarized in Table 5, and they are mostly consistent with our assumptions.

Table 5 In Experiment 1, the Pearson’s correlation test results between SSQ’s total score and three balance performances

Statistically significant correlations were found between the improvements in the number of balance failures and center of mass variability, respectively (either by VRT or 2DT) with the SSQ’s total sickness scores over EW1 and EW2. VRT showed higher correlation coefficients than the 2DT in two measures: number of foot downs (\(r = 0.612 > 0.267\)) and center of mass variability (\(r = 0.305 > 0.269\)). For the balance maintenance time, there was no significant correlation found; however, it is worth noting that while VRT showed a negative correlation as expected (\(r = -0.295\)), 2DT only showed near zero correlation (\(r = 0.001\)).

In summary, as balance performance improves, there is an associated increase in tolerance to sickness in both VRT and 2DT, which supports H4. Additionally, the correlation values suggest that the training effect in VRT was more significant compared to 2DT, supporting H2.

3.6.4 Transfer effect

The true test for any effect of the balance training on CS would be observing how the balance-trained participants perform on completely different VR contents, i.e., the “transfer” contents (see Fig. 1f, g). Assessments were conducted twice: on the first and last days of each week. The descriptive and statistical results are shown in Tables 6 and 7.

Table 6 Changes in the SSQ scores measured on the first and last days of each week using the “transfer” VR content
Table 7 Statistical results for “transfer” content in Experiment 1

The nparLD test revealed that there were no significant differences in EW1. In the case of EW2, significant effects were observed for the Days factor across all SSQ items, but no significant effects were found for Training Methods or their interaction (see Table 7a).

Pairwise comparison tests revealed no significant differences in the between-factor analysis (see Table 7b). However, significant differences were observed within factors (i.e., first day vs. last day) for VRT and 2DT. Specifically, VRT showed significant reductions in SSQ-T (\(p =.029\)*) and SSQ-D (\(p =.029\)*) on the last day compared to the first day, while 2DT demonstrated a significant reduction in SSQ-D (\(p =.049\)*).

This suggests that a transfer effect was only developed by the VR-based balance training. It aligns with the findings that the VRO group, who received no training effect in EW1, showed much higher CS levels early in EW2 (when switched to the new training content) compared to VRT. Thus, it may lead to support for our hypotheses (H2, H3) that an immersive VR environment with long-term balance training is a more effective method than simply being exposed to VR content for an equal amount of time. Note that VRT is still required to accompany extended exposure to VR.

4 Experiment 2: VRT versus VRO

EXP1 had several limitations. The training lasted 2 weeks, but the simplicity of the path in the EW1 content hindered the effective manifestation of any training effect. Although the EW2 content was more challenging, participants could predict their path by the rollercoaster rails, leading to less engagement with the virtual/surrounding environment. The inclusion of the 2D projection display-based training (2DT) in EXP1 was perhaps not appropriate in the first place. For one, it was, expectedly so and as shown in the experimental results, difficult to elicit CS (even though a “large” projection was used, the field of view of only 60° was fixed (i.e., imagery not view angle dependent) and the real world was visible in the periphery). The balancing ability was also notably higher as the training occurred non-immersively—in another words, in the real world, rather than in the unfamiliar VR space. Furthermore, the number of participants was not sufficient to establish a strong validity.

Considering these issues, the second experiment (EXP2) with more participants was conducted to explore further and compare only the effects of VR-based training (VRT) and VR exposure (VRO) over a week (5 days). Specifically, this EXP2 was designed as a two-factor, \(2 \times 2\) mixed model study with repeated measures between subjects.

4.1 Participants

For EXP2, 28 new male participants (ages 19 to 31, \(M = 23.96\), \(SD = 3.16\)) were recruited from the University. The recruitment process was the same as in EXP1, with no one from EXP1 involved. These participants were divided into two groups of 14 each: VRT and VRO. The division was based on their maximum balance ability, measured in seconds (VRT: \(M = 40.57\), \(SD = 26.31\); VRO: \(M = 38.64\), \(SD = 40.10\)) and their sensitivity to CS (MSSQ) (Golding 2006) (VRT: \(M = 16.20\), \(SD = 8.63\); VRO: \(M = 16.24\), \(SD = 7.84\)), ensuring the groups were as balanced as possible (see Table 8).

Table 8 Demographics of the two participants groups in Experiment 2: VRT versus VRO

This experiment received another approval from the University’s IRB (No. 2023-0296-02). Participants were paid $80 for their participation.

4.2 Procedures and measures

The procedure was mostly similar to that of EXP1, but EXP2 was conducted over 1 week (refer to the second week’s procedure of EXP1 in Fig. 3). Participants were asked to perform the ‘one leg stand and eye closed’ test to measure their balance performance. They then experienced the transfer content in a seated position, eliminating any influence of balance effects, to more effectively evaluate the effects of the developed tolerance to CS from balance training in another VR content. The CS score levels were reported using the “Original” SSQ (before the training state for the transfer content). After a break, participants engaged in the training VR content, with or without balance training, depending on the condition, and again evaluated their level of CS using the Original, Visual, and Balance SSQ. The training (or only exposure) was repeated twice daily. On the 2nd to 4th days, participants engaged in only the training VR content twice daily and evaluated their CS levels. On the last day, they performed the balance test again, the training contents twice as usual, and finally, experienced the transfer content in a seated position (after the training state for the transfer content). Note that sufficient break was presented between all steps (i.e., balance test, training, and transfer test). This concluded the second experiment over 1 week.

A new VR training content was used—the “Whale belly exploration” (Joo et al. 2024), which did not have any indicator for the upcoming path (in contrast to the railed rollercoaster content used in EXP1). For the transfer content, the same space exploration content (EW2) was used. Note that both contents lasted 3 min.

4.3 Results

4.3.1 CS tolerance and transfer effects

Similar statistical methods, as applied in EXP1, were used for the experimental data analysis. However, since there were two between-subject groups, VRT and VRO, the pairwise comparison was conducted using the Mann–Whitney U test. The “Original” SSQ scores over 1 week are shown in Table 9 and Fig. 6. Similarly, those from the other versions of the SSQ are presented in the “Appendix”.

Fig. 6
figure 6

Changes in the average SSQ’s total scores and scores for three symptoms (Nausea, Oculomotor, and Disorientation) in Experiment 2

Table 9 The SSQ score trends for training and transfer contents in Experiment 2

Similar to EXP1, a decreasing trend was observed in both groups, with VRT showing overall lower levels of CS compared to VRO. Considering the mixed model study, the nparLD methods were also applied. Only the Days effect for all SSQ items had significance, indicating that extended exposure time to VR had an effect on decreasing CS (see Table 10). Moreover, its interactions with the SSQ-D item had a significant effect (\(p =.039\)*). In the pairwise comparisons, although no significant effects were observed in the between-factor analysis, significant effects were found within-factors for both the VRT and VRO groups. Specifically, CS scores on 1 day were significantly higher than those on 5 day across all SSQ items 10b.

Table 10 Results for training content and transfer content in Experiment 2

Regarding the transfer content, the nparLD test revealed significant effects of the Days factor on all SSQ items, as well as significant interaction effects for SSQ-T (\(p =.044\)*) and SSQ-D (\(p =.049\)*). In the pairwise comparison, a significant decrease in CS was observed only in the VRT from day 1 to day 5 for all SSQ items: SSQ-T (\(p =.005\)**), SSQ-N (\(p =.006\)**), SSQ-O (\(p =.011\)*), and SSQ-D (\(p =.023\)*) This finding indicates that a significant transfer effect was only developed in VRT, which accompanied the balance training (supporting H3).

4.3.2 Balance and cybersickness

With respect to the potential relationship between the trained balance ability and the extent of the cybersickness, the statistical analysis focused particularly on such variables between the first and last days of the training experiment. As for the balance maintenance time, the Shapiro–Wilk test revealed that VRT (\(p =.119\)) and VRO (\(p =.057\)) followed a normal distribution; thus, the Student-t test was utilized. The VRT group showed an average time of 28.14 s (\(SD = 22.49\)) on the first day and showed an increased average time up to 73.78 (\(SD = 46.95\)) on the last day, with a statistical significance (\(p =.002\)**). In contrast, the VRO group started with an average time of 24.21 s (\(SD = 19.41\)) and only resulted in a slight increase to 29.90 (\(SD = 28.05\)) by the end of the week, with no statistical significance (\(p =.243\)).

The trends in the center of mass variability, another indicator of the trained balance ability, over 1 week are summarized in Table 11 and Fig. 6b. The Shapiro-0Wilk test confirmed that only the VRT (\(p =.146\)) followed a normal distribution, unlike the VRO (\(p <.001\)). The Student’s t-test revealed a significant increase in the VRT from day 1 to day 5 (\(p =.049\)*). However, the Wilcoxon signed-rank test showed no significant difference in the VRO (\(p =.526\)). These findings suggest that while the increased exposure to VR developed one’s ability to maintain bodily stability, accompanying balance training further enhanced this improvement.

Table 11 Center of mass variability results for 1 week in Experiment 2

Correlation analyses were performed between the center of mass and the level of CS. The Pearson correlation test indicated a strong positive relationship in the VRT (\(r = 0.344\)) with a significance (p value \(=.004\)**), suggesting that greater body stability is associated with a lower CS. In contrast, the VRO showed a positive but relatively weak relationship (\(r = 0.090\)) without a statistical significance (\(p =.459\)). These findings further support our assumption that enhancing balance ability through training improves tolerance to CS (related to H1 and H4).

5 Discussion

5.1 The effect of balance training on cybersickness

As discussed in length in Sect. 2.2, there have been several theories as to why and how CS occurs, such as the sensory mismatch (LaViola 2000; Rebenitsch and Owen 2016), lack of the rest frame (Harm et al. 1998), and postural instability (Riccio and Stoffregen 1991; Li et al. 2018; Smart et al. 2002). All such factors are plausible and debatable at the same time. While the proposal of immersive balance training for developing tolerance to CS hinges on postural instability in particular, it does not discount the effect of those other factors nor is it in conflict with them.

Our two experiments have shown significant reductions in CS symptoms in all the treatments. This trend was also observed, albeit to a lesser extent, in the VRO treatment that did not include training. These results indicate that repeated exposure to VR contents reduced sickness (Adhanom et al. 2022; Palmisano and Constable 2022), and it is difficult to deny that this effect may have influenced other conditions as well (in terms of what contributed to the reduction). On the other side, balance training may have reduced the sickness acting as cognitive distraction (Kourtesis et al. 2023; Venkatakrishnan et al. 2023). However, cognitive distraction alone makes it difficult to explain the transfer effect. The same is true for the mere exposure to a particular VR content.

Palmisano and Constable (2022) have shown that repeated exposure to VR content could significantly improve CS. However, this improvement was observed only for the very content the participants were exposed to, and it was not shown whether the effect extended to other VR contents. On the other hand, our experiment confirmed the transfer effect of the balance training to a completely different content. Only the VRT group, which engaged in immersive balance training, experienced significantly reduced CS in the transfer contents. This is the critical finding that sets forth the training (and the improved physical/mental capability) as the main culprit to the sickness reduction—more so than the exposure itself or distraction. This also signifies the potential practicality of the approach.

5.2 The potential of balance training on cybersickness

One representative experiment in the attempt to validate the postural instability theory by Riccio and Stoffregen (1991) showed the decreasing sickness levels in a provocative situation by the subject making a more widened and stable stance (Dennison and D’Zmura 2016). In contrast, the experiment in this work went to other ways, where the participants were purposely situated to be unstable (one leg stand), leading to a possible expectation that the “sickness” should increase according to the same theory. One important difference, however, is that the participants were also instructed to “learn” and train as to how to maintain their balance. Indeed, instead of the increased level of sickness, our results clearly show the reduction and even the transfer effects, singling out the very effect of the “training”.

Interestingly, according to Menshikova et al. (2017), when compared figure skaters, soccer players, and wushu fighters, figure skaters showed the most resilience to CS. Thus, innate or learned balancing capability seems related to tolerance to CS. Ritter et al. (2023) studied the VR-based (safe) training of balance beam performance with gymnastics beginners. Among others, the work showed that the participants generally performed worse in VR than in the real world. This indirectly suggests that, for CS improvement by balance training, the training environment will be important. Likewise, our results point in a similar direction, i.e., VRT being more effective than 2DT and even VRO.

As for 2DT, the level of the CS arising from the visual motion must have been less so to begin with compared to that by VR. The visual content has a substantially smaller field of view (approximately VRT: 100° vs. 2DT: 60°), and objects such as walls and the floor may serve as potential reference points. These are aspects that can diminish the training effect in 2DT as well. On a similar note, training for a spatial task (which the balancing or even withstanding CS from visual motion could be examples of) on the 2D oriented environment has shown a negative transfer effect to the corresponding 3D VR environment (Pausch et al. 1997).

Even though our study seems to show that extended exposure to VR does have an effect on building tolerance to CS, in relation to the related work, its firm establishment is still debatable. Even if it was, we believe that its effect is weaker and not so long-lasting than that of balance training. In balance training, the user makes a conscious effort to encode the relevant information into one’s proprioceptive and muscular control system. How long the training effect can be sustained would be a topic of future research.

5.3 Limitations and future works

Our study is limited in several aspects. CS is a truly multifactorial issue, including gender, age, the nature of the tasks undertaken, type of feedback, and multimodality (Feng et al. 2016; Peng et al. 2020) and the types of devices used (Kourtesis et al. 2023; Kim et al. 2008; Chang et al. 2020). Our work only investigated one such probable factor, i.e., balancing capability. While most factors mentioned above are known to influence the level of CS in one way or another, the variance from the individual difference is relatively large (Tian et al. 2022; Chang et al. 2020; Howard and Van Zandt 2021). Balancing capability can be considered a more predictable control factor (Arcioni et al. 2019; Chardonnet and Mérienne 2017). Training for it is also expected to be much less dependent on the immersive training environment (content genre). Note that the training process can be further expedited by employing multimodal feedback, guidance features, and gamification (Dietz et al. 2022; Juras et al. 2018; Prasertsakul et al. 2018).

Another limitation of our study is the relatively small number of participants in each training method group. The participant pool was limited to a specific demographic, i.e., young adult males, which further restricts the generalizability of our findings to other populations. It is, therefore, premature to extend our claims to a broader audience or diverse subject groups. Future studies should aim to address these limitations by conducting larger-scale experiments that include a more diverse participant pool in terms of age, gender, and background. Moreover, such studies should consider employing a wider variety of sickness-inducing or “provocative” VR content to evaluate the effectiveness of the training methods under different scenarios. Moreover, as there may be more fitting and proper balance training routines, these new VR contents may involve interaction techniques to guide such balancing acts more effectively, as demonstrated in Yang and Kim (2002). This would provide more comprehensive insights and increase the robustness and applicability of the findings across varied contexts.

6 Conclusion

In this paper, we conducted two experiments to observe the relationship between user balance learning and developing CS tolerance under different experimental conditions. The findings indicate that enhancing balance performance leads to an increased tolerance for CS. The study also corroborated the greater effectiveness of balance training in immersive environments compared to non-immersive settings. Furthermore, the improvement in the balance ability demonstrated sustainable effects, enabling individuals to tolerate CS in newly encountered VR content as well.

Although our findings are still preliminary, it is the first of its kind. If further validated with continued in-depth and larger-scale studies (e.g., including various postures such as seated, supine, and prone), we hope to be able to design and recommend a standard VR-based balance training for building tolerance to CS for active yet sickness-sensitive “wannabe” VR users (while also improving one’s fitness at the same time as a bonus).