1 Introduction

Contrary to acute stress, which represents an adaptive reaction (Chrousos and Gold 1992), chronic stress is defined as the prolonged psychological and physiological response to situations or stimuli that are perceived as threatening or overwhelming (Agorastos and Chrousos 2022). Across the lifespan, and particularly in childhood (Davis and Soistmann 2022), chronic stress is linked to a range of health issues, including obesity, inflammation, immune dysregulation (e.g., Oh et al. 2018; Piątkowska-Chmiel et al. 2023), as well as anxiety disorders and depression (e.g., Thorsén et al. 2022). As there is a global increase in distress, especially among children and adolescents (De Figueiredo et al. 2021), addressing chronic stress in therapy is key.

Among various stress coping techniques, biofeedback (BF) is commonly used technique for alleviating stress (Yu et al. 2018). BF describes a method of simultaneously feeding back physiological signals (e.g., heart rate, HR, or electrodermal activity) to the individual via visual and/or auditory channels. This approach stimulates interoceptive self-regulatory processes and allows for voluntary control over bodily responses, thereby decreasing stress and increasing self-efficacy (Blum et al. 2019). While past reviews and meta-analytic studies show that BF may be effectively used in adults (Goessl et al. 2017; Pizzoli et al. 2021) and pediatric populations (Thabrew et al. 2018; Umaç and Semerci 2023), there are several barriers to its implementation, particularly for younger patients (Lüddecke and Felnhofer 2022). For example, the visual display of signals has been criticized for being complex, abstract, hard to understand or insignificant to the individual (Blum et al. 2019; Kothgassner et al. 2022). Unappealing content, however, may lead to a lack of motivation and possibly to difficulties in sustained attention (Gaume et al. 2016).

In reaction to these detriments, virtual reality (VR) has repeatedly been proposed as a viable resort. VR provides contextually rich, dynamic content and—in the fully immersive form via a head mounted display (HMD)—allows users to immerse themselves in the task, thus preventing mind-wandering and decreases in motivation (Lüddecke and Felnhofer 2022). Furthermore, enhancing VR-environments with gamification elements like narratives, and incentives may contribute to overcoming said limitations (Kothgassner et al. 2022). Similarly, resorting to natural environments may also be beneficial. Being exposed to nature has been shown to replenish attentional resources (Kaplan 1995) and increase relaxation, an effect that was replicated for VR nature scenarios (e.g., Knaust et al. 2022).

Generally, past data supports the use of VR-BF, with positive effects on self-reported anxiety and HR (Kothgassner et al. 2022) as well as motivation, user experience and attentional focus (Lüddecke and Felnhofer 2022). Yet, most research is on adults. Only few studies test HMD-based VR-BF in children and adolescents: two (Bossenbroek et al. 2020; van Rooij et al. 2016) successfully implemented a diaphragmic breathing-VR-BF in 8–17-year-olds with anxiety and disruptive classroom behaviors, and one (Skalski et al. 2021) tested hemoencephalographic VR-BF in 7–15-year-olds with ADHD. However, small sample sizes and a lack of active control groups deem these findings preliminary. Although recent protocols (Orgil et al. 2023) and abstracts (Recker et al. 2023; Savaş et al. 2023) suggest pending studies, existing data does not yet permit solid conclusions about the feasibility and benefit of VR-BF for younger age groups.

Consequently, this randomized-controlled clinical trial set out to evaluate a self-developed VR-BF in children and adolescents with stress related disorders. Our main aim was to compare a VR-BF training to a standard 2D-BF (active control) to investigate whether the modality (immersive VR) and gamification elements may have superiority regarding their effects on chronic stress, clinical symptoms, the ability to relax, health related quality of life, and the occurrence of side effects.

2 Methods

2.1 Trial design

This trial consisted of a multicenter RCT, with an experimental treatment group receiving VR-based BF-training (VR-BF), and a paralleled active control group receiving a standard BF-training (2D-BF) on a 2D-computer screen. As several meta-reviews (e.g., Darling et al. 2020; Thabrew et al. 2018; Umaç and Semerci 2023) have established a good effectiveness of 2D-BF in pediatric patients, particularly in the context of treating anxiety and depression, we did not aim at evaluating the method per se. Instead, we chose an active BF comparator to examine the specific effect of BF-modality (2D vs. fully immersive 3D) and gamification (narrative, incentives, progression of play) in BF treatment. At all trial centers, participating children and adolescents continued to receive their usual medical treatments. To evaluate treatment effects, patients completed 3 assessments: baseline (T0), post-treatment (T1) and 3-month follow-up (T2). Participants and investigators were not blinded to the type of treatment. The trial was registered in OSF and in the German Clinical Trials Registry (OSF: osf.io/387wq, DRKS00033887, https://drks.de/search/de/trial/DRKS00033887), and it was approved by the study centers’ local ethics committees (Vienna: 1495/2019, Grieskirchen: 1145/2019). No changes were made to methods after trial commencement.

2.2 Participants

Children and adolescents were recruited between fall 2019 and winter 2022 from three Austrian healthcare institutions: (1) the Department of Pediatrics and Adolescent Medicine, (2) the Department of Child and Adolescent Psychiatry (both Medical University of Vienna), and (3) the Department of Child and Adolescent Psychiatry and Psychotherapeutic Medicine in Grieskirchen. Participants consisted of day-care patients, inpatients and outpatients, who were enrolled by clinical psychologists either during their check-up visits or during their inpatient stay and day-care. Inclusion criteria were: (1) 8–18 years of age, (2a) a primary diagnosis of mild to moderate depressive disorder (ICD 10, F32.0 or F32.1), and/or anxiety disorder (F40-F41), or (2b) a primary diagnosis of a chronic somatic illness (e.g., chronic inflammatory bowel disease) combined with a secondary diagnosis of mild to moderate depressive disorder and/or anxiety disorder. Patients’ diagnoses were extracted from the electronic databases at each participating center. Diagnoses were established a priori as part of the routine treatment by clinical psychologists/psychiatrists who were not part of the study team. Exclusion criteria were (1) estimated cognitive impairment (IQ < 70) and/or a diagnosis of intellectual disability, (2) a diagnosis (or history) of schizophrenia, schizotypal or delusional disorder (F20-29), bipolar affective disorder (F31), a manic episode (F30), or an acute crisis (suicidal self-injury, suicidality), (3) insufficient German proficiency, (4) poor visual acuity, and (5) motion sickness (kinetosis).

None of the patients were enrolled in any other clinical trials during the duration of this trial; also, they received no remuneration for their participation. For each patient, a legal guardian (a parent in all cases) provided written informed consent. Additionally, written consents (for patients ≥ 14 years of age) or assents (< 14 years of age) were obtained from participating children and adolescents. Patients meeting the eligibility criteria were randomly allocated in a 1:1 ratio to either the VR-BF treatment group (VR-BF) or the standard 2D-BF treatment group (2D-BF). A stratified allocation scheme (considering age and sex) was used. Allocation was performed by study staff, assessments at T0, T1 and T2, as well as treatments were all carried out by trained clinical psychologists, who were not blinded to the treatment allocation of the participants.

2.3 Interventions

Both interventions consisted of 10 individual 50–60-min on-site sessions, which were held once a week. Clinical psychologists were trained for these purposes and received supervision from the principal investigators on a regular basis.

The two interventions (VR-BF and 2D-BF) both included the following treatment elements: (1) psychoeducation (e.g., about the emotional, behavioral, cognitive and physiological correlates of stress; stress-coping strategies); (2) identification of individual treatment goals related to the training content (e.g., “being more relaxed with peers”, “experiencing less anxiety before and during an exam”); (3) introduction and practice of effective relaxation techniques such as Progressive Muscle Relaxation by Jacobson (PMR; Jacobson 1929), Autogenic Training (AT; Schultz 1987), targeted breathing (e.g., resonant breathing, diaphragmic breathing, Fried 1993), and positive imagery techniques such as fantasy journeys and stories (e.g., Captain Nemo Story; Petermann and Petermann 2018); (4) specific training runs (similar to Aggensteiner et al. 2024), (5) selection and transfer of a preferred relaxation method into daily life; (6) discussion of these transfer exercises at the beginning of each session; and (7) the conclusion of the treatment. Per session, the time structure was as follows: psychoeducation and pre-discussions (10 min), introduction/conjoint practice of relaxation technique (10–15 min), training runs in VR/with 2D-BF (20–25 min), recap, feedback, and home exercises (10–15 min).

In the training runs in VR and 2D, a baseline was assessed for 10-20 s which subsequently served as an individual threshold for patients’ heart rate (HR). Then, we reverted to three forms of feedback: (1) the first run directly fed back the HR to patients. This also contained information about the threshold for down-regulating HR. Additionally, patients were asked to up-regulate HR (e.g., by hyperventilating). (2) The second run included a dynamic animation of an object (see below). Again, HR had to be downregulated to a given threshold. (3) The third run required patients to reach the HR-threshold without visual feedback. This provided a basis for transferring learnings into daily life.

2.3.1 VR-biofeedback treatment group (VR-BF)

We developed the VR-BF game ‘Conquer Catharsis’ using Unreal Engine 4 for programming and a custom-made APP for mapping the participants’ real-time HR onto the virtual environment (for technical details, such as the used algorithm for HR-feedback see Lenz et al. 2020). HR was measured via a chest strap POLAR V800 sensor (Polar, Finland, sampling rate: 1000 Hz) and connected to a tablet (HUAWEI MediaPad T5) via Bluetooth; the APP relayed the HR changes via Wi-Fi to the VR. Psychologists could individually set the HR threshold via the APP. Also, they were able to track what participants saw and did in VR on a separate screen.

The VR-BF ‘Conquer Catharsis’ consists of a nature environment, i.e., a fictitious island which is displayed via a 360° HMD (HTC vive, Taiwan) and may be freely explored using full body tracking and handheld controllers for teleportation and interaction with virtual objects. Parts of the island only become accessible upon successfully completing mini-games (8 in total). Each of these mini-games is solved by relaxing, i.e., by down-regulating the HR below the threshold for 10 s. For example, one mini-game involves the task of emerging a platform from under the water to continue to the other shore. With each step of relaxation, parts of the platform progressively begin to appear above the surface (Fig. 1). However, progress is lost, if HR-levels increase. Per session, one mini-game was played. In the last two sessions, participants chose their favorite mini-game(s) to consolidate acquired relaxation techniques. Exercises were completed both in a sitting (in a chair) and a standing position.

Fig. 1
figure 1

Heart-rate based changes in scenery of the VR-biofeedback game ‘Conquer Catharsis’

2.3.2 Standard biofeedback treatment group (2D-BF)

For 2D-BF, the Schuhfried Biofeedback Xpert system (Schuhfried, Mödling, Austria) was used alongside a conventional 2D computer screen to display the content. Participants wore the multi-sensor (sampling rate for HR: 1000 Hz) on the index finger of their non-dominant hand, which simultaneously assesses HR, temperature, and electrodermal activity (EDA). For the purposes of this study, predominantly exercises with HR were chosen to establish comparability with the VR-BF group. The exercises used in the 2D-BF group resembled the VR-mini-games and included, for instance, down-regulating HR to make apples grow on a tree, to morph a caterpillar into a butterfly, or to let the sun rise over a nature beach scene. Like in the VR-BF group, 2D-BF-participants were alternately standing and sitting.

2.4 Outcomes

All primary and secondary outcomes were assessed by trained clinical psychologists at baseline (T0, before the start of the intervention), post-intervention (T1, after the last session) and at a 3-month follow up (T2, 3 months after the last session). Participants and their caregivers were assessed separately on-site.

2.4.1 Primary outcome

2.4.1.1 Chronic stress

The German 10-item version of the Perceived Stress Scale (PSS, Klein et al. 2016), adapted for children, was used to assess the primary outcome chronic stress, including two subscales, helplessness and self-efficacy, at T0, T1 and T2. The PSS (Cohen et al. 1983) is one of the most widely used, psychometrically sound self-report instruments measuring subjectively perceived stress due to uncontrollable and overwhelming life experiences in the latest month. The 10 questions (e.g., “How often have you been surprised by unexpected events in the last month?”) are answered on a 5-point Likert scale (0 = never to 4 = very often). Alongside a global factor, the PSS has a two-factor structure (Davis and Turner-Cobb 2023), with two subscales: Perceived Helplessness and Perceived Self-Efficacy.

2.4.2 Secondary outcomes

2.4.2.1 Symptom severity

To assess clinically relevant symptom severity, patients completed the Strengths and Difficulties Questionnaire (SDQ, Goodman et al. 2000; https://www.sdqinfo.org) at T0, T1 and T2. The SDQ is among the most used screening instruments worldwide for assessing children’s and adolescents’ mental health status. It consists of 25 items on a 3-point Likert scale (0 = not applicable, 3 = definitely applicable) and the following subscales: (1) Emotional Symptoms (item example: “Many worries or often seems worried”), (2) Conduct Problems (e.g., “Often fights with other youth or bullies them”), (3) Hyperactivity/Inattention (e.g., “Restless, overactive, cannot stay still for long”), (4) Peer Relationship Problems (e.g., “Would rather be alone than with other youth”) and (5) Prosocial Behavior (e.g., “Helpful if someone is hurt, upset or feeling ill”).

2.4.2.2 Quality of life

The German KINDLR questionnaire (Ravens-Sieberer and Bullinger 2000; https://www.kindl.org/) was used at T0, T1 and T2 for assessing proxy (parent) reports of health-related quality of life (HrQoL) in children and adolescents. Age-adjusted versions reliably and validly (Ravens-Sieberer and Bullinger 2003) assess HrQoL of life with 24 items on a 5-point Likert scale (1 = never, 5 = always). HrQoL is measured with regards to (1) Somatic Well-Being, (2) Psychological Well-Being, (3) Self-Worth, (4) Family, (5) Friends, and (6) School.

2.4.2.3 Ability to relax

Participants’ global ability to relax was measured with a single item in patients (“How well can you relax in the following situations?”) on a visual analogue scale (VAS, 100 mm, endpoints: 0 = not at all–100 = very well). at T0, T1, T2, and for three areas of interest: (1) in Everyday Life, (2) at School, and (3) with Peers.

2.4.2.4 Evaluation of training

To evaluate patients’ perceptions and experiences of the treatment, patients answered four questions on a VAS (100 mm, 0 = not at all–100 = very much): (1) Did the training help you? (Helped), (2) Were you bored during the training? (Boring), (3) Did you enjoy the training? (Fun), (4) How did you like the training? (Engaged).

2.4.2.5 Side effects

Based on recommendations (Ioannidis et al. 2004), investigators also recorded all side effects and adverse reactions that occurred related to the used technology (VR-BF: HMD, POLAR chest belt; 2D-BF: finger sensor, desktop monitor), such as vertigo, nausea, fatigue.

2.5 Sample size

Sample size was calculated a priori. Based on meta-analyses in pediatric populations (Darling et al. 2020; Umaç and Semerci 2023) we expect medium effect-sizes for within-effects for the VR-BF-group (Cohen’s f of 0.25), which requires N = 36 for the overall sample. For our rmANOVAs, we were expecting small to moderate effect sizes (Cohen’s f of 0.20–0.25) for the interaction effects condition x time, which requires N = 28–42 for the total sample (i.e., 14 and 16 per group, respectively) to be clinically significant. Power analysis using G*Power (Faul et al. 2009) with α = 0.05 and 1−β = 0.80 indicated that our study was sufficiently powered.

2.6 Statistical methods

All statistical analyses were conducted using IBM SPSS 26. To examine the effects of group (2 levels: VR-BF vs. 2D-BF) and time (3 levels: pre-treatment, post-treatment, 3-month follow-up) on the primary outcome chronic stress, including the two subscales helplessness and self-efficacy, a repeated measures ANOVA was conducted. Time was treated as a within-subjects factor, and group as a between-subjects factor. Interaction effects between time and group were also assessed to determine whether changes over time differed between groups. Bonferroni-corrected simple effects analyses were applied where interactions were significant. The same procedure was used to analyze secondary outcomes: symptom severity scores and relaxation capacity.

3 Results

3.1 Sample

In total, 51 patients were included in this trial between fall 2019 and winter 2022. Due to the measures taken in response to the COVID-19 crisis, the study had to be suspended briefly between April 2020 and November 2020, with no detriment to follow-up assessments of already concluded treatments (for patient flow see Fig. 2).

Fig. 2
figure 2

CONSORT patient flow diagram

In sum, 39 participants (VR-BF = 19; 2D-BF = 20) with a mean age of Mage = 13.34 (SDage = 1.99, range: 9–18) in the overall sample (p = 0.301 for age-comparisons between VR-BF vs. 2D-BF) completed the baseline assessment (T0) and the treatment (10 sessions), as well as post-treatment (T1) evaluations; one 2D-BF-patient failed to complete the 3-month-follow-up (T2), resulting in VR-BF = 19 and 2D-BF = 19 complete data sets. The trial ended in April 2023, after completion of the last follow-up assessments. Regarding the clinical sample characteristics, there was a group difference in type of hospitalizations (χ2 = 10.812, p = 0.004). Day-care (50%) and outpatient (40%) patients were more common in the 2D-BF group, while the VR-BF group had a higher proportion of inpatients (58%). However, there was no significant difference between the three types of hospitalization at baseline regarding perceived chronic stress (F = 1.507, p = 0.234). Descriptive, non-significant differences were also observed in the distribution of diagnoses. In the VR-BF group, 57% had a primary diagnosis of anxiety or depressive disorder without comorbidity, compared to 30% in the 2D-BF group. Comorbid anxiety and depression were present in 21% of VR-BF patients and 40% of 2D-BF patients, respectively. Additionally, a primary somatic illness (e.g., inflammatory bowel disease, urological disease, dilated cardiomyopathy) with comorbid anxiety or depressive disorder was diagnosed in 21% (VR-BF) and 30% (2D-BF) of participants. Depressive disorders across both groups were mild to moderate, with no cases of severe depression. For demographic and clinical characteristics of the final sample (N = 39) see Table 1.

Table 1 Demographic and clinical characteristics of the sample

3.2 Primary outcome: chronic stress

We found a significant decrease of chronic stress (F = 15.669, p = 0.001, par. η2 = 0.303) for both groups, but no condition x time interaction effect (F = 0.357, p = 0.701, par. η2 = 0.010), and no effect between VR-BF and 2D-BF (F = 0.133, p = 0.718, par. η2 = 0.004). A comparable result can be reported for both PSS-subscales. There was a significant decrease regarding Perceived Helplessness in both groups (F = 8.580, p = 0.001, par. η2 = 0.192), but neither a condition x time interaction (F = 0.160, p = 0.852, par. η2 = 0.004) nor a between-group effect (F = 0.002, p = 0.965, par. η2 = 0.000) was found. Perceived Self-Efficacy increased during both treatments (F = 12.257, p = 0.001, par. η2 = 0.036), but there was again no condition x time interaction (F = 1.362, p = 0.263, par. η2 = 0.010) and no significant between-group effect (F = 0.337, p = 0.377, par. η2 = 0.022). Results are depicted in Fig. 3.

Fig. 3
figure 3

Chronic stress self-ratings (means ± SEM) at pre-treatment, post-treatment and 3-month follow-up

3.3 Secondary outcomes: symptom severity

Our results indicate a significant decrease of Emotional Symptoms (F = 5.042, p = 0.009, par. η2 = 0.123) in both groups, but no condition x group (F = 0.062, p = 0.940, par. η2 = 0.002) or between-subject effect was found (F = 0.238, p = 0.628, par. η2 = 0.007). There was neither a pre-post-effect (F = 0.755, p = 0.474, par. η2 = 0.021), nor an interaction effect (F = 0.636, p = 0.391, par. η2 = 0.017) regarding Conduct Problems, but a between-subject effect in this subscale (F = 10.280, p = 0.003, par. η2 = 0.222) indicating more conduct problems in the 2D-BF group over all three time points. Further, no pre-post effect (F = 1.322, p = 0.273, par. η2 = 0.035) and no interaction effect (F = 0.351, p = 0.697, par. η2 = 0.010) was found regarding Hyperactivity/Inattention. Again, there was a group effect indicating higher levels of hyperactivity in the 2D-BF group (F = 6.031, p = 0.019, par. η2 = 0.143) over all three time points. Moreover, there was a significant decrease of Peer Relationship Problems (F = 6.293, p = 0.003, par. η2 = 0.149); however, there was no condition x time (F = 0.790, p = 0.458, par. η2 = 0.021) or between-subject effect (F = 1.393, p = 0.246, par. η2 = 0.037) regarding peer problems. There was no effect on Prosocial Behavior (F = 0.180, p = 0.863, par. η2 = 0.005) and no interaction effect (F = 0.778, p = 0.459, par. η2 = 0.021), but a higher reported prosocial behavior over all three time points in the VR-BF group (F = 23.523, p = 0.001 par. η2 = 0.395).

3.4 Secondary outcomes: health-related quality of life

Table 2 illustrates the rmANOVA results of the HrQoL scale, indicating an increase of HrQoL in the Psychological well-being, Self-worth, and Peer- and School-related QoL subdomains for both treatment groups. However, there were no other between-group effects in the pre-post or the 3-month follow-up regarding HrQoL, and no significant interaction effects.

Table 2 Results regarding HrQoL (pre-, post- and 3-month follow-up)

3.5 Secondary outcomes: training evaluation and ability to relax

Patients’ evaluations of the training showed no significant differences between VR-BF and 2D-BF in the VAS (see Fig. 4A). Additionally, there was no significant effect regarding the ability to relax in Everyday Life (time: F = 3.950, p = 0.054, par. η2 = 0.096, condition x time: F = 1.344, p = 0.254, par. η2 = 0.035, between-subject: F = 0.164, p = 0.688, par. η2 = 0.004). There was a significant increase (F = 6.864, p = 0.013, par. η2 = 0.156) regarding the ability to relax in School, but there was no condition x time interaction effect (F = 1.743, p = 0.195, par. η2 = 0.045) and no significant effect between the groups (F = 0.981, p = 0.328, par. η2 = 0.026). Also, there was a significant pre-post effect regarding the ability to relax with Peers (F = 8.749, p = 0.005, par. η2 = 0.191). There was no condition x time effect (F = 0.750, p = 0.392, par. η2 = 0.020) and no between-group effect (F = 0.057, p = 0.392, par. η2 = 0.002); see Fig. 4B–D.

Fig. 4
figure 4

Patient ratings after treatment and ability to relax (means ± SEM) at pre-treatment and post-treatment

3.6 Secondary outcomes: side-effects

Investigators detected no technology-related or treatment-related side effects, such as vertigo, nausea, or fatigue, in either group. Consequently, no analyses were conducted. Also, no patients dropped out of the study due to adverse events.

4 Discussion

This multicenter randomized-controlled clinical trial was designed to compare a self-developed VR-BF for children and adolescents with stress related disorders such as anxiety and depression to a standard 2D-BF (active control) to investigate whether the modality (fully immersive VR vs. 2D) and gamification elements may have superiority regarding their effects on chronic stress, clinical symptoms, the ability to relax, health related quality of life (HrQoL), and the occurrence of side effects.

Both treatment groups showed stable improvements post-treatment and at a 3-month-follow-up in our primary outcome, chronic stress, with the subscales helplessness and self-efficacy (small to large effect sizes: par. η2 = 0.036–0.303), and in the secondary outcomes emotional symptoms (par. η2 = 0.123) and peer relationship problems (par. η2 = 0.149), the ability to relax at school (par. η2 = 0.156) and with peers (par. η2 = 0.191), and the HrQoL subdomains psychological well-being, self-worth, peer- and school-related QoL (large effect sizes: par. η2 = 0.139–0.890). As no between-group differences emerged in any outcome, our fully immersive, gamified VR-BF treatment was neither superior nor inferior to standard 2D-BF. This aligns with recent findings, including a scoping review of 18 BF studies in adults (Lüddecke and Felnhofer 2022), which concluded that VR-BF and 2D-BF were equally effective in reducing stress and anxiety across various disorders. Similarly, a meta-analysis of seven studies in adults with anxiety disorders (Kothgassner et al. 2022) found no advantage of VR-BF over 2D-BF. The assumption that VR, combined with gamification elements, as applied in our study, would particularly benefit children by enhancing motivation and attentional focus was not supported. Participants in both the VR-BF and 2D-BF groups rated their treatments as equally enjoyable, engaging, and helpful. Consequently, aspects of our VR-BF approach may require critical reevaluation to better harness the potential of VR and gamification elements.

Several elements of our VR-BF-game ‘Conquer Catharsis’ such as the narrative (participants are told, they are stranded on the island and have to find their way to the top to be rescued), the progression of play (solving one mini-task clears the path to the next one) and incentives (participants are rewarded by receiving access to more parts of the island) could be emphasized even more by including leaderboards, or a map of the island to track progress (see Dalmina et al. 2019). Adopting a co-design approach that actively involves target patients could ensure VR content is better tailored to users’ needs and preferences, making it more engaging and effective (Bevan Jones et al. 2020). Also, a broader range of physiological signals could be integrated in VR-BF. We decided to use HR as our primary parameter, as the POLAR chest belt provided an easy-to-use solution. Yet, recent VR publications (Bossenbroek et al. 2020; Recker et al. 2023) suggest that especially respiratory BF may be a promising modality for children and adolescents.

Furthermore, the lack of an observed advantage for VR-BF may be attributable to mediating and moderating factors such as participants’ age, disorder type, and treatment settings. However, small subgroup sizes and limited statistical power precluded confirmation of these effects. Generally, research suggests that a child’s developmental stage may impact the effectiveness of BF interventions (Culbert et al. 1996). For instance, younger children, with their shorter attention spans, benefit from shorter BF sessions and find it easier to use modalities that are more directly influenced, such as electromyography (EMG), compared to less manipulable ones like HR, especially when combined with game-based mechanisms. By contrast, children aged 9 or 10 years can typically engage with sessions lasting 40–50 min and more abstract materials (Culbert et al. 1996). To enhance methodological rigor, future studies should consider narrowing the age range to include either children or adolescents exclusively. This would allow VR gamification elements to be better tailored to the developmental level and preferences of each group.

Another key consideration is the variation in diagnoses and treatment settings. Despite our efforts to render the groups comparable on key variables, random allocation of participants across three study centers led to slight imbalances between the VR-BF and 2D-BF groups. Specifically, inpatients were overrepresented in the VR-BF group, indicating that this group experienced a more intensive treatment context, including hospitalization, separation from family, and a more rigorous therapeutic regime compared to the 2D-BF group. Additionally, while the groups were similar in terms of internalizing and externalizing problems and overall symptom severity, differences emerged in prosocial behaviors and conduct problems. These differences are likely not due to age, sex, or treatment setting and may instead reflect subtle variations in parental perceptions of symptom burden.

Future VR-BF studies in children should incorporate a broader range of measures to provide a more nuanced understanding of the specific impact of BF on key outcomes. For example, the self-report measures used here may have been insufficiently sensitive to detect group differences in stress reactivity and chronic stress levels. Incorporating objective measures, such as physiological signals, could enable allow more in-depth analyses (see Weibel et al. 2023). Additionally, the transfer of skills warrants a more systematic evaluation. While participants were asked about practicing skills at home, we did not formally assess this aspect. Incorporating patient self-reports besides proxy reports regarding training frequency, and an operationalization of skill transfer, as well as implementing in-session assessments of physiological reactivity would all constitute valuable additions to future research in this field.

Despite its lack of superiority over 2D-BF, using VR-BF in children and adolescents with stress related disorders shows promise. VR-BF had a marked positive effect on key outcomes like chronic stress, emotional symptoms, peer-relationships, well-being, self-worth and peer- and school-related QoL. This was observed both in the short and long term at 3 months post treatment. In sum, this study represents the first randomized controlled trial (RCT)—aside from one involving ADHD patients (Skalski et al. 2021)—to examine fully immersive VR-BF training in children and adolescents. It contributes to a deeper understanding of the use of HMD-based, gamified VR in biofeedback (BF) treatment. Notably, the inclusion of an active control group allows for more robust conclusions about modality-specific effects. Also, we made the effort of documenting side effects like nausea or vertigo, as past studies tended to lack an assessment of adverse events (see Kothgassner et al. 2022). Neither in the standard 2D-BF, nor—more importantly—in the VR-BF, side effects were detected. This finding is encouraging, particularly with regards to the use of HMDs in children. Since commercially available HMDs are designed for adults, feasibility in children has been a subject of some debate (e.g., Lauer et al. 2021; Newbutt et al. 2020). In our trial, none of the participants reported any discomfort wearing the HMD.

4.1 Limitations

Potential limitations include a relatively broad age range (9–18 years), different diagnoses (i.e., anxiety disorders and/or depression, and primary diagnoses of somatic illnesses) and varying treatment settings. These factors could not be addressed statistically due to a lack of power. However, age did not differ between the two treatment groups, and symptom severity was comparable (see our pre-trial screening CBCL, Table 1). Yet, in the VR-BF group, inpatients were overrepresented. Future studies should thus strive to increase homogeneity of groups and decrease potential moderating or mediating influences. Furthermore, this study compared two different BF-systems with differing algorithms: a commercially available system (Schuhfried BF Xpert) and a custom-made VR program (‘Conquer Catharsis’, Lenz et al. 2020). While such comparisons are not uncommon in BF-research (Lüddecke and Felnhofer 2022), they introduce some variability in feedback routines and graphical outputs, potentially serving as a source of bias. To minimize this variability, both our systems used the same physiological parameter (HR) with the same sampling rate and incorporated comparable tasks in both groups.

4.2 Implications for clinical practice

As the implementation of VR involves considerable effort, its failure to demonstrate superiority over standard 2D-BF merits the question of its clinical benefit. Due to limited data, it is difficult to discern why VR-BF does not live up to its expectations. Possibly, the benefits of this interactive, engaging and dynamic medium may only be fully realized in a different context. For instance, promising developments encompass smartphone APPs, which allow not only accessible and continuous tracking of physiological signals but also self-directed skill acquisition (e.g., Khaleghi et al. 2024). Similarly, the implementation of Augmented Reality (AR) transfers BF-tasks into those settings that are significant to the individual (e.g., their home environment), thereby facilitating the transfer of BF-learnings (e.g., Connelly et al. 2023). Finally, incorporating other modes of feedback, like tactile feedback, may offer an additional advantage over conventional solutions that primarily revert to visual channels. In any case, more robust evidence is needed, and future studies will lead the way.