1 Introduction and background

Scholars have discussed the concept of virtual reality (VR) since Weinbaum (1935). In his novel Pygmalion’s Spectacles, Weinbaum describes the usage of technology in glasses to enter interactive worlds. Broderick (1982) introduced the term virtual reality in his publication The Judas Mandala. At the beginning of the 20th century, VR has been adapted in many industries and disciplines. In particular, the COVID-19 pandemic and subsequent lockdown have led to immediate activities for digitisation in schools and higher education, which pushed forward learning technologies, including VR, in unprecedented ways (Stracke et al. 2022a, b). The use of VR in education remains in its infancy but is steadily increasing due to the increased quality and availability of affordable devices. Although VR was already an object of fascination in computer-generated two-dimensional worlds, the technology has become transparent for users nowadays and has enabled new ways of experiencing frameless virtual environments (Przybylka 2022). Thus, VR has developed a potential for structural integration in higher education. In this systematic literature review (SLR), we summarise and analyse the scientific literature from Web of Science related to learning in higher education using immersive virtual reality (IVR) in a systematic manner for the first time.

Learning in VR environments has been partly explored in other SLRs before. For instance, one SLR in language learning found that VR positively influenced the motivation and vocabulary acquisition of language learners (Alizadeh and Cowie 2022). Another SLR focused on the educational benefits of IVR applications but only in STEM disciplines (K-12 and higher education) (Pellas et al. 2020). Pellas et al. (2021) underscored immersive VR (the authors addressed higher education and K-12), but the study did not follow a strict definition of IVR; thus, it combined and analysed any type of VR, including desktop-based solutions. Two other SLRs investigated immersive learning experiences only for health profession students (Jiang et al. 2022; Qiao et al. 2021). Therefore, the current study infers that only three SLRs explicitly addressed IVR in general (Radianti et al. 2020; Hamilton et al. 2020; Di Natale et al. 2020), which are detailed as follows.

First, Radianti et al. (2020) explored the application of IVR to higher education. In their semi-automatic article selection, they did not distinguish among the types of VR and used many inclusion keywords. The results revealed high levels of interest in VR for educational purposes, which covered 18 application domains. However, many selected VR applications were in the experimental stage and were only partially integrated into regular curricula. Moreover, few applications were based on specific learning theories and evaluations frequently focused on usability instead of learning outcomes. The review highlighted that although VR is regularly used in fields, such as engineering and computer science for specific skills training, its application remains sporadic and experimental in the majority of domains. The article pointed out significant gaps in systematic design-oriented studies grounded in learning theories and detailed descriptions of VR integration into curricula. The authors recommended that future research should address these research gaps by emphasising learning outcomes and exploring the practical adoption of VR to regular teaching practices. They concluded that while VR holds promise for higher education, added systematic application and integration, which are based on best practices and learning theories, are required to realise its full potential.

Hamilton et al. (2020) conducted a review on 29 experimental studies on the impact of IVR on education. The key findings mentioned that the majority of studies reported the significant benefits of IVR such as improved engagement and knowledge retention. However, other studies mentioned no significant differences between IVR and non-immersive VR methods out of which two cited negative effects. Many studies employed brief intervention periods and did not assess long-term retention, which limited the understanding of the sustained impact of IVR. The review also observed a focus on scientific subjects and highlighted methodological inadequacies in evaluating learning outcomes. The authors called for the formulation of rigorous methods and well-designed interventions to fully realise the potential of IVR in education.

Lastly, Di Natale et al. (2020) was the only SLR that was also included in the selected articles of the current SLR. The authors examined the use and effectiveness of IVR in educational settings over the past decade and underscored a timespan in which VR technology significantly evolved. The authors proposed that IVR has a significant potential to enhance learning and engagement due to its immersive and interactive nature, which provides realistic experiences. However, the selected studies produced mixed results regarding its impact on learning outcomes; while others reported positive effects on achievement and engagement, others did not identify significant benefits. The review stated that IVR can positively influence motivational outcomes, such as student interest and satisfaction, although the consistency of these measurements varies. One of the major challenges associated with IVR is cyber sickness, which includes symptoms, such as discomfort, headache, dizziness and nausea, which poses a barrier to its wide acceptance and feasibility in education. The SLR highlighted the need for rigorous methodological approaches in future research, which suggests that studies should align learning methods with appropriate assessment measures. It recommended the exploration of affordable and accessible immersive tools, such as 360° videos and called for further investigation into the cognitive and affective processes involved in learning with IVR. The article underscored the promise of IVR in education but emphasised the need for additional research to optimise its use and address the existing challenges.

These SLRs identified that VR technologies hold the potential to revolutionise traditional learning methods by providing dynamic and immersive environments that bridge the gap between physical and virtual laboratories and classrooms. Unsurprisingly, this approach gained attention in higher education due to its potential to improve learning outcomes, motivation and skill acquisition. UKAuthority determined that 96% of universities and 79% of colleges in the United Kingdom are utilising AR (Augmented Reality) or VR (Say 2019). Whether or not these findings differentiate levels of immersion remains unknown; nevertheless, they demonstrated that the terms and technologies are broadly accepted in education.

Although interest in the utilisation of immersive learning in higher education is increasing, no commonly agreed and fixed definitions of the terms non-immersive, semi-immersive and immersive VR currently exist, to the best of our knowledge. Therefore, we intend to address this research gap.

In particular, the current literature lacks a systematic analysis of the relationship between IVR and (higher) education, as demonstrated in the review of the abovementioned SLRs. Radianti et al. (2020) and Di Natale et al. (2020) did not distinguish between types of VR, while Hamilton et al. (2020) limited their selection to only experimental evidence-based studies that compared immersive and non-immersive VR. In addition, Di Natale et al. (2020) included K–12 education but (very) strongly filtered the collected 1,080 articles, which led to only 18 studies that included semi-immersive VR such as 360° videos. Therefore, the current study concludes that a systematic review of the scientific literature on the use of IVR in higher education in general remains lacking. The current SLR intends to fill this research gap in IVR for higher education, a topic whose importance is expected to increase in the near future. Moreover, the results can inform future research and be used as a framework for differentiating and classifying theoretical concepts and practical approaches.

2 Levels of immersion in VR and learning design for IVR

The use of the term VR in the scientific literature is diverse and sometimes misleading as it may refer to different input and output technologies such as the use of a computer screen or a head mounted display (HMD). The understanding about the term VR differs between experts and non-experts. In a first overview, we noted that publications related to the application of VR in higher education use the term VR less precisely and more related to the kind of application than the technical specifics. In general, the broadest definition describes VR as ‘a computer-generated world’ (Pan and Hamilton 2018) including visual, auditory and haptic elements. However, this definition can be considered very simplistic (Slater 2018), because it excludes the multiple roles of perception and interaction channels. Consequently, VR can be categorized by the level of immersion experienced by users. Immersion denotes the extent of the willingness of users to perceive the virtual environment as alternative reality (Wilkinson et al. 2021). With their Reality-Virtual Continuum, Milgram and Kishino (1994) define VR in conjunction to Reality, Augmented Reality, and Augmented Virtuality. While these definitions are accepted in the VR community, they are less relevant and not broadly used in application domains like higher education.

For example, we found publications on applications in higher education which are using 360° images or videos with very limited interaction which are denoted as VR, too. From an expert point of view these applications do not or do rarely fulfil technical requirements for VR. This presented SLR shows the current state of VR in high-er education from the perspective of educators. Because these applications are accepted in higher education as VR, we orient the following broadly clustering related to Bamodu and Ye (2013). Therefore, we differentiate the level of immersion into non-immersive, semi-immersive and immersive VR. As these terms are frequently interchanged and confused, we provide brief definitions explanations of our differentiations and a graphical representation (Fig. 1, based on Salatino et al. (2023).

Fig. 1
figure 1

Three types of virtual reality

Non-immersive VR refers to VR experiences in which users view virtual content on a conventional display, remaining users aware of a visible boundary and frame between the virtual and the real world. Interactions indirectly occur via classical devices such as a mouse, keyboard or joystick. Note that this class is not truly VR, however, the term VR is often used to reflect this setting and thus is often misleading.

Semi-immersive VR still frames the virtual environment but uses large-scale projection surfaces, making visible boundaries and frames less apparent to users. Additionally, interactions occur via freehand gestures or tangible interfaces.

Immersive VR eliminates visible boundaries and frames allowing users to be fully enveloped in the virtual environment. Cave automatic virtual environments (CAVEs) employ a multi-projector configuration that envelops the user. These systems continue to be utilized in professional settings. Currently, HMDs are more frequently used and are affordable for private use. In immersive VR, visualisation continuously adapts to the head movements, including head rotation and translation. Input is performed via freehand gestures or special 3D interfaces such as controllers or haptic gloves.

Immersion is dependent not only on perception of and interaction in the virtual world or on the technology used but also on the quality of use. Instead of interacting with 2D interface elements, users can experience in-depth immersion if they can naturally interact with their surroundings. As shown by Hepperle and Wölfel (2023) many differences exist between conventional displays and immersive VR. Therefore, the level of interaction plays a fundamental role in determining the ability of a technology to immerse participants in virtual worlds. For example, passively viewing a 360° camera footage through an HMD is not considered fully immersive because users are only spectators (however, this viewpoint is not shared by everybody and still debated).

The sense of presence in VR is the central aspect that describes the subjective perception of uses of actually being in a virtual environment. This trick of the senses is due to the illusions of place and plausibility (Slater et al. 2022). These subjective misperceptions are inherently derived from the objective, technological degrees of immersion and implementation (ibid.) and contribute to a convincing and realistic VR experience. For the presented literature review, the related articles will be selected due to their relevance to immersive VR. The concept of immersion has become increasingly distinguished over time and is a key selection criterion in the analysis process.

With regard to pedagogical approaches, integrating technology into the learning process should not be limited to mere application but should deliberately pursue educational objectives and pedagogical principles (Wölfel 2023, p. 331). The learning design of IVR applications should begin an in-depth analysis of requirements that covers all educational levels to ensure an effective and successful learning experience (Stracke 2019). Future challenges in (higher) education and the design of appropriate learning approaches and scenarios include a combination of open education (ibid.), open educational resources (Tlili et al. 2022; Stracke et al. 2023a, b), artificial intelligence (Bozkurt et al. 2023), digital twins (Eisenträger et al. 2018), avatars and robots (Huang et al. 2023 ).

This systematic literature review (SLR) focuses on immersive VR (IVR) and its use in higher education. We pose the following research questions:

  • RQ1: How can scientific literature on the use of IVR in higher education be categorised and clustered based on formal aspects and research design?

  • RQ2: What are the current results and outcomes of the scientific research on the use of IVR in higher education?

3 Methodology

The systematic review strictly followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement and its procedures (Moher et al. 2009; Page et al. 2021) as well as the predefined research protocol of the current study, which was adopted from Stracke et al. (2023a, b). The PRISMA statement and procedures require four review phases for the collection and selection of articles for a systematic review:

  1. 1.

    Identification,

  2. 2.

    Screening,

  3. 3.

    Eligibility, and.

  4. 4.

    Included.

For the first phase, identification, we searched the Web of ScienceFootnote 1 (owned by the private company Clarivate Analytics) database, which is the most stringent indexing service for scientific journal articles. On 10 November 2022, we collected all records using the predefined search term in the Advanced Search Query Builder without any limitations:

$$ {\mathbf{TS}} = \left( {\left( {{{``}}{\mathbf{virtual}}\;{\mathbf{reality}}{{''}}} \right)\;{\mathbf{AND}}\;\left( {{{``}}{\mathbf{higher}}\;{\mathbf{education}}{{''}}} \right)} \right). $$

Thus, all records until 10 November 2022 were collected without time restriction. Only one source was included, thus, no elimination of duplicates was required.

In the second phase, screening, the collected records were reviewed and filtered using the formal exclusion and inclusion criteria (Table 1) that both have to be met based on the titles and abstracts.

Table 1 Inclusion and exclusion criteria for the formal screening of collected records

In the third phase, eligibility, the remaining articles were reviewed based on their full texts using content-related inclusion and exclusion criteria (Table 2) that both have to be met following our definition of immersive VR from the section before.

Table 2 Exclusion and inclusion criteria for the content-related screening of collected articles

Finally, in the fourth phase, included, the remaining articles were checked and selected for the systematic review and its in-depth analysis and subsequent discussion related to the research questions.

4 Results

We present the results of the SLR beginning with the results from the standardised and pre-defined selection process. We analyse specific aspects of the selected articles and outline the results, especially characteristics that can be described by objective values.

During the selection process, we identified 50 articles out of 291 initial records that met the inclusion criteria (Fig. 2).

Fig. 2
figure 2

Results of the review phases in the selection of the articles

In total, we selected 50 articles according to their in-depth analysis (Sect. 4) and subsequent discussion (Sect. 5). All articles were compared and analysed according to our following analysis categories that we have developed in consensus before (Table 3).

Table 3 Analysis categories for the selected articles

Annex “Immersive Virtual Reality in Higher Education: A Systematic Review of the Scientific Literature– The Analysis Categories and Results” provides the full overview and key results of the analysis and is published with a DOI (Stracke et al. 2025).

After this overview of the selection process, we summarise the formal aspects of the selected articles (Sect. 4.1) and analyse the research design of the articles (Sect. 4.2). This section does not provide details on the SLRs and discussion papers, because their small numbers (only six records each) would be unable to provide substantial information. In addition, the SLR articles are meta-articles that do not contribute their applications or examples for IVR in higher education, while the discussion papers present specific qualitative research that cannot be compared. Therefore, the SLRs and discussion papers will be fully discussed in Sect. 5.

4.1 Formal analysis of articles

After following the PRISMA methodology, we selected 50 articles, which were published between 2017 and 2022 (Fig. 3) despite the lack of a time constraint for the search.

Fig. 3
figure 3

Number of articles by publication year

Out of the 50 articles, six are SLRs, while another six are discussion papers. The remaining 38 articles are dedicated to studies and they are presenting and reporting 44 studies in total (Table 4). 34 articles (89%) out of the 38 study articles are limited to one study only while two articles present two studies and additional two articles present three studies.

Table 4 Publication types of the selected articles

We first clustered the selected articles using different criteria to obtain an overview of international activities related to IVR in higher education.

To identify activities in different regions, we identified the countries of the authors based on their affiliations (Table 5). If the authors are affiliated with multiple countries, then only the first country is considered.

Table 5 Countries of the authors

If the countries of the first main authors are identified, then the overview becomes different (Table 6).

Table 6 Countries of the first main authors

Apart from country, we also identified which disciplines are currently driving IVR in higher education (Table 7).

Table 7 Disciplines of the authors

Medicine is the leading discipline (43 authors) followed by Education (30 authors) and Engineering (23 authors). Computer Science and IT are represented by 17 and 11 authors, respectively, while Psychology is represented by 14 authors. The fields of Educational Technology and Artificial Intelligence (AI) in Education are represented by six and five authors. Eighteen authors belong to other disciplines (Communication: 4, Design: 3, Marketing: 3, Environment: 3, Life Sciences: 1, Natural Sciences: 1, Technical Sciences: 1, Mathematics: 1 and Languages: 1). We were unable to identify the discipline of 45 authors due to missing or unclear disciplinary affiliations.

If we identify only the disciplines of the first main authors, then the overview, once again, becomes different (Table 8).

Table 8 Disciplines of the first main authors

4.2 Research design of the selected articles

After analysis, we dedicate the following analysis to the research design of the 44 studies described by the 38 study articles (SLRs and discussion papers were excluded, because they do not present studies).

4.2.1 Disciplines and application contents

The studies represent IVR scenarios across research disciplines, which implies that their research questions stem from these backgrounds while using IVR to assess them (Table 9).

Table 9 Disciplines of the 44 study articles

Education and Medicine are the most represented research disciplines, which are featured in 11 studies each followed by Computer Science, which is present in seven studies. Biology and Engineering are represented by five studies each. Geography and Physics are each addressed in two studies. Eight other disciplines are focused on one study each (i.e. Environment, Art, Chemistry, Marketing, Life Sciences, Digital Media, Architecture and diverse disciplines).

The results demonstrate that researchers of various disciplines use IVR to address and answer their scientific questions while focusing on higher education environments.

Although the aforementioned research disciplines describe the scientific background of the scenarios, application contents may differ. The research background may be art history (Education), but the application content could be crime scene investigation in a historical art museum.

The application contents covered in the selected studies span a diverse array of topics (Table 10). At the forefront are the topics of Programming and Anatomy with four studies, while three studies are dedicated to 3D development and Virtual construction. Environmental biology, Presentation skills, Neuroanatomy, Crime scene investigation and Immersive technologies are each represented by two studies. The majority of the topics are explored in a single study, while the application topics of these 18 studies are extremely diverse. They include 3D painting in IVR, orthodontic bracket bonding, cognitive load in interactive learning environments, computational fluid dynamics and washing machine construction, even the management of chemotherapy drugs. Other specialised topics include research on crystallographic networks, cyber sickness, dentistry, lab safety and learning the Italian language. Topics such as first aid, gene sequencing, geomorphology, computational thinking and hypsography are also included. In addition, the topic areas of organic chemistry, design education, physiotherapy and the investigation of schizophrenia are designated with a single study each.

Table 10 Application contents of the 44 studies

4.2.2 Research environment

In summary, the studies only partially describe the research environment and with few details. A total of 32% (14 out of 44) are experiments conducted within a regular course (four with voluntary participation, one mandatory study and nine experiments without details about potential compulsion), 22 studies (50%) are conducted in specialised laboratory environments (extracurricular; 21 with voluntary participation and one without details about potential compulsion), while five studies (11%) do not clarify whether or not the experiments were conducted during regular courses or are extracurricular under controlled conditions (one with voluntary participation and four without details about potential compulsion). In addition, one study was a combination of a voluntary field trip with a design study and a prototype study.

4.2.3 Target groups and research participants

Analysis of the articles also lead to the identification of learners, teachers and developers as target groups. The majority of the studies (39) focus on learners as the main target group, while six studies emphasise the role of teachers (Table 11). Apart from these two core target groups, three studies target employees, professionals and experts were targeted for interviews. Only two studies include a relevant consideration of IVR developers. From all studies, six studies address two target groups (three studies focusing on learners and teachers, one study focusing on learners and experts, one study focusing on learners and professionals, and one study focusing on teachers and developers) while one study is addressing three target groups (learners, teachers and developers). Finally, two studies remain unclear: one study does not indicate the target group, while another study selected the wrong population (trainees) for the intended target group (coal mining experts).

Table 11 Target groups of the 44 studies

Evaluations can be qualitatively or quantitatively conducted. Qualitative interviews with experts with observations can provide valuable results, but quantitative evaluations with surveys require at least 20 participants per test group to enable statistically significant interpretations and, as denoted in the following (Sect. 4.2.4), the majority of studies utilise surveys as the research instrument. The number of participants in the studies are diverse (Table 12).

Table 12 Number of participants in the 44 studies

A quantitative evaluation may require even more participants for statistical validity, if participants are divided into groups. Two studies lack information about grouping and 13 (30%) studies use only a single group (Fig. 4). Meanwhile, 29 (66%) of the studies report that participants are divided into groups. Half of them (15 out of 29) involve a control group (four with additional conditions), eight compare two conditions and six split into different groups (four with different target groups, two with many groups with the same participants and conditions). Out of the 29 studies that split the participants into groups, only four compare groups with more than 50 participants, but they involve only two groups each (78 versus 72; 60 versus 60; 51 versus 51; and 65 versus 29).

Fig. 4
figure 4

Group handling in the studies

4.2.4 Research methodologies and instruments

For the methodologies of the 44 studies (reported by the 38 study articles), 66% (n = 29) follow a mixed-method research design. They mainly use a combination of a survey with a test or an interview. The remaining 15 studies follow and conduct only one research methodology: 18% (n = 8) use only a post-survey, while 9% (n = 4) combine different surveys (pre and post). In addition, one study uses a post-test (on 14 brain structures), one study analyses log and task processing data and one study compares only the time required to fulfil given tasks.

In terms of research instruments, a number of studies use multiple research instruments. Therefore, the numbers of use in Table 13 does not sum up to 44 (100%). Specifically, 35 studies (80%) used post-surveys, but only one of them for a longitudinal analysis, while pre-surveys are only used half as often. Twelve studies conducted interviews (4: pre-design, 4: post-design, 2: pre-implementation, 2: post-implementation). The category ‘Others’ includes observations and feedback (two studies), group task, focus group interviews, workshops, demographics and learning outcome analysis (one study each).

Table 13 Research instruments in the analysed 44 studies

Notably, several studies mention research instruments but do not report details or results; thus, these instruments were excluded from this list.

4.2.5 Interaction and collaboration

In the selection criteria, we differentiate between the levels of non-, semi- and full-immersive applications (see Introduction) in which the current SLR solely focuses on fully immersive VR applications. To assess interactivity, we rely on the descriptions of the applications in the articles: ‘low’ interaction signifies an atypical deviation in which users are described to engage only minimally with the virtual environment (when no real interaction with the environment is possible). It contrasts with ‘normal’ interaction, which is the common level reported. 42 of the 44 studies are describing the level of interaction while two studies are not providing any information about the interaction level. According to their descriptions, the level of interaction in or with the virtual world is low (nearly lacking) for seven out of the 42 studies (Table 14), while this level can be described as normal for 34 articles. Only one study describes a highly interactive IVR experience (Agbo et al. 2022); for two studies, the level of interaction is unclear. The distinction between non-interactive IVR and interactive IVR can also be described as different levels of degrees of freedom. 360° videos in HMD are not considered as normal interactive IVR by us but as IVR with low respectively no interaction.

Table 14 Interaction and collaboration in the 44 studies

The majority of the studies represent single-player IVR applications without collaborative aspects (40 out of 44) and only two studies consider collaborative use-cases with the respective amounts of normal collaboration (Neroni et al. 2021) or high levels of collaboration (Jochecová et al. 2022).

5 Discussion

In this section, we discuss the results; thus, we follow a similar structure. In particular, we highlight outcomes achieved during the current in-depth SLR of the selected 50 articles, which can guide the future research on IVR.

5.1 Formal aspects of the articles

We discuss the results related to the formal aspects of the 50 articles, including the six SLRs and six discussion papers.

We observe that the articles from the current SLR strongly focus on empirical studies (Sect. 4.2.4), which is an apparent trend across the selected studies (Sect. 4.1) as their numbers are strongly increasing in the years 2021 and 2022 (compared to the years before). This finding indicates an ongoing and active generation of new data and demonstrates that several questions related to the use of IVR in higher education remain unanswered.

The creation of IVR for special application areas can be complex. If a specific effect is targeted, such as improving knowledge acquisition, then the technical aspects need to be planned with regard to the application area. Therefore, we expected to identify studies and discussion papers in combination with SLR as the basis of the study design or the base of discussion. However, none of the selected and analysed 50 articles provides such a combination.

Obtaining a total of 50 articles is seemingly insufficient for an international search and may be an indicator that research on IVR is only beginning. Alternatively, the selection criteria focus on publications in the English language, such that the selection does not cover national publications in other languages. Additionally, the implementation of IVR in higher education is a comparatively big step, while earlier works in VR (mainly non-immersive) tend to address technical challenges, interaction and immersion.

We mainly identified empirical studies in the selected articles, while the six SLRs include all types of VR and AR for higher education but do not specifically highlight immersive VR applications. This finding indicates the ongoing research to integrate IVR in higher education as a new and elaborated field of research. Although the articles mainly emphasize earlier technological aspects, the creation of added value is seemingly currently a priority. Papagiannis (2017) describes this aspect (with regard to AR) as a technology that is shifting from a stage of being overlay to entryway; in other words, technology becomes transparent, less prior and more natural, such that users can focus on the tasks enabled by such technology.

The dominance of English as the new lingua franca in global research as well as in indexed scientific journals (Stracke 2020) is also evident in the current SLR. Among the 291 collected records, we excluded four articles that did not meet the criterion of being written in English. Notably, this fact is advantageous to countries whose native language is English. As such, more articles may have been published in other languages but with limited international impact.

For analysis, we clustered the articles by country. We intended to obtain an overview of international activities in the targeted field of IVR in higher education. The reported affiliations of the authors served as the basis of analysis. Therefore, the clusters do not depict the nationality of the authors but pertain to the countries in which worked and published. The overview of 218 authors presents that only 24 are from the Global South (and only from two countries, namely, 20 from Australia and four from South Africa). No author comes from a country with the lowest level of economic development. If the overview of the main authors is compared, the picture is nearly the same: only six are from the Global South (five from Australia and one from South Africa). This result demonstrates that IVR is mainly researched and used in higher education within (better) developed countries. The major reasons could be the infrastructure and resource necessary for IVR.

In addition, using IVR in higher education requires teachers to create or maintain content. The development of individual applications for every course is expensive and complex and, thus, not generally provided by teachers. Consequently, methodologies and applications that can be used in multiple aspects are demanded and required. To account for the assumed transfer between IVR experiences and, thus, generalisation in design, we clustered study articles that include more than one study. The count decreased dramatically from 105 authors from 30 countries to 20 authors from 10 countries.

We offer two implications for this result. First, a possibility exists that IVR in higher education is in its infancy and research has only begun to transfer solutions from one scenario to others. Conversely, the creation and evaluation of IVR scenarios are complex processes, even without development, when focusing on design only. This aspect renders the evaluation and comparison of multiple scenarios relatively difficult, which may, therefore, be possible only for research groups with extensive prior research. For research groups that have their origin in other fields (e.g., didactics) instead of IVR, the conduct of comparative IVR studies may take time. In addition, the development of IVR applications could be for them more time consuming than it was in the past because managing of multiple studies in a single research project is challenging. Nevertheless, the comparison points to a need for research aimed at identifying generic requirements and developing generic solutions to increase and improve the implementation of IVR in higher education.

5.2 Systematic literature reviews

The six SLRs cover a review period of up to the year 2020, while a few of them were only until 2019. The numbers of the analysed articles (n) are diverse, which range from n = 149 by Luo et al. (2021), n = 113 by Lu et al. (2022), n = 46 by Pellas et al. (2021), down to only n = 18 by Di Natale et al. (2020) and only n = 9 by Nesenbergs et al. (2020). Papanastasiou et al. (2018) claim to present a ‘brief, representative and non-exhaustive review of the current research studies’ but do not provide information about the number of reviewed articles. Only three out of the six SLRs follow a standardised protocol (Di Natale et al. 2020; Lu et al. 2022; Nesenbergs et al. 2020), such as PRISMA, in all cases (similar to the current SLR).

Compared with the presented SLR on IVR in higher education, none of the six SLRs focus on the same topic. First, they mainly address different educational sectors: four SLRs cover a broad spectrum: three (Di Natale et al. 2020; Luo et al. 2021; Pellas et al. 2021) highlighted not only higher education but also K–12 while the fourth (Papanastasiou et al. 2018) include tertiary education. Second, the research topics of the six SLRs mainly differed from that of the current SLR: Lu et al. (2022) concentrated on usability research in educational technologies, while Luo et al. (2021), Nesenbergs et al. (2020) and Pellas et al. (2021) examined all types of VR without any specific differentiation for IVR. Third, only two out of the six (Di Natale et al. 2020; Pellas et al. 2021) claim to specifically analyse IVR and both are summarised in Sect. 1. However, both SLRs are not substantial for IVR in higher education. Specifically, Di Natale et al. (2020) also address K-12 and discuss only 18 studies out of the original 1,080 articles collected after applying (very) strong exclusion criteria. Pellas et al. (2021) combine higher education and K-12 but do not follow a strict IVR definition and combine and analyse any types of VR.

5.3 Discussion papers

The six discussion papers combine diverse publications, because they do not present results from experimental studies but collected experiences and thoughts. Therefore, their findings are of mixed quality and must carefully be reviewed.

Kluge et al. (2022a, b) claim to conduct a survey on extended reality experiences and 11 interviews but do not report the content and result of the interviews. In particular, the survey results are based on an online questionnaire that represents the subjective opinions of anonymous respondents without proven IVR experiences. Suen et al. (2020) discuss the websites of VR providers and e-mail exchanges with VR managers. The results pose a light and non-specific focus on VR, such that it does not provide insights for IVR experiences. In addition, Fromm et al. (2021) offer results from workshops in which users have undergone hands-on VR applications. The participants were students and teachers from various disciplines. The reported VR potentials are derived from opinions collected during interviews. Two discussion papers present qualitative usability studies. Skosana et al. (2022) summarise internal pilot testing with five project members, while Çoban and Kayserili (2021) underscore the presence perceptions and experiences of 43 teachers but only in relation to technological and design characteristics. Marks and Thomas (2021) pose an overview of the costs of the VR laboratory of the University of Sydney between 2017 and 2019. They collected feedback from a survey on 295 students, which was mainly positive. However, the cost details are outdated and the survey details are limited.

Discussion papers engage theoretical discussions or developing concepts, but the selected discussion papers do not implement IVR experiments; thus, they only present individual estimations without practical IVR use. The diversity of discussion papers and their mixed quality make the generalisation of the findings difficult. The current SLR cannot answer its research questions on the basis of their specifics. Alternatively, these papers could pinpoint further research requirements on specific aspects.

5.4 Research design of the study articles

The discussion in this section includes the research design of the studies, that is, the 38 study articles that represent 44 studies (excluding the SLRs and discussion papers).

5.4.1 Research disciplines and application contents

In terms of research disciplines, the authors mainly derive from Medicine and Education as well as from a STEAM and geography background while Chemistry and Agriculture, which offer many opportunities for IVR integration, are underrepresented. This finding indicates that IVR is used in various disciplines for higher education and its potential for a broad spectrum of content. In addition, content on IVR application is very diverse and ranges from programming and computational thinking, biology, anatomy and medicine to engineering. Not all application contents relate to the research disciplines to which the authors belong, especially when the authors come from education research backgrounds. This aspect demonstrates that the main drive for research on IVR use in higher education comes from authors that belong to mainly educational disciplines instead of IVR technological disciplines. This finding is in contrast to other recent developments, such as the use of AI in education (Zawacki-Richter et al. 2019; Stracke et al. 2024a, b).

5.4.2 Research environments

The descriptions of the research environments are surprising, because they typically lack basic details; in the majority of cases, they do not allow potential replication and validation of the studies. Furthermore, the majority of IVR studies are realised in specialised laboratories without integration into regular study programs and courses, which leads to artificial situations and potential bias of research results.

In other words, the majority of IVR research is not conducted in an integrated manner within higher education and regular study programs and courses; instead, it remains in an exploratory phase. Moreover, the majority of learning and teaching processes in higher education that use IVR are not a standard part of the academic learning design but are exceptions to extraordinary situations.

5.4.3 Target groups and research participants

For analysis, we clustered the target groups of the studies into the learners, teachers, experts and developers (Table 11). We expect that the studies mainly address learners, because they are crucial to the estimation of whether or not a learning scenario can create added value related to knowledge transfer.

Developers or technical assistance are currently a crucial resource. For teachers, providing an increasing number of scenarios without technical support could be possible with increasing experience and easier tools; however, special, outstanding scenarios may continue to require technical support in the future.

Teachers are currently underrepresented in the presented studies. The scenarios used by learners need to be updated in terms of learning content or adaption to learning groups. If teachers are unable to update or adapt the content independently, then teaching with VR can only be conducted given a thorough pre-planning and explanation of the necessary changes to developers. In turn, developers will integrate changes, which teachers need to confirm prior to their use. Analysis of target groups illustrates the necessity for further research into enabling teachers to manage IVR scenarios, e.g. with dedicated authoring systems.

To reach a break-even point where IVR has advantages over traditional lectures, further research that confirms and measures improved learning outcomes, while reducing the effort for teachers and developers is required.

However, the current analysis of the quantitative and qualitative evaluation in Sect. 4.2.3 indicates problems related to statistically significant interpretations. We find mainly small numbers of participants (Table 12). In other words, 23% (10 out of 44) of the studies do not reach the sample limit for quantitative analysis. In addition, five studies mention that evaluations were conducted with participants but failed to report their results. Conversely, 66% (29 out of 44) of the studies employed more than 21 participants and, thus, may be statistically sound. However, we identify an even lower numbers of participants in the few studies that compared different groups (Fig. 4), which strongly limits statistical validity. In addition, the lack of reports regarding the numbers of participants and grouping by other studies impedes the assessment of the validity of research outcomes and claims.

5.4.4 Research methodologies and instruments

For methodology, we find that the majority of studies follow a mixed-method design that combines a survey with a test or an interview. They mainly use IVR in regular course settings or in extracurricular sessions with more than 21 and up to 100 participants and collect data primarily through pre- and post-treatment surveys of learners and interviewees. Only a few conduct post-treatment tests. In other words, the majority of studies evaluate scenarios by questioning users; among them, however, only a few estimate quantitative values. Furthermore, the very small amount of one study each that use a post-test, analyse log and task-processing data and compare time to tasks indicates that the current IVR research is mostly not evidence-based.

In relation to research instruments, the majority of studies use multiple research instruments (Table 13) with a strong focus on post-surveys. Regarding the surveys, we observed that if IVR is compared, IVR is mostly compared against classical and low-technology solutions and users are asked whether or not they prefer IVR. The surveys demonstrate that the majority of participants possess few or no previous experience with IVR. Therefore, IVR is advantageous to the survey results due to its novelty and high level of interactivity. The current common research methods can confirm positive effects related to IVR acceptance, but research on the assessment of IVR by experienced users or on the manner in which IVR acceptance changes after habituation is lacking.

Notably, several studies mention research instruments but do not report details or results, so that these instruments are excluded from this listing. This lack of reporting impedes the assessment of the validity of research outcomes.

5.4.5 Interaction and collaboration

For interaction, we find that the descriptions of the majority of studies report a normal level of interaction on average. In relation to collaboration, we identify that the majority of the studies focus on single-player IVR applications without collaborative aspects (Table 13). These findings indicate a certain homogeneity in the application of the analysed studies with a common level of interactivity and a low level of collaboration.

Interaction in IVR drives immersion, presence and engagement (McMahan et al. 2012). The majority of studies (34 out of 44) do not describe their interaction with the IVR medium in detail, which points to a normal level of interactivity (e.g. navigating and interacting by point-and-click). As described in the Introduction, although full immersion is an inclusion criterion for the current SLR, an interactive experience is not required; nevertheless, it represents an essential factor for fully immersive experiences. However, researchers in the selected studies, apart from Agbo et al. (2022), rarely use the extraordinary opportunities for rich interaction. The reason could be that they are considered demanding efforts that are not required for or do not contribute to learning outcomes.

Additionally, collaboration drives presence and is supposed to support the gain of soft skills, such as communication, problem-solving and creativity and to increase productivity within IVR applications (Jackson and Fagan 2000; van der Meer et al. 2023). The majority of studies (40 out of 44) describe single-player applications that present no collaborative aspects. One study indicates the need for collaboration and exchange; but collaboration in the form of group discussions is being transferred to the real physical world. In other words, the technical complexity associated with the development of multiplayer applications could represent a significant hurdle for researchers.

The level of interaction and the prevalence of single-player applications in the current research can be attributed to two combined factors, namely, technical complexity of IVR applications and backgrounds of researchers (Table 7), which are typically not experts in technology-enhanced learning and media design. Consequently, these authors may be less familiar with the specific challenges and opportunities posed by IVR technologies. Reducing the technical barrier to render the development of and research on IVR applications more accessible and standardised would widen the field of research. To maintain a technologically sophisticated standard and the reproducibility of research, a potential solution could be to foster the adoption of open-source, modular development approaches for research such as guidelines and toolkits (e.g. RealityFlow) (Murray 2022). These approaches include essential, pre-fabricated components while providing sufficient flexibility and data security. In addition, researchers could benefit from collaboration with experts for technology-enhanced learning and design to implement and evaluate IVR in meaningful and effective media designs and deployments.

5.5 Research outcomes of the study articles

This section discusses the outcomes of the selected studies (38 study articles that represent 44 studies, excluding the SLRs and discussion papers).

5.5.1 Design of IVR applications for higher education

In general, high expectations are attributed to the potential of IVR for education. Many publications claim that IVR exhibits numerous possibilities in cost reduction, visualisation and engagement. However, these expectations are not always met occasionally due to design pitfalls. Five studies are dedicated to IVR expectations out of which two directly examine the potential of IVR for education. Specifically, Jochecová et al. (2022) describe the perceived potential of collaborative IVR by teachers supposing that student motivation increased through the ability to visualise complex phenomena as well as through the virtual transportation of students. Miller et al. (2021) propose that IVR may promote persistence towards the completion of degree programs among certain student groups. Three further studies regard IVR as a means of cost reduction in education. The most convincing one is that of Mayne and Green (2020), who report that IVR is less costly than setting up a real crime scene that students can investigate and compliments the repeatability of a crime scene setup in IVR. Vergara-Rodriguez et al. (2022) claim that IVR can reduce energy costs and enable a new form of dynamic and interactive learning. However, the authors also state potentially higher costs through the increased need for infrastructure and support. Paszkiewicz et al. (2021) agree with the potential cost reduction and additionally quote a potential increase in the safety and efficiency of employee activities. However, these studies are unable to verify these expectations through sound and verifiable results, such that the highlighted potentials of IVR are only suggested but not proven.

To convert expectations into reality, the development of IVR applications for higher education needs to consider many factors. This aspect creates a need for a comprehensive development framework. Two articles opt for self-developed frameworks as development guidelines. The first is Paszkiewicz et al. (2021), who developed a methodology for developing IVR training dedicated to the realisation of the concepts of Industry 4.0. The second is Solmaz and Van Gerven (2022), who propose a development methodology for implementing interactive fluid dynamics simulations in cross-platform environments such as desktop and IVR. This study demonstrates that the research and design field of IVR applications for higher education continue to use custom development frameworks, which are tailored to the specific application contents and use scenarios.

A number of studies report nausea or disorientation, which is known as motion or simulation sickness (Mayne and Green 2020; Birt and Cowling 2018). Scholars report that this occurrence was due to the excessive use of snow and spark brushes in the used 3D painting software Google Tilt Brush, headset incompatibility with glasses or inadequate configuration of eye distance (Ho et al. 2019). Obukhov et al. (2022) observed adaptation with continued use. This phenomenon is not new, and various methods can be used to pre-empt it such as chewing gum while using VR headsets and implementing a vignette to focus the view or different modes of translation inside the VR medium. These measures should be considered during IVR development.

5.5.2 Deployment of IVR media in higher education

Implementing a new learning medium in any higher education setting is accompanied by challenges. The selected study articles also examine these challenges and propose various solutions. For example, Kluge et al. (2022a, b) provide a holistic description of the relevant aspects for the successful implementation of IVR learning. They recommend a pre-defined framework for hardware distribution and ongoing support, clear guidelines and approval pathways for suitable teaching content and the provision of expert support. In addition, Ho et al. (2019) point to the necessity for sufficient space and equipment in using IVR media. Jochecova et al. (2022) agree that the IVR exploration of students requires sufficient space and cite the need for a reasonably fixed assignment for students. They also recommend gamification approaches within IVR settings. The majority of the teachers in their study stated a need for guidance in the IVR environment, especially for first time users. Alternatively, Young et al. (2020) generate eight user-centred design guidelines for effective lesson planning and educational software development for higher education, including guides for focus, provocation, stimulation, collaboration, control, digital life, learner skills and multimodality. In summary, the four studies conclude that IVR environments need further support with deployment compared with traditional educational settings without IVR.

To successfully integrate IVR into day-to-day higher education scenarios, it needs to be considered in the context of other media. While Lamb and Etopio (2019) use IVR in combination with another medium (in this case, writing), the majority of studies rely exclusively on IVR without the use of additional media. This aspect indicates that, in research, IVR applications continue to be considered in isolation without the need of additional media support. Thus, further research into the benefits and challenges of combining IVR with other media is required.

5.5.3 Evaluation of IVR applications to higher education

To evaluate the IVR applications to the contexts of higher education, the study articles used different frameworks. The Cognitive Load Scale, which was proposed by Sweller et al. (1998), is a psychometric tool designed to measure cognitive load within educational contexts. It is recognised as particularly relevant to e-learning and virtual learning environments. The scale initially included three major components, namely, intrinsic (inherent difficulty of the material being learned), extraneous (difficulty in the manner of the presentation of information to learners) and germane (processing and construction of schemas) cognitive loads. The scale has been under discussion since its proposal. Six articles present results related to cognitive load theory. Yang et al. (2022) present the perception of students of task loads in IVR according to their level of expertise and case difficulty. They use IVR to provide scenarios that are optimised for users with different levels of expertise. With IVR, they can balance mental load by providing repetitive scenarios to users with less experience and increase variability with the increase in user expertise. Kee and Zhang (2022) report that participants were more engaged by the IVR environment, but they were also prevented from cognitive thinking. Agbo et al. (2022) provide an account of improved cognitive benefits in their IVR application. Tugtekin and Odabasi (2022) relay that using multimedia design principles in IVR does not lead to higher cognitive load or decreased learning. Thus, the authors propose leaning on the principles cited by Mayer (2014) in the design of IVR environments. These articles draw on cognitive load theory, while Andersen and Makranksy (2020) offer an extension of the Cognitive Load Scale for use in virtual environments with sub-scales of extraneous load to form the new Multidimensional Cognitive Load Scale for Virtual Environments. Unfortunately, all articles mentioned before do not recognise and reflect an updated version of cognitive load theory, which currently neglects independent germane load (Sweller et al. 2019) and refutes the one-dimensional Cognitive Load Scale (Leppink 2020) in recent studies. Thus, this aspect impedes the assessment of the validity of their claims. Nevertheless, the popularity of cognitive load theory in explaining a few of the variations observed during IVR implementation in higher education is notable although all previous evidence-based results must be questioned due to the update in the fundamental theory.

Only one article uses a different, existent framework to evaluate their application. Cabero-Almenara et al. (2022) find that the technology acceptance model of Davis (1989) is effective in determining the acceptance and future use intention of students in terms of IVR learning environments.

Neroni et al. (2021) do not draw on any of these frameworks for evaluation; instead, the authors demonstrate how the IVR environment enables a direct observation of users and their problem-solving process by recording activities in the environment. They state that doing so also permits the objective evaluation of proposed solutions through physics simulations. This approach is also used in usability testing, as cited by seven other studies. Campos et al. (2022) report the positive receptions of students using the IVR learning environment as well as their desire for increased flexibility in using it because it does mean extra work for them. Cabero-Almenara et al. (2022), Mayne and Green (2020), Young et al. (2020) and Huang et al. (2022) report good usability, student satisfaction and acceptance of IVR environments, whereas Nicolaidou et al. (2021) report lower usability in comparison with a mobile application. Huang et al. (2022) report no significant difference of the usability between users with and without prior IVR experience, while Segura et al. (2019) highlight a difference in the experience of game element disposition and game mechanics. They also state a dependency of the acceptance on user age: ‘the kids highlighted the attractive look and provided fun’, while older users were most critical of the game mechanics as well as the level of difficulty. Marques et al. (2022) state they should have performed usability tests to determine ease of use, usefulness, perceived quality and sense of realism. Regarding sense of realism, Kaminska et al. (2017) describe the need to use realistic surroundings to maintain immersion.

These articles demonstrate that no single dominant research framework exists that leads and informs the design and evaluation of IVR environments for higher education and related research. Seemingly, the usability research on IVR is only in its initial stage. Thus, it could benefit from the research experiences of other disciplines and sectors that have established validated frameworks and measures for testing and improving usability. In the future, involving IVR designers, developers, students and teachers could support studies in achieving broader insights about easy design and usage of IVR environments in higher education.

5.5.4 Motivation and engagement

The majority of the studies conducted interviews and surveys in the research design to evaluate various scenarios, while a few only used post-treatment knowledge tests or physical measures. Evaluation typically focuses on the impact of IVR on the learning process of participants.

Primarily, the results illustrate that IVR can increase motivation and engagement (e.g. Nicolaidou et al. 2021; Kee and Zhang 2022; Neroni et al. 2021) and may not be limited to users that wear headsets but include students that provide assistance (Young et al. 2020) as spectators. An explanation may be that IVR provides psychological and emotional experiences for users but difficulties in concentration due to the technology and the social situation also occurs (Young et al. 2020). The studies also indicate that IVR can affect positive and negative emotions (Ślósarz et al. 2022), and imply that the intensity of emotions is seemingly dependent on the design of IVR environments and the self-esteem of students.

Participants exhibited no or weak prior experience with IVR, particularly if they are mainly students, due to novelty. Therefore, an increased motivation could be expected in comparative evaluations between IVR and classic methods. The notion that the increased motivation may partially be due to enthusiasm for the new technology cannot be overlooked. Additionally, IVR scenarios occasionally feature gaming aspects or mini games (Agbo et al. 2022) which may overlay the effects of IVR.

Compared with classical videos, improved attitude and increased learning enjoyment towards the subject (Sung et al. 2020), an increase in cognitive empathy with people suffering from schizophrenia (Marques et al. 2022) and the increased interest and confidence of students (Ho et al. 2019) have been reported. Compared with a known, more interactive e-assessment in a learning management system (e.g. Moodle), however, no significant differences are found in performance or ease of use (Al-Azawei et al. 2019).

The majority of scenarios are evaluated only once. Improvements due to evaluation results and comparisons across states of the same scenarios are rare. This requires applying the process of requirement analysis, design, development and evaluation, which is time consuming and especially difficult in funded projects, because the majority of scenarios address specific applications. Moreover, we were unable to identify results on how motivation or engagement will change due to the repeated, regular use of IVR for education. Emotions influence learning outcomes (Ali and Tan 2022), such that future research into the different emotional dimensions, conditions and factors related to the use of IVR in higher education is required.

5.5.5 Learning outcomes

A number of studies were unable to present an improved learning effect with IVR (Fitton et al. 2020; Al-Azawei et al. 2019; van Ginkel et al. 2019). Sung et al. (2020) find that IVR even leads to low levels of knowledge base performance compared with video learning.

The majority of the studies report the support of learning by IVR, which was found to enhance knowledge retention beyond the increases in engagement and motivation (Pickering et al. 2021) and to facilitate exploratory activities, reflective observation and active experimentation (Kee and Zhang 2022). In Medicine and Healthcare, IVR is found to enhance the training of visual and muscle memory through better and more realistic interaction mechanics and spatial awareness (Obukhov et al. 2022; Huang et al. 2022) such as a reduction in soft tissue damage during bone drilling (Benjamin and Sabri 2021). IVR is also used to support imaging descriptions but is less effective for spatial location tasks in medical scans (Lopez et al. 2021). In the natural sciences, the use of IVR improves performance in a 3D vector physics course (Campos et al. 2022), chemotherapy drug administration (Wang et al. 2022) and organic chemistry (Miller et al. 2021). Many studies report positive effects for very specific application problems, such as improvement of the understanding of programming concepts (Segura et al. 2019), learning of foreign languages compared with mobile applications (Nicolaidou et al. 2021) and learning in mechanical and electrical engineering (Kamińska et al. 2017).

Moreover, the studies report the effects on research skills in general. Text combined with IVR improved skills in scientific writing (Lamb and Etopio 2019). A study mentions that IVR improved computational thinking and problem-solving skills (Agbo et al. 2022). Furthermore, IVR can be utilised for training in oral presentation skills as well as public speaking anxiety (Boetje and van Ginkel 2020). In contrast to Sung et al. (2020), Pande et al. (2021) underscore that IVR is advantageous compared to videos, because it supports long-term retention and sustained interest, although perceived learning benefits diminish over time.

These studies address different and mainly very specific applications. On the one hand, the results demonstrate that there is interest in using IVR for specific needs. On the other hand, this specificity makes it difficult to derive findings that can be generalised. We identified more studies that report the positive effects of IVR on learning than those that do not, but it may be due to the file drawer effect, that is, negative effects might be less frequently reported if they are understood as insufficient intermediate results that require improvement prior to publication. Another potential reason could be the positive novelty effect for the first IVR use.

5.6 Summary

With regard to the research questions, the scientific literature on the use of IVR in higher education can be categorised and clustered based on formal aspects such as publication year and type as well as the countries and disciplines of the authors. It also includes research design, including research disciplines and application contents, research environments, target groups and participants, research methodologies and instruments and interaction and collaboration (RQ1). Thus, we can conclude that IVR research is still in its infancy and lacks sound and standardised research frameworks and designs and repeated and comparative studies for validation.

Regarding the findings on the current results and outcomes of the scientific research on the use of IVR in higher education (RQ2), we conclude that the design, implementation and use of IVR requires a careful consideration of many influencing conditions and factors related to the design of IVR applications, deployment of IVR media, evaluation of IVR applications, motivation and engagement and, in particular, learning outcomes. We identify the diversity of the reported research and learning outcomes with contradictory results, which are typically due to specific and isolated IVR applications without meaningful scenarios and integration into higher education.

The learning aspects associated with IVR reveals the potential and the limitations of this technology. Numerous studies emphasise the positive impact of IVR on student engagement, deepening of knowledge, promotion of exploratory activities and support for specific learning processes, as indicated by increased learning enjoyment, high levels of engagement and complete immersion in IVR environments. We must underline that none of the analysed studies can provide sound and validated causal relationships: all learning outcomes and effects are based on correlations that could be caused or mediated by other factors.

However, the implementation of IVR does not automatically result in enhanced learning, because a number of studies exhibit no substantial difference in learning outcomes between IVR and traditional learning methods. This aspect underlines the importance of the careful examination of specific conditions under which IVR is applied as well as the meaningful integration of the technology into the curriculum. Although IVR holds the potential to increase engagement and introduce new methods for interactive, exploratory learning, several challenges must be considered such as physical side effects, need for technology acceptance and individual differences among learners in IVR environments. For this reason, a nuanced approach is crucial for ensuring that IVR remains an enriching and effective learning medium.

In summary, the notion that the majority of the 50 articles do not present relevant and evidence-based results achieved through a sound methodology and summarised in a correct format is notable and surprising. In particular, discussions and results related to pedagogical aspects are lacking. Many studies comment on the potentials of IVR environments such as reduced costs and higher repeatability. They also point to a certain degree of increased engagement among students when learning with IVR but only in anecdotal research. However, testing the usability of IVR environments is (still) not a fundamental part of IVR research in educational scenarios, because only 7 out of 44 studies explored it. The studies demonstrated that IVR can influence emotions and may help in visualising and learning in addition to haptic feedback, which seems to be especially helpful in mechanical tasks. However, they also report that learning in IVR is connected to cognitive (over)load and negative effects such as cybersickness. To alleviate these challenges, several studies propose (always called innovative) concepts for IVR deployment and design but only two analyse and discuss them with participants.

Finally, the current SLR is the first systematic scoping review on all studies from Web of Science that focuses on the use of IVR in higher education in general. It offers a unique overview of all scientific articles on IVR in higher education without any limitation in time, scope or topic.

5.7 Limitations

VR research has witnessed a surge in scholarly attention with a notable focus on the spectrum of non-immersive and immersive experiences within VR environments. Despite the burgeoning literature, a prevailing issue is the lack of consistent differentiation or explicit delineation between non-immersive and immersive VR in scientific discourse. This conceptual ambiguity challenges the precision in the operationalisation of IVR, which, thus, influences the interpretation of research outcomes.

A similar phenomenon can be observed in the description of IVR applications. The characterisation of key aspects, such as the degree of immersion, interaction and collaboration, is described very differently or is entirely missing. The reason may be that currently no established model exists (despite some proposals such as in Bisswang et al. (2023) that would enable a standardised classification of IVR applications regarding these aspects. The lack of such a model, the inconsistent description and the different fundamental understanding make drawing well-founded interpretations of research outcomes and comparing different studies and IVR applications with one another difficult.

With regard to relevance and meaningfulness, further limitations are due to the study articles. Assessing their validity is difficult due to missing information about participants and use (or non-use) of HMDs. They typically do not meet the general requirements for sound reporting related to IVR environment, identity, verifiability, advantages and learning outcomes.

Another limitation is the restriction on English articles; however, only four articles in other languages (Spanish, French and Russian) were excluded from the original record, such that doing so should not have strongly influenced the analysis and results.

In addition, we have only collected and analysed research items from one single source, namely Web of Science: The reason was that Web of Science is the most stringent, restrictive and rigorous indexing service of scientific publications demanding long-term quality checks and reports with documented peer-reviews from included and indexed journals.

Finally, during the submission and review process, three SLR were published later in the year 2024: Therefore, we could not include them in our discussion anymore. Furthermore, they would also not fulfil our criteria: Cabrera-Duffaut et al. (2024) restricts their SLR on VR for competency development while not distinguishing between different VR types so that are not restricting on IVR. Wiepke and Heinemann (2024) restricts their SLR on user factors and their contribution to the sense of presence in IVR. Muzata et al. (2024) restricts their SLR on engineering education.

6 Conclusions

This study is the first to conduct a SLR on IVR in higher education on all studies indexed in Web of Science. Strictly following the standardised PRISMA protocol, we provide an in-depth analysis of 50 articles to answer the following RQs. How can scientific literature on the use of IVR in higher education be categorised and clustered based on formal aspects and research design? (RQ1) and What are the current results and outcomes of the scientific research on the use of IVR in higher education? (RQ2).

Our first key result is the categorisation and clustering of the scientific literature on the use of IVR in higher education according to formal aspects and research design. We find that IVR research in higher education is still in its infancy and lacks sound and standardised research frameworks and designs as well as repeated and comparative studies for validation.

The second key result is the analysis and discussion of the current results and outcomes of the scientific research on IVR in higher education: We demonstrate the diversity of reported research and learning outcomes with contradictory results, which are frequently due to specific and isolated IVR applications without meaningful scenarios and integration into higher education.

The third key result is that the current SLR is the first one to focus on the use of IVR in general. We offer a unique overview of all scientific articles on IVR in higher education indexed in Web of Science without any limitation in time, scope or topic. Notably, the majority of the 50 articles do not present evidence-based and validated results; in particular, discussions and results related to pedagogical aspects are lacking.

Thus, the presented results and their detailed discussions reveal that IVR research is only beginning and that identifying sound and valid research findings is difficult. Even the term IVR lacks consensus in terms of its definition, such that we propose its use for fully immersive VR applications (in contrast to non-immersive and semi-immersive VR applications).

Based on the findings, we advocate for the development of a comprehensive development, deployment and evaluation framework for IVR applications in higher education, which could facilitate and standardise IVR research.

In summary, we hope that our article contributes to the scientific community and facilitates and supports further research for a better understanding on the potential use of IVR in higher education and improving immersive teaching and learning.