1 Background

1.1 Critical thinking

The development of critical thinking is essential for health professional students to analyse a clinical scenario to make well-informed, safe and effective judgements in complex environments (Carbogim et al. 2018; Chan 2013). The term “critical thinking” has been synonymously used with decision-making, problem-solving, and in the case of health professional education (HPE), clinical reasoning and clinical judgment (Carter et al. 2022; Dissen 2023). Although critical thinking is essential in health disciplines, there has been increasing concern about the development of critical thinking skills in health professional students (Audétat et al. 2013; Koivisto et al. 2018).

Barriers to developing critical thinking include the institutional de-prioritisation of infrastructure to support educators’ modelling of critical thinking; the use of a fixed, didactic, teacher-directed curriculum delivery rather than a heutagogical, socially constructed and student-determined approach to learning; limited development of critical thinking dispositions such as truth-seeking, open-mindedness, inquisitive thinking and reflective judgement; and access to real-world learning experiences with discipline-specific context (Blaschke and Hase 2015; Dwyer 2023; Facione 1990).

While developing critical thinking remains one of the major unsolved problems in pedagogy (Kuhn and Dean 2004; Larsson 2017), two principles are consistently represented in the literature. Firstly, critical thinking is a transformational process. This is facilitated by active, experiential learning and reflective thinking (Dewey 1910) by creating opportunities for the learner to be self-directed, self-disciplined, self-monitored and self-corrective thinking (Paul and Elder 2006). Secondly, critical thinking can be developed independently or with others. Vygotsky (1978) identified that critical thinking is constructed by social interactions with ‘more capable peers’. A Zone of Proximal Development is created by collaboratively working on a task that the learner could not perform independently, in time shifting to performing the same task without assistance.

1.2 Theoretical framework

Social constructivism, situated learning, and heutagogy are three theoretical frameworks that have been identified to facilitate the development of critical thinking in HPE. Social constructivism highlights the interdependence of the social context and the learner’s knowledge of the problem to be solved (Thomas et al. 2014; Vygotsky 1978). The learner actively participates in meaningful, authentic experiences with others who may complement or challenge worldviews and contextual frameworks (Thomas et al. 2014). One approach to enable social constructivism in learning is ‘co-design’, which facilitates collaborative engagement, creativity and designing a meaningful solution to make sense of the problem (Treasure-Jones and Joynes 2018). Intentional co-design in HPE develops knowledge and skills including critical thinking, confidence, building health professional-client relationships and enjoyment in the learning activity (Abbonizio et al. 2024; O’Connor et al. 2021).

Situated learning complements social constructivism, advocating for continuous, active (legitimate) learning, within a social process (peripheral participation) while familiarising a way of doing and knowing in an authentic learning environment (Lave and Wenger 1991; Nicolini et al. 2016). This can be achieved through Communities of Practice which represent the social learning spaces where people develop a communal repertoire of knowledge and practices (Lave and Wenger 1991; Wenger 1999, 2004) or, through communities of inquiry in more formal educational settings.

Heutagogy has been referred to as both a form (Blaschke 2012) and study (Hase and Kenyon 2000) of self-determined learning. It extends from pedagogy and andragogy as learners progress in maturity and autonomy (Canning 2010). A heutagogical approach embraces learner agency or the ability to choose their pathway to learning which may be non-linear and out of sync with that of the facilitator or curriculum (Blaschke 2021; Blaschke and Hase 2019; Hase 2009; Hase and Kenyon 2000). Secondly, heutagogy promotes self-reflection where the learner reflects on the problem-solving process, actions and outcomes, and how it influences their beliefs and actions (Blaschke 2012). Thirdly, heutagogy considers the learner’s confidence in their competency (Hase and Kenyon 2000, 2007) and their ability to take action (Blaschke and Hase 2016; Cochrane et al. 2018).

1.3 Immersive extended reality and 360-degree virtual environments

Alongside theoretical frameworks that support the development of critical thinking, it is important to consider recent advances in technology-enhanced learning and the pedagogical implications of facilitating critical thinking. Immersive extended reality (XR), including virtual reality and virtual environments, is increasingly being used to facilitate skills including critical thinking (Jans et al. 2023). Affordances of mobile immersive extended reality (mXR) align with key concepts outlined in social constructivist, situated learning and heutagogical frameworks. These include improved accessibility (compared to tethered, high-fidelity methods), authentic learning, collaborative practice, confidence and self-efficacy in clinical skills, feedback on student performance, information literacy, self-reflection, motivation, engagement, repetitive practice for skill improvement, safe application of skills, and scalability (Stretton et al. 2024) (Fig. 1).

Fig. 1
figure 1

Mobile extended reality affordances and heutagogical principles. Adapted from Blaschke and Hase (2019) & Stretton et al. (2024).

mXR can be experienced in 360-degree virtual environments. Omnidirectional panoramic images or videos allow the learner to pan and tilt in an uninterrupted circle by shifting the position of either a phone, low-cost phone-enabled VR headset (e.g., Google© Cardboard or Merge) or head-mounted display (e.g., Oculus© Rift or Apple© Vision Pro). Because the learner has the agency to look around and explore, they are more immersive than traditional 2D media but less than a high-fidelity VR learning experience (Rupp et al. 2019). 360-degree virtual environments are an emerging tool in HPE as they enable learners to affordably conceptualise, produce and edit clinical environments that represent authentic clinical experiences (Baysan et al. 2023; Evens et al. 2023). 360-degree scenarios are typically between two to 15 min (Baysan et al. 2023; Evens et al. 2023) and have beneficial affordances in learning performance, problem-solving, self-confidence, motivation, satisfaction, attention, situational awareness, and reflective and skills-based knowledge (Baysan et al. 2023; Blair et al. 2021; Snelson and Hsu 2020). Interaction with the virtual environment can be enhanced with the inclusion of hotspots, behaviour-triggered images, audio, or advancement between scenes based on actions or responses to questions (Evens et al. 2023; Snelson and Hsu 2020). Limitations, however, include viewers’ interactive movements limited to the head and neck, an inability to view objects in 3D, difficulty incorporating multiusers in the same environment, and some reports of cybersickness related to low resolution and refresh rates (Baysan et al. 2023).

While it has been demonstrated that mXR can positively facilitate critical thinking in health profession education programmes, a recent systematic review highlighted that investigations were limited to five health disciplines, focused on emergency or critical response, and only a small number directly measured critical thinking (Stretton et al. 2024).

This paper reports upon the first iteration of a larger educational design research (EDR) project investigating how mobile immersive extended reality (mXR) facilitates critical thinking in health professional education (HPE).

2 Methods (design and construction phase)

2.1 Population

The source population included 123 second-year graduate entry Doctorate of Physiotherapy (DPT) students enrolled in a strength and conditioning subject in a large Australian metropolitan university. Human ethics was approved by the university ethics committee in August 2022 (Reference Number 2022-23676-30979-3).

2.2 Procedure

Educational design research (EDR) is an iterative approach that explores and analyses current literature, theoretical frameworks and stakeholder involvement to inform the design and construction of an intervention that is evaluated and reflected on for the maturation of subsequent iterations (McKenney and Reeves 2019) (Fig. 2). An initial literature review (Analysis Phase) (Stretton et al. 2024), and findings from focus groups (Exploration Phase) (Stretton and Cochrane 2023) informed the Design and Construction Phases along with co-designing the learning activity with the subject coordinators to align with the subject learning outcomes (February to August 2023). Subsequent correspondence and meetings informed the development of an overview recruitment trailer, the initial session with the participants, a video template, and evaluation session planning. Learning management system (Canvas) announcements directed potential student participants to the recruitment trailer, plain language statement, and consent form.

Fig. 2
figure 2

First iteration timeframe (design, construction, evaluation and reflection phases) in the educational design research (EDR)

In the Evaluation Phase, all students enrolled in the strength and conditioning subject attended an Initial Session that outlined instructions for the group learning task to be developed over six weeks (September- October 2023). In week one, groups of four to five students self-selected one of 25 target populations (case scenarios) and co-designed and recorded a two-sentence case scenario audio biography for a browser-based virtual environment (https://www.seekbeak.com). Students were provided a templated video in PowerPoint demonstrating the most appropriate exercise (Table 1), that the researcher uploaded to the virtual environment on behalf of the students. Students could include additional “hotspots” that provided further scenario-based information in the virtual environment as they (a) met the client in the reception, (b) determined the best exercise in the gym, and (c) highlighted key elements in a co-designed video (see Fig. 3). Aligned with the course learning outcomes, students were encouraged to include the aims and benefits of exercise, prerequisites, precautions and contraindications, exercise setup, trick movements and exercise principles (frequency, intensity, time, type, volume, and progression).

While the content and technical assistance were available to all students by the subject coordinators and primary researcher respectively, only those who had accessed the plain language statement and signed the consent form were invited to participate in the research component of the Initial Session (additional one hour). This included the completion of a pre-evaluation survey and a pre-test critical thinking survey (Health Sciences Reasoning Test- HSRT-N).

In week six, a one-hour Evaluation Session was scheduled for all students to exchange target group scenarios and provide peer feedback. Students could review these in a browser on their laptop or using personal mobile phones with low-fidelity Merge headsets (https://mergeedu.com/headset). Research participants were then asked to complete a post-evaluation survey, the System Usability Scale (SUS), and a post-test HSRT-N (additional one hour).

Table 1 PowerPoint slide template for exercise
Fig. 3
figure 3

Example Seekbeak Scenario- moving from (a) Reception area where are “introduced” to the client, moving to (b) High-performance studio where can observe clinical clues and exercise(s) which (c) include key considerations and video based on a provided template

2.3 Measures

2.3.1 Pre-evaluation survey

At the Initial Session (week one), participants were asked to complete an online form (Qualtrics©) with demographic data for baseline analysis including age, gender, ethnicity, and previous use of mobile phones, 360-degree virtual environments and augmented and virtual reality.

2.3.2 Health science reasoning test (HSRT-N)

Participants completed the Health Science Reasoning Test (HSRT-N) at both the Initial Session (week one) and the Evaluation Session (week six). The HSRT uses the same critical thinking subscales as the more generic California Critical Thinking Skills Test (CCTST) (Facione and Facione 2023) originally described in the Delphi study (Facione 1990).

The HSRT-N is a 33-item multiple-choice assessment that takes approximately 50 min to complete and is specifically designed to evaluate the critical thinking skills of health professional students and professionals (Huhn et al. 2011; Saghafi et al. 2024). The vignettes and associated questions are developed by Insight Assessment© with a health science context, but no prior knowledge is required to correctly answer the questions (Flowers et al. 2020). Participants were encouraged to complete the test on a laptop, though the scenario was formatted so could be completed on a mobile phone as an alternative.

The HSRT has reported internal consistency (Kuder Richardson-20) ranging from 0.77 to 0.84 (Cazzell and Anderson 2016; Forneris 2015) with an overall internal consistency of 0.81 (Facione and Facione 2023). Construct validity has been confirmed in a physiotherapy population by Huhn et al. (2011) who were able to discriminate between experts and novices (p =.008).

An overall HSRT-N score, as well as scale scores in the areas of analysis, interpretation, inference, evaluation, explanation, induction, deduction, and numeracy are provided on completion (Huhn et al. 2011). The overall and sub-scale scores are rated out of 100 and categorised as either not manifested (50–62), weak (62–71), moderate (72–80), strong (81–88) or superior (89–100) (Facione and Facione 2023). Currently, there is no published data on what constitutes an important change score on the HSRT-N (Huhn et al. 2013).

2.3.3 Post-evaluation survey

At the Evaluation Session (week six), all students (including those not in the study) were asked to provide peer feedback on each other’s target population virtual environment. Research participants completed a post-evaluation Survey, the System Usability Scale (SUS) and a post-test HSRT-N. The online post-evaluation survey (Qualtrics©) focused on the use of the headset, co-design of the virtual environment, and overall experience of the learning experience. Twelve questions were answered on a five-point Likert scale.

2.3.4 System Usability Scale (SUS)

The Systems Usability Scale (SUS) is a 10-item questionnaire rated on a five-point Likert scale, resulting in very little time to administer (Brooke 1986). The SUS has been used to measure the usability of commercial products, clinical interventions and more recently, HPE (Escalada-Hernandez et al. 2024; Yoo et al. 2024). The SUS has reported reliability and validity, including concurrent validity (Bangor et al. 2008). While the Technology Acceptance Model (TAM) has also been utilised to guide perceived usability (Yoo et al. 2024; Zlamal et al. 2022), the SUS was selected as the project investigates variables beyond just usability (i.e. effectiveness, efficiency and satisfaction), and could be used with a small sample size (Cheah et al. 2023; Lewis and Sauro 2009).

2.3.5 Peer feedback

Feedback has a positive impact on student learning and achievement, especially cognitive and motor skills compared to motivational and behavioural outcomes (Hattie and Timperley 2007; Wisniewski et al. 2020). All students were asked to provide peer feedback on the virtual environments of other groups. This was facilitated by an online survey (Qualtrics) with questions answered on a five-point Likert scale. Questions related to usability, engagement, and critical thinking, as well as identification of “best element” and element “requiring improvement”.

2.4 Data analysis

The HRST-overall score and sub-scores were collated for each participant and compiled as mean and standard deviations. The HSRT-N-overall percentile was also compared with an aggregate sample of the ‘HSRT-N Graduate Physical Therapy’ comparison group from Insight Assessment©. For example, if a test taker had a 60th percentile score, roughly 59 people out of 100 would score lower than this test taker and 40 people out of 100 would score higher than this test taker in the comparison group (Facione and Facione 2023).

Paired and independent samples t-test analysis were used to evaluate the HSRT-N mean test scores between pre-test and post-test overall and sub-scores. An alpha level of 0.05 was used for all statistical tests. The effect size of change scores in the HSRT-N was calculated for all pairwise comparisons of change scores using Cohen’s d = (mean 1 − mean 2)/SDpooled. Effect sizes were defined as small (0.00–0.29), moderate (0.30–0.79), and large (> 0.8) (Cohen 1988). All data were analysed using IBM SPSS Statistics for Mac© version 29.0.1.0 (IBM Corp, Armonk, NY) and Microsoft Excel© software. All other quantitative data were collated as mean values and standard deviations.

The odd-numbered questions on the SUS are scored more favourable if rated “strongly agree”, and even-numbered questions more favourable if rated “strongly disagree”. Raw SUS scores can be converted into percentile ranks with indicative grades and levels of acceptability (Sauro 2018). The average SUS score (at the 50th percentile) is 68, with scores over 68 considered above average and anything under 68 as below average (Sauro 2018).

Post-evaluation qualitative data focusing on the use of headsets, co-design and the primary researcher’s key reflections were exported to Excel©. The dataset was read systematically from start to end with semantic (explicitly expressed) and latent (implicit or conceptual) code labels conceptualised and assigned before refining, defining and thematic mapping (Braun and Clarke 2022).

3 Results

3.1 Evaluation of population

From the source population, n = 74 completed a consent form, with 46 completing the pre-evaluation survey (37% of the source population), and 25 completing the post-evaluation surveys (20% of the source population; 45% attrition). Participant ages ranged between 22 and 26 years of age (mean 25, SD 1.24) and most participants identified as female (n = 18; 72%), with six (24%) identifying as male and one (4%) preferring not to answer. The majority of participants were Australian (n = 16; 64%) with the remaining identifying as Asian (n = 9; 36%).

In the pre-evaluation survey, the majority of participants indicated that they used their mobile phone for educational purposes daily (n = 17; 68%) or two-to-three times a week (n = 4; 16%). The majority of participants had rarely used 360-degree virtual environments (n = 14; 56%) or never (n = 4; 16%). Similarly, most participants had rarely used virtual or augmented reality (n = 16; 64%) or never (n = 8; 32%) before the introduction of this learning task.

3.2 Critical thinking (HSRT-N)

Analysis of the HSRT-N test results suggests that the co-design of clinical virtual environments had a positive effect on critical thinking scores, with a mean pre-test of 83.48 (SD = 6.42) compared with a mean test of 87.00 (SD = 4.79) at post-test evaluation.

Comparison of HSRT-N total pre-test and post-test mean sub-scores showed no statistically significant change in the sub-scores for evaluation, explanation and induction. However, there was a statistically significant increase in the mean sub-score for analysis, interpretation, inference, deduction and numeracy (p <.05) (Table 2).

Table 2 Health science reasoning test: mean, significance and effect size

Even at the pre-test level (55th percentile), analysis of mean HSRT-N scores showed that most test takers scored well against their HSRT-N Graduate Physical Therapy comparison group. The mean HSRT-N improved over the six weeks to a post-test level at the 75th percentile. This would equate to roughly 74 Graduate Physical Therapy students out of 100 scoring lower than the cohort of participants in this study (Facione and Facione 2023).

3.3 System Usability Scale (SUS)

The overall SUS mean was 65.20 indicating that the usability was close to the average (N = 25, M = 65.20, SD = 15.59) (Table 3). Of the items in the SUS where a higher mean indicated more positive responses (i.e., the odd-numbered questions), participants indicated that they quickly learnt how to use the browser-based virtual platform (https://www.seekbeak.com) (n = 25, M = 3.92, SD = 1.00), and found it easy to use (n = 25, M = 3.80, SD = 0.87). They were confident in the use of the virtual environment (n = 25, M = 3.64, SD = 0.76), found the items well integrated (n = 25, M = 3.52, SD = 0.82), and would like to use it more frequently (n = 25, M = 3.48, SD = 0.87).

Of the items in the SUS where a lower mean indicated more positive responses (i.e., the even-numbered questions), participants indicated that: there was a lot to learn to use the virtual platform (n = 25, M = 2.04, SD = 0.89), they would need support to use (n = 25, M = 2.28, SD = 1.10), there were some inconsistencies in the system (n = 25, M = 2.52, SD = 0.77), it was unnecessarily complex (n = 25, M = 2.68, SD = 1.11), and cumbersome (n = 25, M = 2.76, SD = 1.13).

Table 3 Results of post-evaluation system usability scale (SUS)

3.4 Use of headsets, co-design and overall experience

Participants were asked questions related to the use of the Merge headset in the post-evaluation survey. Participants agreed that the use of the headsets added to the learning experience (n = 12, M = 4.17, SD = 0.83) and aided critical thinking (n = 11, M = 4.00, SD = 0.63). However, the use of the headset provoked cybersickness in a couple of participants (n = 11, M = 3.36, SD = 1.36).

Participants indicated that co-designing the virtual environment with their peers did not have an impact on their development of critical thinking (n = 25, M = 3.12, SD = 1.05), though trended to a positive impact on their reasoning (n 25, M = 3.52, SD = 0.92) and knowledge (n 25, M = 3.76, SD = 1.01).

Participants indicated that they agreed that they had a sufficient overview of the purpose of the learning experience (n 25, M = 3.92, SD = 0.91), equipment and resources (n 25, M = 4.28, SD = 0.84), support (n 25, M = 4.36, SD = 0.64), and feedback (n 25, M = 4.00, SD = 0.82), and that the task trended towards a better learning experience than conventional (n 25, M = 3.56, SD = 0.77). Participants implied they would recommend the use of mXR for future learning (n 25, M = 3.96, SD = 0.89) (Table 4).

Table 4 Post-evaluation of use of headsets, co-design, and overall experience

Participants reflected on their own learning experiences when using the mXR. The most pleasing elements of the learning experience were being able to navigate new spaces virtually, having a creative license in the development of their scenario, the use of mixed reality, and co-designing with peers. The most challenging elements included clarity on how to upload the objects, time management, navigating the environment, the use of the headsets (e.g., large phones not able to fit into the headset), cybersickness, and a lack of clarity of the purpose of the learning task (Table 5).

Table 5 Pleasing and challenging elements of learning experience

3.5 Evaluation of peer feedback

During the post-evaluation session, students were asked to share the virtual environments with other groups and provide feedback on the presentation of the target population using an online form. Participants either “Strongly agreed” or “Agreed” that their peer virtual environments were easy to navigate, made good use of the virtual environment, had a high level of engagement, and provided more authentic learning compared to other modes of learning experiences. The challenge to critical thinking was rated slightly less than the other items, though was positively trending (Table 6).

Table 6 Peer feedback on 360 virtual environments

Participants were asked to identify the best element and element for improvement. Students were mostly complimentary of the development of their peers’ videos, especially if included potential trick movements when completing exercises, additional tips, FITT VP, and benefits of exercise. Students were also positive when audio accompanied the biography or exercise instructions in the video.

Areas for improvement included audio for engagement, less text in the video slides, more elements to challenge critical thinking, progressions, outcome measures, and editing considerations (e.g., including frontal and sagittal videos or location of objects in the virtual environment).

4 Discussion

4.1 Critical thinking (HSRT-N)

Developing these critical thinking attributes in health professional students is essential to enhance their ability to analyse a scenario to inform effective clinical practice (Carbogim et al. 2018; Chan 2013). This study has demonstrated that the co-design of virtual environments facilitates the development of critical thinking in physiotherapy HPE, specifically improving the ability to analyse, interpret, and make inferences and deductions in unfamiliar environments.

However, there was no statistical improvement in evaluation, explanation, or induction according to the Health Science Reasoning Test (HSRT-N). The reduced ability to assess the credibility of claims or strength of arguments (Evaluation) may be impacted by the current climate of misinformation, generative artificial intelligence, and deep fake. Participants in this study ranged between 22 and 26 years of age, aligning with 18% of Australians as Generation Z (Australian Bureau of Statistics 2022) and representing the majority of students in higher education today (Basinger et al. 2021). While Gen Z feel more confident in identifying false or misleading information than other generations (Poynter Institute for Media Studies 2022), they frequently base conclusions on surface-level features of [mis]information and are not adequately taught how to judge credibility (Breakstone et al. 2021).

Critical thinking itself is not inherently dangerous as it positively enables a health professional’s problem-solving, decision-making, creativity, communication and self-reflection. However, the potential negative impact needs to be considered, such as epistemological engagement, intuitive judgement, as well as emotional and biased thinking (Dwyer 2023)- i.e. misinformation. This has heightened alongside increasing political, social and health-related concerns, and may present to the current Gen-Z cohort as unchecked facts in social media and generative artificial intelligence.

Health professional students are to be encouraged to developing their critical thinking by actively engaging in forming well-constructed questions, maturing truth-seeking strategies while formulating comprehensive analysis of information, and evaluation of findings and outcomes as they mature prior knowledge.

It was anticipated that participants would have developed reasoning skills in their entry bachelor programmes before commencing the Doctor of Physiotherapy. However, graduate entry students, despite starting as experienced learners, face the challenge of applying critical thinking within a shortened degree timeframe (Macdiarmid et al. 2024). This may be further compounded by the need to disassociate from didactic foundational learning experiences from their undergraduate programme, and be open to alternative methods of teaching to elevate critical thinking as a health professional. The development of critical thinking is progressive over the time of the degree- even for graduate entry programmes (Furze et al. 2015). Graduate entry students’ enjoyment of active learning and alternative approaches to learning can be utilised to enhance their development and learning experiences (Berg et al. 2021). Future iterations may integrate engagement with nuanced statements that are neither entirely true nor false to better prepare for complex real-world HPE experiences (Schvaneveldt et al. 2022).

This study did not show a statistical difference in Explanation- the ability to provide evidence, reasoning, assumptions, or rationale for judgements (Facione and Facione 2023). Future iterations could integrate the practical structure of Toulmin’s Argument Model and evidence-based reasoning to scaffold well-supported explanations (Ju and Choi 2017), along with Socratic questioning built into the template or reflective logbook could further facilitate critical thinking and explanation skills (Hu 2023).

Both deductive and inductive reasoning are important to developing critical thinking in HPE (Karlsen et al. 2021). The improvement in deductive learning in this study may reflect the participants’ familiarisation with a learning process that is explained by the demonstration of its application to clinical situations (Lin et al. 2023). However, improvement in Inductive Reasoning was not observed. Inductive reasoning draws on prior experience, knowledge and empirical observations to analyse patterns, and then draw conclusion(s) to make a reasoned judgement of what may happen in an unfamiliar situation (Facione and Facione 2023; Lin et al. 2023). Although participants may have developed an ability to form conclusions from observations in previous non-health professional degrees, the transference of this skill to form reasoned judgements from specific health observations may be limited in this second-year DPT cohort. Future iterations may integrate deductive reasoning further by providing structured logical problems where students apply general rules or theoretical principles to specific real-world scenarios to reach conclusions.

Participants received the HSRT-N results immediately from the online portal (https://insightassessment.com), appealing to Gen Z participants who expect instant feedback and access to the content (Abril 2024). Feedback should be personalised, explicit, and explain why and how the development of the virtual environment could be carried out differently, rather than focus on what went wrong (Abril 2024; Basinger et al. 2021; Cragun et al. 2024). While this first iteration included an option for participants to contact the researcher as needed, future iterations could benefit from scheduled “check-ins” that would focus on expectations for the week, while also providing an opportunity to ask questions (Abril 2024).

Beyond the HSRT-N, participants indicated that co-designing the virtual environment did not develop their ‘critical thinking’, however, did have some impact on ‘clinical reasoning’. Neither term was defined in the participant information, though this cohort may be more familiar with the latter (clinical reasoning) and see more directly the impact on knowledge and application when responding to the survey.

4.2 Co-design

Participants indicated that co-design did not have an impact on the learning experience or development of critical thinking, though could be the result of involvement of co-design in their bachelor’s degree. By starting with the co-design of the scenarios, learners begin with curiosity as they discern the knowledge required and reflect on the learning process and application to [clinical] practice (Blaschke 2012; Blaschke and Hase 2016; Canning and Callan 2010; Hase 2009; Hase and Kenyon 2007). Co-design could be enhanced in future iterations with self-selected group tasks that stimulate both heutagogical and social constructivist practice with “more knowledgeable others” (Oliver 2000; Thomas et al. 2014).

Mapping “clinical clues” with decision points similar to a “choose-your-own-adventure”(CYOA) approach would enhance learner agency (heutagogy) while co-designing the virtual environment. While using CYOA to facilitate critical thinking is yet to be fully explored in HPE, preliminary studies indicate improvement in engagement and satisfaction in learning, confidence, and developing clinical decision-making in preparation for unexpected situations (Jogerst et al. 2022; Litten and Stewart 2023; Thomas et al. 2022). Consequences of the CYOA pathway choice would then be presented, either providing positive feedback on the correct option, or reflective questions to assist learning before returning to re-evaluate the virtual scene, developing a learner’s capability and ability to act on the information presented (Blaschke and Hase 2016; Cochrane et al. 2018).

Incorporating a review of both (a) the problem and resulting actions and outcomes, and (b) how the problem-solving process influences their own beliefs and actions promotes self-reflection (heutagogy) could be included in future iterations with a reflective journal and prompting questions based on the expectations for the week.

4.3 System Usability Scale (SUS) and post-evaluation survey

The combined results of the System Usability Scale (SUS) and the post-evaluation survey produced some conflicting findings. This may be a result of the design of the SUS (alternating positive and negative statements) and the focus on the use of the virtual headset, which was novel for most (96%) participants. However, the mixed responses to usability are comparable to findings in Saab et al. (2023) those who reported that while the use of virtual reality clinical scenarios promoted clinical decision-making and critical thinking, familiarity with the use of mXR was initially confusing before figuring out how to use it. The inclusion of a brief tutorial video and bullet point instructions may positively impact the usability of the virtual headset.

4.4 Limitations

While the study advocates for mXR to facilitate critical thinking in HPE, some limitations exist. Firstly, the participant number (n = 25) represents only 20% of the source population in the doctor of physiotherapy programme thereby limiting generalisability. Although it was anticipated that the population would not be representative of students new to tertiary study, the age range was narrow (22–26 years of age) for a post-graduate degree. This, however, could be generalisable to both undergraduate and post-graduate student cohorts. As the first iteration of Educational Design Research, realised design principles cannot be reported at this point of the project. However, some suggestions for future iterations have been presented above. In addition, next steps may integrate design principles in other health education programmes and variations of mXR. This iteration was limited to six weeks in a semester, however, future development could consider the value of consolidating or expanding this timeframe for engagement, critical thinking skills and dispositional development. mXR-facilitated critical thinking may be intentionally scaffolded across the health programme to supplement student clinical experiences (i.e. orientation, traumatic and/ or complex scenarios), the value of which accrediting professional bodies could consider toward competency and registration requirements (i.e. clinical hours).

5 Conclusion

This paper presents findings from the first iteration of a larger educational design research project. The study demonstrated that critical thinking improved using a heutagogical, social constructivist approach while co-designing a virtual environment for health professional education. Some elements of critical thinking may be influenced by inherent perceptions of a Generation Z cohort and pre-exposure to the development of these elements in previous degrees. The usability and learning experience of immersive mobile extended reality for health professional education is encouraging, with suggestions for future iterations presented.