1 Introduction

Suicide now ranks as the second leading cause of death among adolescents [1]. Though non-suicidal self-injury (NSSI; deliberate destruction of body tissue without suicidal intent and not socially sanctioned) remains distinct from a suicide attempt [2, 3], over time, NSSI increases risk of suicidal behavior [4, 5]. The National Institute of Mental Health (NIMH) recognizes that enhancing clinicians’ ability to identify persons considering suicide could help prevent more suicides [6]. A high priority area for mental health professionals involves reducing the burden and mortality associated with suicidality through research on early detection, assessment, and interventions [7]. Identifying youth who engage in NSSI and adequately assessing and managing self-injury represents a strategy to prevent suicidal behavior among adolescents. Non-mental health care providers, including primary care providers (PCPs), are positioned to lead such important public health interventions to prevent youth suicide [8]. However, research has shown that only about 1 in 4 PCPs routinely inquires about and addresses NSSI among youth [9]. Further, relative to other areas of mental health care, pediatric PCPs feel least prepared to address and want more training on NSSI [9].

To address gaps in training, this simulated training is aimed at non-mental health clinicians and involves developing and validating the efficacy of training in a realistic virtual environment enhanced with tools for reflective learning. Ultimately the goal is to help reduce NSSI and suicidal behavior among adolescents. We propose the training intervention will improve clinicians’ skills in identifying youth who engage in NSSI and increase the frequency and thoroughness with which they assess NSSI and suicide risk. Upon completion of the prototype simulated environment, a large-scale controlled clinical trial will be designed to validate the effectiveness of the training intervention in reducing NSSI and suicidality.

2 Study Design

Residents within two pediatric residency programs in Florida will complete the didactic training and simulation practice. Residents in one pediatric residency program in Texas will participate in the didactic training only. We will collect pre-training and 1-month post-training assessments from all residents regarding their knowledge, attitudes, skills, behavioral control and intentions, and behavior changes. In addition, a subset of residents will complete 3- and 6-month follow-up assessments to determine preliminary longitudinal effects of the training. Data collection using additional methods that supplement provider self-reports will occur in Florida. We hypothesize that training in the proposed virtual learning environment (VLE) will result, on average, in improved attitudes and increased knowledge and behavioral control and intentions, leading to positive behavior changes among clinicians and, ultimately, decreased incidence of NSSI and suicidal behavior among youth. Analyses will be based on before-after comparisons via paired t-tests for demonstrating benefits of the training, as well as generalized multiple regression with random and fixed effects.

2.1 Suicidality and NSSI Among Adolescents

Suicide rates for adolescent males and females in the U.S. have steadily increased since 1999 [1]. In 2014, 5504 adolescents aged 10 to 24 died by suicide, which represented more deaths than the next eight leading causes of death combined [1]. The rates of suicidal behavior among youth heighten the public health significance of the problem. Suicidal ideation and suicide attempts increase precipitously during adolescence, peaking during mid-adolescence (15 to 18 years) [10, 11]. For every death by suicide, 100 to 200 adolescents attempted to take their own lives, compared to a rate of 4 attempts to 1 death among the elderly population [12]. In 2015, almost 18% of high school students seriously considered attempting suicide, and 9% actually attempted suicide during the previous 12 months [13].

NSSI and suicide attempts differ, yet sometimes co-occur [3]. Repetitive NSSI increases risk of suicide by reducing fears of pain and injury over time, removing a barrier to completing suicide [3, 4, 14]. Prior research involving team members found that NSSI represented the most important factor to distinguish youth from the general community who attempted suicide from those who only considered suicide [5]. Approximately 18% of adolescents have engaged in NSSI, [15] with an age of onset usually between 13 and 15 [16]. Adolescents most often engage in NSSI to regulate overwhelming, negative emotions such as anger, anxiety, or frustration [17,18,19]. NSSI reduces negative emotions, making this coping behavior highly reinforcing [20]. Many adolescents who self-injure report weekly episodes of behaviors such as cutting, burning, scraping, or erasing the skin [21]. Factors distinguishing NSSI from a suicide attempt include intent or purpose, frequency, interpersonal and intrapersonal consequences, severity or lethality of methods used, number of methods used, and cognitive state during self-harm [3]. Still, some adolescents engage in NSSI as a strategy to avoid/suppress suicidal thoughts [22]. Thus, we hypothesize that addressing NSSI and suicidal ideation will represent highly effective strategies for preventing youth suicide.

2.2 Clinician Education, Perceptions, and Behavior

Despite increased awareness of suicide and NSSI as major public health problems, many healthcare professionals who have frequent contact with high-risk patients lack adequate training in specialized assessment techniques and treatment approaches [23]. The Joint Commission stated: “Clinicians in emergency, primary, and behavioral health care settings particularly have a crucial role in detecting suicide ideation and assuring appropriate evaluation” [24 (p. 1)]. However, clinicians’ knowledge, comfort, and skills determine their ability to provide appropriate care to distressed adolescents [25]. The Commission and U.S. government’s National Strategy for Suicide Prevention recommend educating all staff in patient care settings to identify and respond to patients with suicidal ideation toward a goal of “zero suicides” [24, 26]. Researchers, including members of this team, also have highlighted the need for clinician training on NSSI [9, 27, 28].

As discussed in a review article [23], healthcare providers with pediatric patient populations encounter distressed and suicidal adolescents, and thus can play a major role in NSSI and suicide prevention. Research shows that 20% to 41% of adolescents who present to PCPs have high levels of emotional distress and/or suicidal ideation, yet PCPs identify less than half (24%–45%) of these young people [29,30,31]. Given the prevalence of NSSI, non-mental health care providers will likely confront self-injurious behavior among adolescents [9]. Therefore, they would benefit from enhanced skills related to engaging youth in conversations about NSSI; assessing the history, context, and functions of the behavior; and referring patients to mental health specialists [9, 27, 28]. Research suggests non-mental health clinicians are often unprepared to address NSSI, and thus need additional training [32,33,34]. Research also suggests medical staff may feel more frustrated, burdened, impatient, and unsympathetic toward patients who engage in NSSI, compared to other patients [35]. Applying Weiner’s Attributional Model of Helping Behavior [36], members of the research team found that clinicians attributed more control, stability, and internal locus to a self-injuring patient, resulting in less willingness to help [35]. Unhelpful attitudes of healthcare providers toward individuals who self-injure remain common [37,38,39]. For example, Friedman et al. [40] found that 77% of emergency clinicians felt NSSI was about seeking attention. These researchers also found that just 13% of clinicians strongly agreed healthcare providers should consider people who engage in NSSI at risk of suicide, and 92% thought training in managing NSSI was important [40]. Further, these attitudes likely contribute to adolescents’ fears of being labeled as an “attention seeker,” “stupid,” or “crazy” and their reluctance to seek help [41]. Negative attitudes held by clinicians, and a lack of confidence and competence to address self-injury, may compromise a therapeutic relationship with an adolescent and increase risk of suicide [42]. Receipt of care among adolescents who self-injure remains low [41, 43,44,45]. However, training clinicians to screen for and assess NSSI should facilitate increased NSSI self-disclosure and receipt of needed care [9].

3 Virtual Learning Environments and NSSI

The virtual environment research used for this study augments existing didactic training that is already implemented in programs of study. Residents will apply the knowledge they acquired during the didactic session and practice new skills (e.g., identifying and assessing NSSI) by engaging with an adolescent patient avatar who self-injures within the NSSI_VLE (Virtual Learning Environment) simulation. At the end of each practice session, residents will receive feedback on their interactions. Some feedback will be automated, e.g., time the resident talked versus listened and time he/she displayed negative nonverbal interactions; some will be human-based annotations, e.g., was the dialogue empathetic or did it shut down communications. This feedback, built into the VLE, can be used for either self- or guided-reflection with the goal of improved interactions when working with patients [46]. Each resident will have 10–15 min to interact with the patient avatar, followed by a 5- to 10-minute feedback session. The simulation is easily portable – the participant can use a PC, Linux or Mac laptop computer; the subject matter expert observes and tags events and the human-in-loop guiding the avatar’s behaviors can be anywhere in the world. Moreover, the experience is intense and contextually correct in terms of avatar appearance and behaviors. Thus, the NSSI_VLE has the potential for greater dissemination and impact on clinical practices than does the use of real standardized patients and the employment of programmed, rather than human-guided, behaviors in computer simulations.

The underlying technology and paradigm for the NSSI_VLE were designed and developed over the last decade by an interdisciplinary team of faculty members, research staff, and students from computer science, education and modeling & simulation [47]. The system can create and deliver an interactive virtual reality-based learning experience – all systems built in this environment run with and without a head-mounted display (HMD). The initial and dominant use of the system we will adapt involved preparing pre-service teachers and honing the skills of in-service teachers. At present, the system is deployed at universities and school districts across the United States and internationally. In this context, the teacher enters a room that looks like a classroom, including props, whiteboards, and students [48]. However, students are avatars in a virtual classroom typically projected on a TV monitor or laptop screen (other options include full surround displays and low-cost HMDs such as the Vive). As evidenced in prior studies, delivering experiences on a laptop places no constraints on a user’s sense of immersion, nor on the learning that occurs. The students represent a range of personalities, from passive to aggressive and dependent to independent. Avatar students are “puppeteered” by a single trained human operator, which makes the experience realistic, as the operator can adapt to specific actions of the teacher [49].

The efficacy and effectiveness of the underlying system has been demonstrated in a series of studies funded by the Bill & Melinda Gates Foundation from 2012 to 2016. Studies show that rehearsing skills in the simulated classroom can change targeted behaviors, these learned behaviors transfer to the real world and, in the case of teaching, behavior changes have positive effects on student success [50, 51]. The broader impacts of the paradigm have been demonstrated in law enforcement preparation (interviewing and de-escalation skills), protective strategies for students (resisting peer pressure and providing support for others to do the same) [52], social skills for children with autism, peer tutoring, and job interviewing (as interviewer and interviewee).

3.1 The VLE Paradigm

In the existing paradigm, there are three kinds of users: interactors, participants and observers. A scenario is chosen by a participant or their coach and then initiated by a primary interactor, whose role is to provide genuine human interaction that is mediated by virtual characters. While multiple interactors, one primary and the others helpers, are supported by the software, we rarely use more than a single interactor. That interactor, trained in improvisation, controls one of more avatars (typically between one and six) using gestures to initiate behaviors and their own voice, morphed to match that of the currently inhabited avatar, to enable genuine human-to-human conversations. The second type of user is a participant who is the learner, interacting with the avatar(s). As with interactors, there can be multiple participants but the vast majority of times we have just one or two. The third class of user is an observer who can be passive or active. An active observer is a subject-matter expert (SME) who annotates events that are captured in a video that shows the avatars and the participants. Those annotations, also called tags, can be simple built-in phrases such as “Elicits a Personal Response” or “Does Not Appear to Display Empathy” each of which is associated with the beginning of a sequence of relevant frames in the video. Such annotations can be made on-line (as the events occur) or off-line (as part of after-action review) or even as a combination where on-line annotations might be altered, expanded with comments, or removed upon further examination. All such annotations are hopefully based on objective criteria but have the potential for bias based on the SME’s subjective opinions. Other annotations can be objective and involve no human input. In previous teacher training applications, these have included teacher versus student talk time and percentage of teacher-student interaction time spent with each individual student. All such annotations should be made for the purpose of learning, either through self- or guided-reflection.

3.2 Pilot NSSI Simulation Application

In preparation for this development effort, we have worked with clinical psychologist Dr. Nicholas Westers, from Children’s Medical Center Dallas, who has pioneered the development and begun to evaluate a didactic training program on adolescent NSSI for pediatric residents. Our current effort is a small study funded internally to test the NSSI_VLE concept using two patient avatars to enhance clinicians’ capacity and willingness to engage in helpful conversations with youth about NSSI and assess suicide risk. Results of this study are preliminary but demonstrate the promise as participants report that they feel they are better prepared and will be more comfortable in dealing with NSSI than prior to the training. Figures 1 and 2 show the two scenes that are being used in this pilot study. In keeping with our paradigm, we develop detailed profiles of these subjects including their family situations and the stressors that may make them prone to NSSI behaviors. The following are brief summaries of those profiles.

Fig. 1.
figure 1

NSSI Interview with Kasi

Fig. 2.
figure 2

NSSI Interview with Alex

Figure 1 shows Kasi, a 17-year-old female in the 12th grade at a private school. She typically earns A’s and B’s, but has been struggling this year academically, earning B’s and C’s and failing one or two classes. She lives with her biological parents and four other siblings (she is the oldest). She has used alcohol and marijuana in the past, having gotten drunk on a few occasions, often in the context of her NSSI. She is sexually active with her boyfriend and feels guilty about this. Her Patient Health Questionnaire (PHQ-9) score is a 22 (i.e., severe depression) and she also experiences significant levels of anxiety. Although she has thought about suicide, including sometimes when she self-injures, she has never acted on these thoughts (i.e., she has never attempted suicide). She first started to self-injure because some of her friends disclosed their own NSSI as a coping strategy, so she tried it for herself and realized it worked well to cope with her emotions or punish herself.

Figure 2 shows Alex, a 14-year-old male in the 8th grade at a public school. He typically earns mostly B’s with occasional A’s. He lives with both biological parents and his 8-year-old sister. He has denied ever having used any alcohol or drugs, has never had sex, and does not exhibit symptoms of an eating disorder. His PHQ-9 score is an 8 (i.e., mild depression). His girlfriend since 7th grade broke up with him five months ago.

As is evident, the cases are quite different and, if these youths self-injure, the frequency and the means of doing so are likely to be different. Moreover, their responses, including openness to share, may be quite different. This situation can be intimidating for a PCP who is primarily trained to deal with more objective medical matters. The goal, as stated, is to provide experiences through which they can hone their skills much as a pilot does, gaining confidence even for situations that occur rarely in their practice, but are nonetheless important to their performance as first-line health professionals.

Regarding this approach, researchers have suggested the need to include virtual patients in medical education to support competency-based education [53], and have shown equivalent improvements in skills after trainings with virtual patients as compared with live standardized patients [54]. Interactive trainings prove most effective in changing clinicians’ behavior [55, 56]. Role-playing with feedback allows clinicians to practice skills and enhances clinicians’ capacity to address diverse adolescent health issues [57, 58], including suicidality [59, 60]. This project builds on identified best practices for enhancing clinicians’ knowledge, skills, and behaviors by incorporating an interactive VR experience using role-playing and feedback with patient avatars.

4 Digital Puppetry Development

Given the complexity of communication in these adolescent interview cases, creating a digital puppetry interface that would allow interactors to perform nuanced non-verbal behavior was a significant challenge which required multiple iterations to achieve the necessary granularity of expression and control. Multiple combinations of technologies were tested to find a solution that allowed for precision control and quality performance across distributed networks.

4.1 Gross Body Movement

Initial iterations of the digital puppetry system assumed a one-to-one real-time motion tracking paradigm using a combination of infrared head tracking and a Microsoft Kinect Device [61] to control gross body movement and posture of the virtual avatar. However, sending the motion capture data in real time over a distributed network created visual quality issues. Lack of precision in tracking and lag combined with imperfect occlusion models in the environment created a high frequency of instances where avatar gross body motion appeared unnatural. In worst cases avatar limbs would spasm, jerk, or extend through inanimate objects in the environment. Efforts to constrain possible motion to avoid those visual errors were unsuccessful in that those efforts limited the range of motion in a way that made performance of nuanced non-verbal expression inconsistent and difficult to achieve. Thus, the one-to-one real-time motion tracking paradigm was abandoned and replaced by a gesture-based puppetry paradigm.

In the gesture-based puppetry paradigm, the full range of motion for each avatar was pre-defined, modeled, and mapped to geographic triggers that the interactor can use to manipulate the body posture of the avatar in real-time without motion capture. The range of motion for each avatar was defined as a body pose palette which was collaboratively designed by SMEs, interactors, and an animator using video reference materials from clinical case studies. Examples from the body pose palettes of Kasi and Alex can be seen in Figs. 3 and 4.

Fig. 3.
figure 3

Body pose palette samples for Kasi

Fig. 4.
figure 4

Body pose palette samples for Alex

The virtual environment allows the interactor to calibrate the geographic trigger for each body pose. Thus, while limited to a defined set of pre-determined poses, the interactor can customize the trigger points to suit individual preferences and physical requirements which allows for more natural and intuitive motion control while avoiding the visual errors produced by motion capture. However, even improved, this paradigm was found to be insufficient to produce the desired level of natural motion. Using the geographic trigger method, we found that ten to twelve poses were the upper limits of what interactors could reliably navigate for each avatar. Interactors quickly became frustrated with the limitations of the palette and expressed a need for more subtle motion between more communicative larger pose choices.

To address the need for these more subtle and transitional body postures, three key frames of animation were added to each body pose in the palette along with gesture pacing controls. Using the pacing control, interactors are now able to scroll through a range of motion within each body pose. We’ve found that this subtle motion contained within the body pose range allows for more authentic conversational movement. Additionally, the head controls for the avatar were separated from the body poses and returned to use one-to-one infrared tracking based on real-time interactor head movement. In this way, interactors could use head position to alter emotional and status cues of the body poses to expand the expressive potential of the set. Motion controls were also set on the head tracking to prevent avatar head motions that would not be physically possible. If incompatible tracking data is sent to the system, the head position reverts back to body pose defaults and thus prevents most visible errors. Thus, the gesture-based body pose control system in combination with infrared head tracking and key frame animation control was found to be optimal for precision puppetry of the avatars.

4.2 Facial Expressions

Similar to the development process for gross body poses, the development cycle for a digital puppetry system for facial expression transitioned from a one-to-one motion capture paradigm to pre-defined range of triggered expressions. Initially we attempted to use direct motion capture using facial markers. Unfortunately, we experienced the same issues of lack of precision control, instances of tracking failure, and lag. Next, an attempt was made to design generalized facial expressions based on theories of universal facial expression characteristics with the hopes that this set could be applied to all avatars in the system and that could be triggered using a joystick control. While the joystick control was found to be much more reliable than the motion capture control and eliminated the lag and visual errors, we found the resulting facial expressions to be insufficient to communicate nuanced emotional states required by the interview situation. Thus, instead of creating a general set of universal facial expressions, we developed specific facial expression palettes for individual avatars to cover the range of expression expected in these interview situations. Examples of the facial expression palettes can be seen in Fig. 5.

Fig. 5.
figure 5

Facial expression palettes for Kasi and Alex

The effect of using these standardized palettes for body and facial expression is that the performance of the human interactor is contained within these standardized physical expressions while allowing the interactor the freedom to respond to interview cues from the learner. This feature provides an added benefit over motion capture techniques in that the puppetry performance can be both responsive and standardized. The interactor has sufficient precision to respond authentically to the learner, and yet responses are bounded within a standardized set of expressions that can be based upon authentic sources and input from SMEs.

5 Conclusion

When addressing complex interpersonal situations such as conversations between physicians and adolescents engaging in NSSI, using gesture-based digital puppetry provides a way to create a responsive virtual environment for authentic rehearsal. We foresee VLEs enhancing the capacity of clinicians who must communicate effectively about sensitive topics and build trusting relationships with young people. This paradigm can be used to better prepare physicians through scenario rehearsal that approximates real practice and via informed reflection that identifies types of interaction, reinforcing those SMEs deem as effective ones and highlighting what these SMEs view as ineffective ones.

This project also is taking advantage of research we are carrying out on the affective states of participants. Initial work has focused on body posture and separately on facial gestures with studies investigating alternative means of feedback regarding body poses [62] and the challenges posed by hand to face occlusions [63]. We have also studied various late versus early fusion strategies for multimodal data, typically involving vocalizations (sounds) but not verbalizations (words). This work is still ongoing but shows great promise for reflective feedback and even automated changes to avatar behaviors.