Keywords

1 Introduction

1.1 Medical Application Background

Current trainees in pediatric residency programs are often unable to achieve competency in many necessary procedures before completing their medical education [1,2,3,4,5]. One such procedure is neonatal intubation (the placement of a flexible plastic tube through the mouth down into the trachea in order to maintain an open airway), which remains a critical skill for any general pediatrician responsible for delivering babies or nursery coverage. In the pediatric medical literature, intubation proficiency is defined as the ability for a provider to successfully intubate more than 80% of the time [6]. A recent study [7] found that an average of 8 to 10 intubation opportunities may be required to achieve this competency level. This same study also demonstrated that medical trainees’ intubation opportunities have been reduced from more than 30 over a 3 year training period per trainee to less than 3.

Thus, it can be argued that the decrease in intubation proficiency may be directly attributed to the decrease in exposure of pediatric residents to neonatal intubation. There are multiple factors contributing to this decrease in intubation opportunities, including the recent restriction of resident duty hours, the increased use of non-invasive mechanical ventilation for neonates [8, 9], the new recommendations of the Neonatal Resuscitation Program regarding management of non-vigorous infants with meconium-stained fluid, and the expansion of non-physician providers (such as neonatal nurse practitioners) in many academic medical institutions [10].

Given these current limitations on intubation opportunities, there is a significant need for the development of new techniques to teach novice providers the skill of neonatal intubation. Many programs are relying on learners’ experiences in simulation labs to compensate for the decreased opportunities to intubate real patients. However, the learning process is heavily dependent upon the provider’s understanding of the intraoral anatomy as seen in real life (as opposed to the views obtained with current simulation manikins) [11].

It is well known that although every intubation attempt performed by a learner is supervised by a competent intubator, the actual process of learning this skill is constrained by the fact that the supervisor cannot actually see what the novice intubator is viewing. To overcome this handicap, several teaching medical centers have begun to utilize video laryngoscopy systems wherein the supervisor may have access to the same view as the learner. Currently available video laryngoscope systems, such as the Glidescope (Verathon Medical, Bothell, WA) and Storz C-MAC D-blade (Karl Storz, Tuttlingen, Germany), are prohibitively expensive ($22,000–$55,000), contain fragile fiber optic components, and require the operator to learn the skill using non-standard equipment and techniques. It is uncertain whether the learning process using these costly non-standard systems will translate to clinical environments that do not have access to these highly specialized intubation tools.

Another concern with using traditional video laryngoscopy is that it requires the intubator to turn his or her head towards the video monitor, thus interrupting direct line-of-sight visualization. Video laryngoscopy also presents a deep intra-oral view of the airway that bears little resemblance to the typical unassisted view. These two differences create a new challenge to gaining competency in direct laryngoscopy (with the focus being on a direct line-of-sight view) while using video laryngoscopy as a learning tool (with the focus being on an enhanced glottic view without a direct line-of-sight view).

1.2 Proposed Solution and Contribution

In this research project, Lenovo Research and the Duke NICU (a member of the National Institute of Child Health and Human Development [NICHD] Neonatal Research Network) jointly developed an Augmented Reality-Assisted Laryngoscopy (ARAL) system using a head-mounted device (HMD) also known as “smart glasses”. Medical providers with limited intubation experience wore these smart glasses while performing intubations on an infant manikin with a high-resolution camera attached to the laryngoscope blade. An enhanced image of the patient’s airway was projected onto the glasses’ visual field, giving the intubator improved glottic visualization while still maintaining focus on the direct line-of-sight view of the larynx.

It has been previously demonstrated that coaching by a supervisor viewing the video images, either on the same screen or on a different device, increases the likelihood of successful intubations and shortens intubation times for novice providers on intubation manikins [12]. By making both the enhanced glottic view and the direct line-of-sight view continuously available to trainees, the ARAL system combines the benefits of both direct and video laryngoscopy and may help enhance trainees’ ability to subsequently perform direct laryngoscopy successfully without the use of AR.

The approach of using an AR HMD to provide views of live camera feeds in order to assist health care providers in performing medical procedures is novel and can be expanded to many other areas of medicine. The advantages of maintaining the direct line-of-sight view along with the enhanced glottic visualization, as well as the shared supervisor view, have the potential to improve efficacy (rate of successful intubation) and efficiency (time to intubate) for both trainees as well as experienced providers.

The objectives of this paper are to explore the design issues and tradeoffs associated with the ARAL system. The results of a recently completed pilot study exploring the use of the described ARAL system is currently being prepared for publication. We will discuss the effectiveness of the improved glottic visualization, mechanisms for improved supervisor experience in order to facilitate coaching, flexibilities displaying the glottic visualization window in the AR HMD, and the users’ behavior and preferences when switching their visual focus between the glottic visualization window and the direct line-of-sight view while performing the intubation.

1.3 Paper Organization

This paper has six sections: Introduction; System Design and User Experience; Pilot Survey and User Survey; Pilot Study Results; User Survey Results, and Conclusion.

2 System Design and User Experience

In this section, we describe the system design and user experience of the ARAL system.

2.1 Application Scenario and Hardware Arrangements

Figure 1 graphically depicts the application scenario. A standard neonatal intubation manikin and disposable laryngoscope (BritePro Solo, FlexiCare, UK) are used in the teaching program today and are carried forward in the ARAL system. Onto the laryngoscope we add an HD resolution (1280 × 720) camera through the means of a clip-on adapter, described in detail below. The camera used in this paper is a TD-B20903-76 (Misumi Electronics Corp., Taiwan). This device is a small (3.5 mm diameter, 13 mm long) camera designed for the medical industry. The camera is connected to a Capture PC/laptop through a cabled USB connection. The capture PC performs some video processing and encoding, and the real-time video streaming content is then transmitted from the capture PC to the client devices (AR HMD and/or tablet) with a Wi-Fi link using the real-time protocol (RTP). Consequently, the live video stream can be displayed on the AR HMD headset with minimal latency. A supervisor can use the display on the laptop, or connect with another device such as a tablet, to also see the intubation camera video stream and provide verbal coaching to the intubator.

Fig. 1.
figure 1

Pilot scenario of AR-Assisted Laryngoscopy

We developed an adapter (Fig. 2) to hold the camera and attach it to the laryngoscope. The adapter is small and lightweight, and supports the Miller-type BritePro Solo laryngoscope blades (sizes 00 through 2). To support other Miller-type blades minor adjustments may be required. Impact on the intubator is minimal, as the camera/adapter unit weighs less than 10 grams, permits direct line-of-sight visualization, and allows for quick attachment to the laryngoscope blade. In preliminary studies, we observed that the camera-adapter unit created a view similar to that of commercially available video intubation systems. More importantly, unlike these latter systems, neither the adapter nor the camera will enter the oral cavity given that the camera does not extend past the intubation handle.

Fig. 2.
figure 2

Adapter to hold the camera in the laryngoscope blade (a) photograph of the adapter in place on an assembled laryngoscope (b) a CAD model of the adapter; the ramp enables support for a range of blade sizes; the key slot ensures proper camera orientation (c) a detailed view of the key, which is attached to the camera tube

In this paper, the AR HMD that we used were the ODG R-7 Smartglasses (Osterhout Design Group, USA). These glasses have two HD resolution (1280 × 720) displays, one for each eye. During development and when working with early test users, we found that the buttons on the underside of the temples of the smart glasses were accidentally pressed frequently, typically as the user would put the glasses on or take them off. To provide a better user experience, we disabled or limited the functionality of these buttons. Additionally, in a medical setting the health care provider frequently has their hands occupied by the medical procedure and cannot operate controls such as these. As such, settings for the display of the camera stream in the AR HMD were provided on the Capture PC and could be operated by an assistant at the verbal instruction of the intubator.

2.2 Software Arrangements and User Experience

A key user experience requirement for this solution is an easy and seamless connection mechanism. In a medical environment, having the equipment turn on and just work is an expected behavior. In our solution, there are several challenges: the battery life of the smart glasses is short, the boot time is long, and Wi-Fi networks can be difficult and finicky to set up. At this early stage of our work, we have implemented a limited number of features to address these issues. First, we have addressed the Wi-Fi client-server connection discovery problem through the use of a quick response (QR) code. The Capture PC, which also acts as an RTP server, generates a QR code with the necessary information to connect, including the network name and the IP address. The camera integrated in the smart glasses is used to read the QR code, and initiate the connection. We also include the network name and IP address in human-readable form with the QR code, which can help the user in diagnosing any problems. Second, we automatically launch our application on the smart glasses when they have finished booting the operating system, and the application starts in the QR scanning mode. These two components helped make the early testing and pilot study possible, but are insufficient as a final solution to resolve the challenges listed above.

Our solution includes a pan-and-zoom feature that enables us to focus on a region of interest in the camera field of view. As shown in Fig. 3, the camera will see some of the exterior of the manikin’s face and lips (subfigure (a)). Additionally, much of the field of view of the camera will be of the laryngoscope blade itself (subfigure (b)), due to the location of the adapter-camera assembly, which is rooted in the desire to permit direct line-of-sight visualization and prevent the adapter/camera from entering the oral cavity. The pan-and-zoom feature enables our solution to primarily display the enhanced glottic view, without showing much of the patient’s face or the laryngoscope blade, but still showing enough of these visual features to provide a frame of reference to the intubator.

Fig. 3.
figure 3

Camera field of view

Another feature of our solution that was found to be profoundly important during early testing was a manual camera settings interface, particular a setting for exposure. In the intubation scenario, the camera view captures a large range of brightness. Additionally, the camera view inside the oral cavity includes reflective surfaces (either plastic in the manikin or oral secretions in a patient). In our experience, the auto-exposure algorithm available in many cameras did not properly adjust to the conditions seen during intubation, and the manual exposure setting was an important inclusion. Eventually, an auto-exposure algorithm dedicated to this scenario would be a necessary improvement.

We also have two features that are speculative in nature. First, we support the ability to resize and reposition the glottic visualization window in the display field of the smart glasses. The hypothesized benefit of our ARAL system is that the wearer of the headset can see both the patient and the environment with their direct line-of-sight as well as the camera view in the headset display. If the glottic visualization window showing the camera view takes up too much of the visual field, the direct line-of-sight view may be compromised. Similarly, the second feature would support the use of both of the displays (binocular, one in each eye), or only one (monocular, in either the left or the right eye). Rather than reducing the size of the glottic visualization window, the window can be shown only in one eye. This would introduce a binocular rivalry to the wearer, allowing them to choose to focus their visual attention on the glottic visualization window in one eye, or the direct line-of-sight view available to the other eye. In the case of intubation, where a direct binocular view into the oral cavity is partially obstructed, we hypothesize that this monocular approach may be a productive one. However, in the case of the ODG R-7 glasses that are used in this paper, the optics support only a 30° field of view of the wearer, so the resize and reposition feature was not found to be useful. In the future as AR HMD optics improve in resolution and field of view, we suspect the feature may become useful.

One additional pedagogical feature implemented in this solution is an instructor telestrator function. As described in Sect. 1.2, the camera video stream is shown on a display separate from the HMD, such as the Supervisor Tablet in Fig. 1. This allows the experienced intubator to see what the learner is seeing and provide specific verbal coaching. The instructor can also use their finger or a stylus to draw on top of the video stream, which the trainee will see on the HMD display in real-time. This visual instruction thus supplements the verbal feedback being given to the trainee during the actual intubation attempt.

3 Pilot Study and User Survey

3.1 Selection of Subjects

Duke NICU nurses were selected as the subjects for this study. The selection of these subjects was based on the assumption that although these individuals have theoretical knowledge of the intubation process and an understanding of intraoral anatomy, they do not have hands-on experience (with either a manikin or a real patient). This approach allows us to study this technology as it is designed to be used, as an educational tool for novice providers.

3.2 Subject Recruitment and Compensation

Subjects were recruited during their nursing shifts in the hospital. If they volunteered to participate, the study would take place at their convenience (usually during one of their scheduled breaks in the unit’s conference room). There was no compensation for participation in this study.

3.3 Pilot Study

45 test subjects were randomly assigned to one of three groups (with 15 providers in each group). The first group intubated an infant manikin using direct laryngoscopy (DL), which is the standard intubation technique. The camera and adapter were left attached to the laryngoscope so that the same obstruction was in place, but the camera was not powered on. The second group intubated an infant manikin using indirect video laryngoscopy (IL) – these providers relied solely on a video stream of the mouth and airway that was projected onto the Capture PC laptop placed beside the manikin, but without the use of the AR HMD. The third group intubated an infant manikin using AR-assisted video laryngoscopy – these providers were able to view the manikin directly (by peering beneath the smart glasses frame) while simultaneously using the video stream projected onto the glasses to supplement their view. All three groups were given verbal coaching during their attempts by an expert intubator, who was able to view the video stream in real time while assisting those in the second and third groups.

Each participant attempted intubation five times. The outcome of each attempt was documented as either successful (endotracheal tube placed in the airway in less than 30 s), unsuccessful due to time (endotracheal tube placed in the airway within 30 to 60 s), unsuccessful due to attempt being aborted at 60 s, or unsuccessful due to esophageal intubation (endotracheal tube placed in the esophagus instead of the airway). The time required to acquire (or visually identify) the airway as well as the time required to intubate were also recorded.

The preliminary results of this pilot study are described in Sect. 4 with a more detailed analysis being readied for publication in the near future.

3.4 User Survey

The following nine questions were listed on a survey generated through SurveyMonkey and the nurses who joined the pilot study were asked to provide answers anonymously. Eight questions offer 5-point Likert-scale answer options: strongly agree, agree, neutral, disagree, and strongly disagree, and an additional comment box was provided in case respondents wanted to provide any other opinions. Question 8 was in the form of an open-ended question.

  • Q1. It is easier to identify the magnified airway with the smart glasses compared to just using my eyes.

  • Q2. It is advantageous that the supervisor can see the same view that I see on the smart glasses so that she can give me better verbal guidance and instructions.

  • Q3. It is advantageous that the instructor can mark on the view of the smart glasses so she can circle the airway location for me.

  • Q4. I would like to have the ability to turn off the smart glasses display if necessary so that I can focus on the direct view of the manikin.

  • Q5. I would like to have the flexibility of changing the size and position of the view in the smart glasses.

  • Q6. I would like the ability to change the zoom factor of the camera and make the airway even larger.

  • Q7. It would be nice if the smart glasses had a bigger field of view (display size) as the technology continues to improve.

  • Q8. While performing intubation using the smart glasses, did you use the see-through capability of the glasses to look directly at the manikin for any purpose (e.g., to insert the tube into the mouth)? Or was everything done by looking at the smart glasses display of the camera stream? Feel free to elaborate.

  • Q9. (Assuming you did look at the manikin directly) It is easy to switch my attention between looking at the smart glasses display and directly at the manikin.

Questions 1, 2 and 3 solicit opinions on the general utility and effectiveness of the enhanced glottic view, the shared view between trainee and instructor, and telestrator function. Questions 5, 6, and 7 solicit opinions on AR view configurations. Questions 4, 8 and 9 solicit opinions on users’ visual focus behavior and preference when they swap between the two available views (the direct line-of-sight view and the camera view). Note that following the collection of the data from the pilot study described in the previous section, all participants were allowed to experience the AR solution regardless of assigned group. This allowed the test subjects in the DL and IL groups to also answer questions about AR-assisted view. The results of the survey are described in Sect. 5.

4 Pilot Study Results

The overall outcomes for the three groups (DL, IL and AR-assisted) are reflected in Table 1 and Fig. 4. As illustrated, the success rate for participants in the DL group was significantly lower than the success rate of participants in the IL and AR groups. The largest contributor to this disparity was the number of esophageal intubations – over a quarter of the providers in the DL group intubated the manikin’s esophagus, while there were no esophageal intubations in the IL or AR groups. This can be attributed to the specific verbal coaching that is afforded by the live video stream – the expert intubator was able to identify the esophagus and the airway for the novice provider, thus preventing malposition of the endotracheal tube.

Table 1. Overall outcomes for the three groups (DL, IL and AR)
Fig. 4.
figure 4

Overall outcomes for the three groups (DL, IL and AR)

The time required to intubate was also improved with the use of video laryngoscopy (both indirect and AR-assisted). The average time to complete one intubation (successful or otherwise) in the DL group was 36.59 s, compared to 26.17 s in the IL group and 26.31 s in the AR group. This improvement in speed is directly related to the intubator’s ability to visually identify the airway – participants in the DL group acquired the airway in 18.00 s (on average), compared to 5.06 s and 4.65 s in the IL and AR groups respectively.

In summary, the pilot study illustrates the utility of video laryngoscopy in improving intubation proficiency in a simulation environment. We hypothesize that the ARAL system will prove to be both more efficacious and efficient in teaching new providers than indirect video laryngoscopy when this technology is used in real patients. In an effort to assess cross-transference of intubation skills acquired through the use of the ARAL system, we plan to conduct additional studies examining the success rates of providers trained with the use of the ARAL system and subsequent attempts with direct laryngoscopy.

5 User Survey Results

A user survey was distributed to the test subjects several weeks after the pilot study and each test subject filled them out on a voluntary basis. Altogether, 26 subjects filled out the survey: 7 from the DL group, 7 from the IL group, and 12 from the AR group. Table 2 lists the survey responses for the Likert-scale questions in tabular form.

Table 2. User survey responses

5.1 General Utility of Solution

In this section we focus on the questions having to do with the general utility or effectiveness of the AR-assisted solution. Question 1 (Fig. 5), question 2 (Fig. 6), and question 3 (Fig. 7) show that the majority of the respondents agreed that the improved glottic view, shared glottic view, and telestration feature did make learning and performing the intubation easier, demonstrating that the ARAL system is an effective teaching tool.

Fig. 5.
figure 5

Effectiveness of magnified airway (improved glottic view) with the AR smart glasses

Fig. 6.
figure 6

Effectiveness of shared glottic view

Fig. 7.
figure 7

Effectiveness of telestration feature

5.2 View Configuration

We also asked a series of questions about potential desire for user configuration of the display of the camera stream in the smart glasses. Question 5 (Fig. 8), question 6 (Fig. 9), and question 7 (Fig. 10) show the results of the survey in this view configuration category. The majority of the respondents liked the idea of having flexibility to configure the size, position, and zoom factors of the AR view, and they expressed a desire to have a larger field of view once more advanced technology becomes available, though the preferences were not as strong as those regarding the general utility questions.

Fig. 8.
figure 8

AR display size and position configuration

Fig. 9.
figure 9

AR display zoom factor configuration

Fig. 10.
figure 10

Desire for better field of view in AR smart glasses

5.3 Switching Visual Attention Between Views

The third category of survey questions dealt with the users’ behavior and preference between two available views – the direct view of the manikin and their hands under or through the smart glasses, and the enhanced glottic view of the camera stream available in the display of the smart glasses – while performing the intubation task. Question 4 (Fig. 11), question 9 (Table 3), and question 10 (Fig. 12) tried to assess this difference. The majority of the respondents would like the ability to turn off the AR view when necessary to focus on the direct view. While many of the respondents believed that they could easily switch between the two views, a sizable number of the respondents were not so sure (or, as was provided in the comments, at least wanted more practice before considering it as being “easy”), and several believed that switching views was hard.

Fig. 11.
figure 11

The capability to turn off the AR view

Table 3. Open-ended responses about the use of see-through AR display technology
Fig. 12.
figure 12

Ease of switching visual focus between direct and enhanced views.

Q8. While performing intubation using the smart glasses, did you use the see-through capability of the glasses to look directly at the manikin for any purpose (e.g., to insert the tube into the mouth)? Or was everything done by looking at the smart glasses display of the camera stream? Feel free to elaborate.

Table 3 lists all the answers to Question 9 (the use of direct view). We collected 19 responses, while 7 respondents chose not to comment. One respondent said he or she did not know, 5 respondents (26%) claimed that they only used the AR view, and 13 respondents (68%) claimed that they used both the AR view and the direct view. Among them: four respondents used the direct view to insert the intubation tube into the mouth at the beginning; one respondent used the direct view for intubation and only looked at the AR view when the instructor delivered verbal advice; two respondents only looked at the direct view to double check what they saw in the AR view; and six respondents claimed that they used the direct view without specifying more details.

6 Conclusion

This pilot study illustrates the utility of both indirect and AR-assisted video laryngoscopy in improving intubation proficiency in a simulation environment. We hypothesize that AR-assisted video laryngoscopy will prove to be more efficacious than indirect video laryngoscopy when this technology is used in real patients. A manikin provides a static environment to practice intubations without the challenges of patient movement, oral secretions, differing anatomies of individual infants, and other challenges during intubation that are unique to live patients. Future projects are planned to study this theory.

The user survey supports the effectiveness of the magnified airway with the AR smart glasses, shared video stream view between trainees and the expert instructor, and telestrator capability. Users of this technology also like the flexibility of configuring the size, position, and zoom factor of the AR view, and they would like to have an even larger field of view once technology advances further. For the static manikin simulation, the majority of medical participants used both the AR view and the direct view. Most users also found it feasible to switch their attention between the two views. Once again, when facing the challenges during intubation that are unique to live patients, we hypothesize that the direct line-of-sight view will be needed more so than when intubating a manikin.

The approach of using an AR HMD to provide live camera feeds to assist health care providers in performing medical procedures is novel and can be expanded to many other areas of medicine. The advantages of maintaining the direct line-of-sight view in addition to the telestrator capabilities have the potential to improve efficacy and efficiency in many medical fields.