1 Introduction

Several wearable augmented reality (AR) displays have emerged in recent years such as Google glass and Microsoft HoloLens. These devices allow users to see the real world with additional visual information superimposed on top. Although such technologies are intended for the general user, they also pose exciting opportunities for individuals with low vision. Visually impaired individuals may not be able to identify essential details in a scene and a digital visual aid based on augmented reality may help provide essential information such as navigation, people identification, reading signs, etc. The cool-factor of general devices is more likely to reduce the stigma associated with specialized assistive technologies. For instance, Google glass has been used to compensate for colour blindness [1] and edge enhancement for low vision wearers [2].

Still, several of these technologies are costly and not widely available. It may take several years before such devices are commonplace. The limited availability means that large user groups around the world are unable to experiment, develop ideas, and design new applications to suit their needs. This study focused on low cost devices. Experimentation was conducted with one of the less expensive see-through displays on the market, namely EPSON BT200, the inexpensive Google Cardboard [3] and cheap homemade head-mounted augmented reality display built from a smartphone and widely available scrap materials that can be found around the home.

2 Background

Visual impairment takes many forms. Examples include no visual perception, various levels of visual acuity [4], tunnel vision, colour blindness [5,6,7], nystagmus, etc. Research into digital visual aids goes back several decades to the early work by Peli et al. [8, 9]. Peli et al.’s early work mostly focused on various filters to enhance face recognition among visually impaired individuals. Over the decades various aspects of reduced vision have been addressed. Everingham et al. [10] experimented with a head-mounted device that helped with the classification of scenes. Harper et al. [11] developed a digital visual aid that performed magnification of the scene. Colour blindness is a topic that has drawn the attention of several researchers. Approaches for enhancing colour images such that they are more easily perceivable by individuals with reduced colour perception have been proposed and also implemented in a wearable device [1]. Various approaches addressing tunnel vision have focused on condensing wide fields of view and displaying these in the narrow field of view where the user can perceive visual stimuli [12].

Edge enhancement is also a recurring topic in the research literature [2, 13, 14] where several attempts at enchanting the views using wearable displays have been explored. In particular, edge detection can be used to highlight important features such as the edges of the steps of stairs to prevent a visually impaired person falling, or it can be used to highlight pedestrian crossing zones for safer navigation in busy traffic with moving vehicles. Others have proposed to provide depth cues [15] where objects closer to the viewer are given a brighter colour than objects further away. A more direct approach is obstacle detection and identification, for instance, using stereoscopic vision [16] and laser range scanners [17].

Although quite a few exciting solutions to various aspects of low-vision have been proposed, there are comparatively fewer studies on what visually impaired individuals want and what they actually need. One exception is the qualitative study by Cimarolly et al. [18] that emphasized visually impaired individuals’ need for social interaction and getting around. Similar findings were identified in [19], which more specifically identified the needs as being able to recognize faces and texts in various physical contexts. Text is especially important when travelling and utilizing public transport, finding locations such as shops and offices, and identifying specific products within shops. The detection of text and digits is well researched [20, 21]. Several studies have specifically focused on wearable devices capable of recognizing text in the wild intended for visually impaired users [22, 23]. The recognition of people is important in order for visually impaired people to be able to participate and to function in social settings and be involved in society in general. Faces are the most widely used cue for recognizing individuals. However, faces can be hard to identify from a distance for individuals with low visual acuity, and impossible for individuals without vision. Surprisingly, there are very few studies on face recognition applied to low-vision aids despite the fact that the research field of face recognition is vast and the algorithms are well-developed [24].

Another issue is the desire to be “normal” and not to stand out [19, 25]. It has been found that older individuals with reduced function tend to abandon their assistive aids [25]. Generally, people have a desire to look cool and blend in [25] while assistive technologies can be stigmatizing. The long-term goal of this research endeavour is to achieve invisible assistive technology that does not draw attention. The alternative view to assistive technology is universal design where there is one, non-stigmatizing, solution to be used by all, for instance, readable language [26,27,28], dyslexia [29,30,31,32,33], motor disabilities [34,35,36,37], low-vision [38, 39], etc.

3 Wearable AR-Display Evaluations

Three augmented reality displays were evaluated to assess their suitability for prototyping and evaluating AR-based assistive technologies for users with low vision, namely, the commercial EPSON BT200 see-through display, Google Cardboard, and a homemade DIY (do it yourself) AR device.

3.1 Commercial See-Through Display

First, the suitability of a set of commercial display glasses was evaluated, i.e., the EPSON BT300 see-through mobile viewer. This display kit is relatively inexpensive (approximately 800 Euro) and therefore used by researchers [40,41,42]. The glasses have a display area in the middle of each lens that reflects the displays to the viewer with the light source embedded in the frames. A standalone handheld android unit with buttons and a touch pad controls the device. The device is intended for individuals with uncorrected vision or with low corrections as the device can be worn with eyeglasses. A tinted sunscreen in front of the glasses filters bright external light. The device is intended for entertainment purposes.

Figure 1 shows visual results of simple tests performed with the kit. The immediate impression is that the display area is too small to be perceivable for anyone with low visual acuity (see Fig. 1a and b). The documentation states that the display has a viewing angle of 23°. Some low vision users may be able to perceive icons and simple symbols if the entire display is used for displaying such symbols.

Fig. 1.
figure 1

ESPON BT200 See through mobile viewer.

Figure 1c shows how the semi-transparent display areas cause large shadows in the center of the visual fields. These shadows are especially noticeable when the device is switched-off. It is likely that this shadow can be visually disturbing to users when focusing on the real-world scene. However, Fig. 1d shows that the display is visible even when viewed in very bright lighting conditions such as looking towards the sky. Overall, the small display with its shadow obstructing the important part of the view makes this device not suitable as a platform for developing and experimenting with low-vision aids.

3.2 Google Cardboard

Google cardboard [43, 44] has received much attention as it can provide relatively powerful virtual reality experiences at moderate costs. While other virtual reality headsets are based on specialized hardware, google cardboard simply relies on using ordinary smartphones for computation, networking, sensing, sounding, and displaying [43]. The cardboard framework thus consists of a simple headset and software. The name cardboard stems from the simple proof-of-concept headset built from cardboard and a set of lenses allowing focusing on the close display. Moreover, cardboard comes with an open API and new cardboard applications are added regularly.

Cardboard can also be used for AR applications. To test this, a simple 20 Euro plastic cardboard headset was used (see Fig. 2). It has adjustable lenses, straps to hold it to the head, and an opening allowing the camera to be used to capture the scene. Figure 2 bottom right shows how the camera view on the display appears with the headset.

Fig. 2.
figure 2

Variation on Google cardboard for AR using the mobile camera.

A simple experiment was conducted with the mobile phone in camera monitor mode. The test was performed walking around with the headset only and relying on the live video captured by the camera. The results were less than optimal. The camera update is quite slow to be practical as there is a noticeable lag of a fraction of a second. Moreover, the dynamic range is low and camera response slow as the camera takes a long time to adjust when walking from a dark area to a light area and vice versa, turning the head rapidly, etc. In conclusion, the limitations with current smartphone cameras do not make Cardboard suitable for real-time real-world AR-aids for the visually impaired.

3.3 Homemade AR-Device

The proposed approach applies a similar technique to that of many existing augmented reality systems [45] where visual elements are superimposed on the worldview via a heads-up display [46, 47]. Heads-up displays often exploit the pepper ghost effect [48] traditionally used to create the illusion of ghosts via transparent glass reflections.

A simple system can be built using a smartphone as smartphones are commonplace, affordable, and easy to program. The display of the smartphone is placed perpendicular to the viewing direction of the user (see Fig. 3). A flat transparent plastic plate is positioned at an angle of 45° relative to the smartphone display and viewer such that the image displayed on the smartphone display is reflected into the eyes of the user. The user sees a combination of the real-world view behind the transparent glass and the smartphone displays image reflected via the glass.

Fig. 3.
figure 3

The augmented reality device utilizing the Pepper ghost effect. The real scene is viewed through the plastic film and virtual scene shown on the smartphone display is reflected into the same view via the transparent and reflective film positioned at a 45° angle.

Various approaches were explored using various household items. One solution involved a plastic detergent container with the open end to view the world and the back end cut out for the eyes (see Fig. 4). This plastic container had sufficient stiffness to carry the mobile phone. An Apple iPhone 6 was used in this prototype. Enough space was made such that the device could be worn with eyeglasses as individuals with low vision may wear eyeglasses to correct for low-vision. A bracket to hold the smartphone in place was created at the top of the container with a hole cut out such that the display was visible when placed with the display facing down.

Fig. 4.
figure 4

The homemade wearable augmented reality Display. (a) the overall device, (b) the augmented information is not visible to onlookers, (c) and (d) augmented information as seen by the user, (d) and (e) the augmented image from the smartphone and (f) camera-mirror fixture. (Color figure online)

A hole was also made at the location of the smartphone camera where a small mirror can be fixed allowing the front view of the user to be captured by the smartphone camera. Note that this feature is not explored herein. A rectangular, stiff and transparent plastic sheet cut out from the packaging of a cakebox was fixed inside the plastic container at a 45-degree angle below the mobile phone.

This setup works well inside buildings with moderate light-intensity. However, outdoors during daytime, daylight is comparatively much stronger than the smartphone display. Therefore, for outdoor use, several layers of coloured plastic sheets were placed at the opening facing the view. The brightness of the outside light is therefore reduced compared to the light from the smartphone display, making it easier to view the information. Many individuals with low vision use sunglasses outside and often sunglasses which blocks disturbing light coming in from the sides. The configuration is thus consistent with the viewing environment preferred by many low-vision individuals.

The amount of information displayed should be minimized so as not to disturb the real-world view. Black is used as the background on the display as it is not reflected into the viewers’ eyes. Information is highlighted in bright colours to make the information clearly visible. Figure 5 shows the augmented views used in the examples in Fig. 4. The remainder of this paper will focus on applications of this homemade AR-device.

Fig. 5.
figure 5

Augmented information as displayed on the smartphone. Only the non-black visual elements are reflected and perceivable by the user when superimposed on the view. (Color figure online)

In conclusion, the homemade AR device is able to augment information across most of the field of view, and there is no lag in the background information. It thus appears more suitable than the two other technologies for visually impaired users.

4 Example Applications and Techniques

Several issues have been explored with wearable visual aids, such as text recognition [19] and edge enhancement [15]. Edge enhancement requires calibration to ensure that the image overlaps with the view. Other applications do not require calibration. Transportation, recognizing text, and recognizing faces are key challenges for low-vision individuals [9].

4.1 Navigation Aid

Projecting key map information may help a low-vision individual navigate a city without losing track of traffic. Figure 4c shows a sketch of how the device could be used to track the progress of reaching a target on the map, while seeing the scene.

Figure 5a shows the information displayed on the smartphone. As mentioned, the black background is not reflected and visible to the user. The important information, that is, the roads and name of the roads, is displayed in white that gives maximum visibility to the user. A green arrow is used to indicate the user’s position and orientation relative to the map. The display also shows how less important information can be included, such as the current time and battery level of the device. In this example, red was used as well as a smaller text size in order not to draw attention away from the main information displayed. Figure 4b shows the same view with the key information in red. Practical tests suggest that red may be difficult to see under varying lighting conditions. Note that the user will not be able to perceive the information if the scene is very bright.

The user does not have to look down to inspect the content of a navigation device or smartphone and thereby creating dangerous situations by shifting the visual attention from the traffic. The augmented reality display may also be used to show local public transport information in real-time, such as arriving buses and indication of nearby shops. Text or faces recognized from the scene can be displayed in sufficiently large text to the viewer.

4.2 Face Recognition

Figure 4d shows a sketch of how the device could be used to recognize people. The device is pointed towards the person the user is looking at through the viewfinder. The camera captures the image of the person, face recognition software identifies the person, and the name is displayed as a textual cue. Alternatively, a familiar photograph of the person can be displayed. Other modalities can also be used such as audio.

Figure 5d illustrates the view as it is displayed by the smartphone using red-coloured large text on a black background. Note that the text is positioned towards the bottom right of the display in order not to completely overshadow the face. Moreover, it is likely that the scene maybe less dark towards the bottom than in the middle, especially if the person stands by a window or with the sky as background.

5 Augmenting 3D Sketches

Sketching is a useful tool for exploring and communicating ideas [49]. Also, hand-drawn sketches signal unfinished work [50]. Sketches are usually associated with 2D drawings or flat drawings of 3D objects and scenes [51]. However, they have also been extended to the panoramic domain [52,53,54] where the observer gets a three-dimensional experience of being immersed inside the sketch. For instance, panoramic views are used in Google street view [55] and for creating richer museum experiences [56], often with additional technology such as RFID [57]. A method for sketching augmented reality visual aids is demonstrated next.

5.1 Sketching Panoramas

The approach involves superimposing a 3D sketch on top of the view that changes according to the movement of the head. This is achieved by treating the 3D sketch as a panoramic image. The panoramic image is sketched using the PanoramaGrid grid paper proposed in [53]. This grid paper allows the designer to draw sketches directly onto the equirectangular space. The equirectangular space represents the viewing sphere around the viewer. The original panoramic sketch is then twice as wide as it is tall as the horizontal axis represents “longitude” from 0° to 360° around the viewer, and the vertical axis represents the tilt or latitude from −90° to 90° below and above the viewer [58, 59]. By tracing the lines of different colours on the grid paper, the designer “moves” in x, y, and z dimensions in 3D space.

The sketches were made in a simple sketching software package (Microsoft Paint). Figure 6 (top) illustrates how the sketch is drawn on the grid paper. The grid paper has a cube configuration with four horizontal planes organized in 90° in relation to each other (the center line has a horizontal distance of 90°). These four planes are represented by the cyan, magenta, yellow, and green grids where the green grid wraps around the edges to achieve a full 360° panorama. In addition, two horizontal planes representing the floor and the ceiling are represented by the two black grids. The simple sketch contains the handwritten word “LOOK” aligned with the magenta gridlines and three filled hand-drawn squares aligned with the yellow gridlines. Since the magenta and yellow gridlines represent two planes perpendicular to each other, the word “LOOK” and the three squares also appear perpendicular to each other when viewed as a panorama.

Fig. 6.
figure 6

Making a panoramic sketch: (a) sketching by tracing panoramic gridlines by hand, (b) panoramic sketch inverted, (c) inverted panoramic sketch binarized and subjected to a morphological closing operator for gridline removal. (Color figure online)

5.2 Post-processing Sketches

Next, the sketches were inverted such that the background became black and the sketch white in order for the sketch and not the background to be visible to the viewer. Figure 6 (middle) shows the example sketch inverted with the gridlines included for illustrative purposes, and Fig. 6 (bottom) shows a binarized [60] inverted image where the gridlines were removed with a morphological closing operator. The open source image-processing framework ImageJ [61] was used to perform these post-processing operations.

The inverted sketches were mirrored across the horizontal axis to counterbalance the mirroring that occurs when the displayed image is mirrored to the user via the reflective transparent plate. Figure 7 illustrates the effect of mirroring the panoramic sketch across the horizontal axis.

Fig. 7.
figure 7

Panoramic renderings of a panoramic sketch (left), mirrored around the horizontal axis to ensure correct viewing (right)

5.3 Rendering Panoramic Sketches

The panoramic images were rendered using the freely available FSPviewer panoramic viewer [62]. Alternatively, the sketches could be viewed in a smartphone panoramic viewer where the viewing tilt and direction is controlled by the orientation and tilt of the mobile handset. This allows the sketch to be updated in conjunction with the head movements of the user. Note that this strategy will not allow the user to move around the scene (translation). However, the purpose is to give a convincing experience, not actual functionality.

A suitable horizontal viewing range had to be set in the panoramic rendering software. The horizontal viewing angle parameter controls the size of the viewport onto the panorama specified as a horizontal angle. The front opening of the viewing box was 7.5 cm high and 16.8 cm wide, and the depth of the box was 14 cm. The viewing angle can be found simply as 2atan(W/2D) where W is the width of the opening and D is the depth of the viewing chamber (the distance from front opening to the back opening).

The vertical viewing range of the opening was therefore approximately 30° while the horizontal viewing range of the opening was 60°. The vertical and horizontal displayable viewing range by the reflected image of the smartphone was 34° and 58°, respectively. This is because the dimensions of the iphone 6 display are 10.4 × 5.6 cm and the distance from the smartphone to the viewer was 9.5 cm. The distance between the smartphone to the viewer is calculated as the sum of the distance from the smartphone to the mirror and the distance from the mirror to the viewer (see Fig. 8). The images were thus rendered with a viewing angle of 60°.

Fig. 8.
figure 8

Measuring the distance to the display as a sum of the distance between the display and the mirror (D1) and the distance between the mirror and the observer (D1).

Figure 9 illustrates how the sketch in Fig. 6 is rendered with the view pointing in different directions with different tilts. Gridlines were included for reference. Figure 10 shows renderings of several panoramic sketches without gridlines. Note that these renderings were not mirrored across the horizontal axis for presentation purposes.

Fig. 9.
figure 9

Panoramic renderings of the panoramic sketch in Fig. 6 for various directions and tilts using FSPviewer. Gridlines are included and rendered with a horizontal viewing range of 70° for illustrative purposes.

Fig. 10.
figure 10

Panoramic renderings of three panoramic sketches without gridlines for various directions and tilts using FSPviewer. Rendered with a horizontal viewing range of 70°.

5.4 Stereoscopic Views

A trait of full vision is the ability to perceive depth through stereoscopic views where each eye views a scene from a slightly offset vantage point relative to each other. A common trait of reduced vision is the lack of depth vision. One reason for this is probably that the human visual system is complex and a fully working stereoscopic vision requires a fine-tuned visual system. Because of this, the focus herein is on monoscopic views as it is assumed that the target user group may have varying degrees of depth vision.

However, to experiment with stereoscopic vision, a set of stereographic renderings were created. These stereoscopic approximations are based on the renderings explained in the previous section, but with an instance copied to the left and the right side of the display. The left and the right instances are thus displayed in front of the left and right eyes, respectively. Figure 11 illustrates this process. Note that to be theoretically correct, the two renderings should be different depending on the distance in space. As it is difficult to infer depth information from 2D drawing [63, 64], two identical renderings were used as a simple substitute.

Fig. 11.
figure 11

Stereoscopic panoramic renderings of panoramic sketches (not mirrored around the horizontal axis and with a horizontal viewing range of 70° for more simple presentation).

5.5 Augmenting Panoramic Sketches

Figure 12 shows example sketches viewed using the homemade AR headset. The scenes were aligned manually and photographed indoors. An underground garage was chosen as it is not too bright yet large. Note that views were rendered with an angular horizontal range of 48° since it is assumed that the camera was located approximately 10 cm from the opening of the displays.

Fig. 12.
figure 12

Panoramic sketches superimposed on the real-world views. The four examples in the middle row and bottom row show the outline of a zebra crossing and a virtual gate.

The photographs reveal that the information is easily perceivable. Moreover, the perspective projections of the sketches align quite well with the lines in the scene. Figure 12 (top left) shows the handwritten word “LOOK” occupying the entire viewfinder. The white text nearly occludes the background that can just nearly be seen through the limited transparency. The top right image shows an arrow pointing towards the square perpendicular in an angle positioned high up in the scene.

The following four images show a zebra crossing and a virtual gate. The first image (middle left) is obtained with medium mobile phone display intensity and a horizontal angular range of 70° with the camera further away, while the three other images are obtained with maximum intensity and a horizontal angular range of 48°. The first image depicts the scene looking ahead revealing both the virtual gate and the zebra crossing. The three other images depict looking down showing the zebra crossing from various orientations.

5.6 Lighting Conditions

Finally, some simple tests were conducted to assess the property of the headset under various lighting conditions (see Fig. 13). Obviously, it was no problem perceiving the reflected information indoors. However, it was not possible to perceive the information outside on a bright day except from when looking towards darker regions (see Fig. 13c). A simple test with coloured filters was thus performed.

Fig. 13.
figure 13

Perceiving AR-display information. The balance between intensity of the display and scene lighting. (Color figure online)

The filters were made from a green transparent plastic sheet. Four identical sheets were cut out in the size of the headset opening allowing various levels of filtering to be explored. Figure 13a shows the four layers of film placed on the front of the headset. Figure 13b shows that the two filters had no effect when looking at the sky, while four filters did help make the information visible when looking at the sky (see Fig. 13d) or looking at the ground (see Fig. 13e). Figure 13f shows that the indoor views were perceivable even when using the filter.

Although the filter helped block out bright light making it possible to perceive the displayed information, the filter made the scene unclear. The main reason for this was the low quality filter effect with the plastic sheets used that were not intended for optical purposes. The results would be better if one had used a more suitable material, such as the material used in sunglasses lenses. However, there is still the problem of moving from bright to dark or dark to light locations as it is not practical to add or remove the filter on the fly as one walks around. One possible remedy is to use a controllable screen that could be used to block out light according to the level of background light. Moreover, the intensity of the light emitted from the display should obviously be adjusted automatically to match the given lighting conditions.

6 Conclusions

Prototyping of AR applications with the visually impaired as target group was explored. The experimentation revealed that the display of the commercial see-through display was too small and obstructive, and that a smartphone camera based Cardboard system was responding too slowly to be practical. Instead, a method for prototyping simple wearable augmented reality displays was shown. Examples of how sketches can be combined with the real world were illustrated. The proposed approach allows experimentation with augmented reality visual aids. Augmented reality may not be useful for individuals with very low vision as they often find visual stimuli stressful and thus may prefer to receive information via other modalities, especially audio. Future work will focus on improving the DIY display by making it smaller and moving the weight of the mobile handset closer to the body to make it more practical in use.