Keywords

1 Introduction

Virtual Reality (VR) is rapidly growing in popularity due to the availability of affordable consumer-ready products [1]. Although initially branded as a technology for entertainment and simulation, nowadays its being used for office work [2, 3], collaboration [4], and training and education [5]. These newfound applications have rendered a need for efficient and effective text entry techniques for VR as inputting text is an essential part of these experiences. Many recent works have attempted to meet this need by developing novel, as well as customizing the existing text entry techniques for VR. However, it is often difficult to comprehend the mechanism of these techniques and extract meaningful average performance data from this body of work as they were evaluated in different experiment conditions and report different performance metrics. This makes it difficult for the researchers to use and apply these findings, causing re-exploration of design philosophies, and as a result slowing down the overall development process. To address this, this paper reviews the existing text entry techniques for VR. It categorizes these techniques based on their input mechanism and discusses their strengths, limitations, and performance. It also provides design recommendations for researchers to facilitate the development of more user-friendly and effective text entry techniques. This work does not include speech recognition.

2 Physical Techniques

Adapting the standard physical Qwerty keyboard for VR is difficult for several reasons. Numerous studies have established that both novice and expert typists look down at the keyboard to verify their hand position [6, 7]. Since users wearing a Head-Mounted Display (HMD) cannot see their hands (Fig. 1), it is almost impossible for them to input text as fast and accurate as in the real world. The size of the keyboards and the need for a supporting surface also make physical Qwerty harder to use in scenarios where users are required to move around. Although some have proposed miniature Qwerty layouts, the smaller key size makes it impossible to touch-type, which further affects entry speed and accuracy [8]. This section reviews all techniques that use a physical keyboard or a keypad in VR. Table 1 presents the entry speed and error rates [9] of these techniques from the literature.

Fig. 1.
figure 1

A user entering text in VR using a physical Qwerty keyboard.

Table 1. Physical text entry techniques with their entry speed in words per minute (wpm) and error rate (%). “Char” signifies “character”. Sensors are not considered as auxiliary devices.

2.1 Physical Qwerty

Many have used a physical Qwerty or its variants, particularly Qwertz and Azerty, in VR to bank on their widespread use and familiarity [10,11,12]. Most of these techniques track the keyboard and hands with external depth cameras to display their virtual representations in VR [2, 10, 13, 14] to enable users to enter text with a physical keyboard through its animated representation. In an evaluation, this approach yielded 34.0 wpm and 12% error rate [13].

Bovet et al. [14] used the Logitech BRIDGE SDK [15] with an HTC Vive Pro HMD [16] and a Logitech G810 Orion Spectrum keyboard to display an animated representation of the keyboard and hands in VR. In a user study, this method reached an average entry speed of 44.4 wpm from an initial speed of 34.5 wpm. It also yielded negligible mental and physical demand scores. Grubert et al. [2] argued that this method can retain at least 50% of users’ desktop typing skills. This method’s effectiveness, however, is dependent on the reliability of its tracking sensors.

Kim and Kim [17] tracked the keyboard and hand positions to display a visual representation of the keyboard and hands. Unlike the previous approach, they did not track fingers, instead displayed finger positions based on the last keypress. That is, they calibrated finger positions based on the assumption that specific fingers were used to press specific keys, which is a cheaper, but a less accurate approach compared to using depth cameras. In an evaluation, users retained at least 60% of their desktop typing speed and 80% of accuracy with this method.

A different method embedded a live video stream of the real world onto the virtual world to afford the use of a physical Qwerty in VR [11, 12]. It displayed the full and a partial view of the keyboard and hands. In a user study, the full view yielded 36.7 wpm and 10.4% error rate [12]. Partial view that displayed only the keyboard and the hands yielded comparable results. McGill et al. [12] also proposed a partial blending option that displays parts of the keyboard but did not evaluate its performance. Although this method demonstrates competitive speed and accuracy, it breaks the immersion by displaying the real world. It also requires the use of additional cameras, increasing the cost of development and installation. To remedy this, Walker et al. [18] proposed a software solution for predicting finger positions. They argued that when expert users are provided with adequate visual feedback, they can use their sense of proprioception to correctly (re)position their hands on the keyboard for the next key. They used a decoder [19] for auto-correction. In a user study, this approach reached 43.7 wpm and 92.6% accuracy rate. Although the performance of this approach is promising, its reliance on a decoder can potentially frustrate users. Its dependence on the sense of proprioception can also increase the cognitive load.

It is clear from this section that physical Qwerty remains an essential method for text entry in immersive environments. Researchers have proposed the use of various sensors, cameras, and decoders to facilitate reasonably fast and accurate text entry in VR with physical Qwerty, which may not be feasible in all scenarios and for low-income groups.

2.2 Mobile Keypads

Some immersive systems allow users to move around within a limited space. Since physical Qwerty restricts this ability, researchers have proposed using mobile keypads.

Bowman et al. [20] and González et al. [21] used Twiddler in VR. Twiddler is a 12-key chorded keypad, with which users enter characters by pressing either one or a combination of keys simultaneously (called a chord). This approach was evaluated in a user study, where a virtual layout of the keypad was displayed in VR. In the study, Twiddler yielded 3.0 wpm and 82% accuracy rate. Although Twiddler enables users to be mobile, it is substantially slower than the other techniques. Besides, it requires extensive training to learn the chords, which may discourage many to try it [6].

González et al. [21] also evaluated a 9-key mobile keypad that has three characters on each key. They embossed the keys to provide users with haptic feedback. To enter characters with this keypad, users press the key containing the intended character once or multiple times, respective to the position of the character on the key, just like Multi-tap [6]. For instance, to enter the letter “c”, users press the “2” key containing the letters “a”, “b”, and “c” three times. This method yielded 12.1 wpm and 95% accuracy rate. Although relatively slow, this method could be useful for short-term text entry, such as while entering a password or a search keyword.

2.3 Discussion

One can see in Table 1, Knierim et al. [10], Grubert et al. [11], Hoppe et al. [13], Bovet et al. [14], and Lin et al. [22] all investigated physical Qwerty with animated keyboard and hands as visual feedback, yet reported diverse results, summarized in Table 2. Bovet et al.’s [14] study yielded the highest entry speed (roughly16% faster than the next fastest result). The fact that the baseline condition of this study also yielded a relatively high entry speed suggests that this is likely due to a more experienced sample. The fact that this study used a sophisticated tracking apparatus optimized for text entry in VR may have contributed towards this as well. This presumably impacted the accuracy rate of the method too since Bovet et al. [14] and Lin et al. [22] reported substantially lower error rates than the others (0.4% vs. > 7.6%).

Table 2. Studies investigating physical Qwerty with animated keyboard and hands. Best and worst results are highlighted in bold and italic, respectively.

Unfortunately, Table 1 does not provide a clear indication of whether users perform better (in terms of speed) with animated representation of the keyboard and hands. In the studies conducted by Hoppe et al. [13] and Lin et al. [22], users performed much better without any visual feedback compared to when animated keyboard and hands were displayed (34.0 wpm vs. 31.2 wpm [13]; 28.1 wpm vs. 27.4 wpm [22]). However, in the studies conducted by Knierim et al. [10] and Grubert et al. [11], users performed substantially better with visual feedback (31.8 wpm vs. 37.5 wpm [10]; 26.1 wpm vs. 34.4 wpm [11]). We speculate, this is due to the unreliability of the feedback provided in the former studies. It was reported that visual feedback in the former studies was not always accurate or available. Hoppe et al. [13] used a Leap Motion [24] to track hands and reported that it was able to display the hands for about 70% of the time.

Walker et al. [18] and Grubert et al. [11] both provided visual feedback on keypress. However, Walker et al. reported a 40% higher entry speed than Grubert et al. This is likely because Walker et al. used a decoder to enhance the accuracy of their system. They also included an extensive training session before the actual study. Further, in addition to highlighting the currently pressed key like Grubert et al. [11], Walker et al.’s [18] system also highlighted other recently pressed keys to aid in users’ sense of proprioception. These could also help explain a 45% lower error rate (15.2% [11] vs. 8.4% [18]). In fact, Walker et al.’s system yielded even a lower error rate with auto-correction (2.6%).

Studies comparing visual feedback through animated fingertips and full hand representation revealed that fingertips yield a slightly higher accuracy rate than full hand [10, 11]. This gain in accuracy is likely due to the obstruction of keys when a full hand is displayed as opposed to fingertips. Interestingly, Grubert et al. [11] identified this difference to be statistically significant (p < .05), while Knierim et al. [10] did not.

Table 3 presents the error rate metrics reported in the literature. As one can see, most studies reported either the Error Rate (ER) or the Total Error Rate (TER) metrics. However, text entry studies often use different methods to calculate the same error metrics. For instance, Knierim et al. [10] used the Minimal String Distance (MSD) algorithm to count the total number of errors, when Rajanna et al. [25] counted the total number of backspaces. A previous work [9] analyzed these performance metrics and demonstrated how different metrics yield different results. Yet, unfortunately, most studies do not report how they calculate errors.

Table 3. Error rate metrics reported in the literature.

3 Virtual Qwerty

Numerous virtual text entry techniques have been developed for VR to enable mobility and eliminate the dependence on physical keyboards and keypads. Most of these techniques use the Qwerty layouts or its variants to facilitate the transference of knowledge from the physical to virtual keyboards. These techniques exploit a variety of interaction methods, including head pointing, finger, wrist, and hand gestures, game controllers, touch, eye gazing, and handwriting. The following sections reviews these techniques. Table 4 displays the entry speed and error rates [9] of these techniques from the literature.

Table 4. Virtual text entry techniques with their entry speed in words per minute (wpm) and error rate (%). “Char” signifies “character”. Sensors are not considered as auxiliary devices.

3.1 Head Pointing

Since head pointing is the default interaction method for most HMDs, it has been widely used in text entry. Head pointing text entry techniques cast a ray into the scene that is controlled by head movements. A virtual Qwerty keyboard floats in front of users (Fig. 2). To enter a character, users first move the cursor over a key, then select the respective character by either dwelling on it for a predetermined amount of time (a timeout period) [26] or pressing a controller key [26, 27].

Fig. 2.
figure 2

Text entry through head pointing [27].

Yu et al. [26] evaluated a head pointing technique that used a 400 ms dwell time. It yielded a 10.6 wpm entry speed and 95.8% accuracy rate. Since dwelling can affect the overall entry speed, Majaranta et al. [28] suggested using customizable dwell time. Yu et al. [26] also evaluated a different approach that enables users to select characters by pressing a controller key in place of dwelling. They reported a 15.6 wpm entry speed and 98% accuracy rate by the 6th session. This suggests that head pointing is faster with keypress than with dwelling. Although these approaches could cause physical discomfort since they force users to constantly move their heads around, could be effective in short-term text entry, such as entering a password or a search keyword [26].

3.2 Finger, Wrist, and Hand Gestures

Many have explored text entry with finger, wrist, and hand gestures in VR. These methods usually use external cameras, sensors, and gloves to track fingers, wrists, and hands.

Bowman et al. [29] developed a digital glove that maps the Qwerty layout to the fingers. Such as, it maps the second-row letters “a”, “s”, “d”, and “f” to the left little, ring, middle, and index fingers, respectively. With this approach, users first rotate their hands to select a row, then enter a character by pinching the thumb and another finger. This method was evaluated in a user study, where it reached on average 6.1 wpm entry speed and 90% accuracy rate [20, 21]. The fact that it lacks haptic feedback may have contributed towards its relatively low speed and accuracy. The KITTY keyboard [30] uses a similar approach, but instead of using hand rotation, it uses the thumb with six degrees of freedom to select a row. Three positions in the front and three in the back of the thumb are assigned to different rows. For example, a pinch between the left little finger and the middle inner thumb enters the letter “a”. This method also lacks haptic feedback. Wu et al. [31] designed a different approach that uses micro speakers to simulate haptic feedback on keypress. With this approach, users wear two data gloves on each hand and enter a character by bending a finger beyond a predetermined threshold.

The main challenge of these techniques is that they demand a substantial amount of time and effort to master. The use of digital gloves makes them a costly solution for both manufacturers and consumers and restricts users from using their hands for secondary tasks. Further, they can strain the fingers when used for an extended period.

Ishii et al. [32] proposed a fist-pointer method, where hand movements and fist gestures are used to select characters. First, users move the pointer by moving the hand in a thumbs-up position. Once the pointer is over the intended character, they select it by folding the thumb. Ishii et al., however, did not evaluate this approach.

Some have also explored mid-air gestures that enables users to select characters by performing hand gestures and finger postures [27, 33]. This technique, too, does not provide haptic feedback. In an evaluation, it yielded a relatively low 9.8 wpm and 92.5% accuracy rate. It was also mentally and physically demanding.

3.3 Controllers

Some techniques enable users to enter text using handheld controllers augmented with motion trackers.

Speicher et al. [27] evaluated four such techniques. The first enables users to use a controller as a laser pointer. Users move the cursor over a keyboard by pointing the controller, then select a character when the cursor is over it by pressing a key. Speicher et al. allowed users to hold two controllers in two hands to facilitate bimanual input. It yielded on average 15.4 wpm and 99% accuracy rate. The second enables users to use a controller as a stylus. Users tap the controller on a character to enter it. It yielded 12.7 wpm and 98.1% accuracy rate. The third enables users to use a controller as a joystick. Users navigate the cursor over a keyboard by pressing the four directional keys, i.e., the four edges of a touchpad, then select a character when the cursor is over it by pressing a key. It yielded 5.3 wpm and 77.2% accuracy rate. The fourth is identical to the third but uses continuous cursor control instead of a discrete movement selection. It yielded 8.4 wpm and 87.8% accuracy rate. While these four techniques are competitive in terms of speed and accuracy, they can cause physical stress in extended use. Results revealed that the latter two techniques were the least physically demanding but the most frustrating due to slower entry speed.

Min et al. [34] designed an ambiguous Qwerty keyboard that arranges the keys into a 3 × 3 grid (Fig. 3). With this approach, users first select a cell by pressing a button, then select the target character by pressing the button once or multiple times, respective to the position of the character in the cell (like Multi-tap [6]). For instance, to enter the letter “p”, users first select the top-right cell, then press the button twice. This method saves space due its smaller size, leaving extra space for work, which is rather important in immersive environments [8]. This method has not yet been evaluated.

Fig. 3.
figure 3

An ambiguous Qwerty that arranges the keys into a 3 × 3 grid [34].

3.4 Touch-Based Techniques

Some have exploited the popularity of touch-based interaction in VR.

Gugenheimer et al. [35] augmented a 17.78 cm capacitive touchpad on the back of an HMD. They enabled users to enter text by selecting characters on a floating virtual keyboard using the touchpad. They argued that users can use their sense of proprioception to select the correct keys. In an informal evaluation, this approach reached 10 wpm. The challenge with this approach is that it requires users to interact with a trackpad on the back of the HMD, limiting its use to a few scenarios. This method can also cause physical stress when used for an extended period.

Kim and Kim [36] used the hover functionality of a Samsung Galaxy S4 smartphone [37] to enable text entry through its touchscreen. This approach senses the finger using the hover sensor to display its position over a virtual keyboard. Users select a character by either touching the touchscreen or moving the finger beyond the range of the sensor. The former approach enables users to reposition their fingers to correct a selection before leaving the touchscreen. In a user study, these methods reached 7.8 and 9.0 wpm and 79.5% and 92.6% accuracy rate, respectively. The latter approach was more accurate since it enabled users to correct their selections, but at the same time, caused additional physical stress. Although entry speed of these techniques is relatively low, the concept of using hover for text entry in VR is promising.

3.5 Eye Gazing

Many recent HMDs are equipped with eye trackers, making it possible to use eye movements to control the cursor.

Hajana and Ransen [25] studied gaze typing in VR for flat and curved virtual keyboards. They enabled users to select characters by either dwelling on a virtual key for 520 ms or pressing a controller key. The former approach yielded 9.4 and 7.5 wpm, while the latter yielded 10.2 and 9.2 wpm for the flat and curved keyboards, respectively. Accuracy rate for both were over 99%. No significant difference was identified between the curved and flat keyboards.

Ma et al. [38] incorporated a Brain Computer Interface (BCI) with eye gaze for text entry in VR. They combined electric signals from the brain with eye gaze to determine cursor position and selection. Unlike most methods reviewed in this work, this method did not use Qwerty, instead designed an alphabetic layout with 3 rows with 8 characters per row (i.e., first row includes the letters from “a” to “h”). An informal study reported an entry speed of 10.0 wpm, which is relatively low. Yet, this approach could be useful to users with physical disabilities.

3.6 Word-Level Techniques

Word-level text entry techniques have also been explored in VR [26, 39].

Yu et al. [26] investigated gesture typing [40], where users press down a controller key to indicate the start of a gesture, perform the gesture using head movements, then release the button to indicate the end of the gesture. In a user study, this approach reached 24.7 wpm with 94.2% accuracy rate by the 8th session. The accuracy rate of this approach, however, is reliant on the efficiency of its decoder. Further, it has a high physical demand since it requires users to define gesture using expressive head movements.

Popriev et al. [41] and Gonzalez et al. [21] explored handwriting in VR, where users write on an actual tablet using a stylus and the output is displayed on a virtual notepad. This method, however, yielded a low entry speed and accuracy rate, 2.3 wpm and 77%, respectively [21].

4 Novel Virtual Techniques

Many have designed novel keyboard and keypad layouts to facilitate text entry in VR. This section reviews all such techniques.

4.1 Circular and Cubic Layouts

Gonzalez et al. [21] evaluated a circular layout that organizes the letters in an alphabetic order. The keyboard is displayed on a tablet. First, users select a character on the circle using the stylus, then confirm the selection by dragging it to the center of the circle. This approach yielded an entry speed of 4.4 wpm and 98% accuracy rate.

Yu et al. [42] developed PizzaText, which divides a circle into 7 slices, each containing 4 characters (Fig. 4). Users use the dual thumbsticks of a joystick to interact with this layout [43]—the right thumbstick is used to move around the circular keyboard and the left thumbstick is used to select characters. In a user study, this approach reached 15.9 wpm and 94.6% accuracy rate by the end of the 5th session. Yu et al. [42] evaluated this layout in three different sizes, however, failed to identify a significant difference between the three in terms of speed and accuracy.

Fig. 4.
figure 4

With the PizzaText keyboard [42], users traverse the keyboard using the right thumbstick and select characters using the left thumbstick.

4.2 3D Layouts

Most keyboards for VR are in 2D, although virtual environments are in 3D. The Cubic keyboard [44] is a 3D keyboard that arranges the letters in a 3 × 3 × 3 (H × W × D) 3D array. It has 27 cells, 26 for the 26 letters of the English language and a blank cell at the center. Users use a controller to navigate through the cells to select a character. In a pilot study, this approach yielded a competitive entry speed of 21.7 wpm, demanding further exploration of 3D keyboard layouts for VR.

4.3 Hand-Based Techniques

Some have proposed hand-based approaches that map characters onto fingers to reduce the reliance on external hardware.

The BlueTap keyboard [45] maps the letters onto the fingers in an alphabetic order, with at most 4 characters per finger (Fig. 5). Users tap on different parts of the fingers with the thumbs to select characters. This approach uses a wrist worn camera to detect the taps.

Fig. 5.
figure 5

From left, the BlueTap [45] and the standard 12-key mobile keypad [46] mapped onto the fingers.

Pratorious et al. [46] proposed another approach that maps the standard mobile keypad to the index, middle, and ring fingers. Each knuckle or fingertip includes up to 4 letters (Fig. 5). Like Multi-tap [6], users tap on the knuckle or fingertip once or multiple times, respective to the position of the character. This approach uses a wrist worn camera and an accelerometer to detect the taps. In a pilot study, it yielded 10 wpm.

Ogitani et al. [47] designed a 12-key mobile keypad layout that maps up to 4 characters to each key. With this layout, users tap on the key containing the target character, then swipe towards the direction of the character to enter it (similar to an existing tablet keyboard [48]). They evaluated two techniques using the keypad. The first projects the keypad on the palm and users use the index finger of the other hand to select characters. The second displays the keypad in mid-air and users use their hands to select characters. Both techniques display an animated representation of the hand in the virtual world. These techniques yielded 5.6 wpm and 8.2 wpm, respectively. Interestingly, the authors reported that a projected Qwerty yielded a better entry speed than the keypad, however, did provide further insights into it. We speculate that could be because all participants were familiar with the Qwerty layout.

Although hand-based techniques do not require external devices to function and free up virtual real estate, they do not provide haptic feedback and can cause physical stress. Besides, these techniques force users to perform multiple actions to enter one character, which is a slow approach by design, thus can affect one’s overall text entry experience.

4.4 Discussion

We can see in Table 4 that both Speicher et al. [27] and Yu et al. [26] investigated head pointing coupled with a controller but Yu et al. reported a 35% faster entry speed than Speicher et al. (15.6 wpm vs. 10.2 wpm). We believe the use of a predictive system contributed towards this—Yu et al. augmented their system with predictive features, including word suggestion and completion. They both reported comparable error rates.

Eye gazing [25] and head pointing [26] coupled with dwelling yielded comparable entry speed (9.4 wpm and 10.6 wpm). Eye gazing was slightly faster (11%), most probably due to the use of a shorter dwell time (400 ms vs. 550 ms). The similarly between the two is further established when coupled with a controller. Eye gazing [25] and head pointing [27] both yielded 10.2 wpm when coupled with a controller. This is most probably because both techniques use similar methods for moving the cursor—one uses the head and the other uses eyes. However, it is worth noting that eye gazing has a lower physical demand than head pointing. Yu et al. [25] showed that word-based input using head gestures increases text entry speed by ~40%, it would be worth investigating how this method performs with eye gazing.

Both Speicher et al. [27] and Yu et al. [42] used directional control—the former used thumbsticks with a circular keyboard and the latter used a directional pad with Qwerty. Although the techniques are different, it is worth mentioning that the circular keyboard yielded a 67% higher entry speed than Qwerty (15.9 wpm vs. 5.3 wpm). This is likely due to the compact nature of the circular keyboard layout, which makes it suitable for navigation using directional control. However, the higher error rate of the circular keyboard (5.5% vs. 2.8%) is likely due to the unfamiliarity with the novel layout.

Interestingly, an embossed mobile keypad [21] yielded a 26% faster entry speed than a smartphone virtual Qwerty [36] (12.1 wpm vs. 9 wpm), regardless of the fact that the keypad used Multi-tap. We believe this result could be attributable to the familiarity of feature phones at that time (2009) and the haptic feedback afforded by the keypads.

Table 5 presents the error rate metrics reported in the literature. Evidently, most studies reported either the Error Rate (ER) or the Total Error Rate (TER) metrics. The studies that reported TER also reported the Corrected Error Rate (CER) and Uncorrected Error Rate (UER) metrics. Only one study reported only CER [25]. However, text entry studies often use different methods to calculate the same error metrics. Knierim et al. [10], for instance, used the Minimal String Distance (MSD) algorithm to count the total number of errors, when Rajanna et al. [25] counted the total number of backspaces. A prior work [9] analyzed these metrics and showed how different metrics yield different results. Nevertheless, most studies do not report how they calculate errors.

Table 5. Error rate metrics reported in the literature.

5 Hand Representation

Grubert et al. [11] investigated the effects of different hand representation on text entry in four conditions: none, animated hands, fingertips, and video inlay of the hands. They failed to identify a significant effect of hand representation on entry speed. However, fingertips and video inlay were significantly more accurate. The task load scores for video inlay were also significantly lower. McGill et al. [12] reported similar results. Knierim et al. [10] also compared different hand representations in four conditions: none, realistic hands, abstract hands, and fingertips. Surprisingly, they found out that hand representation did not affect entry speed or accuracy for experienced typists; but affected entry speed for the inexperienced ones. Evidently, inexperienced typists were significantly slower with no hand representation compared to abstract hand representation.

A different study [50] compared male, female, and robotic hand representations. Results revealed that female participants preferred female hands than male and robotic hands, while male participants were mostly neutral.

6 Conclusion

This paper categorized the existing text entry techniques for VR based on their input mechanism. It discussed their strengths, limitations, and the overall performance. It also highlighted important design considerations for the development of more effective text entry techniques. The goal of this paper is to help researchers to comprehend the mechanism of these techniques, compare their performance, and finally identify and address the gaps in this body of work.

6.1 Future Work

This work highlighted the fact that most existing text entry techniques for VR are adaptations of techniques that were designed and optimized for different form factors. As a result, these techniques fail to address all needs and challenges of the paradigm [27]. In VR, speed and accuracy alone do not entirely reflect the effectiveness of a text entry method—usability, learnability, fatigue, and space requirement must also be taken into consideration. Further, as we continue to seek solutions for virtual office spaces, it is important to consider methods that are not only efficient but also portable.

Due to the unviability of an effective text entry technique, physical Qwerty remains an important tool in VR [8]. Although it is substantially faster than most alternatives, it compromises mobility. Further, it relies on expensive sensors to track hands and fingers for visual feedback, which hinders its widespread use. The reliability of the sensors also poses a challenge [13, 27] as most popular tracking devices are error prone [47]. Hence, there is a need for developing cheaper and more reliable tracking devices for physical Qwerty to be fully embraced in VR. Embedding live video streams of the keyboard and hand in VR is an effective alternative [11, 12]. However, further investigation is needed to identify the optimal level of video stream that does not compromise the immersion. The methods for seamlessly blending the virtual and real worlds must also be explored.

In addition, the effects of various keyboard properties have not yet been fully studied. Yu et al. [42] investigated the effects of different sized circular keyboards and Rajana and Hansen [25] investigated different shaped keyboard in gaze typing. However, the effects of the size, position, type (3D vs. 2D), and shape (flat vs. curved) of different types of keyboards in virtual space is still unexplored.

Good user experience is an important ingredient in successful technologies. This review showed that different approaches can be effective in different scenarios [21, 27]. Hence, the possibility of using different text entry solutions for different scenarios must be explored. For instance, users could use a pointing-based or an ambiguous technique for short-term text entry, then switch to a physical keyboard for heavy text entry sessions. It is also essential that we design alternative text entry techniques specifically for 3D environments, such as the Cubic keyboard [44]. Circular and touch-based keyboards also demand further exploration [36, 42].