Keywords

1 Introduction

Ultra-small touch screen devices (henceforth referred to as ultra-small devices) such as smartwatches, are required to be small and lightweight so that they can be worn on the body with no frustration. For this reason, the input area of such devices is limited, and thus, ultra-small devices are more prone to occlusion or the fat finger problem [16] than smartphones or tablets. As a result, users often have difficulties in selecting correct keys and thus entering texts. Therefore, entering texts on ultra-small devices is impractical. To address this problem, we present Flickey (Fig. 1), a flick-based QWERTY software keyboard for ultra-small devices. The flick-based selection mechanism of Flickey, in combination with its callout technique, allows users to select a tiny key on the small keyboard easier than with a tap.

In this work, we developed a prototype of Flickey and conducted a comparative experiment with two existing keyboards under three size conditions to investigate the performance and usability of Flickey. The results suggest that Flickey shows high performance when the size of the keyboard becomes small.

2 Related Work

Numerous text entry methods for ultra-small devices have been explored. Among them, many researchers have adopted a QWERTY layout on ultra-small devices in designing their input methods; this is because many users are familiar with the QWERTY layout, and thus the learning cost of input methods can be lowered. For example, in ZoomBoard [14], users use touch gestures which trigger iterative zooming (i.e., visual magnification) until a certain level of zoom is reached, where users can press a key easily with their finger. In Swipeboard [3], users use a swipe gesture to select a key, which eliminates ambiguous selection. WatchWriter [6] uses gesture typing to enable users to enter a word per gesture. SplitBoard [8] splits a QWERTY keyboard into two parts to increase the size of each key. DriftBoard [15] is a panning-based input technique using a movable QWERTY keyboard and a fixed cursor point. ZShift [11] uses a callout to display a copy of the occluded area to eliminate the occlusion caused by a finger, thus enabling text entry using a QWERTY layout on ultra-small devices.

By contrast, some keyboards do not use a QWERTY layout. Komnios and Dunlop [9] used a specialized keyboard with six large keys and adopted alternative/next word predictions based on a dictionary to realize text entry on ultra-small devices. DragKeys [4] uses a specialized keyboard that consists of eight circularly arranged large keys, each of which contains multiple small keys. To enter a character, users select a large key with a dragging gesture and then select a small key within the selected large key using another dragging gesture.

In this work, we use a QWERTY layout to reduce the learning cost and adopt the same approach of ZShift, which uses a callout to avoid the occlusion problem. In addition, we use a flick as a key selection trigger to eliminate ambiguous selection.

Fig. 1.
figure 1

Flickey. (a) User is entering text using Flickey with his/her index finger. (b) Flickey compared with a 10 JPY coin (diameter: 20 mm).

3 Flickey

Flickey is a flick-based QWERTY software keyboard, which uses a callout technique (similar to ZShift [11]) and adopts a flick as a key selection trigger. This flick-based selection mechanism in combination with the callout technique allows users to select a tiny key on the small keyboard easier than with a tap.

Fig. 2.
figure 2

Text entry procedure of Flickey. To enter a character (a) the users select the column of the key with a touch-down. (b) If the selected column is not the intended one, the users can change the selected column by moving their finger to the right or left. (c) Users select the key by a touch-up or a flick to enter the character.

Figure 2 illustrates the text entry procedure of Flickey. Flickey involves two steps for entering a character. First, the users select the column of the key by a touch-down (Fig. 2a). If the selected column is not the intended one at the first touch-down, users can change the selected column by moving their finger to the right or left (Fig. 2b). Second, the users select a key by a touch-up or a flick (Fig. 2c): the middle-row is selected by a touch-up and the upper-row or lower-row is selected by a flick-up or flick-down, respectively.

This design allows users to select a tiny key on the small keyboard easier than with a tap. In the first step, users concentrate on selecting the horizontal position of the key, which eliminates concern over the vertical position of the finger; in the second step, users simply have to select the row of the key with a touch-up, flick-up, or flick-down, which eliminates the need for precise vertical positioning. In addition, Flickey displays the current key selection in a callout which is placed above the keyboard to remedy the fat finger problem.

Fig. 3.
figure 3

Smartphone attached in a landscape orientation with respect to the non-dominant hand.

4 Evaluation

We conducted an experiment to evaluate the performance and usability of Flickey, comparing it with two existing keyboards: ZoomBoard [14] and ZShift [11]. Experiment participants performed text entry tasks using three different keyboards under three size conditions (Fig. 4), similar to the approach of previous studies [11, 14], to investigate their performance on a wide variety of ultra-small devicesFootnote 1 (e.g., bracelet-style devices and smartwatches).

4.1 Participants

We recruited five participants (four males and one female) aged between 21 and 22 years. All the participants majored in computer science, were right-handed, and were familiar with the QWERTY layout. Each participant received 1,640 JPY (approximately 15 USD) after completion of the experiment.

4.2 Apparatus

We implemented the three keyboards (i.e., ZoomBoard [14], ZShift [11], and ours) on an iPhone 5 smartphone (iOS 8.3, 4 in., 1,136 \(\times \) 640 pixels, 326 ppi). Similar to the approaches in [11, 14], we used a smartphone for the experiment because its touch screen is more accurate than the touch screen of existing smartwatches. The smartphone was attached in a landscape orientation to the non-dominant hand of the participant using a Velcro strap (D&M Co., Ltd.; knee wrap; 842XUD2786 BLK M) as shown in Fig. 3.

Fig. 4.
figure 4

Layout of the keyboards used in the experiment.

We implemented the three keyboards on the smartphone. Figure 4 shows the layout of the keyboards used in the experiment. The small size of the keyboards was determined first, and then the medium (small \(\times \) 1.33) and large (small \(\times \) 1.77) sizes were determined in relation to the small size, which is the same approach as in a previous study [11]. All the keyboards are smaller than the smartphone’s QWERTY keyboard (the dimension of the small keyboard is approximately 1 / 20 (0.054x) of the dimension of the QWERTY keyboard on the iPhone 6).

4.3 Procedure and Task

The experiment was conducted in a calm office environment. First, the purpose of the experiment was explained to the participants. In addition, they were informed that they could abort the experiment and take a break at any time. The participants were required to sign a consent form and answer a demographic questionnaire. Then, we measured the width of the index finger of each participant’s dominant hand using a digital caliper; for the measurement, the digital caliper was aligned with the distal interphalangeal joint (Fig. 5). The average width obtained was 14.3 mm (Standard Deviation = 0.8 mm), which matches the standard size for Japanese people [10].

Fig. 5.
figure 5

Measurement position for the index finger.

We explained the three keyboards to the participants through a short demonstration. Then, the participants practiced each keyboard. As practice, the participants entered the short phrase “tsukuba taro” using each keyboard. Then, to practice the delete gesture (left swipe on the keyboard), which deletes one character, the participants deleted the inputted short phrase. Then, the participants practiced the space gesture (right swipe on the keyboard) for entering a space between words. For the final stage of practice, the participants entered another short phrase, “taro”. The participants practiced the three keyboards in the same order they evaluated them. The duration of the practice was approximately 6 min.

After the practice, the participants entered five phrases (five trials) for one keyboard and one size in a session. The participants were instructed to enter the phrases as quickly and accurately as possible. The participants were also instructed to correct mistakes when they entered a wrong phrase. One session was performed for each condition (i.e., nine sessions were performed in total). Therefore, the participants performed 45 trials (\(=\)keyboards \(\times \) 3 sizes \(\times \) 5 phrases). The conditions were presented to the participants in a random order without redundancy to counterbalance possible biases caused by the order of the conditions. The phrases also were presented to the participants in a random order. The phrases were chosen from the phrase set provided by MacKenzie et al. [12], which contains 500 phrases in English. After each session was completed, the participants were asked to report their impressions regarding the selection of the targets to the experimenter and to respond to the System Usability Scale (SUS) questionnaires [1, 2]. In this experiment, we used the Japanese version of SUS [5] because all the participants were Japanese. The participants were also asked to respond to the NASA Task Load Index (NASA-TLX) questionnaires [7]. We used the Japanese version of NASA-TLX [13] for the same reason as above. Then, the participants were asked to take a break of 3 min to reduce the effects of fatigue.

After all sessions were complete, the participants were given a questionnaire to report their impressions of each condition. The duration of the experiment was approximately 120 min.

Table 1. Text entry speed (WPM). Standard deviations are shown in parentheses.
Fig. 6.
figure 6

Text entry speed (WPM).

Table 2. Character error rate (CER).

4.4 Result and Analysis

Text Entry Speed. We used Words Per Minute (WPM) as an index of text entry speed and analyzed the results using a one-way repeated measure analysis of variance (ANOVA). Table 1 and Fig. 6 show the text entry speed and the result of the ANOVA. The result indicated a significant main effect of keyboard under all size conditions. We used Tukey’s test with a significance level of 0.05 for post hoc analysis. The result revealed the following: (1) under the small condition, significant differences were found between ZoomBoard and ZShift (p < 0.05) and between ZoomBoard and Flickey (p < 0.01), (2) under the medium condition, a significant difference was found between ZoomBoard and ZShift (p < 0.001), and (3) under the large condition, significant differences were found between ZoomBoard and ZShift (p < 0.001) and between ZShift and Flickey (p < 0.001).

In the questionnaire, one participant stated that “I can enter text using Flickey accurately even if the size of the keyboard is small.” The results support this comment. The text entry speed under the small condition was equivalent to that of the large condition. In contrast, ZoomBoard and ZShift achieved faster text entry speed as the size of the keyboard became larger, whereas Flickey achieved similar scores for all sizes. One participant stated that “I could not operate the small size Flickey and the large size Flickey in the same way” and two participants stated that “I have to move my finger much farther than I thought to enter a key under the large condition.” This is because we changed the threshold to detect a flick according to the size of the keyboard, which changed the usability. Thus, the text entry speed could be improved when the thresholds are optimized.

Error Rate. Table 2 shows the error rate. We used character error rate (CER) as the index of the error rate [17] in this analysis, which is calculated as the Damerau-Levenshtein distance between the submitted text and the reference text. It is normalized by the number of characters in the reference text. This index indicates whether the participants corrected their wrong input or not. In the results, we found that the CER under all conditions was so low that we concluded that the participants corrected their wrong input as they were asked to. We also analyzed the results using a one-way repeated measure ANOVA. The result indicated a significant main effect of keyboard under the small condition. We used Tukey’s test with a significance level of 0.05 for post hoc analysis. The result revealed that significant differences were found under the small condition between ZoomBoard and ZShift (p < 0.01) and between ZShift and Flickey (p < 0.01).

Table 3. Corrected error rate (Cerr). SDs are shown in parentheses.
Table 4. Usability (SUS). SDs are shown in parentheses.

We also used corrected error rate (Cerr) as another index of the error rate [18], which is the percentage of the wrong inputs in all inputs and analyzed the results using a one-way repeated measure ANOVA. Table 3 shows the Cerrs and the result of the ANOVA. The result indicated a significant main effect of keyboard under the large condition. We used Tukey’s test with a significance level of 0.05 for post hoc analysis. The result revealed that a significant difference was found under the large condition between ZoomBoard and Flickey (p < 0.05).

We analyzed the cause of the errors under the Flickey condition. We found that a wrong key row which was shifted by one from the correct one was selected. In other words, the participants selected a wrong key row because they accidentally moved their finger too much when they selected a key row. The reason for this might be that the content of the callout is changed discretely in response to the current selection of a key row. Under the ZShift condition, the participants can recognize their finger position accurately when selecting a key because the content of the callout is changed continuously in response to the movement of their finger. In contrast, under the Flickey condition, the participants cannot recognize their actual finger position because the content of the callout is changed discretely.

Usability and Workload. We analyzed the results of SUS using a one-way repeated measure ANOVA. Table 4 shows the scores of SUS and the result of the ANOVA. The result indicated a significant main effect of keyboard under the large condition. We used Tukey’s test with a significance level of 0.05 for post hoc analysis. The result revealed that a significant difference was found under the large condition between ZShift and Flickey (p < 0.05).

Table 5. Workload (NASA-TLX). SDs are shown in parentheses.
Fig. 7.
figure 7

Workload (NASA-TLX).

We analyzed the results of NASA-TLX using a one-way repeated measure ANOVA. Table 5 and Fig. 7 show the scores of NASA-TLX and the result of the ANOVA. The result indicated no significant difference between any conditions.

The above results suggest the following:

  • ZShift achieved higher SUS scores as the size of keyboard became larger. This means that the participants could use ZShift as a standard QWERTY keyboard as the size of the keyboard became larger. Comments in the questionnaire support this observation. Two participants stated that “I could use ZShift as a standard QWERTY keyboard when the size of the keyboard was large.”

  • ZoomBoard achieved similar scores for both SUS and NASA-TLX under all size conditions. This means that the usability does not change even if the size of the keyboard changes. Comments in the questionnaire support this observation. Two participants stated that “The input procedure of ZoomBoard was a little bit troublesome because I always had to tap twice to enter a key.” One participant stated that “I had to look intensively at the keyboard at all times because the layout of the keyboard changed for every zoom.”

  • Flickey achieved higher NASA-TLX scores under the large condition than under the small condition. This means that the input procedure of using a flick is too complicated even though the participants could select a key directly without a flick under the large condition. This result suggests that Flickey is useful when the size of the keyboard is particularly small as under the small condition.

4.5 Discussion

As a result, with the keyboard size of small (16.5 mm), Flickey achieved a good performance (ZoomBoard: 7.5 WPM, ZShift: 8.5 WPM, Flickey: 8.7 WPM). However, the Cerr of Flickey tended to be higher than the other keyboards. In contrast, the participants could enter text correctly (i.e., the CER was quite low) because they corrected their errors. Moreover, the SUS and NASA-TLX scores of Flickey were similar to other keyboards under the small condition. These results suggest that Flickey can be used practically on ultra-small devices even if the Cerr is high.

Furthermore, the text entry speed of Flickey was fast, in contrast to the high Cerr. This suggests that Flickey has the potential for improving text entry speed if the Cerr is decreased. Therefore, we will explore interaction designs to decrease the Cerr of Flickey.

5 Future Work

We found many potential improvements from the results of our experiment. In our current implementation, the threshold to detect flick changes according to the size of the keyboard, which leads to a change in usability. Therefore, we will conduct further experiments to explore the most suitable thresholds to detect a flick under each size condition. Furthermore, the Cerr of Flickey tended to be higher than other keyboards because the content of the callout is changed discretely in response to the current selection of a key row. To address this problem, we will make improvements in which the content of a callout changes continuously in response to the users’ drag operation. After these improvements, we will conduct further experiments to evaluate the performance of Flickey.

In our experiment, we used a smartphone. This design may influence the results, especially those related to the performance around the edge of the screen. Therefore, in the immediate future, we will conduct the above experiment using a real smartwatch.

6 Conclusion

We presented a flick-based QWERTY software keyboard called Flickey for ultra-small devices. Flickey enables users to enter text on ultra-small devices because the flick-based selection mechanism of Flickey, in combination with its callout technique, eliminates ambiguous selection, which allows users to select a tiny key on a small keyboard. To investigate the text entry performance and usability of Flickey, we developed a prototype of Flickey and conducted a comparative experiment with two existing keyboards. As a result, with a keyboard size of 16.5 mm, Flickey achieved a good performance (ZoomBoard: 7.5 WPM, ZShift: 8.5 WPM, Flickey: 8.7 WPM). The results suggest that Flickey shows high performance when the size of the keyboard becomes small, and thus Flickey could be used practically on ultra-small devices.