Keywords

1 Introduction

In our daily life, paper-based work still remains even though work with digital documents has become popular. People often need to deal with multiple digital/paper documents to accomplish a task. In this paper, we focus on an environment, in which people work with paper and a pen by referring digital information presented onto a table from a video projector. A use case is that a high school student studies physics with an exercise book by comparing her answer on the notebook with an answer key projected on the desk and correcting wrong answers by a pen. Another scenario is that a person who wants to submit a notification of change of address to the post office writes the (paper) document by referring projected information that is aware of his/her condition. People check if presented number and written one are identical, transcript from projected information to a paper document, and so on. In such cases, gaze switching between document and information may cause problems in task efficiency and precision if it takes too much time to find desired information, as well as making mistakes of using incorrect information due to the limitation of human short-term memory.

In this paper, we show a result of investigation on the position of information projection on the desk for efficient and precise pen-based task. The rest of the paper is organized as follows. In Sect. 2, we briefly introduces related work of information annotation in augmented reality. Section 3 presents experiments that intend to explore the relationship between the work performance and the placement of information. In Sect. 4 discuss the result, and finally Sect. 5 concludes the paper.

2 Related Work

Augmented Reality (AR) technology presents information near physical objects or places, or connects between them using a linkage line, which allows information to be related with the physical world [1]. View management techniques have been proposed to improve the visibility of presentation in multiple labels, i.e., information, environment [2, 4] and on various background, i.e., contrast, texture, etc. [3]. These work allow a system to find a position of information presentation that a user easily perceives and reads it; however, the condition in which a user is actively involved with the information has not yet been considered well. Here, “active use” is found in the use cases described above, e.g., referring and transcribing presented information, which requires tight hand-eye coordination.

3 Experiment

We investigated the relationship between the work performance and the placement of information.

3.1 Methodology

A transcription task was chosen, in which a subject was asked to transcribe the characters in a specific cell of a printed table from the projected information. Figure 1 shows a scene of this experiment. A blank paper strip is placed at a specific position on the desk, on which the instruction to the subject is presented by a video projector (a). Every time a subject clicks a mouse, a new instruction is presented, and he/she transcribes the characters at the directed cell in the printed table. The printed table is placed in the center of the desk (c). The position of the instruction slip is randomly selected from 54 positions, which are set at every 30\(^\circ \) and at a distance of r (=1, 5, 10, and 20 cm) as shown in the circle of (b). A variable r represents the distance between the source of transcription, i.e., paper strip, and the target, i.e., printed table, which is defined as the distance between the point where the straight line that connects the center of the paper strip and that of the printed table intersects with the respective side (Fig. 2). The reason why we do not utilize the distance between the centers of two rectangles as r comes from the assumption that information is directly presented on the desk, rather than overlapping on these sheets. So, r should be defined as a variable that does not depend on the size of these sheets.

One session at a particular position on the desk consists of five trials of transcription, and three sessions are performed at the position. So, one subject performs 162 (= 54 positions \(\times \) 3 sessions) sessions in total. To avoid recall of known words, 15 alphabetic characters are randomly generated. Ten subjects (three men and seven women in their 20’s) whose dominant hands are right participated to the experiments.

Fig. 1.
figure 1

A scene of an experiment

Fig. 2.
figure 2

The definition of r and \(\theta \).

Two performance metrics are measured: (1) task completion time as an efficacy metric and (2) number of errors as an effectiveness metric. Task completion time is defined as the duration taken to complete one session, i.e., from the beginning of the first trial to the end of the fifth trial, based on the timestamps generated by mouse clicks. The number of errors includes two types of errors: incorrect position of transcription and incorrect character transcription. We counts the incorrect character transcription as what the subject cannot notice during the session and the experimenter later finds. In addition to these performance metrics, we analyze the eye movement using an eye tracker (Arrington Research Inc. View Point EyeTracker) (d). The experimenter counted the number of eye movements by watching the recorded video from the eye tracker. Also, we asked the participants about 5-most and 5-least preferred positions for the task after all trials, which is intended to understand subjective opinions.

3.2 Result

To investigate the difference of the performance by position visually, the performance metrics are averaged and represented as heat maps by interpolating spatial data using inverse distance weighting (IDW) [5]. Figures 3(a) and (b) show the task completion time and the number of errors, respectively. The heatmap overlaps with the area of the desk. The values were normalized between 0 (minimum) and 1 (maximum).

Fig. 3.
figure 3

Heat maps of (a) task completion time and (b) the number of errors. (Color figure online)

Average task completion time is about 73.9 s (SD: 6.6), in which the fastest and the slowest average completion times per position are 65.00 s (at 5 cm and 90\(^\circ \)) and 100.13 s (at 20 cm and 300\(^\circ \)), respectively. We confirmed significant difference between these positions at the level of p < 0.05 (t(9) = −5.14). By contrast, an average number of error per trial is 0.16 (SD: 0.09), ranging from 0.03 (at 10 cm and 0\(^\circ \)) to 0.40 (at 20 cm and 330\(^\circ \)), which do not show any significant difference (p>0.05 (t(9) = −2.01)). Generally, presentation on the upper sides of the printed table and the one close to the target show better results (blue area in Fig. 3), while worse results are observed on the lower right side (red area in Fig. 3).

Regarding the eye movement, we could obtain data from five subjects and counted the number of eye movements. On average, 38.5 times (SD: 3.7) per session of eye movements were observed, ranging from 29.2 (at 20 cm and 270\(^\circ \)) to 45.7 (at 1 cm and 60\(^\circ \)); however, no clear difference in the number of eye movements was observed. Rather we found three types of gazing that may depend on individuals: (1) keeping eyes on the instruction slip during transcribing after checking the position of the target cell, (2) transcribing by looking at the instruction slip and the target cell alternatively, and (3) transcribing by gazing the target cell once he/she looked at the instruction slip.

In terms of the subjective opinions for position, we assigned scores of 5, 4, 3, 2, and 1 in ascending order of “like”. Likewise, scores of \(-5, -4, \ldots , -1\) were assigned in ascending order of “dislike”. Figure 4 shows the preference ratings, in which the numbers in rectangle and triangle indicate the rank of “like” and “dislike”, respectively. Similar to the performance metrics in Fig. 3, the lower side of the transcription target was not preferred, and upper left part was preferred instead. We consider that these performance and preference came from the characteristics of the dominance hand use.

By taking into account the fact that all subjects are right-handed, the low performance and disliked positions are generally inside or near the forearm of dominant side, i.e., closer to the body, while the preferred positions appear on the opposite side.

Fig. 4.
figure 4

Preference for the positions (5 most preferred (rectangle) and 5 least preferred (triangle) positions)

4 Discussion

4.1 Analysis of the Experimental Results

The distribution of task completion time (Fig. 3)(a) and the preference for the positions (Fig. 4) looks consistent with each other in that presentation at an upper and closer area from the target (printed table) was relatively good, while presentation at bottom-right area was not. The reason for this is that the presentation does not overlap with either the right hand holding the pen, or the left hand holding the paper for tracing the printed table when the information is presented at upper work area. Furthermore, by presenting the information at close position from the work area, we consider that the subjects could perform the transcription task with natural postures because a large head movement was not necessary to see the information.

On the other hand, the areas that were mentioned as difficult places to work and had long task completion time overlap with the right arm. Also, the subjects needed to look inside to see the information, which forced to the subjects to turn their necks and physical burden was imposed to the subjects. Thus, we consider that they felt troublesome and extra time was consumed.

Fig. 5.
figure 5

The breakdown of errors.

The effectiveness of the presentation is measured by the number of errors. As shown in Fig. 3(b), bottom right of the desk shows large number of errors on average; however, we found large amount of user dependency. The breakdown of the errors is shown in Fig. 5, in which the confusion between “I” and “l” shares the largest portion of observed errors, followed by that of “6” and “8”. By taking into account the feedback from the subjects, the experimental environment might affect the precision of the transcription. This includes low readability of fonts, low resolution of projected information, occlusion from eye tracker, and so on. As described in Sect. 3.2, the number of errors in one session is 0.40 out of 75 characters (= 15 characters \(\times \) 5 trials) at most, which corresponds to 0.53% of input characters and small enough to be ignored in most applications.

As discussed above, small r was generally preferred except for the case with overlapping with the right hand; however, Fig. 3(a) implies another exceptional case. For example, the position at 1 cm and 0\(^\circ \) shows longer task completion time than the position at 5 cm and 10 cm with the same angle (0\(^\circ \)). As shown in Fig. 6(a), the information overlaps with the right hands when the information is presented next to the working area, 1 cm in this case. By contrast in (b), such overlap does not happen because an area enough to avoid overlapping with the information is provided. Consequently, the requirements for the system can be listed below:

  • As close to the working area as possible.

  • Least necessary activity of referring to the source information.

  • Avoid current or predicted working area.

Fig. 6.
figure 6

Too much short distance makes overlap (a), while presentation at an adequate distance avoids overlap (b).

4.2 Merging Two Performance Metrics

The two heatmaps in Fig. 3 represent different aspects of the information presentation on the task performance; task completion time indicates the efficacy, while the number of errors represents the effectiveness of the presentation. By merging these aspects with appropriate blend ratio, a single heatmap that reflects both aspects is obtained. In Fig. 7, three types of merged heatmaps are shown, in which task completion time is merged with (a) 0.8, (b) 0.5, and (c) 0.2, and in turn, the error rate is merged with 0.2, 0.5, and 0.8, respectively. The figures mean that the position can be determined based on map (a) if an application emphasizes the task completion time, while map (c) can be utilized in case that the correctness of work is important.

Fig. 7.
figure 7

Merging two aspects of performance with different blend ratio. (Color figure online)

4.3 Applying to Other Tasks

We performed the task of transcribing specific character string at the specific position in the experiment of this study. Since this work is abstraction of transcription from reference information, we consider that the heatmap can be applied to similar work such as filling in the application form, copying the content of the booklet etc., and collation work like self-study answering.

As a constraint in the experiment for creating the heatmap, we narrowed down the type of information to a static one that the user only refers to; however, interactivity with the user is often required in real world tasks, in which presented information is operated by the user. Similar to the static information, dynamic information should also be unobtrusive by the dominant hand and presented close to the work area for referring to information necessary for work. On the other hand, the angle at which operation is easier for the user, i.e., ergonomically suitable, is unclear from the experimental results. So, the heatmap is not always possible to apply as it is. Extra experiment is needed to understand the characteristics of the presentation from this aspect.

Furthermore, the generated heatmap was specialized for right handed users. We consider that the heatmap for right handed users cannot just be flipped horizontally for left handed users. In writing characters or drawing lines from left to right, right handed people often do not bend their wrists too much (Fig. 8(a)), while we can see that left handed people tend to bend the wrist inward and surround the paper (Fig. 8(b)). We consider that there are two reasons for this specific posture:

  • Make the characters that the user wrote before visible

  • Natural and comfortable to warp the wrist outward in drawing a line

The area where the left hand becomes a barrier is different from the right handed person due to bending the elbow and warping the wrist outward. So, the optimal information presentation position for the left handed person is not simply symmetric about y axis. Similar to the right handed people, close distance to the target and unobtrusiveness for the dominant hand is necessary for left handed people; however, the effect of angle should be investigated carefully.

Fig. 8.
figure 8

Difference of forearm posture for those who are right-handed and left-handed in writing characters from left to right.

4.4 Integration into a System

In this paper, we obtained heatmap that shows the relationship between the position of information presentation and work efficiency as well as accuracy. Here, we describe the utilization of the map and other functional components to realize an entire system. Prior to applying the heatmap, three areas are recognized: (1) operational area in work area, (2) free space in work area, (3) the area of operational hand, i.e., dominant hand. Work area is an area where the task is carried out. In this study, work area is the transcription target. The position of operational area is considered as “hot spot” that is utilized to start searching candidate area around it from free space for information presentation. In Fig. 7, the work area is drawn as white rectangles, and the candidate areas are drawn with blue color. Free space is the area, where no object except for the target object exists. Finally, the area of the user’s dominant hand is detected as dynamic obstacle that is the lowest possible area for presentation, which is drawn with yellow to red in Fig. 7. The three types of areas are overlapped with the heatmap, and free space with as high performance as possible is found. Additionally, the preference for presentation as shown in Fig. 4 can be referred in determining the position of presentation.

5 Conclusion

In this paper, we explored the suitable and unsuitable positions for information presentation on the desk through a projector; we focused on a task that contains user’s active involvement with paper, i.e., transcribing characters in a specific cell of a printed table from the projected information. The position of the information presentation is represented as the distance between the edge of the transcription target and the edge of source information, and the direction of the source information from the transcription target.

In the quantitative experiment that dealt the distance (r) and the direction (\(\theta \)) as variables, the task completion time and the error rate were measured as metrics for analysis, and visualized as heatmaps. Eye movement was also observed. Generally, presentation on the upper sides of the printed table and close to the target showed better results (faster task completion and lower error rate), while worse results are observed on the lower right side that was around the dominant hands. Regarding the eye movement, no particular characteristic was found. Subjective preference for the position had similar characteristics as the performance metrics, i.e., the lower side of the transcription target was not preferred, and upper left part was preferred instead.

Although the result is obtained under very limited environment, we consider it is a good starting point to design a projector-based information presentation system that supports existing paper tasks. A use case is that a high school student studies physics with an exercise book by comparing her answer on the notebook with an answer key projected on the desk and correcting wrong answers by a pen. Extra experiment is needed to understand the effect of the presentation on the user’s subjective preference, especially from the ergonomic point of view.