Registration System Errors Perception in Augmented Reality Based on RGB-D Cameras

Tokunaga, Daniel M.; Corrêa, Cléber G.; Bernardo, Fernanda M.; Bernardes, João; Ranzini, Edith; Nunes, Fátima L. S.; Tori, Romero

doi:10.1007/978-3-319-21067-4_14

Daniel M. Tokunaga¹⁵,
Cléber G. Corrêa¹⁵,
Fernanda M. Bernardo¹⁶,
João Bernardes¹⁶,
Edith Ranzini¹⁷,
Fátima L. S. Nunes¹⁶ &
…
Romero Tori¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9179))

Included in the following conference series:

International Conference on Virtual, Augmented and Mixed Reality

4350 Accesses

Abstract

One of the main objectives in augmented reality (AR) is to totally merge virtual information into the real world. However, different problems in computational processes can directly affect the user perception. Although several works investigate how rendering or interaction issues are perceived by the user, little has been studied of how spatial registration problems can affect the user perception in AR systems, even that registration being one of the central problems of AR. In this work, we study how system errors of three-points RANSAC pose estimation algorithm based on RGB-D cameras can affect the user perception, by applying psychophysical tests. With these user tests, we address how depth map and feature matching noises, among other issues, can affect the perception of object registration.

You have full access to this open access chapter, Download conference paper PDF

Basics and Advances in Monocular vSLAM

A Method of Registering Virtual Objects in Monocular Augmented Reality System

Modeling deviations of rgb-d cameras for accurate depth map and color image registration

Article 31 August 2017

Keywords

1 Introduction

Augmented reality (AR) systems became popular in recent years, with application in several areas such as education [1, 2], health [3], industry [4] and entertainment [5]. The main idea behind AR is to combine virtual and real information in real-time, registering 3D information [6]. This idea seems to be simple in theory, however it involves several aspects such as rendering the virtual objects and capturing real environment data in order to combine both pieces of information. Azuma [6] defines three characteristics of AR, such as:

three-dimensional (3D) space composed of real and virtual elements, with predominance of the real over the virtual;
real-time human-computer interaction;
3D registering, with alignment among real and virtual elements.

Due to problems in rendering and capturing, commonly caused by technological limitations, such as resolution, processing power and noise of input devices; and aspects from real environment, such as lighting and real objects and camera motion, different errors may be generated by the system during the human-computer interaction. These system errors can differently affect the user perception of whether the virtual information is real or not.

Several works examined how rendering errors affect the user perception [7–10]. Sanches et al. [11] studied the system error perception in video based avatar rendering issues. Also, [12–14] studied perception in Virtual Reality systems with haptic interaction.

However, little has been investigated about how the 3D registering errors affect the user perception of virtual objects placed over real surfaces. We addressed this problem, studying different types of common errors inside RGB-D registering based on three-point RANSAC pose estimation method [15], which enables fast registration of rigid object and is widely used.

Our main objective is to analyze system errors of 3D registering according to users perception, through subjective tests. Our contribution is the analysis of how system errors are perceived by users and which errors more or less affect the perception. Therefore, analyzing how high or low the different AR system errors should be in order to achieve a better user perception, even with these errors, which would help in the development of these interactive systems.

2 Related Work

User perception of virtual information realism is a central focus in several works. Gkioulekas et al. [7] presented a user perception study on the role of the phase function in translucent objects. Also, Jarabo et al. [8] and Křivánek et al. [10] studied realism in global illumination rendering, Jarabo et al. [8] investigated the perception of complex dynamic scenes, such as crowd movements, and Křivánek et al. [10], the effect of global illumination approximation impact on perception.

MacDonnell et al. [9], Gelasca and Ebrahimi [16] and Sanches et al. [11] studied user perception related with human avatars, synthesized and captured. MacDonnell et al. [9] studied the effects of human avatars and realism perception, and pointed out the uncanny valley effect [17] in avatar rendering. Gelasca and Ebrahimi [16] and Sanches et al. [11], in turn, studied the artifacts of background segmentation of user videos. Where, Sanches et al. [11] focused on AR and its impacts on user perception.

Several works in haptic interaction research also use objective evaluations with psychophysical models to determine user perception when a user is submitted to tactile stimuli [12–14], analyzing the tolerance in the occurrence of certain noise or errors in haptic signals.

As spotted by the works presented above, several aspects related with perception in AR [6] were studied isolatedly, such as rendering or interactions. However, little has been studied about another central aspect of AR, the registration of real and virtual objects. Sanches et al. [11] directly studied perception in AR systems, yet, their work relied on rendering aspects. Our work addresses this gap, investigating perception in AR registering focusing on common system errors that can happen, analyzing how these errors affect the perception in AR.

3 Registering Method and User Test Stimuli

Registration of view in AR systems can be achieved by several approaches [18–20], and recently RGB-D cameras have become widely available. These cameras enable faster and easier methods of pose estimation. One pose estimation method based on these cameras are the three-points based RANSAC based methods [21], used in several works, such as Henry et al. [15]. These methods pose estimate known scene or objects based on correlation of known feature points with observed feature points, extracted using feature point extraction methods such as SIFT [19] or SURF [20]. In this work, we will use this three-points based RANSAC pose estimation with SURF feature points in order to analyze the method performance through system error based on user perception.

Two common system errors in this method are the noise present in RGB-D depth map, caused by the capturing system, and 2D (two-dimensional) position error of detected feature points in the image, caused by mismatches or position error of the feature matching algorithm. Also, clustering of the detected feature points, generated by occlusions of the object or because of the original object texture, is another error. This clustering (Fig. 1(a)) can cause pose estimation errors, since the points are not spread all over the object surface, small errors in captured data can thus affect the pose error more.

In order to analyze the user perception, two controlled experiments were conducted. Here, 150 different ground truth images were pre-rendered with different plane image poses, in order to decrease pose related perception bias. For each error value, all the 150 images were re-rendered with the result returned by our pose estimation algorithm with the error injected, as in Lepetit et al. [22]. In the user test, some of the 150 images are randomly shown with their ground truth images for each error value.

Experiment 1 was executed to analyze the effect of two types of errors themselves over the registration, without error combination. Two tests were preformed, varying the depth (DEPTH) and 2D position (POS_2D) noises with 7 different values for each, as presented in Fig. 2. The depth map errors are injected as a Gaussian random noise of mean \(\mu = 0\) and standard variation with 7 values varying from \(\sigma = [0, 20, 40, 60, 80, 100, 120] mm\). In order to add the 2D position noise, we added a two-dimensional Gaussian random noise of mean \(\mu = 0\) and standard variation with 7 values varying from \(\sigma = [0, 16.6, 33.3, 50, 66.6, 83.3, 100] pix\), similarly to depth noise. This experiment indicated how these two errors are perceived by users, and how they differ from computed absolute errors.

In Experiment 2, 4 variables were chosen: depth (DEPTH), 2D position (POS_2D), clustering type (CLUSTER) and object rendered over the plane (OBJ). We varied the object, since one hypothesis was that the object can affect error perception, providing guides and clues t to the error. Figure 3 shows some examples of pose estimation results with these combined errors.

In this test, we varied the depth map noise variance with three different values \(\sigma = [0, 20, 60]mm\). Also, the 2D position noise was modified with three different variances of \(\sigma = [0, 16.6, 50] pix\). For the clustering, we used two different types, the spread and in line clustering as shown in Fig. 1. And, lastly, we chose two different objects to be rendered, the Happy Buddha object from the “Stanford 3D Scanning Repository”^{Footnote 1}, as an object that is taller but does not occupy all the space of the pose estimated plane object; and a red grid that perfectly fits around the plane object, as shown in Fig. 6.

In Table 1, we show the user test stimuli used in the two user tests. It is worth pointing out that our objective here is to evaluate system errors and characteristics, not effects caused by environment setting. Therefore, screen, lighting conditions and other physical environment are not chosen as stimuli.

Table 1. Full stimuli list used in the two experiments.

Full size table

4 User Test Setup

The experiments were conducted in controlled physical environments, using common video monitors and an automated system to collect users opinions. A simple yes-no psychophysical task was used to compare two images. These two images were presented side by side on one screen, being an image to represent the ground truth and other with noise or equal to ground truth. The comparison of the ground truth with itself allows to avoid response bias, ensuring a balance between the two answers. The pairs of images were shown in random order (70 pairs in the first experiment and 114 in the second experiment).

To each user was asked to select the option “Equal” or the option “Different”, clicking on one of the buttons located at the bottom of the screen after observing the two images. A time for response was not specified; however, users should answer quickly. Figure 4 presents an example of the virtual environment used in the tests with the screen and the two images, as well as the buttons to register the user opinion. In each experiment, 17 subjects participated, the major part composed of students and professors.

In order to test statistically significant differences in answers to different scenarios, we chose one-way ANOVA (Analysis of Variance) for first experiment, aiming to analyze the effects of two variables separately (DEPTH and POS_2D); and four-way ANOVA for the second experiment, which allows analyzing the effect of one of the four variables (DEPTH, POS_2D, CLUSTER and OBJ), independently of the other variables (called Main Effects); as well as the effect of one variable when there is dependence on the level or levels of the other variables (one or more variables, with analysis of two, three and four variables), called Interaction Effects. After finding the effects (main and interaction), Tukey post-hoc test was selected to compare pairs of means and to determine the causes of these effects in both experiments.

5 Results

Figure 5 shows the results from the first user experiment. Here, the mean values of the percentage of times in which users perceived the errors inserted in the images are shown, as well as the ground truth and the computed pose errors. These data were processed with one-way ANOVA and Tukey post-hoc test, where null hypotheses were false in both cases, depth (\(F = 30.97\), \(p = 0.0224e^{-30}\)) and 2D (\(F = 26.75\), \(p = 0.0311e^{-26}\)).

From these graphs, it is possible to observe how the decrease of registration perception is different from the absolute computed error increase. The depth errors values are perceived almost as an inverse exponential function, from graph (Fig. 5(a)), where the computational error increases almost linearly. We could deduct that depth map noises are easily perceived by the user in this setup, once it creates rotation errors at the object pose estimation.

Two-dimensional position errors are also perceived differently from the computed errors (Fig. 5(c)). However, differently from depth map errors, the position noise decreases more slowly and the computed absolute error increases almost exponentially. This could be due to the nature of the pose error returned by this noise. Since only the position is varied and the depth map is clean in this test, the resulting error appears as a slide of the virtual object over the plane object, as seen in Fig. 5. With this, small errors are weakly perceived by the user.

Results of the second test, in which four different variables were chosen as stimuli, were processed with four-way ANOVA and Tukey post-hoc test. The full table of rejected null hypothesis, with their F and p-values, and returned post-hoc analysis are listed in Table 2, and part of these significant results are illustrated by graphs in Fig. 6.

Table 2. Significant results from experiment 2.

Full size table

One of the significant effects found is that the rendered object is one factor that changes user perception, once the null hypothesis of OBJ was rejected (\(F = 211.17\), \(p \approx 0.0\)) (Fig. 6(g)). This result shows that the virtual object to be included in the real world significantly affects registration perception. This can be explained by comparing the images in Fig. 3. Once the grid object is more aligned with the plane object to be pose estimated, the user has more clues to correlate both objects poses, making it easier for the user to find errors which the registration. This effect is clearer specially in POS_2D errors as shown in Fig. 6(f), where the difference in perception across errors decays with grid OBJ.

One change observed from the first experiment was the perceptions related with POS_2D error. In Experiment 1, POS_2D errors are less perceived by users; however, in Experiment 2, POS_2D perception has a strong decrease with error increase (Fig. 5(b)). This can be explained by the point that POS_2D error perception could be strongly increased when combined with other types of errors, as in Figs. 6(f) and 6(h). Figure 6(h) allows observing that DEPTH errors are strongly affected by the POS_2D errors, meaning that combinations of both errors can generate worse perceptions.

Finally, one interesting issue is that CLUSTER related errors did not strongly affect user perception, although the null hypothesis was rejected. In Fig. 6(c), it is possible to observe that the change in perception is not strong with our clustering types, and from Fig. 6(d), the CLUSTER type was significant only for middle values of depth errors. Even the null hypothesis of the interaction effect of POS_2D and CLUSTER was not rejected, showing that clustering does not play a strong role in registration perception, with the level of clustering we applied to this work (illustrated in Fig. 1).

6 Conclusion

We discussed how system errors in the registration method applied to AR systems based on RGB-D cameras can affect users perception. Although, this registration process is one of the central aspects in AR systems, little about its perception has been studied. Based on yes-no psychophysical tests and ANOVA analysis, we spotted several traces of perception in AR, such as the role of relation between rendered object and real object, which directly affect perception.

Also, we address how some system errors appear in the object pose and how this affects the user perception. Effects such as depth map errors alone are more perceived than feature points position errors; however, when combined with other errors, position errors cause a huge decrease in perception. Another issue is that user perception is not identically related with the absolute computed error, having sudden decrease with even small changes in some errors.

Although our tests cover the major system errors in the RGB-D registration method, more tests need to be applied in future works, in order to fully explore registration related perception issues in AR. For example, how and why this real-virtual objects relationship affects perception, or effects of the test environment, such as screen resolution and illumination issues.

Notes

1.
http://graphics.stanford.edu/data/3Dscanrep/.

References

Santos, M., Chen, A., Taketomi, T., Yamamoto, G., Miyazaki, J., Kato, H.: Augmented reality learning experiences: survey of prototype design and evaluation. IEEE Trans. Learn. Technol. 7(1), 38–56 (2014)
Article Google Scholar
Lee, K.: Augmented reality in education and training. TechTrends 56(2), 13–21 (2012)
Article Google Scholar
Sielhorst, T., Feuerstein, M., Navab, N.: Advanced medical displays: a literature review of augmented reality. J. Disp. Technol. 4(4), 451–467 (2008)
Article Google Scholar
Ong, S.K., Yuan, M.L., Nee, A.Y.C.: Augmented reality applications in manufacturing: a survey. Int. J. Prod. Res. 46, 2707–2742 (2008)
Article Google Scholar
Thomas, B.H.: A survey of visual, mixed, and augmented reality gaming. Comput. Entertainment 10(3), 3:1–3:33 (2012)
Article Google Scholar
Azuma, R.T.: A survey of augmented reality. Presence Teleoperators Virtual Environ. 6(4), 355–385 (1997)
Google Scholar
Gkioulekas, I., Xiao, B., Zhao, S., Adelson, E.H., Zickler, T., Bala, K.: Understanding the role of phase function in translucent appearance. ACM Trans. Graph. 32(5), 147:1–147:19 (2013)
Article Google Scholar
Jarabo, A., Eyck, T.V., Sundstedt, V., Bala, K., Gutierrez, D., O’Sullivan, C.: Crowd light: evaluating the perceived fidelity of illuminated dynamic scenes. Comput. Graph. Forum 31(2), 565–574 (2012)
Article Google Scholar
McDonnell, R., Breidt, M., Bülthoff, H.H.: Render me real?: investigating the effect of render style on the perception of animated virtual humans. ACM Trans. Graph. 31(4), 91:1–91:11 (2012)
Article Google Scholar
Křivánek, J., Ferwerda, J.A., Bala, K.: Effects of global illumination approximations on material appearance. ACM Trans. Graph. 29(4), 112:1–112:10 (2010)
Google Scholar
Sanches, S., Silva, V., Nakamura, R., Tori, R.: Objective assessment of video segmentation quality for augmented reality. In: IEEE International Conference on Multimedia and Expo., pp. 1–6, July 2013
Google Scholar
Steinbach, E., Hirche, S., Ernst, M., Brandi, F., Chaudhari, R., Kammerl, J., Vittorias, I.: Haptic communications. Proc. IEEE 100(4), 937–956 (2012)
Article Google Scholar
Chaudhari, R., Steinbach, E., Hirche, S.: Towards an objective quality evaluation framework for haptic data reduction. In: IEEE World Haptics Conference, pp. 539–544 (2011)
Google Scholar
Sakr, N., Georganas, N., Zhao, J.: A perceptual quality metric for haptic signals. In: Proceedings of the IEEE International Workshop on Haptic, Audio and Visual Environments and Games, pp. 27–32, Ottawa, Ontario, Canada (2007)
Google Scholar
Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: using kinect-style depth cameras for dense 3D modeling of indoor environments. Int. J. Robot. Res. 31(5), 647–663 (2012)
Article Google Scholar
Gelasca, E.D., Ebrahimi, T.: On evaluating video object segmentation quality: a perceptually driven objective metric. IEEE J. Sel. Top. Sig. Proc. 3(2), 319–335 (2009)
Article Google Scholar
MacDorman, K.F., Green, R.D., Ho, C.C., Koch, C.T.: Too real for comfort? uncanny responses to computer generated faces. Comput. Hum. Behav. 25(3), 695–710 (2009). Including the Special Issue: Enabling elderly users to create
Article Google Scholar
Lima, J., Uchiyama, H., Teichrieb, V., Marchand, E.: Texture-less planar object detection and pose estimation using depth-assisted rectification of contours. In: 2012 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2012, pp. 297–298, Novmber 2012
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157 (1999)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(N) solution to the PnP problem. Int. J. Comput. Vision 81(2), 155–166 (2009)
Article Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the National Council for the Improvement of Higher Education Personnel (CAPES) for scholarships, and the Engineering Technology Development Foundation (FDTE) for funding of the project.

Author information

Authors and Affiliations

Escola Politécnica da Universidade de São Paulo, Sao Paulo, Brazil
Daniel M. Tokunaga, Cléber G. Corrêa & Romero Tori
School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Brazil
Fernanda M. Bernardo, João Bernardes & Fátima L. S. Nunes
Pontifícia Universidade Católica de São Paulo, Sao Paulo, Brazil
Edith Ranzini

Authors

Daniel M. Tokunaga
View author publications
You can also search for this author in PubMed Google Scholar
Cléber G. Corrêa
View author publications
You can also search for this author in PubMed Google Scholar
Fernanda M. Bernardo
View author publications
You can also search for this author in PubMed Google Scholar
João Bernardes
View author publications
You can also search for this author in PubMed Google Scholar
Edith Ranzini
View author publications
You can also search for this author in PubMed Google Scholar
Fátima L. S. Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Romero Tori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cléber G. Corrêa .

Editor information

Editors and Affiliations

University of Central Florida, Orlando, Florida, USA
Randall Shumaker
University of Central Florida, Orlando, Florida, USA
Stephanie Lackey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tokunaga, D.M. et al. (2015). Registration System Errors Perception in Augmented Reality Based on RGB-D Cameras. In: Shumaker, R., Lackey, S. (eds) Virtual, Augmented and Mixed Reality. VAMR 2015. Lecture Notes in Computer Science(), vol 9179. Springer, Cham. https://doi.org/10.1007/978-3-319-21067-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-21067-4_14
Published: 21 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21066-7
Online ISBN: 978-3-319-21067-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics