Abstract
Augmented Reality (AR) has gained rising attention from both industry and academia as it enhances the way we interact with the physical world. Compared with native AR apps, implementing AR with web technologies (Web AR) can provide lightweight and universal cross-platform deployment that does not involve extra downloading and installation in advance. However, there are some challenges when developing Web AR apps, such as computational efficiency and networking. The limited capabilities of the browser, especially on mobile devices, make it more challenging to develop efficient web apps. Fortunately, several technical advances have emerged that could change the status of Web AR. This paper presents an efficient implementation of a vision-based and multi-target Web AR app that runs at real-time frame rates on standard web browsers on mobile devices and PCs. A method based on natural features tracking (NFT) is used, and several new web technologies are optimized to achieve specific tasks. The proposed implementation takes advantage of an efficient and lightweight class of convolutional neural networks (CNN) to classify image targets. It uses an image registration method that eliminates the need for a database of the feature points’ descriptors, which is usually used in natural feature tracking methods. Computation-intensive tasks, such as target extraction and pose estimation, were computed with separate threads. Thus, the main thread which handles the HTML rendering runs smoothly and is not blocked by these computation-intensive tasks. To evaluate the performance of the proposed architecture and validate its performance, a prototype app was developed. The findings demonstrate that the app can track multiple image targets with real-time frame rates and stable interaction.
Similar content being viewed by others
References
Abadi M, Agarwal A, Barham P et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Abriata LA (2018) Towards commodity, web-based augmented reality applications for research and education in chemistry and structural biology. arXiv preprint arXiv:1806.08332
Acuna R, Willert V (2018) Insights into the robustness of control point configurations for homography and planar pose estimation. arXiv preprint arXiv:1803.03025
Akgul O, Penekli H, Genc Y (2016) Applying deep learning in augmented reality tracking. In: 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, pp 47-54
Al-Zoube MA (2017)Web-Based augmented reality with natural feature tracking and advanced rendering. In: 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, pp 320-326
Belghit H, Bellarbi A, Zenati N, Otmane S (2018)Vision-based pose estimation for augmented reality: a comparison study. arXiv preprint arXiv:1806.09316
Bonenberger Yannic R, Jason P, Alain, Didier S (2018) Universal web-based tracking for augmented reality applications. In: International Conference on Virtual Reality and Augmented Reality. Springer, Cham, pp 18-27
Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5(1–10):4
Danchilla B (2012) Three.js framework. Beginning WebGL for HTML5. Springer, Berlin, pp 173–203
Etienne J (2017) AR.js Project Homepage. https://github.com/jeromeetienne/AR.js. Accessed 21 Feb 2022
Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn 47(6):2280–2292
Garro V, Crosilla F, Fusiello A (2012) Solving the pnp problem with anisotropic orthogonal procrustes analysis. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, pp 262-269
Göttl F, Gagel P, Grubert J (2018) Efficient pose tracking from natural features in standard web browsers. In: Proceedings of the 23rd International ACM Conference on 3D Web Technology, pp 1-4
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Jensen P, Jibaja I, Hu N, Gohman D, McCutchan J (2015) SIMD in Javascript via C++ and Emscripten. In: Workshop on Programming Models for SIMD/Vector Processing
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In :Advances in neural information processing systems, pp 1097-1105
Lalonde JF (2018), July Deep learning for augmented reality. In: 2018 17th Workshop on Information Optics (WIO). IEEE, pp 1-3
Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: An accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: Binary robust invariant scalable keypoints. In: 2011 International conference on computer vision, pp 2548-2555
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI 121–130
Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651
Møller A (2018) Technical perspective: WebAssembly: A quiet revolution of the Web. Commun ACM 61(12):106
Oberkampf D, DeMenthon DF, Davis LS (1996) Iterative pose estimation using coplanar feature points. Comput Vis Image Underst 63(3):495–511
Petrović N (2020) Augmented and virtual reality web applications for music stage performance. In: 2020 55th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST). IEEE, pp 33-36
Qiao X, Ren P, Dustdar S, Liu L, Ma H, Chen J (2019) Web AR: A promising future for mobile augmented reality—State of the art, challenges, and insights. Proc IEEE 107(4):651-666
Rao J, Qiao Y, Ren F, Wang J, Du Q (2017) A mobile outdoor augmented reality method combining deep learning object detection and spatial relationships for geovisualization. Sensors 17(9):1951
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computer vision. Springer, Berlin, Heidelberg, pp 430-443
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564-2571
Schweighofer G, Pinz A (2006) Robust pose estimation from a planar target. IEEE Trans Pattern Anal Mach Intell 28(12):2024–2030
Smilkov D et al (2019) Tensorflow.js: Machine learning for the web and beyond. ArXiv, abs/1901.05350
Timchenko R, Grechnyev O, Skuratovskyi S, Chyrka Y, Gorovyi I (2020) Augmented reality in web: results and challenges. In: 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP). IEEE, pp 211-216
Yi KM, Trulls E, Lepetit V, Fua P (2016) Lift: Learned invariant feature transform. In: European Conference on Computer Vision. Springer, Cham, pp 467-483
Zakai A (2011), October Emscripten: an LLVM-to-JavaScript compiler. In: Proceedings of the ACM international conference companion on object oriented programming systems languages and applications companion, pp 301-312
Zhang J, Lalonde JF (2017) Learning high dynamic range from outdoor panoramas. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4519-4528
Zhang Y, Lu Y (2019) Application advantages and prospects of web-based AR Technology in publishing. In: International Conference on Augmented Reality, Virtual Reality and Computer Graphics. Springer, Cham, pp 13-22
Zhou B, Guven S, Tao S, Ye F (2018)Pose-assisted active visual recognition in mobile augmented reality. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp 756-758
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflict of interest to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Zoube, M.A. Efficient vision-based multi-target augmented reality in the browser. Multimed Tools Appl 81, 14303–14320 (2022). https://doi.org/10.1007/s11042-022-12206-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12206-6