Efficient vision-based multi-target augmented reality in the browser

Al-Zoube, Mohammed A.

doi:10.1007/s11042-022-12206-6

Efficient vision-based multi-target augmented reality in the browser

Published: 25 February 2022

Volume 81, pages 14303–14320, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohammed A. Al-Zoube ORCID: orcid.org/0000-0003-0476-465X¹

323 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Augmented Reality (AR) has gained rising attention from both industry and academia as it enhances the way we interact with the physical world. Compared with native AR apps, implementing AR with web technologies (Web AR) can provide lightweight and universal cross-platform deployment that does not involve extra downloading and installation in advance. However, there are some challenges when developing Web AR apps, such as computational efficiency and networking. The limited capabilities of the browser, especially on mobile devices, make it more challenging to develop efficient web apps. Fortunately, several technical advances have emerged that could change the status of Web AR. This paper presents an efficient implementation of a vision-based and multi-target Web AR app that runs at real-time frame rates on standard web browsers on mobile devices and PCs. A method based on natural features tracking (NFT) is used, and several new web technologies are optimized to achieve specific tasks. The proposed implementation takes advantage of an efficient and lightweight class of convolutional neural networks (CNN) to classify image targets. It uses an image registration method that eliminates the need for a database of the feature points’ descriptors, which is usually used in natural feature tracking methods. Computation-intensive tasks, such as target extraction and pose estimation, were computed with separate threads. Thus, the main thread which handles the HTML rendering runs smoothly and is not blocked by these computation-intensive tasks. To evaluate the performance of the proposed architecture and validate its performance, a prototype app was developed. The findings demonstrate that the app can track multiple image targets with real-time frame rates and stable interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Cloud Based Augmented Reality Framework - Enabling User-Centered Interactive Systems Development

Universal Web-Based Tracking for Augmented Reality Applications

Latest Research Trends and Challenges of Computational Intelligence Using Artificial Intelligence and Augmented Reality

Notes

References

Abadi M, Agarwal A, Barham P et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Abriata LA (2018) Towards commodity, web-based augmented reality applications for research and education in chemistry and structural biology. arXiv preprint arXiv:1806.08332
Acuna R, Willert V (2018) Insights into the robustness of control point configurations for homography and planar pose estimation. arXiv preprint arXiv:1803.03025
Akgul O, Penekli H, Genc Y (2016) Applying deep learning in augmented reality tracking. In: 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, pp 47-54
Al-Zoube MA (2017)Web-Based augmented reality with natural feature tracking and advanced rendering. In: 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, pp 320-326
Belghit H, Bellarbi A, Zenati N, Otmane S (2018)Vision-based pose estimation for augmented reality: a comparison study. arXiv preprint arXiv:1806.09316
Bonenberger Yannic R, Jason P, Alain, Didier S (2018) Universal web-based tracking for augmented reality applications. In: International Conference on Virtual Reality and Augmented Reality. Springer, Cham, pp 18-27
Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5(1–10):4
Danchilla B (2012) Three.js framework. Beginning WebGL for HTML5. Springer, Berlin, pp 173–203
Etienne J (2017) AR.js Project Homepage. https://github.com/jeromeetienne/AR.js. Accessed 21 Feb 2022
Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn 47(6):2280–2292
Article Google Scholar
Garro V, Crosilla F, Fusiello A (2012) Solving the pnp problem with anisotropic orthogonal procrustes analysis. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, pp 262-269
Göttl F, Gagel P, Grubert J (2018) Efficient pose tracking from natural features in standard web browsers. In: Proceedings of the 23rd International ACM Conference on 3D Web Technology, pp 1-4
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Jensen P, Jibaja I, Hu N, Gohman D, McCutchan J (2015) SIMD in Javascript via C++ and Emscripten. In: Workshop on Programming Models for SIMD/Vector Processing
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In :Advances in neural information processing systems, pp 1097-1105
Lalonde JF (2018), July Deep learning for augmented reality. In: 2018 17th Workshop on Information Optics (WIO). IEEE, pp 1-3
Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: An accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155
Article Google Scholar
Leutenegger S, Chli M, Siegwart RY (2011) BRISK: Binary robust invariant scalable keypoints. In: 2011 International conference on computer vision, pp 2548-2555
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI 121–130
Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651
Article Google Scholar
Møller A (2018) Technical perspective: WebAssembly: A quiet revolution of the Web. Commun ACM 61(12):106
Article Google Scholar
Oberkampf D, DeMenthon DF, Davis LS (1996) Iterative pose estimation using coplanar feature points. Comput Vis Image Underst 63(3):495–511
Article Google Scholar
Petrović N (2020) Augmented and virtual reality web applications for music stage performance. In: 2020 55th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST). IEEE, pp 33-36
Qiao X, Ren P, Dustdar S, Liu L, Ma H, Chen J (2019) Web AR: A promising future for mobile augmented reality—State of the art, challenges, and insights. Proc IEEE 107(4):651-666
Rao J, Qiao Y, Ren F, Wang J, Du Q (2017) A mobile outdoor augmented reality method combining deep learning object detection and spatial relationships for geovisualization. Sensors 17(9):1951
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computer vision. Springer, Berlin, Heidelberg, pp 430-443
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564-2571
Schweighofer G, Pinz A (2006) Robust pose estimation from a planar target. IEEE Trans Pattern Anal Mach Intell 28(12):2024–2030
Article Google Scholar
Smilkov D et al (2019) Tensorflow.js: Machine learning for the web and beyond. ArXiv, abs/1901.05350
Timchenko R, Grechnyev O, Skuratovskyi S, Chyrka Y, Gorovyi I (2020) Augmented reality in web: results and challenges. In: 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP). IEEE, pp 211-216
Yi KM, Trulls E, Lepetit V, Fua P (2016) Lift: Learned invariant feature transform. In: European Conference on Computer Vision. Springer, Cham, pp 467-483
Zakai A (2011), October Emscripten: an LLVM-to-JavaScript compiler. In: Proceedings of the ACM international conference companion on object oriented programming systems languages and applications companion, pp 301-312
Zhang J, Lalonde JF (2017) Learning high dynamic range from outdoor panoramas. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4519-4528
Zhang Y, Lu Y (2019) Application advantages and prospects of web-based AR Technology in publishing. In: International Conference on Augmented Reality, Virtual Reality and Computer Graphics. Springer, Cham, pp 13-22
Zhou B, Guven S, Tao S, Ye F (2018)Pose-assisted active visual recognition in mobile augmented reality. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp 756-758

Download references

Author information

Authors and Affiliations

Department of Computer Graphics, Princess Sumaya University for Technology (PSUT), P.O. Box 1438, Al-Jubaiha, Amman, 11941, Jordan
Mohammed A. Al-Zoube

Authors

Mohammed A. Al-Zoube
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed A. Al-Zoube.

Ethics declarations

Conflict of interest

The author has no conflict of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Zoube, M.A. Efficient vision-based multi-target augmented reality in the browser. Multimed Tools Appl 81, 14303–14320 (2022). https://doi.org/10.1007/s11042-022-12206-6

Download citation

Received: 15 March 2021
Revised: 28 May 2021
Accepted: 10 January 2022
Published: 25 February 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11042-022-12206-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient vision-based multi-target augmented reality in the browser

Abstract

Access this article

Similar content being viewed by others

A Cloud Based Augmented Reality Framework - Enabling User-Centered Interactive Systems Development

Universal Web-Based Tracking for Augmented Reality Applications

Latest Research Trends and Challenges of Computational Intelligence Using Artificial Intelligence and Augmented Reality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient vision-based multi-target augmented reality in the browser

Abstract

Access this article

Similar content being viewed by others

A Cloud Based Augmented Reality Framework - Enabling User-Centered Interactive Systems Development

Universal Web-Based Tracking for Augmented Reality Applications

Latest Research Trends and Challenges of Computational Intelligence Using Artificial Intelligence and Augmented Reality

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation