Skip to main content
Log in

Efficient vision-based multi-target augmented reality in the browser

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Augmented Reality (AR) has gained rising attention from both industry and academia as it enhances the way we interact with the physical world. Compared with native AR apps, implementing AR with web technologies (Web AR) can provide lightweight and universal cross-platform deployment that does not involve extra downloading and installation in advance. However, there are some challenges when developing Web AR apps, such as computational efficiency and networking. The limited capabilities of the browser, especially on mobile devices, make it more challenging to develop efficient web apps. Fortunately, several technical advances have emerged that could change the status of Web AR. This paper presents an efficient implementation of a vision-based and multi-target Web AR app that runs at real-time frame rates on standard web browsers on mobile devices and PCs. A method based on natural features tracking (NFT) is used, and several new web technologies are optimized to achieve specific tasks. The proposed implementation takes advantage of an efficient and lightweight class of convolutional neural networks (CNN) to classify image targets. It uses an image registration method that eliminates the need for a database of the feature points’ descriptors, which is usually used in natural feature tracking methods. Computation-intensive tasks, such as target extraction and pose estimation, were computed with separate threads. Thus, the main thread which handles the HTML rendering runs smoothly and is not blocked by these computation-intensive tasks. To evaluate the performance of the proposed architecture and validate its performance, a prototype app was developed. The findings demonstrate that the app can track multiple image targets with real-time frame rates and stable interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://github.com/google-ar.

  2. https://artoolkit.org/.

  3. https://immersive-web.github.io/webxr/.

  4. https://github.com/alzoube/DeepKnn.

  5. https://nghiaho.com/?page_id=576.

  6. https://github.com/alzoube/jsRPP.

  7. https://github.com/inspirit/jsfeat.

  8. https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas.

References

  1. Abadi M, Agarwal A, Barham P et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467

  2. Abriata LA (2018) Towards commodity, web-based augmented reality applications for research and education in chemistry and structural biology. arXiv preprint arXiv:1806.08332

  3. Acuna R, Willert V (2018) Insights into the robustness of control point configurations for homography and planar pose estimation. arXiv preprint arXiv:1803.03025

  4. Akgul O, Penekli H, Genc Y (2016) Applying deep learning in augmented reality tracking. In: 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, pp 47-54

  5. Al-Zoube MA (2017)Web-Based augmented reality with natural feature tracking and advanced rendering. In: 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, pp 320-326

  6. Belghit H, Bellarbi A, Zenati N, Otmane S (2018)Vision-based pose estimation for augmented reality: a comparison study. arXiv preprint arXiv:1806.09316

  7. Bonenberger Yannic R, Jason P, Alain, Didier S (2018) Universal web-based tracking for augmented reality applications. In: International Conference on Virtual Reality and Augmented Reality. Springer, Cham, pp 18-27

  8. Bouguet JY (2001) Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5(1–10):4

  9. Danchilla B (2012) Three.js framework. Beginning WebGL for HTML5. Springer, Berlin, pp 173–203

  10. Etienne J (2017) AR.js Project Homepage. https://github.com/jeromeetienne/AR.js. Accessed 21 Feb 2022

  11. Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn 47(6):2280–2292

    Article  Google Scholar 

  12. Garro V, Crosilla F, Fusiello A (2012) Solving the pnp problem with anisotropic orthogonal procrustes analysis. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, pp 262-269

  13. Göttl F, Gagel P, Grubert J (2018) Efficient pose tracking from natural features in standard web browsers. In: Proceedings of the 23rd International ACM Conference on 3D Web Technology, pp 1-4

  14. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  15. Jensen P, Jibaja I, Hu N, Gohman D, McCutchan J (2015) SIMD in Javascript via C++ and Emscripten. In: Workshop on Programming Models for SIMD/Vector Processing

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In :Advances in neural information processing systems, pp 1097-1105

  17. Lalonde JF (2018), July Deep learning for augmented reality. In: 2018 17th Workshop on Information Optics (WIO). IEEE, pp 1-3

  18. Lepetit V, Moreno-Noguer F, Fua P (2009) Epnp: An accurate o (n) solution to the pnp problem. Int J Comput Vis 81(2):155

    Article  Google Scholar 

  19. Leutenegger S, Chli M, Siegwart RY (2011) BRISK: Binary robust invariant scalable keypoints. In: 2011 International conference on computer vision, pp 2548-2555

  20. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  21. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI 121–130

  22. Marchand E, Uchiyama H, Spindler F (2015) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651

    Article  Google Scholar 

  23. Møller A (2018) Technical perspective: WebAssembly: A quiet revolution of the Web. Commun ACM 61(12):106

    Article  Google Scholar 

  24. Oberkampf D, DeMenthon DF, Davis LS (1996) Iterative pose estimation using coplanar feature points. Comput Vis Image Underst 63(3):495–511

    Article  Google Scholar 

  25. Petrović N (2020) Augmented and virtual reality web applications for music stage performance. In: 2020 55th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST). IEEE, pp 33-36

  26. Qiao X, Ren P, Dustdar S, Liu L, Ma H, Chen J (2019) Web AR: A promising future for mobile augmented reality—State of the art, challenges, and insights. Proc IEEE 107(4):651-666

  27. Rao J, Qiao Y, Ren F, Wang J, Du Q (2017) A mobile outdoor augmented reality method combining deep learning object detection and spatial relationships for geovisualization. Sensors 17(9):1951

  28. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computer vision. Springer, Berlin, Heidelberg, pp 430-443

  29. Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: An efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision, pp 2564-2571

  30. Schweighofer G, Pinz A (2006) Robust pose estimation from a planar target. IEEE Trans Pattern Anal Mach Intell 28(12):2024–2030

    Article  Google Scholar 

  31. Smilkov D et al (2019) Tensorflow.js: Machine learning for the web and beyond. ArXiv, abs/1901.05350

  32. Timchenko R, Grechnyev O, Skuratovskyi S, Chyrka Y, Gorovyi I (2020) Augmented reality in web: results and challenges. In: 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP). IEEE, pp 211-216

  33. Yi KM, Trulls E, Lepetit V, Fua P (2016) Lift: Learned invariant feature transform. In: European Conference on Computer Vision. Springer, Cham, pp 467-483

  34. Zakai A (2011), October Emscripten: an LLVM-to-JavaScript compiler. In: Proceedings of the ACM international conference companion on object oriented programming systems languages and applications companion, pp 301-312

  35. Zhang J, Lalonde JF (2017) Learning high dynamic range from outdoor panoramas. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4519-4528

  36. Zhang Y, Lu Y (2019) Application advantages and prospects of web-based AR Technology in publishing. In: International Conference on Augmented Reality, Virtual Reality and Computer Graphics. Springer, Cham, pp 13-22

  37. Zhou B, Guven S, Tao S, Ye F (2018)Pose-assisted active visual recognition in mobile augmented reality. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp 756-758

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed A. Al-Zoube.

Ethics declarations

Conflict of interest

The author has no conflict of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Zoube, M.A. Efficient vision-based multi-target augmented reality in the browser. Multimed Tools Appl 81, 14303–14320 (2022). https://doi.org/10.1007/s11042-022-12206-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12206-6

Keywords

Navigation