Abstract
Mobile Augmented Reality (AR), which overlays digital content on the real-world scenes surrounding a user, is bringing immersive interactive experiences where the real and virtual worlds are tightly coupled. To enable seamless and precise AR experiences, an image recognition system that can accurately recognize the object in the camera view with low system latency is required. However, due to the pervasiveness and severity of image distortions, an effective and robust image recognition solution for “in the wild” mobile AR is still elusive. In this article, we present CollabAR, an edge-assisted system that provides distortion-tolerant image recognition for mobile AR with imperceptible system latency. CollabAR incorporates both distortion-tolerant and collaborative image recognition modules in its design. The former enables distortion-adaptive image recognition to improve the robustness against image distortions, while the latter exploits the spatial-temporal correlation among mobile AR users to improve recognition accuracy. Moreover, as it is difficult to collect a large-scale image distortion dataset, we propose a Cycle-Consistent Generative Adversarial Network-based data augmentation method to synthesize realistic image distortion. Our evaluation demonstrates that CollabAR achieves over 85% recognition accuracy for “in the wild” images with severe distortions, while reducing the end-to-end system latency to as low as 18.2 ms.
- Z. Liu, G. Lan, J. Stojkovic, Y. Zhang, C. Joe-Wong, and M. Gorlatova. 2020. CollabAR: Edge-assisted collaborative image recognition for mobile augmented reality. In Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’20), 301–312.Google Scholar
- P. Jain, J. Manweiler, and R. Roy Choudhury. 2015. Overlay: Practical mobile augmented reality. In Proceedings of the ACM Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’15), 331–344. Google ScholarDigital Library
- K. Chen, T. Li, H.-S. Kim, D. E. Culler, and R. H. Katz. 2018. MARVEL: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the ACM Conference on Embedded Networked Sensor Systems (SenSys’18), 292–304. Google ScholarDigital Library
- A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12), Vol. 25, 1097–1105. Google ScholarDigital Library
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4510–4520.Google Scholar
- L. Liu, H. Li, and M. Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’19), 1–16. Google ScholarDigital Library
- M. Xu, M. Zhu, Y. Liu, F. X. Lin, and X. Liu. 2018. Deepcache: Principled cache for mobile deep vision. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’18).129–144. Google ScholarDigital Library
- H. Ji and C. Liu. 2008. Motion blur identification from image gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). 1–8.Google Scholar
- i-MARECULTURE. https://imareculture.eu. [Accessed Dec. 1, 2020].Google Scholar
- S. F. Dodge and L. J. Karam. 2018. Quality robust mixtures of deep neural networks. IEEE Transactions on Image Processing 27, 11 (2018), 5553–5562. Google ScholarDigital Library
- W. Liu and W. Lin. 2012. Additive white gaussian noise level estimation in SVD domain for images. IEEE Transactions on Image Processing 22, 3 (2012) 872–883. Google ScholarDigital Library
- W. Zhang, B. Han, P. Hui, V. Gopalakrishnan, E. Zavesky, and F. Qian. 2018. CARS: Collaborative augmented reality for socialization. In Proceedings of the ACM International Workshop on Mobile Computing Systems & Applications (HotMobile’18). 25–30. Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). 248–255.Google Scholar
- G. Griffin, A. Holub, and P. Perona. 2007. Caltech-256 object category dataset.Google Scholar
- S. Ghosh, R. Shet, P. Amon, A. Hutter, and A. Kaup. 2018. Robustness of deep convolutional neural networks for image degradations. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18), 2916–2920.Google Scholar
- T. S. Borkar and L. J. Karam. 2019. Deepcorrect: Correcting DNN models against image distortions. IEEE Transactions on Image Processing 28, 12 (2019), 6022–6034.Google ScholarCross Ref
- K. Saenko, B. Kulis, M. Fritz, and T. Darrell. 2010. Adapting visual category models to new domains. In Proceedings of European Conference on Computer Vision (ECCV’10). 213–226. Google ScholarDigital Library
- J. Sun, W. Cao, Z. Xu, and J. Ponce. 2015. Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 769–777.Google Scholar
- J. Flusser, S. Farokhi, C. Höschl, T. Suk, B. Zitová, and M. Pedone. 2015. Recognition of images degraded by gaussian blur. IEEE Transactions on Image Processing 252 (2015), 790–806.Google Scholar
- W. Zhang, B. Han, and P. Hui. 2018. Jaguar: Low latency mobile augmented reality with flexible tracking. In Proceedings of the ACM International Conference on Multimedia (MM’18). 355–363. Google ScholarDigital Library
- S. Shen, Y. Han, X. Wang, and Y. Wang. 2019. Computation offloading with multiple agents in edge-computing-supported IoT. ACM Transactions on Sensor Networks 16, 1 (2019), 1–27. Google ScholarDigital Library
- P. Guo, B. Hu, R. Li, and W. Hu. 2018. Foggycache: Cross-device approximate computation reuse. In Proceedings of the ACM Annual International Conference on Mobile Computing and Networking (MobiCom’18). 19–34. Google ScholarDigital Library
- HoloLens. https://www.microsoft.com/en-us/hololens. [Accessed May 20, 2021].Google Scholar
- Magic leap. https://www.magicleap.com/. [Accessed May 20, 2021].Google Scholar
- Google ARCore. https://developers.google.com/ar/. [Accessed May 20, 2021].Google Scholar
- Apple ARKit. https://developer.apple.com/documentation/arkit. [Accessed Dec. 1, 2020].Google Scholar
- X. Ran, C. Slocum, M. Gorlatova, and J. Chen. 2019. ShareAR: Communication-efficient multi-user mobile augmented reality. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets’19). 109–116. Google ScholarDigital Library
- K. Lebeck, K. Ruth, T. Kohno, and F. Roesner. 2018. Towards security and privacy for multi-user augmented reality: Foundations with end users. In Proceedings of IEEE Symposium on Security and Privacy (S&P’18). 392–408.Google Scholar
- T. Li, N. S. Nguyen, X. Zhang, T. Wang, and B. Sheng. 2020. PROMAR: Practical reference object-based multi-user augmented reality. In Proceedings of IEEE Conference on Computer Communications (INFOCOM’20). 1359–1368.Google Scholar
- K. Apicharttrisorn, B. Balasubramanian, J. Chen, R. Sivaraj, Y.-Z. Tsai, R. Jana, S. Krishnamurthy, T. Tran, and Y. Zhou. 2020. Characterization of multi-user augmented reality over cellular networks. In Proceedings of the Annual IEEE International Conference on Sensing, Communication, and Networking (SECON’20). 1–9.Google Scholar
- Share AR experience with your buddy. https://niantic.helpshift.com/a/pokemon-go/?s=buddy-pokemon&f=shared-ar-experience-with-your-buddy&p=web. [Accessed May 20, 2021].Google Scholar
- S. Stein. 2020. Snapchat’s augmented reality lenses can span whole city blocks. https://www.cnet.com/news/ snapchats-augmented-reality-lenses-can-span-whole-city-blocks/. [Accessed May 11, 2021].Google Scholar
- J. Holland. 2017. Can holograms shape the future of car design? https://medium.com/ford/can-holograms-shape-the-future-of-car-design-dda4fcc4f22b. [Accessed May 11, 2021].Google Scholar
- I. Maw. 2019. How Lockheed Martin is using augmented reality in aerospace manufacturing. https://www.engineering.com/story/how-lockheed-martin-is-using-augmented-reality-in-aerospace-manufacturing. [Accessed May 11, 2021].Google Scholar
- H. Verkasalo. 2009. Contextual patterns in mobile service usage. Personal and Ubiquitous Computing 13, 5 (2009), 331–342. Google ScholarDigital Library
- Y. Li and W. Gao. 2019. DeltaVR: Achieving high-performance mobile VR dynamics through pixel reuse. In Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’19). 13–24. Google ScholarDigital Library
- Z. Zhou. 2012. Ensemble Methods: Foundations and Algorithms.Chapman and Hall/CRC. Google ScholarDigital Library
- S. Teerapittayanon, B. McDanel, and H.-T. Kung. 2017. Distributed deep neural networks over the cloud, the edge, and end devices. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS’17). 328–339.Google ScholarCross Ref
- N. F. Rajani and R. J. Mooney. 2017. Stacking with auxiliary features. In Proceedings of IJCAI, 2634–2640. Google ScholarDigital Library
- J. Chen, J. Chen, H. Chao, and M. Yang. 2018. Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18), 3155–3164.Google Scholar
- Z. Sun, M. Ozay, Y. Zhang, X. Liu, and T. Okatani. 2018. Feature quantization for defending against distortion of images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 7957–7966.Google Scholar
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2223–2232.Google Scholar
- M. Labbe and F. Michaud. 2013. Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics 29, 3 (2013), 734–745. Google ScholarDigital Library
- X. Zeng, K. Cao, and M. Zhang. 2017. Mobiledeeppill: A. Small-footprint mobile deep learning system for recognizing unconstrained pill images. In Proceedings of the ACM Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’17). 56–67. Google ScholarDigital Library
- H. Qiu, X. Liu, S. Rallapalli, A. J. Bency, K. Chan, R. Urgaonkar, B. Manjunath, and R. Govindan. 2018. Kestrel: Video analytics for augmented multi-camera vehicle tracking. In Proceedings of the IEEE/ACM International Conference on Internet-of-Things Design and Implementation (IoTDI’18). 48–59.Google Scholar
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.Google Scholar
- X. Zhu, Y. Liu, J. Li, T. Wan, and Z. Qin. 2018. Emotion classification with data augmentation using generative adversarial networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’18). 349–360.Google Scholar
- V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers. 2019. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in ct segmentation tasks. Scientific Reports 9, 1 (2019), 1–9.Google ScholarCross Ref
- D. Engin, A. Genç, and H. Kemal Ekenel. 2018. Cycle-dehaze: Enhanced CycleGAN for single image dehazing. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR’18), 825–833.Google Scholar
- A. Bulat, J. Yang, and G. Tzimiropoulos. 2018. To learn image super-resolution, use a GAN to learn how to do image degradation first. In Proceedings of European Conference on Computer Vision (ECCV’18). 185–200.Google Scholar
- S. Nah, T. Hyun Kim, and K. Mu Lee. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3883–3891.Google Scholar
- Y.-W. Tai, X. Chen, S. Kim, S. J. Kim, F. Li, J. Yang, J. Yu, Y. Matsushita, and M. S. Brown. 2013. Nonlinear camera response functions and image deblurring: Theoretical analysis and practice. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 10 (2013), 2498–2512. Google ScholarDigital Library
- S. Bae and F. Durand. 2007. Defocus magnification. In Computer Graphics Forum. Wiley Online Library.Google Scholar
- J. Xu, L. Zhang, and D. Zhang. 2018. A trilateral weighted sparse coding scheme for real-world image denoising. In Proceedings of European Conference on Computer Vision (ECCV’18). 20–36.Google Scholar
- J. Anaya and A. Barbu. 2018. RENOIR: A. Dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation 51 (2018) 144–154.Google ScholarCross Ref
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
- Y. Zhou, S. Song, and N. Cheung. 2017. On classification of distorted images with deep convolutional neural networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). 1213–1217.Google Scholar
- S. Dodge and L. Karam. 2019. Human and DNN classification performance on images with quality distortions: A comparative study. ACM Transactions on Applied Perception 16, 2 (2019), 1–17. Google ScholarDigital Library
- A. Mittal, A. K. Moorthy, and A. C. Bovik 2012. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing 21, 12 (2012) 4695–4708. Google ScholarDigital Library
- R. Liu, Z. Li, and J. Jia. 2008. Image partial blur detection and classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). 1–8.Google Scholar
- N. D. Narvekar and L. J. Karam. 2011. A no-reference image blur metric based on the cumulative probability of blur detection (CPBD). IEEE Transactions on Image Processing 20, 9 (2011), 2678–2683. Google ScholarDigital Library
- BBC Civilisations AR. https://www.bbc.co.uk/taster/pilots/civilisations-ar. [Accessed May 20, 2021].Google Scholar
- QuiverVision. https://quivervision.com/. [Accessed May 20, 2021].Google Scholar
- C. Chen, Y. Miao, C. X. Lu, L. Xie, P. Blunsom, A. Markham, and N. Trigoni. 2019. Motiontransformer: Transferring neural inertial tracking between domains. Proceedings of AAAI 33, 1 (2019), 8009–8016.Google ScholarCross Ref
- A. Mathur, A. Isopoussu, F. Kawsar, N. Berthouze, and N. D. Lane. 2019. Mic2Mic: Using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems. In Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’19). 169–180. Google ScholarDigital Library
- X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2794–2802.Google Scholar
- T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and A. A. Efros. 2016. Learning dense correspondence via 3D-guided cycle consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 117–126.Google Scholar
- Y. Taigman, A. Polyak, and L. Wolf. 2016. Unsupervised cross-domain image generation. arXiv:1611.02200, 2016.Google Scholar
- J. Johnson, A. Alahi, and L. Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European Conference on Computer Vision (ECCV’16). 694–711.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google Scholar
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1125–1134.Google Scholar
- D. P. Kingma and J. Ba. 2014. Adam: A. Method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- A. Jain. 1989. Fundamentals of Digital Image Processing. Prentice Hall. Google ScholarDigital Library
- C. Poynton. 2012. Digital Video and HD: Algorithms and Interfaces. Elsevier. Google ScholarDigital Library
- X. Peng, J. Hoffman, Y. Stella, and K. Saenko. 2016. Fine-to-coarse knowledge transfer for low-res image classification. In Proceedings of the IEEE International Conference on Image Processing (ICIP’16). 3683–3687.Google Scholar
- A. O. Ercan, A. E. Gamal, and L. J. Guibas. 2013. Object tracking in the presence of occlusions using multiple cameras: A sensor network approach. ACM Transactions on Sensor Networks 9, 2 (2013), 1–36. Google ScholarDigital Library
- Google cloud anchors. https://developers.google.com/ar/develop/developer-guides/anchors.Google Scholar
- F. Li, C. Zhao, G. Ding, J. Gong, C. Liu, and F. Zhao. 2012. A reliable and accurate indoor localization method using phone inertial sensors. In Proceedings of the ACM Conference on Ubiquitous Computing (UbiComp’12). 421–430. Google ScholarDigital Library
- X. Ran, C. Slocum, Y.-Z. Tsai, K. Apicharttrisorn, M. Gorlatova, and J. Chen. 2020. Multi-user augmented reality with communication efficient and spatially consistent virtual objects. In Proceedings of the ACM International Conference on Emerging Networking EXperiments and Technologies (CoNext’20). 386–398. Google ScholarDigital Library
- Y. Lin, T. Liu, and C. Fuh. 2007. Local ensemble kernel learning for object category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1–8.Google Scholar
- M. M. Derakhshani, S. Masoudnia, A. H. Shaker, O. Mersa, M. A. Sadeghi, M. Rastegari, and B. N. Araabi. 2019. Assisted excitation of activations: A learning technique to improve object detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 9201–9210.Google Scholar
- A. Heiskanen. 2020. Fologram and all brick introduce an entirely new way of building in AR. https://aec-business.com/fologram-and-all-brick-introduce-an-entirely-new-way-of-building-in-ar/. [Accessed May 11, 2021].Google Scholar
- AR-Check. https://ar-check.com/. [Accessed May 20, 2021].Google Scholar
- A. Buades, B. Coll, and J.-M. Morel. 2011. Non-local means denoising. Image Processing on Line 1 (2011), 208–212.Google ScholarCross Ref
- M. Ye, D. Lyu, and G. Chen. 2020. Scale-iterative upscaling network for image deblurring. IEEE Access 8 (2020), 18 316–18 325.Google Scholar
- L. Xu, J. S. Ren, C. Liu, and J. Jia. 2014. Deep convolutional neural network for image deconvolution. In Proceedings of NeurIPS, vol. 27, 1790–1798. Google ScholarDigital Library
- J. Su, D. V. Vargas, and K. Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 828–841.Google ScholarCross Ref
- Y. Pei, Y. Huang, Q. Zou, X. Zhang, and S. Wang. 2020. Effects of image degradation and degradation removal to CNN-based image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 4 (2020), 1239–1253.Google ScholarCross Ref
- R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel. 2018. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.Google Scholar
- Tensorflow lite. https://www.tensorflow.org/lite. [Accessed May 20, 2021].Google Scholar
Index Terms
- Edge-assisted Collaborative Image Recognition for Mobile Augmented Reality
Recommendations
Edge-assisted collaborative image recognition for augmented reality: demo abstract
SenSys '19: Proceedings of the 17th Conference on Embedded Networked Sensor SystemsMobile Augmented Reality (AR), which overlays digital information with real-world scenes surrounding a user, provides an enhanced mode of interaction with the ambient world. Contextual AR applications rely on image recognition to identify objects in the ...
5G edge enhanced mobile augmented reality
MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and NetworkingMobile Augmented Reality (MAR) provides a unique experience where the physical world is augmented with virtual annotations. MAR involves computation-heavy algorithms that could potentially be offloaded to edge servers on 5G networks, which significantly ...
Collaborative Augmented Reality: Multi-user Interaction in Urban Simulation
IVIC '09: Proceedings of the 1st International Visual Informatics Conference on Visual Informatics: Bridging Research and PracticeAugmented reality (AR) environment allows user or multi-user to interact with 2D and 3D data. AR simply can provide a collaborative interactive AR environment for urban simulation, where users can interact naturally and intuitively. AR collaboration ...
Comments