skip to main content
research-article

CADTrack: Instructions and Support for Orientation Disambiguation of Near-Symmetrical Objects

Published: 01 November 2023 Publication History

Abstract

Determining the correct orientation of objects can be critical to succeed in tasks like assembly and quality assurance. In particular, near-symmetrical objects may require careful inspection of small visual features to disambiguate their orientation. We propose CADTrack, a digital assistant for providing instructions and support for tasks where the object orientation matters but may be hard to disambiguate with the naked eye. Additionally, we present a deep learning pipeline for tracking the orientation of near-symmetrical objects. In contrast to existing approaches, which require labeled datasets involving laborious data acquisition and annotation processes, CADTrack uses a digital model of the object to generate synthetic data and train a convolutional neural network. Furthermore, we extend the architecture of Mask R-CNN with a confidence prediction branch to avoid errors caused by misleading orientation guidance. We evaluate CADTrack in a user study, comparing our tracking-based instructions to other methods to confirm the benefits of our approach in terms of preference and required effort.

Supplementary Material

Video (iss23main-p1176-p-video.mp4)
This is the presentation video for the paper CADTrack: Instructions and Support for Orientation Disambiguation of Near-Symmetrical Objects. Abstract: Determining the correct orientation of objects can be critical to succeed in tasks like assembly and quality assurance. In particular, near-symmetrical objects may require careful inspection of small visual features to disambiguate their orientation. We propose CADTrack, a digital assistant for providing instructions and support for tasks where the object orientation matters but may be hard to disambiguate with the naked eye. Additionally, we present a deep learning pipeline for tracking the orientation of near-symmetrical objects. In contrast to existing approaches, which require labeled datasets involving laborious data acquisition and annotation processes, CADTrack uses a digital model of the object to generate synthetic data and train a convolutional neural network.

References

[1]
Adel Ahmadyan, Liangkai Zhang, Artsiom Ablavatski, Jianing Wei, and Matthias Grundmann. 2021. Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7818-7827. https://doi.org/10.1109/CVPR46437. 2021.00773
[2]
David Anderson, James L. Frankel, Joe Marks, Aseem Agarwala, Paul Beardsley, Jessica Hodgins, Darren Leigh, Kathy Ryall, Eddie Sullivan, and Jonathan S. Yedidia. 2000. Tangible Interaction + Graphical Interpretation: A New Approach to 3D Modeling. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '00). ACM Press/Addison-Wesley Publishing Co., USA, 393-402. https://doi.org/10.1145/344779.344960
[3]
João Belo, Andreas Fender, Tiare Feuchtner, and Kaj Grønbaek. 2019. Digital Assistance for Quality Assurance: Augmenting Workspaces Using Deep Learning for Tracking Near-Symmetrical Objects. In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces (Daejeon, Republic of Korea) (ISS '19). Association for Computing Machinery, New York, NY, USA, 275-287. https://doi.org/10.1145/3343055.3359699
[4]
Ernesto A. Bustamante and Randall D. Spain. 2008. Measurement Invariance of the Nasa TLX. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 52, 19 ( 2008 ), 1522-1526. https://doi.org/10.1177/154193120805201946 arXiv:https://doi.org/10.1177/154193120805201946
[5]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Afinity Fields. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1302-1310. https: //doi.org/10.1109/CVPR. 2017.143
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248-255. https://doi.org/10.1109/CVPR. 2009.5206848
[7]
Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Dmitry Olefir, Tomas Hodan, Youssef Zidan, Mohamad Elbadrawy, Markus Knauer, Harinandan Katam, and Ahsan Lodhi. 2020. BlenderProc: Reducing the Reality Gap with Photorealistic Rendering, In Robotics: Science and Systems (RSS). International Conference on Robotics: Sciene and Systems, RSS 2020. https://elib.dlr.de/139317/
[8]
Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Youssef Zidan, Dmitry Olefir, Mohamad Elbadrawy, Ahsan Lodhi, and Harinandan Katam. 2019. Blenderproc. arXiv preprint arXiv: 1911. 01911 ( 2019 ).
[9]
Tan Chee Eng, Amin Mohd Sani, and Puay Kim Yu. 2006. Methods to achieve zero human error in semiconductors manufacturing. In 2006 8th Electronics Packaging Technology Conference. 678-683. https://doi.org/10.1109/EPTC. 2006. 342795
[10]
Markus Funk, Thomas Kosch, and Albrecht Schmidt. 2016. Interactive Worker Assistance: Comparing the Efects of in-Situ Projection, Head-Mounted Displays, Tablet, and Paper Instructions. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Heidelberg, Germany) (UbiComp '16). Association for Computing Machinery, New York, NY, USA, 934-939. https://doi.org/10.1145/2971648.2971706
[11]
Ross Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV). 1440-1448. https://doi.org/10.1109/ICCV. 2015.169
[12]
Ross Girshick, Jef Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 580-587. https://doi.org/10.1109/CVPR. 2014.81
[13]
Ankit Gupta, Dieter Fox, Brian Curless, and Michael Cohen. 2012. DuploTrack: A Real-Time System for Authoring and Guiding Duplo Block Assembly. Association for Computing Machinery, New York, NY, USA, 389-402. https: //doi.org/10.1145/2380116.2380167
[14]
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139-183. https://doi.org/10.1016/S0166-4115 ( 08 ) 62386-9
[15]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980-2988. https://doi.org/10.1109/ICCV. 2017.322
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 ( 2015 ), 1904-1916. https://doi.org/10.1109/TPAMI. 2015.2389824
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770-778. https://doi.org/10.1109/CVPR. 2016.90
[18]
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of Tricks for Image Classification with Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Steven J. Henderson and Steven Feiner. 2009. Evaluating the benefits of augmented reality for task localization in maintenance of an armored personnel carrier turret. In 2009 8th IEEE International Symposium on Mixed and Augmented Reality. 135-144. https://doi.org/10.1109/ISMAR. 2009.5336486
[20]
Steven J. Henderson and Steven K. Feiner. 2011. Augmented reality in the psychomotor phase of a procedural task. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality. 191-200. https://doi.org/10.1109/ISMAR. 2011.6092386
[21]
Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, and Matthias Grundmann. 2020. MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision. arXiv: 2003. 03522 [cs.CV]
[22]
Hiroshi Ishii and Brygg Ullmer. 1997. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) ( CHI '97). Association for Computing Machinery, New York, NY, USA, 234-241. https://doi.org/10.1145/258549.258715
[23]
Ankur Joshi, Saket Kale, Satish Chandel, and D. K. Pal. 2015. Likert Scale: Explored and Explained. Current Journal of Applied Science and Technology 7, 4 (Feb. 2015 ), 396-403. https://doi.org/10.9734/BJAST/ 2015 /14975
[24]
Andreas Kamilaris and Francesc X. Prenafeta-Boldú. 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 ( 2018 ), 70-90. https://doi.org/10.1016/j.compag. 2018. 02.016
[25]
Werner Kritzinger, Matthias Karner, Georg Traar, Jan Henjes, and Wilfried Sihn. 2018. Digital Twin in manufacturing: A categorical literature review and classification. IFAC-PapersOnLine 51, 11 ( 2018 ), 1016-1022. https://doi.org/10. 1016/j.ifacol. 2018.08.474 16th IFAC Symposium on Information Control Problems in Manufacturing INCOM 2018.
[26]
Alex Krizhevsky, Ilya Sutskever, and Geofrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 60, 6 (may 2017 ), 84-90. https://doi.org/10.1145/3065386
[27]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 ( 1989 ), 541-551. https://doi.org/10.1162/neco. 1989. 1.4. 541
[28]
Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. 2009. EPnP: An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision 81, 2 ( 01 Feb 2009 ), 155-166. https://doi.org/10.1007/s11263-008-0152-6
[29]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 936-944. https://doi.org/10.1109/CVPR. 2017.106
[30]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 ( 2020 ), 318-327. https://doi.org/10.1109/TPAMI. 2018.2858826
[31]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 ( 2017 ), 60-88. https://doi.org/10.1016/j.media. 2017. 07.005
[32]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision-ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2
[33]
Andrew Miller, Brandyn White, Emiko Charbonneau, Zach Kanzler, and Joseph J. LaViola Jr. 2012. Interactive 3D Model Acquisition and Tracking of Building Block Structures. IEEE Transactions on Visualization and Computer Graphics 18, 4 ( 2012 ), 651-659. https://doi.org/10.1109/TVCG. 2012.48
[34]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In Computer Vision-ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 483-499. https://doi.org/10.1007/978-3-319-46484-8_29
[35]
Jarkko Polvi, Takafumi Taketomi, Atsunori Moteki, Toshiyuki Yoshitake, Toshiyuki Fukuoka, Goshiro Yamamoto, Christian Sandor, and Hirokazu Kato. 2018. Handheld Guides in Inspection Tasks: Augmented Reality versus Picture. IEEE Transactions on Visualization and Computer Graphics 24, 7 ( 2018 ), 2118-2128. https://doi.org/10.1109/TVCG. 2017. 2709746
[36]
PyTorch. n.d. Mask R-CNN-Torchvision main documentation. https://pytorch.org/vision/main/models/mask_rcnn. html
[37]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 779-788. https://doi.org/10.1109/CVPR. 2016.91
[38]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6517-6525. https://doi.org/10.1109/CVPR. 2017.690
[39]
Jun Rekimoto and Masanori Saitoh. 1999. Augmented Surfaces: A Spatially Continuous Work Space for Hybrid Computing Environments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) ( CHI '99). Association for Computing Machinery, New York, NY, USA, 378-385. https://doi.org/ 10.1145/302979.303113
[40]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN : Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 ( 2017 ), 1137-1149. https://doi.org/10.1109/TPAMI. 2016.2577031
[41]
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. arXiv:1312.6229 [cs.CV]
[42]
Chen Song, Jiaru Song, and Qixing Huang. 2020. HybridPose: 6D Object Pose Estimation Under Hybrid Representations. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 428-437. https://doi.org/10.1109/ CVPR42600. 2020.00051
[43]
John Stark. 2022. Product lifecycle management (PLM). In Product Lifecycle Management (Volume 1) 21st Century Paradigm for Product Realisation. Springer, 1-32.
[44]
Mengu Sukan, Carmine Elvezio, Steven Feiner, and Barbara Tversky. 2016. Providing Assistance for Orienting 3D Objects Using Monocular Eyewear. In Proceedings of the 2016 Symposium on Spatial User Interaction (Tokyo, Japan) ( SUI '16). Association for Computing Machinery, New York, NY, USA, 89-98. https://doi.org/10.1145/2983310.2985764
[45]
Andrea Bacchetti Ting Zheng, Marco Ardolino and Marco Perona. 2021. The applications of Industry 4.0 technologies in manufacturing context: a systematic literature review. International Journal of Production Research 59, 6 ( 2021 ), 1922-1954. https://doi.org/10.1080/00207543. 2020. 1824085 arXiv:https://doi.org/10.1080/00207543. 2020.1824085
[46]
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 23-30. https://doi.org/10.1109/IROS. 2017.8202133
[47]
Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human Pose Estimation via Deep Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1653-1660. https://doi.org/10.1109/CVPR. 2014.214
[48]
Jonathan Tremblay, Aayush Prakash, David Acuna, Mark Brophy, Varun Jampani, Cem Anil, Thang To, Eric Cameracci, Shaad Boochoon, and Stan Birchfield. 2018. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1082-10828. https://doi.org/10.1109/CVPRW. 2018.00143
[49]
Turck. n.d. Error Proofing for Integrated Circuit Chips Loaded in Pocket Tape. https://www.turck.de/en/error-proofingfor-ic-chips-in-pocket-tape-17004.php
[50]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724-4732. https://doi.org/10.1109/CVPR. 2016.511
[51]
Zhen Yang, Jinlei Shi, Wenjun Jiang, Yuexin Sui, Yimin Wu, Shu Ma, Chunyan Kang, and Hongting Li. 2019. Influences of Augmented Reality Assistance on Performance and Cognitive Loads in Diferent Stages of Assembly Task. Frontiers in Psychology 10 ( 2019 ). https://doi.org/10.3389/fpsyg. 2019.01703

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 7, Issue ISS
December 2023
482 pages
EISSN:2573-0142
DOI:10.1145/3554314
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2023
Published in PACMHCI Volume 7, Issue ISS

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset generation
  2. deep learning
  3. guidance
  4. industry 4.0
  5. near-symmetrical objects
  6. tracking
  7. user study

Qualifiers

  • Research-article

Funding Sources

  • Innovation Fund Denmark

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 104
    Total Downloads
  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media