ABSTRACT
In this paper, we introduce a detection system for workflow in a manufacturing line using depth images to preserve the privacy of workers. A depth camera sensor is mounted on a ceiling with a top-down angle and pointed to workers below completing a workflow. The system was deployed in a real life industrial process where workers had to work on a metal sheet by completing a sequence of bending steps. In this study, we experimented the effectiveness of using two classification approaches in order to identify the current workstep that workers are doing. The first approach was workflow detection by human activity recognition along with detecting related objects (a tool table, a computer screen and a machine) in the scene using only a depth camera sensor. Because of the similarity between the human body shape during different activities, the results were low and precision was 63.03%. The second approach was workflow detection by object classification and human localisation along with integrating depth camera sensor data with other sensor devices and results were better than the first approach with precision 85.42%. Within this approach, two classification models were created only using data from the Realsense sensor and two more were created including data from the bending machine. Each model has its own benefits in terms of precision, accuracy and performance, and we explain them along with the challenges the system had, in the discussion section. The results are also investigated in details and we present the future plans for the proposed detection system and for the sensors connected.
- Abdelrahman Ahmad, Michael Haslgrübler, Alois Ferscha, Birgit Ettinger, and Jullius Cho. 2020. Macro workstep detection for assembly manufacturing. In Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 1–6.Google ScholarDigital Library
- Annie I Antón, Julia B Earp, and Jessica D Young. 2010. How internet users’ privacy concerns have evolved since 2002. IEEE Security & Privacy 8, 1 (2010), 21–27.Google ScholarDigital Library
- Shoaib Azam, Aasim Rafique, and Moongu Jeon. 2016. Vehicle pose detection using region based convolutional neural network. In 2016 International Conference on Control, Automation and Information Sciences (ICCAIS). IEEE, 194–198.Google ScholarCross Ref
- Matthew Brand, Nuria Oliver, and Alex Pentland. 1997. Coupled hidden markov models for complex action recognition. In Proceedings of IEEE computer society conference on computer vision and pattern recognition. IEEE, 994–999.Google ScholarCross Ref
- Vincenzo Carletti, Luca Del Pizzo, Gennaro Percannella, and Mario Vento. 2017. An efficient and effective method for people detection from top-view depth cameras. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–6.Google ScholarCross Ref
- Raymond H Chan, Chung-Wa Ho, and Mila Nikolova. 2005. Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Transactions on image processing 14, 10 (2005), 1479–1485.Google ScholarDigital Library
- PJ Cheng and SC Lin. 2000. Using neural networks to predict bending angle of sheet metal formed by laser. International Journal of Machine Tools and Manufacture 40, 8(2000), 1185–1197.Google ScholarCross Ref
- Tao Cheng, Jochen Teizer, Giovanni C Migliaccio, and Umberto C Gatti. 2013. Automated task-level activity analysis through fusion of real time location sensors and worker’s thoracic posture data. Automation in Construction 29 (2013), 24–39.Google ScholarCross Ref
- Matthias Dantone, Juergen Gall, Christian Leistner, and Luc Van Gool. 2013. Human pose estimation using body parts dependent joint regressors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3041–3048.Google ScholarDigital Library
- Marcin Eichner, Manuel Marin-Jimenez, Andrew Zisserman, and Vittorio Ferrari. 2012. 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International journal of computer vision 99, 2 (2012), 190–214.Google ScholarDigital Library
- Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov. 2014. Scalable object detection using deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2147–2154.Google ScholarDigital Library
- Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32, 9(2009), 1627–1645.Google Scholar
- Pedro F Felzenszwalb and Daniel P Huttenlocher. 2005. Pictorial structures for object recognition. International journal of computer vision 61, 1 (2005), 55–79.Google ScholarDigital Library
- Markus Funk, Lars Lischke, Sven Mayer, Alireza Sahami Shirazi, and Albrecht Schmidt. 2018. Teach Me How! Interactive Assembly Instructions Using Demonstration and In-Situ Projection. In Assistive Augmentation. Springer, 49–73.Google Scholar
- Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3354–3361.Google ScholarCross Ref
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.Google ScholarDigital Library
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580–587.Google ScholarDigital Library
- Klaus Greff, André Brandão, Stephan Krauß, Didier Stricker, and Esteban Clua. 2012. A comparison between background subtraction algorithms using a consumer depth camera.. In VISAPP (1). 431–436.Google Scholar
- Zhixin Guo, Wenzhi Liao, Yifan Xiao, Peter Veelaert, and Wilfried Philips. 2019. Deep learning fusion of RGB and depth images for pedestrian detection. In 30th British Machine Vision Conference. 1–13.Google Scholar
- Sven Hinrichsen, Daniel Riediger, and Alexander Unrau. 2016. Assistance systems in manual assembly. In Proceedings of 6th International conference on Production Engineering and Management, 29 September 2016. 3–14.Google Scholar
- Philipp Hold, Selim Erol, Gehard Reisinger, and Wilfried Sihn. 2017. Planning and evaluation of digital assistance systems. Procedia Manufacturing 9(2017), 143–150.Google ScholarCross Ref
- Ninghang Hu, Gwenn Englebienne, and Ben Kröse. 2013. Posture recognition with a top-view camera. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2152–2157.Google Scholar
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7310–7311.Google ScholarCross Ref
- Isto Huvila. 2016. Awkwardness of becoming a boundary object: Mangle and materialities of reports, documentation data, and the archaeological work. The Information Society 32, 4 (2016), 280–297.Google ScholarDigital Library
- Intel. [n.d.]. Stereo Depth – Intel® RealSense™ Depth and Tracking Cameras. Accessed: 2-10-2019.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167(2015).Google Scholar
- Ardalan Khosrowpour, Juan Carlos Niebles, and Mani Golparvar-Fard. 2014. Vision-based workface assessment using depth images for activity analysis of interior construction operations. Automation in Construction 48 (2014), 74–87.Google ScholarCross Ref
- Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops. 554–561.Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks Mark. Commun. ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
- John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).Google Scholar
- Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials 15, 3 (2012), 1192–1209.Google Scholar
- Shu-Chun Lin, An-Sheng Liu, Tang-Wei Hsu, and Li-Chen Fu. 2015. Representative body points on top-view depth sequences for daily activity recognition. In 2015 IEEE international conference on systems, man, and cybernetics. IEEE, 2968–2973.Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.Google ScholarCross Ref
- An-Sheng Liu, Zi-Jun Li, Tso-Hsin Yeh, Yu-Huan Yang, and Li-Chen Fu. 2017. Partially transferred convolution neural network with cross-layer inheriting for posture recognition from top-view depth camera. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4139–4143.Google ScholarDigital Library
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.Google ScholarCross Ref
- Christian Meurisch and Max Mühlhäuser. 2021. Data Protection in AI Services: A Survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–38.Google ScholarDigital Library
- Pardis Emami Naeini, Sruti Bhagavatula, Hana Habib, Martin Degeling, Lujo Bauer, Lorrie Faith Cranor, and Norman Sadeh. 2017. Privacy expectations and preferences in an IoT world. In Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017). 399–412.Google ScholarDigital Library
- Pedro O O Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. Advances in neural information processing systems 28 (2015), 1990–1998.Google Scholar
- Chinh Huu Pham, Quoc Khanh Le, and Thanh Ha Le. 2014. Human action recognition using dynamic time warping and voting algorithm. VNU Journal of Science: Computer Science and Communication Engineering 30, 3(2014).Google Scholar
- Michael Rauter. 2013. Reliable human detection and tracking in top-view depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 529–534.Google ScholarDigital Library
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39, 6(2016), 1137–1149.Google Scholar
- Ehsan Rezazadeh Azar, Sven Dickinson, and Brenda McCabe. 2013. Server-customer interaction tracker: computer vision–based system to estimate dirt-loading cycles. Journal of Construction Engineering and Management 139, 7(2013), 785–794.Google ScholarCross Ref
- Samy Sadeky, Ayoub Al-Hamadiy, Bernd Michaelisy, and Usama Sayed. 2010. Real-time automatic traffic accident recognition using hfg. In 2010 20th International Conference on Pattern Recognition. IEEE, 3348–3351.Google ScholarDigital Library
- Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Conditional models for contextual human motion recognition. Computer Vision and Image Understanding 104, 2-3 (2006), 210–220.Google ScholarDigital Library
- Christian Szegedy, Scott Reed, Dumitru Erhan, Dragomir Anguelov, and Sergey Ioffe. 2014. Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441(2014).Google Scholar
- Christian Szegedy, Alexander Toshev, and Dumitru Erhan. 2013. Deep neural networks for object detection. Advances in neural information processing systems 26 (2013), 2553–2561.Google Scholar
- Kentaro Toyama, John Krumm, Barry Brumitt, and Brian Meyers. 1999. Wallflower: Principles and practice of background maintenance. In Proceedings of the seventh IEEE international conference on computer vision, Vol. 1. IEEE, 255–261.Google ScholarCross Ref
- Silicon UK. [n.d.]. Tales In Tech History: Microsoft Kinect. Accessed: 05-01-2018.Google Scholar
- Lu Xia, Chia-Chih Chen, and Jake K Aggarwal. 2012. View invariant human action recognition using histograms of 3d joints. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, 20–27.Google Scholar
- Wenming Yang, Wang Lu, and Naitong Zhang. 2007. Object extraction combining image partition with motion detection. In 2007 IEEE International Conference on Image Processing, Vol. 3. IEEE, III–337.Google ScholarCross Ref
- Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan. 2008. Sensor-based abnormal human-activity detection. IEEE Transactions on Knowledge and Data Engineering 20, 8(2008), 1082–1090.Google ScholarDigital Library
Index Terms
- Privacy Preserving Workflow Detection for Manufacturing Using Neural Networks based Object Detection
Recommendations
Online Depth Image-Based Object Tracking with Sparse Representation and Object Detection
Online object tracking under complex environments is an important but challenging problem in computer vision, especially for illumination changing and occlusion conditions. With the emergence of commercial real-time depth cameras like Kinect, depth ...
Macro workstep detection for assembly manufacturing
PETRA '20: Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive EnvironmentsIn this paper, we introduce a detection system for macro worksteps in a manufacturing assembly line using depth images. The sensor is mounted on the ceiling with a top-down angle. The system was deployed in a real life industrial process where workers ...
Vision-based navigation of an unmanned surface vehicle with object detection and tracking abilities
The paper discusses autocalibration, object detection, and object tracking for unmanned surface vehicles. Input data are recorded with a wide-baseline stereo vision system providing accuracy for distance estimations. The paper reports about followed ...
Comments