Abstract
Multiple human 3D pose estimation is a useful but challenging task in computer vison applications. The ambiguities in estimation of 2D and 3D poses of multiple persons can be verified by using multi-view frames, in which the occluded or self-occluded body parts of some persons might be visible in other camera views. But, when cameras are moving and uncalibrated, estimating the association of multiple human body parts among different camera views is a challenging task. This paper presents novel methods for multiple human 3D pose estimation and pose association in multi-view camera frames in an uncalibrated camera setup using an adversarial learning framework. The generator is a 3D pose estimation network that learns a mapping of distance and angular difference matrices between 2D and 3D spaces. The discriminator tries to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate valid 3D poses. To increase the accuracy of the generator network, multi-view frames are used. The estimated 3D poses are associated among multi-view frames by a statistical method. The association and relative rotation and translation of cameras to each other are also obtained. This step empowers the generator network and removes ambiguities in the estimation of occluded or self-occluded body parts. The global 3D poses are the inputs to the discriminator network to imposter the discriminator that they come from the ground-truth. Experimental results conducted on multi-view and multi-person datasets (such as Campus, Shelf, Utrecht Multi-Person Motion (UMPM), and also KTH Football 2) indicate that the proposed method achieves superior performance in comparison with other state-of-the-art methods while it does require any calibration information in priori.
Similar content being viewed by others
References
Afrouzian R, Seyedarabi H, Kasaei S (2016) Pose estimation of soccer players using multiple uncalibrated cameras. Multimedia Tools and Applications 75(12):6809–6827
Amin S, Andriluka M, Rohrbach M, Schiele B (2013) Multi-view pictorial structures for 3d human pose estimation. In: British machine vision conference, vol 2, BMVA press
Aujla GS, Jindal A, Chaudhary R, Kumar N, Vashist S, Sharma N, Obaidat MS (2019) Dlrs: deep learning-based recommender system for smart healthcare ecosystem. In: ICC 2019-2019 IEEE international conference on communications (ICC), IEEE, pp 1–6
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N (2014) Ilic, s.: 3d pictorial structures for multiple human pose estimation. In: IEEE Conference on computer vision and pattern recognition (CVPR), IEEE, pp 1669–1676
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N (2015) Ilic, S.: 3d pictorial structures revisited: Multiple human pose estimation. IEEE Trans Patt Anal Mach Intel 38:1929–1942
Belagiannis V, Amin S, Andriluka M, Schiele B, Navab N (2015) Ilic, S.: 3d pictorial structures revisited: Multiple human pose estimation. IEEE Trans Patt Anal Mach Intel 38:1929–1942
Berclaz J, Fleuret F, Turetken E, Fua P (2011) Multiple object tracking using k-shortest paths optimization. IEEE Trans Patt Anal Mach Intel 33 (9):1806–1819
Biswas P, Liang TC, Toh KC, Ye Y, Wang TC (2006) Semidefinite programming approaches for sensor network localization with noisy distance measurements. IEEE Trans Autom Sci Eng 3(4):360–371
Bridgeman L, Volino M, Guillemaut JY, Hilton A (2019) Multi-person 3d pose estimation and tracking in sports. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 0–0
Burenius M, Sullivan J (2011) Carlsson, S.: Motion capture from dynamic orthographic cameras. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops), pp 1634–1641
Burenius M, Sullivan J (2013) Carlsson, s.: 3d pictorial structures for multiple view articulated pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3618–3625
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Chen L, Ai H, Chen R, Zhuang Z, Liu S (2020) Cross-view tracking for multi-human 3d pose estimation at over 100 fps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3279–3288
Chen C, Ramanan D (2016) 3d human pose estimation = 2d pose estimation + matching. arXiv:abs/1612.06524
Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112
Dong J, Jiang W, Huang Q, Bao H, Zhou X (2019) Fast and robust multi-person 3d pose estimation from multiple views. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7792–7801
Dong Y, Zhang Z, Hong WC (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11(4):1009
Ershadi-Nasab S, Kasaei S, Sanaei E (2018) Regression-based convolutional 3d pose estimation from single image. Electron Lett 54(5):292–293
Ershadi-Nasab S, Noury E, Kasaei S, Sanaei E (2016) 3d multiple human pose estimation from multi-view images. MMTA submitted
Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R (2019) A hybrid deep learning-based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv Manag 16(3):924–935
Garg S, Kaur K, Kumar N, Rodrigues JJ (2019) Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in sdn: a social multimedia perspective. IEEE Trans Multimedia 21(3):566–578
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gower J, Dijksterhuis G (2004) Procrustes problems. Oxford Statistical Science Series. OUP Oxford. https://books.google.com/books?id=ukeWSQx0LoAC
Gulati A, Aujla GS, Chaudhary R, Kumar N, Obaidat MS (2018) Deep learning-based content centric data dissemination scheme for internet of vehicles. In: 2018 IEEE international conference on communications (ICC), IEEE, pp 1–6
Heng L, Li B, Pollefeys M (2013) Camodocal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry. In: 2013 IEEE/RSJ international conference on intelligent robots and systems, pp 1793–1800
Hong WC, Dong Y, Lai CY, Chen LY, Wei SY (2011) Svr with hybrid chaotic immune algorithm for seasonal load demand forecasting. Energies 4(6):960–977
Hong WC, Li MW, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443
Hurley JR, Cattell RB (1962) The procrustes program: Producing direct rotation to test a hypothesized factor structure. Syst Res Behav Sci 7(2):258–262
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, Springer, pp 34–50
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Patt Anal Mach Intel 36(7):1325–1339
Iqbal U, Milan A, Gall J (2017) Posetrack: Joint multi-person pose estimation and tracking. In: IEEE conference on computer vision and pattern recognition (CVPR). arXiv:1611.07727
Jindal A, Aujla GS, Kumar N, Prodan R, Obaidat MS (2018) Drums: Demand response management in a smart city using deep learning and svr. In: 2018 IEEE Global communications conference (GLOBECOM), IEEE, pp 1–6
Kazemi V, Burenius M, Azizpour H, Sullivan J (2013) Multi-view body part recognition with random forests. In: BMVC
Kim JH, Dai Y, Li H, Du X, Kim J (2013) Multi-view 3d reconstruction from uncalibrated radially-symmetric cameras. In: 2013 IEEE international conference on computer vision, pp 1896–1903
Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69
Li S, Chan AB (2014) 3d human pose estimation from monocular images with deep convolutional neural network. In: Asian conference on computer vision, Springer, pp 332–347
Li M, Zhou Z (2020) Liu, x.: 3d hypothesis clustering for cross-view matching in multi-person motion capture. Comput Visual Media 6(2):147–156
Lian J, Jia W, Zareapoor M, Zheng Y, Luo R, Jain DK, Kumar N (2019) Deep-learning-based small surface defect detection via an exaggerated local variation-based generative adversarial network. IEEE Trans Industrial Inform 16(2):1343–1351
Liao X, Li K, Yin J (2016) Separable data hiding in encrypted image based on compressive sensing and discrete fourier transform. Multimedia Tools and Applications 76:20,739–20,753
Liao X, Yu Y, Li B, Li Z, Qin Z (2020) A new payload partition strategy in color image steganography. IEEE Trans Circ Syst Video Technol 30:685–696
Liu P, Yu H, Cang S (2019) Adaptive neural network tracking control for underactuated systems with matched and mismatched disturbances. Nonlinear Dynamics 98(2):1447–1464
Makkar A, Kumar N (2018) User behavior analysis-based smart energy management for webpage ranking: Learning automata-based solution. Sustainable Computing: Informatics and Systems 20:174–191
Makkar A, Obaidat MS, Kumar N (2018) Fs2rnn: Feature selection scheme for web spam detection using recurrent neural networks. In: 2018 IEEE Global communications conference (GLOBECOM), IEEE, pp 1–6
Miglani A, Kumar N (2019) Deep learning models for traffic flow prediction in autonomous vehicles: a review, solutions, and challenges. Vehicular Communications 20(100):184
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV
Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Harvesting multiple views for marker-less 3d human pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Revaud J, Weinzaepfel P, Harchaoui Z, Schmid C (2016) Deepmatching: Hierarchical deformable dense matching. Int J Comput Vis 120(3):300–323
Rosales R, Siddiqui M, Alon J, Sclaroff S (2001) Estimating 3d body pose using uncalibrated cameras. In: 2001. CVPR 2001. Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, vol 1, IEEE, pp I–I
Schick A, Stiefelhagen R (2015) 3d pictorial structures for human pose estimation with supervoxels. In: 2015 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 140–147
Sun L, Zhao C, Yan Z, Liu P, Duckett T, Stolkin R (2018) A novel weakly-supervised approach for rgb-d-based nuclear waste object detection. IEEE Sensors J 19(9):3487–3500
Tanke J, Gall J (2019) Iterative greedy matching for 3d human pose tracking from multiple views. In: German conference on pattern recognition, Springer, pp 537–550
Tekin B, Katircioglu I, Salzmann M, Lepetit V, Fua P (2016) Structured prediction of 3d human pose with deep neural networks. arXiv:abs/1605.05180
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Varga D, Szirányi T, Kiss A, Spórás L, Havasi L (2015) A multi-view pedestrian tracking method in an uncalibrated camera network. In: 2015 IEEE international conference on computer vision workshop (ICCVW), pp 184–191
Wang X, Cao Z, Wang R, Liu Z, Zhu X (2019) Improving human pose estimation with self-attention generative adversarial networks. IEEE Access 7:119,668–119,680
Yang W, Ouyang W, Wang X, Ren J, Li H (2018) Wang, x.: 3d human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5255–5264
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22:1330–1334
Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dynamics 98(2):1107–1136
Zhang Z, Hong WC, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14,642–14,658
Zhou X, Sun X, Zhang W, Liang S, Wei Y (2016) Deep kinematic pose regression. In: Computer vision–ECCV 2016 workshops, Springer, pp 186–201
Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3d human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4966–4975
van der Aa NP, Luo X, Giezeman GJ, Tan RT, Veltkamp RC (2011) Umpm benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: 2011 IEEE International conference on computer vision workshops (ICCV workshops), pp 1264–1269
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ershadi-Nasab, S., Kasaei, S. & Sanaei, E. Uncalibrated multi-view multiple humans association and 3D pose estimation by adversarial learning. Multimed Tools Appl 80, 2461–2488 (2021). https://doi.org/10.1007/s11042-020-09733-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09733-5