Abstract
In this paper, we present BodyTrak, an intelligent sensing technology that can estimate full body poses on a wristband. It only requires one miniature RGB camera to capture the body silhouettes, which are learned by a customized deep learning model to estimate the 3D positions of 14 joints on arms, legs, torso, and head. We conducted a user study with 9 participants in which each participant performed 12 daily activities such as walking, sitting, or exercising, in varying scenarios (wearing different clothes, outdoors/indoors) with a different number of camera settings on the wrist. The results show that our system can infer the full body pose (3D positions of 14 joints) with an average error of 6.9 cm using only one miniature RGB camera (11.5mm x 9.5mm) on the wrist pointing towards the body. Based on the results, we disscuss the possible application, challenges, and limitations to deploy our system in real-world scenarios.
- Md Atiqur Rahman Ahad, Masud Ahmed, Anindya Das Antar, Yasushi Makihara, and Yasushi Yagi. 2021. Action recognition using kinematics posture feature on 3D skeleton joint locations. Pattern Recognition Letters 145 (2021), 216--224.Google ScholarDigital Library
- Karan Ahuja, Andy Kong, Mayank Goel, and Chris Harrison. 2020. Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems.. In UIST. 1121--1131.Google Scholar
- Karan Ahuja, Sven Mayer, Mayank Goel, and Chris Harrison. 2021. Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
- Amonzon. [n.d.]. Musou USB Safety Tester,USB Digital Power Meter Tester Multimeter Current and Voltage Monitor DC 5.1A 30V Amp Voltage Power Meter, Test Speed of Chargers, Cables, Capacity of Power Banks,Black. [EB/OL]. https://www.amazon.com/Musou-Digital-Multimeter-Chargers-Capacity/dp/B071214RD8 Accessed Oct 4, 2020.Google Scholar
- Rozilene Maria C Aroeira, B Estevam, Antônio Eustáquio M Pertence, Marcelo Greco, and João Manuel RS Tavares. 2016. Non-invasive methods of computer vision in the posture evaluation of adolescent idiopathic scoliosis. Journal of bodywork and movement therapies 20, 4 (2016), 832--843.Google ScholarCross Ref
- Carlijn VC Bouten, Karel TM Koekkoek, Maarten Verduin, Rens Kodde, and Jan D Janssen. 1997. A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE transactions on biomedical engineering 44, 3 (1997), 136--147.Google Scholar
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 172--186.Google ScholarDigital Library
- Tuochao Chen, Yaxuan Li, Songyun Tao, Hyunchul Lim, Mose Sakashita, Ruidong Zhang, Francois Guimbretiere, and Cheng Zhang. 2021. NeckFace: Continuously Tracking Full Facial Expressions on Neck-mounted Wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--31.Google ScholarDigital Library
- Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 112--125.Google ScholarDigital Library
- Intel Corporation. 2021. RealSense. In https://www.intelrealsense.com/.Google Scholar
- Microsoft Corporation. 2021. Microsoft Kinect.. In https://en.wikipedia.org/wiki/Kinect.Google Scholar
- Rita Cucchiara, Costantino Grana, Andrea Prati, and Roberto Vezzani. 2004. Probabilistic posture classification for human-behavior analysis. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans 35, 1 (2004), 42--54.Google ScholarDigital Library
- Amit Das, Ivan Tashev, and Shoaib Mohammed. 2017. Ultrasound based gesture recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 406--410.Google ScholarDigital Library
- Mohamed El Amine Elforaici, Ismail Chaaraoui, Wassim Bouachir, Youssef Ouakrim, and Neila Mezghani. 2018. Posture recognition using an RGB-D camera: exploring 3D body modeling and deep learning approaches. In 2018 IEEE life sciences conference (LSC). IEEE, 69--72.Google Scholar
- Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297--7306.Google ScholarCross Ref
- Samuel Gandang Gunanto et al. 2016. 2D to 3D space transformation for facial animation based on marker data. In 2016 6th International Annual Engineering Seminar (InAES). IEEE, 1--5.Google Scholar
- Samer Hijazi, Rishi Kumar, and Chris Rowen. 2015. Using convolutional neural networks for image recognition. Cadence Design Systems Inc.: San Jose, CA, USA (2015), 1--12.Google Scholar
- Ryosuke Hori, Ryo Hachiuma, Hideo Saito, Mariko Isogawa, and Dan Mikami. 2021. Silhouette-Based Synthetic Data Generation For 3D Human Pose Estimation With A Single Wrist-Mounted 360° Camera. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 1304--1308.Google ScholarCross Ref
- Fang Hu, Peng He, Songlin Xu, Yin Li, and Cheng Zhang. 2020. FingerTrak: Continuous 3D Hand Pose Tracking by Deep Learning Hand Silhouettes Captured by Miniature Thermal Cameras on Wrist. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2, Article 71 (June 2020), 24 pages. https://doi.org/10.1145/3397306Google ScholarDigital Library
- Fang Hu, Peng He, Songlin Xu, Yin Li, and Cheng Zhang. 2020. FingerTrak: Continuous 3D hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1--24.Google ScholarDigital Library
- Xinyue Huang and Adriana Kovashka. 2016. Inferring Visual Persuasion via Body Language, Setting, and Deep Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarCross Ref
- Dong-Hyun Hwang, Kohei Aso, and Hideki Koike. 2019. MonoEye: Monocular Fisheye Camera-based 3D Human Pose Estimation. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 988--989.Google ScholarCross Ref
- Dong-Hyun Hwang, Kohei Aso, Ye Yuan, Kris Kitani, and Hideki Koike. 2020. MonoEye: Multimodal Human Motion Capture System Using A Single Ultra-Wide Fisheye Camera. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 98--111.Google ScholarDigital Library
- NaturalPoint Inc. 2021. OptiTrack. In http://optitrack.com.Google Scholar
- Northern Digital Inc. 2021. trakSTAR. In https://www.ndigital.com/msci/products/drivebay-trakstar/.Google Scholar
- PhaseSpace Inc. 2021. PhaseSpace. In https://phasespace.com/.Google Scholar
- Wenjun Jiang, Hongfei Xue, Chenglin Miao, Shiyang Wang, Sen Lin, Chong Tian, Srinivasan Murali, Haochen Hu, Zhi Sun, and Lu Su. 2020. Towards 3D human pose construction using wifi. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.Google ScholarDigital Library
- Shian-Ru Ke, LiangJia Zhu, Jenq-Neng Hwang, Hung-I Pai, Kung-Ming Lan, and Chih-Pin Liao. 2010. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming. In 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 489--496.Google ScholarDigital Library
- Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.Google ScholarDigital Library
- David Kim, Otmar Hilliges, Shahram Izadi, Alex D Butler, Jiawen Chen, Iason Oikonomidis, and Patrick Olivier. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology. 167--176.Google ScholarDigital Library
- Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. 2020. Cross-domain complementary learning using pose for multi-person part segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1066--1078.Google ScholarCross Ref
- Jianbo Liu, Ying Wang, Yongcheng Liu, Shiming Xiang, and Chunhong Pan. 2020. 3D PostureNet: A unified framework for skeleton-based posture recognition. Pattern Recognition Letters 140 (2020), 143--149.Google ScholarDigital Library
- Yang Liu, Zhenjiang Li, Zhidan Liu, and Kaishun Wu. 2019. Real-time arm skeleton tracking and gesture inference tolerant to missing wearable sensors. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services. 287--299.Google ScholarDigital Library
- ALT LLC. 2021. Antilatency. In https://antilatency.com/.Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarCross Ref
- Vicon Motion Systems Ltd. 2021. Vicon. In https://vicon.com/.Google Scholar
- Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV). IEEE, 506--516.Google ScholarCross Ref
- Greg Mori, Xiaofeng Ren, Alexei A Efros, and Jitendra Malik. 2004. Recovering human body configurations: Combining segmentation and recognition. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. IEEE, II-II.Google ScholarCross Ref
- Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. 2020. You2me: Inferring body pose in egocentric video via first and second person interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9890--9900.Google ScholarCross Ref
- Jaime A Rincon, Angelo Costa, Paulo Novais, Vicente Julian, and Carlos Carrascosa. 2018. Intelligent wristbands for the automatic detection of emotional states for the elderly. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 520--530.Google ScholarDigital Library
- Daniel Roetenberg, Henk Luinge, and Per Slycke. 2009. Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Tech. Rep 1 (2009), 1--7.Google Scholar
- J Roggendorf, S Chen, S Baudrexel, S Van De Loo, C Seifried, and R Hilker. 2012. Arm swing asymmetry in Parkinson's disease measured with ultrasound based motion analysis during treadmill gait. Gait & posture 35, 1 (2012), 116--120.Google Scholar
- Ralf Schmidt, Catherine Disselhorst-Klug, Jiri Silny, and Günter Rau. 1999. A marker-based measurement procedure for unconstrained wrist and elbow motions. Journal of biomechanics 32, 6 (1999), 615--621.Google ScholarCross Ref
- Sheng Shen, He Wang, and Romit Roy Choudhury. 2016. I am a smartwatch and i can track my user's arm. In Proceedings of the 14th annual international conference on Mobile systems, applications, and services. 85--96.Google ScholarDigital Library
- Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (July 2011), 10 pages. https://doi.org/10.1145/2010324.1964926Google ScholarDigital Library
- Christina Strohrmann, Holger Harms, Cornelia Kappeler-Setz, and Gerhard Troster. 2012. Monitoring kinematic changes with fatigue in running using body-worn sensors. IEEE transactions on information technology in biomedicine 16, 5 (2012), 983--990.Google ScholarDigital Library
- Nusrat Tasnim, Md Islam, Joong-Hwan Baek, et al. 2020. Deep learning-based action recognition using 3D skeleton joints information. Inventions 5, 3 (2020), 49.Google ScholarCross Ref
- Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xr-egopose: Egocentric 3d human pose from an hmd camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7728--7738.Google ScholarCross Ref
- Vive. 2021. HTC VIVE.. In https://www.vive.com/.Google Scholar
- Kathan Vyas, Rui Ma, Behnaz Rezaei, Shuangjun Liu, Michael Neubauer, Thomas Ploetz, Ronald Oberleitner, and Sarah Ostadabbas. 2019. Recognition of atypical behavior in autism diagnosis from video using pose estimation over time. In 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
- Erwin Wu, Ye Yuan, Hui-Shyong Yeo, Aaron Quigley, Hideki Koike, and Kris M Kitani. 2020. Back-Hand-Pose: 3D Hand Pose Estimation for a Wrist-worn Camera via Dorsum Deformation Network. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 1147--1160.Google ScholarDigital Library
- Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, and Christian Theobalt. 2019. Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE transactions on visualization and computer graphics 25, 5 (2019), 2093--2101.Google Scholar
- Jackie Yang, Gaurab Banerjee, Vishesh Gupta, Monica S Lam, and James A Landay. 2020. Soundr: Head Position and Orientation Prediction Using a Microphone Array. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
- Jackie Yang, Tuochao Chen, Fang Qin, Monica S Lam, and James A Landay. 2022. HybridTrak: Adding Full-Body Tracking to VR Using an Off-the-Shelf Webcam. In CHI Conference on Human Factors in Computing Systems. 1--13.Google ScholarDigital Library
- Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.Google ScholarCross Ref
Index Terms
- BodyTrak: Inferring Full-body Poses from Body Silhouettes Using a Miniature Camera on a Wristband
Recommendations
Accurate 3D motion tracking by combining image alignment and feature matching
AbstractWe presents a novel method to improve the accuracy of 3D motion tacking. In contrast to the state-of-the-art tracking approaches, where the 3D structure of target is commonly approximated by a CAD model, the proposed method establishes the target ...
Multiple people tracking and pose estimation with occlusion estimation
Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-...
Comments