research-article

BodyTrak: Inferring Full-body Poses from Body Silhouettes Using a Miniature Camera on a Wristband

Authors:
Hyunchul Lim

Cornell University, Ithaca, New York, USA

Cornell University, Ithaca, New York, USA
View Profile

,
Yaxuan Li

McGill University, Montreal, Canada

McGill University, Montreal, Canada
View Profile

,
Matthew Dressa

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Fang Hu

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Jae Hoon Kim

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Ruidong Zhang

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

,
Cheng Zhang

Cornell University, Ithaca, USA

Cornell University, Ithaca, USA
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6 Issue 3Article No.: 154pp 1–21https://doi.org/10.1145/3552312

Published:07 September 2022Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

In this paper, we present BodyTrak, an intelligent sensing technology that can estimate full body poses on a wristband. It only requires one miniature RGB camera to capture the body silhouettes, which are learned by a customized deep learning model to estimate the 3D positions of 14 joints on arms, legs, torso, and head. We conducted a user study with 9 participants in which each participant performed 12 daily activities such as walking, sitting, or exercising, in varying scenarios (wearing different clothes, outdoors/indoors) with a different number of camera settings on the wrist. The results show that our system can infer the full body pose (3D positions of 14 joints) with an average error of 6.9 cm using only one miniature RGB camera (11.5mm x 9.5mm) on the wrist pointing towards the body. Based on the results, we disscuss the possible application, challenges, and limitations to deploy our system in real-world scenarios.

References

Md Atiqur Rahman Ahad, Masud Ahmed, Anindya Das Antar, Yasushi Makihara, and Yasushi Yagi. 2021. Action recognition using kinematics posture feature on 3D skeleton joint locations. Pattern Recognition Letters 145 (2021), 216--224.Google ScholarDigital Library
Karan Ahuja, Andy Kong, Mayank Goel, and Chris Harrison. 2020. Direction-of-Voice (DoV) Estimation for Intuitive Speech Interaction with Smart Devices Ecosystems.. In UIST. 1121--1131.Google Scholar
Karan Ahuja, Sven Mayer, Mayank Goel, and Chris Harrison. 2021. Pose-on-the-Go: Approximating User Pose with Smartphone Sensor Fusion and Inverse Kinematics. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
Amonzon. [n.d.]. Musou USB Safety Tester,USB Digital Power Meter Tester Multimeter Current and Voltage Monitor DC 5.1A 30V Amp Voltage Power Meter, Test Speed of Chargers, Cables, Capacity of Power Banks,Black. [EB/OL]. https://www.amazon.com/Musou-Digital-Multimeter-Chargers-Capacity/dp/B071214RD8 Accessed Oct 4, 2020.Google Scholar
Rozilene Maria C Aroeira, B Estevam, Antônio Eustáquio M Pertence, Marcelo Greco, and João Manuel RS Tavares. 2016. Non-invasive methods of computer vision in the posture evaluation of adolescent idiopathic scoliosis. Journal of bodywork and movement therapies 20, 4 (2016), 832--843.Google ScholarCross Ref
Carlijn VC Bouten, Karel TM Koekkoek, Maarten Verduin, Rens Kodde, and Jan D Janssen. 1997. A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE transactions on biomedical engineering 44, 3 (1997), 136--147.Google Scholar
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 172--186.Google ScholarDigital Library
Tuochao Chen, Yaxuan Li, Songyun Tao, Hyunchul Lim, Mose Sakashita, Ruidong Zhang, Francois Guimbretiere, and Cheng Zhang. 2021. NeckFace: Continuously Tracking Full Facial Expressions on Neck-mounted Wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--31.Google ScholarDigital Library
Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 112--125.Google ScholarDigital Library
Intel Corporation. 2021. RealSense. In https://www.intelrealsense.com/.Google Scholar
Microsoft Corporation. 2021. Microsoft Kinect.. In https://en.wikipedia.org/wiki/Kinect.Google Scholar
Rita Cucchiara, Costantino Grana, Andrea Prati, and Roberto Vezzani. 2004. Probabilistic posture classification for human-behavior analysis. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans 35, 1 (2004), 42--54.Google ScholarDigital Library
Amit Das, Ivan Tashev, and Shoaib Mohammed. 2017. Ultrasound based gesture recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 406--410.Google ScholarDigital Library
Mohamed El Amine Elforaici, Ismail Chaaraoui, Wassim Bouachir, Youssef Ouakrim, and Neila Mezghani. 2018. Posture recognition using an RGB-D camera: exploring 3D body modeling and deep learning approaches. In 2018 IEEE life sciences conference (LSC). IEEE, 69--72.Google Scholar
Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7297--7306.Google ScholarCross Ref
Samuel Gandang Gunanto et al. 2016. 2D to 3D space transformation for facial animation based on marker data. In 2016 6th International Annual Engineering Seminar (InAES). IEEE, 1--5.Google Scholar
Samer Hijazi, Rishi Kumar, and Chris Rowen. 2015. Using convolutional neural networks for image recognition. Cadence Design Systems Inc.: San Jose, CA, USA (2015), 1--12.Google Scholar
Ryosuke Hori, Ryo Hachiuma, Hideo Saito, Mariko Isogawa, and Dan Mikami. 2021. Silhouette-Based Synthetic Data Generation For 3D Human Pose Estimation With A Single Wrist-Mounted 360° Camera. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 1304--1308.Google ScholarCross Ref
Fang Hu, Peng He, Songlin Xu, Yin Li, and Cheng Zhang. 2020. FingerTrak: Continuous 3D Hand Pose Tracking by Deep Learning Hand Silhouettes Captured by Miniature Thermal Cameras on Wrist. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2, Article 71 (June 2020), 24 pages. https://doi.org/10.1145/3397306Google ScholarDigital Library
Fang Hu, Peng He, Songlin Xu, Yin Li, and Cheng Zhang. 2020. FingerTrak: Continuous 3D hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1--24.Google ScholarDigital Library
Xinyue Huang and Adriana Kovashka. 2016. Inferring Visual Persuasion via Body Language, Setting, and Deep Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarCross Ref
Dong-Hyun Hwang, Kohei Aso, and Hideki Koike. 2019. MonoEye: Monocular Fisheye Camera-based 3D Human Pose Estimation. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 988--989.Google ScholarCross Ref
Dong-Hyun Hwang, Kohei Aso, Ye Yuan, Kris Kitani, and Hideki Koike. 2020. MonoEye: Multimodal Human Motion Capture System Using A Single Ultra-Wide Fisheye Camera. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 98--111.Google ScholarDigital Library
NaturalPoint Inc. 2021. OptiTrack. In http://optitrack.com.Google Scholar
Northern Digital Inc. 2021. trakSTAR. In https://www.ndigital.com/msci/products/drivebay-trakstar/.Google Scholar
PhaseSpace Inc. 2021. PhaseSpace. In https://phasespace.com/.Google Scholar
Wenjun Jiang, Hongfei Xue, Chenglin Miao, Shiyang Wang, Sen Lin, Chong Tian, Srinivasan Murali, Haochen Hu, Zhi Sun, and Lu Su. 2020. Towards 3D human pose construction using wifi. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.Google ScholarDigital Library
Shian-Ru Ke, LiangJia Zhu, Jenq-Neng Hwang, Hung-I Pai, Kung-Ming Lan, and Chih-Pin Liao. 2010. Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming. In 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 489--496.Google ScholarDigital Library
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision. 2938--2946.Google ScholarDigital Library
David Kim, Otmar Hilliges, Shahram Izadi, Alex D Butler, Jiawen Chen, Iason Oikonomidis, and Patrick Olivier. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology. 167--176.Google ScholarDigital Library
Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, and Ming-Ting Sun. 2020. Cross-domain complementary learning using pose for multi-person part segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1066--1078.Google ScholarCross Ref
Jianbo Liu, Ying Wang, Yongcheng Liu, Shiming Xiang, and Chunhong Pan. 2020. 3D PostureNet: A unified framework for skeleton-based posture recognition. Pattern Recognition Letters 140 (2020), 143--149.Google ScholarDigital Library
Yang Liu, Zhenjiang Li, Zhidan Liu, and Kaishun Wu. 2019. Real-time arm skeleton tracking and gesture inference tolerant to missing wearable sensors. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services. 287--299.Google ScholarDigital Library
ALT LLC. 2021. Antilatency. In https://antilatency.com/.Google Scholar
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarCross Ref
Vicon Motion Systems Ltd. 2021. Vicon. In https://vicon.com/.Google Scholar
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV). IEEE, 506--516.Google ScholarCross Ref
Greg Mori, Xiaofeng Ren, Alexei A Efros, and Jitendra Malik. 2004. Recovering human body configurations: Combining segmentation and recognition. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. IEEE, II-II.Google ScholarCross Ref
Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. 2020. You2me: Inferring body pose in egocentric video via first and second person interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9890--9900.Google ScholarCross Ref
Jaime A Rincon, Angelo Costa, Paulo Novais, Vicente Julian, and Carlos Carrascosa. 2018. Intelligent wristbands for the automatic detection of emotional states for the elderly. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 520--530.Google ScholarDigital Library
Daniel Roetenberg, Henk Luinge, and Per Slycke. 2009. Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV, Tech. Rep 1 (2009), 1--7.Google Scholar
J Roggendorf, S Chen, S Baudrexel, S Van De Loo, C Seifried, and R Hilker. 2012. Arm swing asymmetry in Parkinson's disease measured with ultrasound based motion analysis during treadmill gait. Gait & posture 35, 1 (2012), 116--120.Google Scholar
Ralf Schmidt, Catherine Disselhorst-Klug, Jiri Silny, and Günter Rau. 1999. A marker-based measurement procedure for unconstrained wrist and elbow motions. Journal of biomechanics 32, 6 (1999), 615--621.Google ScholarCross Ref
Sheng Shen, He Wang, and Romit Roy Choudhury. 2016. I am a smartwatch and i can track my user's arm. In Proceedings of the 14th annual international conference on Mobile systems, applications, and services. 85--96.Google ScholarDigital Library
Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins. 2011. Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (July 2011), 10 pages. https://doi.org/10.1145/2010324.1964926Google ScholarDigital Library
Christina Strohrmann, Holger Harms, Cornelia Kappeler-Setz, and Gerhard Troster. 2012. Monitoring kinematic changes with fatigue in running using body-worn sensors. IEEE transactions on information technology in biomedicine 16, 5 (2012), 983--990.Google ScholarDigital Library
Nusrat Tasnim, Md Islam, Joong-Hwan Baek, et al. 2020. Deep learning-based action recognition using 3D skeleton joints information. Inventions 5, 3 (2020), 49.Google ScholarCross Ref
Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xr-egopose: Egocentric 3d human pose from an hmd camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7728--7738.Google ScholarCross Ref
Vive. 2021. HTC VIVE.. In https://www.vive.com/.Google Scholar
Kathan Vyas, Rui Ma, Behnaz Rezaei, Shuangjun Liu, Michael Neubauer, Thomas Ploetz, Ronald Oberleitner, and Sarah Ostadabbas. 2019. Recognition of atypical behavior in autism diagnosis from video using pose estimation over time. In 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
Erwin Wu, Ye Yuan, Hui-Shyong Yeo, Aaron Quigley, Hideki Koike, and Kris M Kitani. 2020. Back-Hand-Pose: 3D Hand Pose Estimation for a Wrist-worn Camera via Dorsum Deformation Network. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 1147--1160.Google ScholarDigital Library
Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, and Christian Theobalt. 2019. Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE transactions on visualization and computer graphics 25, 5 (2019), 2093--2101.Google Scholar
Jackie Yang, Gaurab Banerjee, Vishesh Gupta, Monica S Lam, and James A Landay. 2020. Soundr: Head Position and Orientation Prediction Using a Microphone Array. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarDigital Library
Jackie Yang, Tuochao Chen, Fang Qin, Monica S Lam, and James A Landay. 2022. HybridTrak: Adding Full-Body Tracking to VR Using an Off-the-Shelf Webcam. In CHI Conference on Human Factors in Computing Systems. 1--13.Google ScholarDigital Library
Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7356--7365.Google ScholarCross Ref

Index Terms

BodyTrak: Inferring Full-body Poses from Body Silhouettes Using a Miniature Camera on a Wristband
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile devices

Recommendations

Vision-based 3-D tracking of humans in action
Read More
Accurate 3D motion tracking by combining image alignment and feature matching
Abstract
We presents a novel method to improve the accuracy of 3D motion tacking. In contrast to the state-of-the-art tracking approaches, where the 3D structure of target is commonly approximated by a CAD model, the proposed method establishes the target ...
Read More
Multiple people tracking and pose estimation with occlusion estimation

Simultaneously tracking poses of multiple people is a difficult problem because of inter-person occlusions and self occlusions. This paper presents an approach that circumvents this problem by performing tracking based on observations from multiple wide-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 6, Issue 3
September 2022
1612 pages
EISSN:2474-9567
DOI:10.1145/3563014
Issue’s Table of Contents

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 September 2022
Published in imwut Volume 6, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Motion Tracking
Pose Estimation
Smart devices
Wearable Technology
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 695
  Total Downloads
- Downloads (Last 12 months)240
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BodyTrak: Inferring Full-body Poses from Body Silhouettes Using a Miniature Camera on a Wristband

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Vision-based 3-D tracking of humans in action

Accurate 3D motion tracking by combining image alignment and feature matching

Multiple people tracking and pose estimation with occlusion estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BodyTrak: Inferring Full-body Poses from Body Silhouettes Using a Miniature Camera on a Wristband

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Vision-based 3-D tracking of humans in action

Accurate 3D motion tracking by combining image alignment and feature matching

Multiple people tracking and pose estimation with occlusion estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media