Enabling Gait Analysis in the Telemedicine Practice through Portable and Accurate 3D Human Pose Estimation

https://doi.org/10.1016/j.cmpb.2022.107016Get rights and content

Highlights

  • A portable hardware/software system was developed for human motion analysis at a distance

  • Limited discrepancies w.r.t. motion capture systems were found

  • The work put the basis towards remote gait analysis

Abstract

Human pose estimation (HPE) through deep learning-based software applications is a trend topic for markerless motion analysis. Thanks to the accuracy of the state-of-the-art technology, HPE could enable gait analysis in the telemedicine practice. On the other hand, delivering such a service at a distance requires the system to satisfy multiple and different constraints like accuracy, portability, real-time, and privacy compliance at the same time. Existing solutions either guarantee accuracy and real-time (e.g., the widespread OpenPose software on well-equipped computing platforms) or portability and data privacy (e.g., light convolutional neural networks on mobile phones). We propose a portable and low-cost platform that implements real-time and accurate 3D HPE through an embedded software on a low-power off-the-shelf computing device that guarantees privacy by default and by design. We present an extended evaluation of both accuracy and performance of the proposed solution conducted with a marker-based motion capture system (i.e., Vicon) as ground truth. The results show that the platform achieves real-time performance and high-accuracy with a deviation below the error tolerance when compared to the marker-based motion capture system (e.g., less than an error of 5 on the estimated knee flexion difference on the entire gait cycle and correlation 0.91<ρ<0.99). We provide a proof-of-concept study, showing that such portable technology, considering the limited discrepancies with respect to the marker-based motion capture system and its working tolerance, could be used for gait analysis at a distance without leading to different clinical interpretation.

Introduction

Human pose estimation (HPE) is a key component for human motion analysis from RGB images and videos [1]. Among the many application fields, such as, sport performance analysis, human computer interaction, and action recognition, there is a growing interest in applying such a computer vision technology for the analysis of pathological gait detection [2], [3], [4], [5], [6]. The interest is mainly due to the fact that, differently from other solutions like multi-camera motion capture systems, multiple inertial measurement units (IMU), force plate, and pressure insoles, HPE does not require participants to wear markers or specialized sensors and can be rapidly applied without complicated setup.

Thanks to the advances in the convolutional neural networks (CNNs), there have been significant improvements in the accuracy of the 3D HPE. Different techniques rely on the training of end-to-end CNNs to predict the 3D human pose from each image (i.e., video frame) [7]. Other techniques achieve comparable accuracy by estimating the 2D pose first and then lifting the 2D pose to view-invariant 3D pose [8].

On the other hand, long latencies of the image processing (i.e., execution time of the inference phase) lead the system to work at low frequency and, as a consequence, to a sensible decrease in the overall estimation accuracy [9]. As both direct (3D) and indirect (2D + Depth) methods are computationally demanding, they require well-equipped and advanced architectures (i.e., multi-core CPUs supported by a GPU accelerator) to achieve accuracy and real-time processing.

One of the main challenges is to adopt markerless gait analysis through CNN-based software (SW) applications in telemedicine. Telemedicine comprises medical care and professionals that adopt information and communication technologies in order to provide assessment, diagnosis, goal setting, treatment education, and monitoring through remote devices [10], [11]. As to motion analysis, telehealth approaches are useful to provide evaluations in the management of conditions affecting movement (for example orthopedic or neurologic diseases, including, fractures, amputations, stroke, Parkinson disease, multiple sclerosis) and their follow-up. Furthermore, telehealth technologies for motion analysis may be useful for rehabilitation in order to assess body function limitations due to posture or movement alterations as well as to monitor execution and effects of physical exercises in patients suffering from disabling conditions.

Nevertheless, delivering gait analysis at a distance through cameras and HPE software requires the system to satisfy, beside accuracy and real-time, also portability and privacy compliance at the same time [12]. Addressing all these non-functional constraints in a seameless way requires the HPE software to execute on Internet-of-Things (IoT) devices at the edge [13], by which the video streams (i.e., the sensitive information) is elaborated close to the input sensor (i.e., the camera), while only the process results can be stored or sent over the communication network (i.e., Ethernet, WiFi) [14].

In this context, IoT devices for edge computing are resource-constrained architectures, in which memory and computing capability are bounded to guarantee portability, energy efficiency, and low-power consumption. State-of-the-art software solutions for HPE either do not run on these devices due to memory limitations or they achieve very low working frequencies (i.e., far from real-time)1, which can lead to very low accuracy in the kinetic gait analysis. Although some solutions have been proposed to apply HPE on mobile devices [16], they achieve real-time performance at the cost of coarse pose estimation (i.e., not adapt for clinical gait analysis) due to memory or computing power limitations [17].

In this work, we address the problem of satisfying multiple and different non-functional constraints like real-time computing, accuracy, privacy compliance, and portability for remote gait analysis. We present a portable embedded system (see Fig. 1(a)) that implements a real-time accurate 3D HPE SW application with a low-cost camera and a low-power computing device.

The main contributions of the work, which are the key elements of the proposed system, are the following:

  • We reimplemented a 2D HPE engine to take advantage of the heterogeneous computing elements (i.e., CPU and integrated GPU) of an off-the-shelf portable computing device (i.e., NVIDIA Jetson board) to achieve real-time computation at the edge.

  • We combined the inference phase with different image elaboration blocks to improve the accuracy while guaranteeing real-time and reduced memory footprint of the software. We selected a set of spatio-temporal filters and an interpolation procedure to extend the HPE from 2D to 3D and we implemented them thorough optimized parallel code to take advantage of multi-level parallelism on the device.

  • We present a quantitative analysis of the proposed platform accuracy in extrapolating both the 3D positions of human joints and the joint angles. We present an extended comparison with: (i) an infrared marker-based motion capture system (i.e., Vicon), and (ii) the OpenPose software application executed on a high-performance computing server to evaluate the applicability of the platform for gait analysis.

The article is organized as follows. Section 2 presents the analysis of the related work. Section 3 presents the proposed portable HPE platform. Section 4 reports the experimental results. Section 5 presents the discussion of the obtained results, while Section 6 is devoted to the conclusions.

Section snippets

Related Work

Measuring gait variables using computer vision has been increasingly applied in the recent years to assess mobility and risks of fall [5], [18] as well as to identify gait features in parkinsonism and other neurological diseases [19], [20]. Quantifying gait pathology with commodity cameras increases access to quantitative motion analysis in clinics and at home and enables researchers to conduct large-scale studies of neurological and musculoskeletal disorders [21]. The same technology has been

Method

Fig. 2 shows an overview of the proposed platform (MAEVE). It consists of communicating and concurrent elaboration nodes that implement a pipeline of computer vision primitives and inference-based applications. The pipeline allows for an efficient processing of the frame sequence (i.e., video stream), which are generated by a RGB-D cameras and elaborated, in real time, on the heterogeneous computing elements of the embedded boards (i.e., CPU and GPU). The result is a set of 3D keypoints (KP3D),

Experimental results

We evaluated MAEVE in terms of real-time computing performance on different off-the-shelf embedded boards. In every configuration, the input camera consists of an Intel RealSense D415 1920x1080 RGB frame + 1280x720 depth frame running at 60 frames per second (FPS). We then evaluated the MAEVE accuracy in two ways. First, by comparing the absolute 3D coordinates of the joint keypoints extrapolated by MAEVE and the marker coordinates extrapolated by an infra-red motion capture system during the

Discussion

In our experimental analysis, the HPE software at the state of the art (i.e., Openpose) has shown to achieve very high accuracy at the cost of high computational load and memory footprint. This is due to the fact that, to achieve high accuracy, the core of the pose estimation framework requires a deep and convolutional neural network architecture. As a consequence, such an inference application either does not run on memory bounded portable devices or, when run on the most efficient and

Conclusion

One of the main challenges of RGB-D based on 3D HPE is to achieve accuracy for clinical applications. Even more challenging is the HPE applied in the telemedicine practice, which requires the system to satisfy, beside accuracy, also portability, real-time, and privacy compliance constraints at the same time. In this article, we presented MAEVE, a 3D HPE that runs in real-time on a off-the-shelf portable computing board. The experimental results demonstrated that MAEVE achieves real time

Declaration of Competing Interest

We would like to confirm no conflict of interest, financial or other, exists. This manuscript is entirely original, has not been copyrighted, published, submitted, or accepted for publication elsewhere.

References (45)

  • Y. Guo et al.

    3-D canonical pose estimation and abnormal gait recognition with a single RGB-D camera

    IEEE Robotics and Automation Letters

    (2019)
  • G. Pavlakos et al.

    Coarse-to-fine volumetric prediction for single-image 3d human pose

    Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • H.-S. Fang et al.

    Learning pose grammar to encode human body configuration for 3d pose estimation

    Proc. of AAAI Conference on Artificial Intelligence

    (2018)
  • G. Wei et al.

    View invariant 3d human pose estimation

    IEEE Transactions on Circuits and Systems for Video Technology

    (2020)
  • M. Abbas et al.

    D-sorm: A digital solution for remote monitoring based on the attitude of wearable devices

    Computer Methods and Programs in Biomedicine

    (2021)
  • L. Yanicelli et al.

    Heart failure non-invasive home telemonitoring systems: A systematic review

    Computer Methods and Programs in Biomedicine

    (2021)
  • J. Calvillo-Arbizu et al.

    Internet of things in health: Requirements, issues, and gaps

    Computer Methods and Programs in Biomedicine

    (2021)
  • M. Sassi et al.

    Security and privacy protection in the e-health system: Remote monitoring of covid-19 patients as a use case

    Smart Innovation, Systems and Technologies

    (2022)
  • Z. Cao et al.

    Realtime multi-person 2d pose estimation using part affinity fields

    Proc. of IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • J. Zhang et al.

    Mobipose: Real-time multi-person pose estimation on mobile devices

    SenSys 2020 - Proc. of ACM Conference on Embedded Networked Sensor Systems

    (2020)
  • W. Yu et al.

    Human identification in health care systems using mobile edge computing

    Transactions on Emerging Telecommunications Technologies

    (2020)
  • S. Rupprechter et al.

    A clinically interpretable computer-vision based method for quantifying gait in parkinsons disease

    Sensors

    (2021)
  • Cited by (10)

    View all citing articles on Scopus
    View full text