Keywords

1 Introduction

Non-contact visual measurement using cameras can obtain various information by using the techniques of computer vision. The geometrical information, for example, shape and motion, has been one of the major topics. Shape and motion give useful information, but they are however often not sufficient to understand the real world. For example, for human-robot interaction, knowing a person’s muscle activity is useful for robot’s cooperative motion with the person. Motion planning of picking objects by a robot also would be easier if its physical properties such as weight and compliance are known in advance.

The physical information, such as muscle activity and weight, requires installation of additional sensors, for example force sensors, outside the vision system, while shape and motion are the information that can be directly obtained by using input images and camera parameters. Since the additional sensors are not always available especially in uncontrolled environments, it is beneficial if the physical information can be obtained by non-contact visual measurement, which is easier to apply to various situations compared to other systems that require contact with target objects.

In this paper, we focused on observing the muscle activity as one of the physical information. If the muscle activity is measured only by visual observation, it is useful to estimate the force generated by the muscle, and can be a clue to recognize the behavior and intention of the person. Since humans can estimate physical quantities only from visual clues, this approach is similar to the inference process of human.

Several visual features can be considered as the clues to estimate the muscle activity of a person such as the body pose, the articulated motion, and the muscle bulging. We measure the skin shape to observe the muscle bulging since it is expected to directly indicate the muscle activity, while the estimation from pose and motion needs the contact information with the environment.

The skin shape is not only affected by the muscle activity but also by the angle of the joint to which the muscle is attached. Even if a muscle is not activated, the length of the muscle changes according to the joint angle because the end point the muscle moves with the bone. Therefore, it is necessary to consider both the muscle activity and the joint angle. The question we tackle in this paper is if it is possible to predict muscle activity and joint angle from skin shape.

The proposed method is a learning-based approach. At the prediction step, the muscle activity and the joint angle is estimated only from the skin shape captured by non-contact sensor. At the training step, the data set of skin shape, muscle activity and joint angle is obtained, and their relationship is learned. The skin shape is captured by using a range sensor (a.k.a. depth sensor). The muscle activity is measured by electromyograph (EMG) sensors, which are attached to the skin above the target muscles, and record the electrical activity produced by muscles. The joint angle is measured by motion capture (mocap) system. Although the joint angle can be measured by a vision system without attaching markers on the human body, we used a mocap system for the accuracy at the training step.

The contribution of this paper is summarized as follows.

  • It is demonstrated that visual measurement of skin shape can be used to estimate muscle activity and joint angle by a learning-based approach.

  • Skin regions that corresponds to the muscles can be detectable as active regions from the training data set.

  • It is succeeded to predict muscle activity, joint angle from skin shapes by linear regression of the skin deformation in the active region.

2 Related Work

The recent techniques of human motion analysis enables the estimation of the internal joint torque or muscle tension [1, 2]. The joint trajectories can be computed, for example, by using the optical motion capture system which measures the trajectories from the markers attached on a human subject [3]. The joint torques are obtained from the inverse dynamics computation of an articulated body system, which is almost the same formula as mechanical system [4]. It usually requires the inertial properties of a human subject, which are estimated from literature data or identified [5]. The joint torques accentuated by muscles is extracted by subtracting the external joint torques driven by contact forces. Though the contact forces are measurable by utilizing some force sensors or force plates, the multi-contact situation makes it difficult to measure them individually; the mathematical optimization is often formulated to estimate them [2]. Each muscle tension can be estimated from the actuated joint torque with mathematical optimization techniques [2, 6], or be obtained from the combination of a physiological muscle model [7, 8] and EMG sensors. The mathematical optimization evaluates several physical and physiological terms in order to obtain one unique solution; however, the accurate estimation requires many reliable evaluation terms. Though EMG sensors measure individual muscle activation, they actually cannot be attached on all the muscles. The method using both mathematical optimization and EMG sensors is also investigated [9].

The articulated motion of human body can be estimated from 2D images without markers. The detection of 2D joint positions can be done using convolutional neural networks (ConvNets). Toshev et al. [10] first proposed a method based on ConvNets for detecting human pose i.e., 2D key points representing joint locations from a single image. Li et al. [11] used ConvNets to directly regress 3D human joints with an image. There are two main reasons for the improvements on accuracy of 3D human pose detection. In biomechanics and robotics, inverse kinematics has been well-studied and used to generate human pose from mocap by controlling joint angles. Previous approaches [12, 13] estimated 3D human pose from 2D key points by combining a statistical model and constraints such as joint limit [14], segment length [12] and symmetry. Some methods perform regression of joint angles or axis angles [15, 16] to estimate angular skeletal pose using ConvNets but the high nonlinearity prevents us from accurate prediction of joint locations. In this paper, it is assumed that the skeletal pose is given by mocap or these methods, and we used a mocap system for accuracy in the experiments.

Skin deformation according to body pose is important factor to generate a realistic model of human body in computer graphics. The methods of modeling muscles is classified to three approaches: geometrically-based, physically-based, and data-driven approaches [17]. The muscle deformation is model from the range data obtained by a depth camera in [18]. The skin deformation is learned with respect to the pose and acceleration of body parts by kernel regression in [19]. Various parts of body are modeled for graphics by simulating muscle and skin such as face [20], hand [21], lung [22] and upper body [23]. The relationship between skin shape and muscle force is learned in [24] to predict the force from skin shape while the skeletal pose is assumed to be fixed.

Skin deformation is based on nonrigid surface registration techniques, which are classified into three categories in terms of regularization that they use: smoothness regularization, isometric regularization and conformal regularization. Early approaches [25,26,27,28] are based on smoothness regularization. These techniques are very flexible to deform a template shape largely, but they are poor at preserving template details and mesh structures. Second, the isometric (as-rigid-as possible) regularization can preserve original template details and are commonly used in automatic registration techniques [29,30,31]. The drawback of these methods are that they are incapable of handling models with different sizes or those which undergo large local stretching. Third, the techniques based on conformal (as-similar-as possible) regularization [32,33,34] are proposed to achieve both flexibility in changing shapes and preservation of mesh structure. They are based on angle-preserving deformation.

3 Proposed Method

We consider the lower limb in this paper to simplify the situation, since some of the muscles are affected by various factors of multiple joints and muscles around them if they are attached to the joints of large degree of freedom like shoulder joints. The muscles of lower limb is related to the ankle joint and the movement can be controlled as the 1D motion, flexion/extension of the ankle. The motion of the ankle mainly depends on three muscles, Gastrocnemius, Soleus, Tibialis anterior muscles. Therefore, we analyze the relationship of the activity of the first two of these muscles, the ankle joint and the skin shape of lower-limb in this paper.

3.1 Data Acquisition

First, we describe how to acquire the data set to observe lower limb. Figure 1 shows the setup of the experiment. The skin shape of the lower limb of a subject is observed by using three range sensors placed around the subject. The range sensors are based on a projector-camera system [35] and capture the shape of the visible part from the sensor at 100FPS. The whole shape of the lower limb is captured by using three range sensors with a technique of reducing the crosstalk of multiple projected patterns [36]. The shape of the lower limb is reconstructed by merging range scans by Poisson reconstruction [37]. Figure 2 shows the examples of the skin shapes. The activity of muscles are low at Pose 1. The subject is standing on the toe at Pose 2, and the bulging of Gastrocnemius and Soleus muscles can be observed. The shape of EMGs on the skin is removed from the range scans and the skin shape is interpolated during merging them.

Fig. 1.
figure 1

The setup of the experiment: the skin shape of the lower limb of a subject is observed by using three range sensors placed around the subject. The joint angle of the ankle is measured by a mocap system. The muscle activities of lower limb are measured by EMG sensors placed on the skin above the muscles.

Fig. 2.
figure 2

Two examples of skin shapes: the activity of muscles are low at Pose 1. The subject is standing on the toe at Pose 2, and the bulging of Gastrocnemius muscle can be observed.

The joint angle of the ankle is measured by a mocap system. The markers are attached on the knee, ankle and toe, and the angle of the ankle joint is calculated by solving inverse kinematics. We used the mocap system since it gives accurate and stable results, and estimating the angle without markers is one of our future work.

The muscle activities of lower limb are calculated from the data measured by EMG sensors placed on the skin above the muscles. Two EMG sensors are used simultaneously, and they are placed on Gastrocnemius and Soleus muscles. Figure 3 shows an example of the muscle activity and the ankle angle according to the motion of the lower limb. The muscle activity is defined as the integrated EMG signal normalized by the signal of the maximal voluntary contraction (MVC) [38]. If it is close to zero, the muscle is relaxed.

Fig. 3.
figure 3

The trajectory of muscle activity and ankle angle is shown for an example of the motion. The muscle activity is calculated from the data measured by a EMG sensor.

3.2 Calculating Deformation of Skin Shapes

The muscle activity affects the skin shape by the deformation of muscle shape under the skin. Since the skin shapes are acquired by the range scanners frame-by-frame, the deformation of skin shape is needed to be calculated by finding the correspondence between the shapes. In this paper, we use a template shape to compare with each range scan. It is constructed from the one of the relaxed pose with low muscle activity.

In this paper, we used the as-conformal-as possible approach [34] to deform a template model. This method constrains the transformations of the model as similarity transformations (scale + rotation) locally as much as possible, which allows us to fit the model to the target geometry in a flexible way while preserving the mesh structure with less distortions. The cost function is defined as follows.

$$\begin{aligned} E(X)\,=\,&w_{\mathrm{ASAP}}E_{\mathrm{ASAP}} + w_{\mathrm{Closest}} E_{\mathrm{Closest}} \nonumber \\&+ w_{\mathrm{Marker}} E_{\mathrm{Marker}}, \end{aligned}$$
(1)

where \(E_{\mathrm{ASAP}}\) constrains deformation as similar as possible, and \(E_{\mathrm{Closest}}\) penalizes distances between the closest points of template and target surface. \(E_{\mathrm{Marker}}\) is the positional constraint of deformation by using mocap markers to avoid the shift during deformable registration. Four marker landmarks at the inner/outer knee and ankle are used. The energy is minimized using the alternating optimization technique where the first step optimizes the vertex positions with fixed transformations and the second step optimizes affine transformations with fixed vertex positions.

3.3 Detecting Active Regions Corresponding to Muscles

The skin actively deforms according to the motion is limited to the small number of the regions in the whole skin. The proposed method detects the regions of the skin which are deformed largely according to muscle contractions in order to use them for predicting muscle activities and joint angles.

First, the part of the lower limb is extracted from the template model that is deformed to each range scan. The deformed templates are then aligned to the original template shape by rigid transformation so as to minimize the deformation vectors. It is necessary to reduce the error of the skeletal pose estimated by the mocap system.

Fig. 4.
figure 4

The displacement caused by deformation for each vertex is calculated based on the normal vector at the vertex.

Second, the displacement caused by deformation \(r_i^{(t)}\) of vertex i at sample t is calculated as follows.

$$\begin{aligned} r_i^{(t)} = \varvec{n}_i^{(t)} \cdot (\varvec{p}_i^{(t)} - \varvec{q}_i^{(t)}), \end{aligned}$$
(2)

where \(\varvec{p}_i^{(t)}\) and \(\varvec{q}_i^{(t)}\) are the vertex positions of the deformed and original model, respectively. \(\varvec{n}_i^{(t)}\) is the normal vector at the vertex \(\varvec{q}_i^{(t)}\) as shown in Fig. 4.

Fig. 5.
figure 5

The variance of the displacement for all vertices is calculated by using the training data set. The red parts indicate that the variance is large, and the regions of Gastrocnemius and Soleus muscles can be recognized from the variance. (Color figure online)

Since the active regions are assumed to deform largely according to the muscle activity and the joint angle. If the shape is captured for various poses and state of muscles, the variance of the displacement is expected to be large. Figure 5 shows the variance of the displacement calculated for all vertices by using the training data set. The red part indicates that the variance is large, and the regions of Gastrocnemius and Soleus muscles can be recognized from the variance.

Fig. 6.
figure 6

The red regions are regarded as active by choosing the vertices of top 25% variance. The number of vertices are reduced to 25% of the original template by decimating the model. (Color figure online)

The top 25% vertices are chosen as the active regions that are used for prediction. Figure 6 shows the regions chosen as active. The number of vertices are reduced to 25% of the original template by decimating the model to lower the degree of freedom in this paper. For predicting the activity of Gastrocnemius and Soleus muscles, the regions around the calf are used based on the knowledge of biomechanics, while the all regions are used for predicting the ankle angle. The set of active vertices are defined as \(V_{\text{ act }}\) for predicting the muscle activities and \(V_{\text{ ang }}\) for predicting the ankle angle, respectively.

3.4 Predicting Muscle Activity and Joint Angle from Skin Shape

To learn the relationship between skin shape and muscle activity, a muscle model is assumed. Each muscle force along the fiber direction is often modeled as the sum of an active and passive part as shown in Hill-type models [7, 8]. The passive part depends on only the elastic property of each muscle, whereas the active one is generated by the muscle activity. The active component can be represented by the products of the activity level, the length depending function, the velocity dependent function, and the constant value related to the maximum muscle contraction [39, 40]. In this paper, let us assume the quasi-static muscle contraction and the linear elastic isotropic property for muscle. In the assumption, the local displacement as well as the active stress can be considered to be linear with respect to the muscle activity level. When also assuming the small change of the relative angle between the normal direction on each skin surface and the fiber direction in the nearest muscle, the displacement of each skin vertex is approximated to be linear to the corresponding muscle activity level.

Based on the above assumption, the following linear model between the skin shape, muscle activity and ankle angle is considered.

$$\begin{aligned}&\varvec{y} = \varvec{X} \varvec{\omega }, \qquad \varvec{y} = \begin{bmatrix} \vdots \\ y^{(t)} \\ y^{(t+1)} \\ \vdots \end{bmatrix} \varvec{\omega } = \begin{bmatrix} \vdots \\ \omega _{V(i)} \\ \omega _{V(i+1)} \\ \vdots \end{bmatrix} \\&\varvec{X} = \begin{bmatrix} \ddots&\vdots&\vdots&\\ \cdots&r_{V(i)}^{(t)}&r_{V(i+1)}^{(t)}&\ldots \\ \cdots&r_{V(i)}^{(t+1)}&r_{V(i+1)}^{(t+1)}&\ldots \\&\vdots&\vdots&\ddots \end{bmatrix}\nonumber \end{aligned}$$
(3)

\(y^{(t)}\) is t-th sample of the one of two muscle activities and the angle of the ankle joint, which are measured by the EMG sensors and the mocap system. \(\omega _{V(i)}\) is the weight of i-th vertex in the active region V(i) to be estimated from the training data set. V(i) is \(V_{\text{ act }}\) for predicting the muscle activities and \(V_{\text{ ang }}\) for predicting the ankle angle, respectively. \(r_{V(i)}^{(t)}\) is the displacement calculated for each vertex of the t-th sample. The weight vector \(\varvec{\omega }\) is estimated by the least square solution for three targets, two muscle activities and the ankle angle. In the step of prediction, the displacement is given by calculating deformation and the muscle activities and the ankle angle are estimated by using the weight vector.

Fig. 7.
figure 7

In the sequence for validation, the subject stands up on the toe by moving the ankle from 20\(^\circ \) of dorsiflexion to 40\(^\circ \) of plantar flexion slowly in five seconds. The middle row is the captured shape, and the bottom row is the skeleton reconstructed from the predicted angles of the ankle joint. The red part of the shape is used for predicting the muscle activities and the ankle angle. (Color figure online)

Fig. 8.
figure 8

The distribution of the muscle activity of Gastrocnemius muscle and the ankle angle is used for training parameters.

4 Experiments

As the training data set, we acquire 8K samples of skin shape, muscles activity and joint angle. The angle of the ankle changes from 20\(^\circ \) of dorsiflexion to 40\(^\circ \) of plantar flexion during the acquisition. Figure 8 shows the distribution of the muscle activity and the ankle angle is used for training parameters. The muscle activity is captured so that it is distributed two-dimensionally over the value range of both the muscle activity and the ankle angle.

The decimated template consists of 3.7K vertices while the original template shape have 15K vertices. The active regions used for prediction consist of 138 and 176 vertices for estimating the muscle activities and the ankle angle, respectively.

As the validation data, we use a sequence that the subject stands up on the toe by moving the ankle from 20\(^\circ \) of dorsiflexion to 40\(^\circ \) of plantar flexion slowly in five seconds. Figure 7 shows the captured shape and the skeleton reconstructed from the predicted angles of the ankle joint. The red part of the shapes is used for predicting the muscle activities and the ankle angle. Although the shape of the foot and ankle is not included for prediction, the proposed method calculates the angles. The distribution of the muscle activity and the ankle angle is shown in Fig. 3.

The predicted results of two muscle activities and the ankle angle are shown in Figs. 9, 10 and 11. The red lines are the results measured by EMG sensors and the mocap system. The blue lines are the predicted results. 500 samples are captured in five seconds, and the average errors are 9.0% and 6.7% of the MVCs for the activities of Gastrocnemius and Soleus muscles, and 1.8\(^\circ \) for the ankle angle, respectively.

Fig. 9.
figure 9

The prediction result of the activity of Gastrocnemius muscle. (Color figure online)

Fig. 10.
figure 10

The prediction result of the activity of Soleus muscle. (Color figure online)

Fig. 11.
figure 11

The prediction result of the joint angle of the ankle. (Color figure online)

Fig. 12.
figure 12

The red regions indicates that the vertex have large weights for prediction. As to Gastrocnemius muscle, the large weights exist around the upper parts of the calf where the bulges of Gastrocnemius muscle is visible on the skin. The weight for Soleus muscle is similar to that for Gastrocnemius muscle, but there is large weight at the side of the lower limb, where the Soleus muscle is visible on the skin. The weight for predicting the ankle angle have large values on both the front and back sides of the lower limb. (Color figure online)

The weight vector \(\varvec{\omega }\) indicates how much the vertices that affect the prediction of the muscle activities and the ankle angle. Figure 12 shows the magnitude of the weight for each vertex. The red regions have large weights. With regard to Gastrocnemius muscle, the large weights exist around the upper parts of the calf where the bulges of Gastrocnemius muscle is visible on the skin. The weight for Soleus muscle is similar to that for Gastrocnemius muscle since they are both related to the motion of lowering the toe. But, there is large weight of Soleus muscle at the side of the lower limb, where the Soleus muscle is visible on the skin. The weight for predicting the ankle angle have large values on both the front and back sides of the lower limb. It indicates that it is important to observe the whole lower limb to predict the ankle angle, and is reasonable because the ankle angle is affected by the muscles on both sides, mainly by Gastrocnemius, Soleus and Tibialis anterior muscles.

The Gastrocnemius and Soleus muscles cooperate with respect to the ankle joints, and it is difficult to discriminate them by inverse kinematics, inverse dynamics, or mathematical optimization. Since the proposed method observes the deformation of individual muscle, the activity of the cooperative muscles can be uniquely estimated. It is one of the contributions that is meaningful from the viewpoint of the biomechanics.

5 Conclusion

In this paper, we have proposed a method of predicting muscle activity and joint angle of human body from skin shape. The both factors are needed to be considered simultaneously since the muscle activity and the joint angle affect the skin shape. The proposed method is a learning-based approach that uses the data set of the skin shape, the muscle activity and the joint angle, and trains a linear regressor for predicting muscle activity and joint angle from skin shape. The active regions corresponding to the muscles are extracted from the training data, and the weight parameters for prediction is learned for the active regions. In this paper, we chose a simple situation of a lower limb that the ankle moves one dimensionally to lower the toe. The muscle activity and joint angle are successfully predicted even in the case that the both factors change simultaneously. The learned weights are reasonable from the viewpoint of biomechanics, and it indicates that the skin shapes gives useful information to infer the state of muscle and joint.

The approach in this paper requires the training data set to learn the regressor. Since it is costly to obtain the data for many subjects, the next step is to study a scalable approach that can predict the state of muscle and joint for many people even if they are not included in the training data set. One of the promising approaches is the prediction based on a biomechanical model by estimating the muscle structure for each individuals from the skin shape. A model-based approach are expected to reduce the cost of collecting individual data. Additionally, we plan to apply the proposed method to different part of the body in the future work.