Keywords

1 Introduction

In Europe there are 370,000–740,000 out-of-hospital cardiac arrests every year with a survival rate as low as 7.6% [1]. Many are witnessed by a bystander and the bystander might not be skilled in cardiopulmonary resuscitation (CPR), thus there is a need for guided assistance to ensure the provision of quality CPR. The importance of quality CPR has been confirmed in many publications [2,3,4].

Smartphone applications for communication with the emergency unit and sending GPS location already exists in solution like Hjelp 113-GPS App by the Norwegian air ambulanceFootnote 1. Our group (Engan et al.) has earlier proposed an application for dispatcher communication which detects the compression rate [5]. Another important CPR quality metric is the compression depth which is crucial for generating sufficient circulation [6], thus providing the dispatcher with depth information can improve CPR quality and possibly save lives.

Previously an accelerometer has been used to estimate the compression depth with the purpose of providing feedback in emergency or in training situations [7,8,9]. This requires the smartphone to be held in the hand of the bystander or at the chest of the patient during CPR. Since it is very important to maintain the phone connection between the bystander and the dispatcher we believe that placing the smartphone next to the patient and using the camera to perform the measurements would be more suited for emergency situations. This ensures that the microphone and loud speaker is not covered and that the phone connection is not interrupted by accidentally pressing a button. To our knowledge there has been made no attempt to estimate the compression depth from a smartphone camera with the attention to provide information to the dispatcher in an emergency situation. In this paper we have investigated this problem and propose a system that uses the front camera on a smartphone to estimate the compression depth. Figure 1 gives an overview of the proposed system, using generated Accumulative Difference Images (ADIs) [10] for motion segmentation to both detect the bystander position in the frame and to estimate the compression depth. These steps will be further explained in Sect. 3.

Fig. 1.
figure 1

Proposed system for detection of compression depth. Top: detecting bystander and regions of interest (ROIs). Bottom: detection of compression depth. (Color figure online)

2 Modelling of Scene

Modelling of the scene is necessary in order to estimate both the bystander‘s position in world coordinates and to compensate for the camera angle and position relative to the bystander.

2.1 Image to World Coordinates

We can find a model for the connection between world coordinates and image coordinates by calibration of the camera. By using camera coordinates for the world points it is sufficient to use the internal camera matrix K. The radial distortion must also be found and compensated for. Then we have

$$\begin{aligned} {\lambda \begin{bmatrix}x_c\\y_c\\1\end{bmatrix}}=K P_0{\begin{bmatrix}x_{w}\\y_{w}\\z_{w}\\1\end{bmatrix}}={\begin{bmatrix}\alpha&0&x_0 \\0&\beta&y_0 \\0&0&1 \end{bmatrix}}{\begin{bmatrix}1&0&0&0 \\0&1&0&0 \\0&0&1&0 \end{bmatrix}{\begin{bmatrix}x_{w}\\y_{w}\\z_{w}\\1\end{bmatrix}}} \end{aligned}$$
(1)

where \(\lambda =z_w\), \(P_0\) a projection matrix, \(\alpha \) and \(\beta \) the focal length of the camera and \(x_0\) and \(y_0\) the principal point offset in pixels. The distance, \(z_w\), can be expressed \(z_w = z_{w0} + \varDelta z\) where \(z_{w0}\) is the distance between shoulders and ground and \(\varDelta z\) is the compression depth in z-direction. A derivation of Eq. (1) for \(\varDelta z<< z_{w0}\) gives the two expressions, approximated to be linear:

$$\begin{aligned} (y_c-y_0)=\beta \frac{y_w}{z_w}=\beta \frac{y_w}{z_{w0} + \varDelta z}=\beta \frac{y_w}{z_{w0}} \frac{1}{1 + \frac{\varDelta z}{Z_{w0}}} \approx \beta \frac{y_w}{z_{w0}} (1-\frac{\varDelta z}{z_{w0}}) \end{aligned}$$
(2)
$$\begin{aligned} (x_c-x_0)=\alpha \frac{x_w}{z_w}=\alpha \frac{x_w}{z_{w0} + \varDelta z}=\alpha \frac{x_w}{z_{w0}} \frac{1}{1 + \frac{\varDelta z}{Z_{w0}}} \approx \alpha \frac{x_w}{z_{w0}} (1-\frac{\varDelta z}{z_{w0}}) \end{aligned}$$
(3)
Fig. 2.
figure 2

Model of scene. Ellipsoid in position 1,2, and 3 illustrates the shoulder positions when compressing 50 mm. L1 and L2 illustrates the blind spot problem as a consequence of the different motions. p.A, p.B, p.C and p.D shows the possible camera positions for detections. The pink box shows the observed motion bands in the camera positions p.A, p.B and p.C. (Color figure online)

Fig. 3.
figure 3

Enlargement model for moving objects. The x-axis shows the observed size of the 45 mm square object in pixels and the y-axis show the distance between the object and the camera. Enlargement in % for object approaching 50 mm at 800 and 600 mm are marked.

Figure 2 shows a model of the scene. Ellipsoid 1, 2 and 3 illustrates the shoulder positions of the bystander. For illustration purpose ellipsoid 2 and 3 are scaled relative to ellipsoid 1 according to the camera enlargement model for approaching objects. p.Ap.Bp.C and p.D are camera positions along the positive y-axis where position p.D defines the limit for camera positions where the bystander‘s shoulders are visible in the camera‘s field of view (FOV) and is a function of the distance between ground and shoulders along the z-axis given by \(\frac{z_{w0}}{2}\). L1 and L2 represents motion vectors for the observed object enlargement in the image frame due to compression motions. The pink box is a zoomed in area of C illustrating the observed motion band in different camera positions.

The position of the ellipsoid marked as 1 illustrates the bystanders starting position, and 2 illustrates the new position if the compression motion is strictly in z-direction and the compression depth, \(\varDelta z\), is 50 mm. The enlargement for approaching objects for different \(z_{w0}\) is found from Eqs. (2) and (3) and is illustrated by using a 45 mm approaching object in Fig. 3. Since our method for detecting motion only captures changes in the contour of the bystander, a movement from shoulder position 1 to 2 and a camera positioned where L1 meets the ground floor line, would be represented by the same values for \(x_c\) and \(y_c\). Thus, we would not be able to detect the change in the generated ADI and this position is further referred to as the blind spot and must be taken into account.

As shown in Fig. 2 a camera positioned where L1 meets the ground line is not possible since the camera would be placed underneath the patients shoulder. Camera positions p.A, p.B and p.C should therefore have no problem avoiding the blind spot problem. Positions where y-value > p.C needs to be avoided since the bystander‘s shoulders no longer is guaranteed to be a part of the image frame. If the compression motion was strictly in z-direction the detected motion band should increase for each displacement along positive y-axis. This is not the case and it turns out that a compression motion will vary but are typically slightly positive along the y-axis, illustrated by the red ellipsoid at position 3 where line L2 indicates an approximation to a typical motion vector. This causes the blind spot line to move to the other side of the indicated camera positions p.A, p.B and p.C. As a consequence, the detected motion band will shrink instead of increase as the camera is placed further along the positive y-axis. Since the y-value for \(L2>p.D\), the blind spot is not a problem, this is also true for a smaller bystander with \(z_{w0}<800\). Equation (1) and (2), as well as Fig. 3 shows that the linear model will change with \(z_{w0}\), which is bystander and patient dependent (length of arms, size of torso).

2.2 Camera Angle Model

The camera angle problem is illustrated in the zoomed in area of circle C in Fig. 2 (pink box). Although the distance from the camera to the shoulders changes relatively little between positions p.A, p.B and p.C, the displacements causes big variations in observed motion band. Since the compression movements will have small variations, the compensating model for displacement in y-direction is estimated by observing detected motion bands in given positions and at given compression depths. As the red, green and blue line in the pink box shows, this reduction of detected motion band is approximately linear which was also the case when studying the different detection results. The compensating model for the displacement in y-direction in the area between position p.A and p.C is estimated to be:

$$\begin{aligned} ang_{corr} = 1 + 0.0026(act_{pos} - p.A) \end{aligned}$$
(4)

where \(ang_{corr}\) is the compensating factor for displacement along positive y-axis and \(act_{pos}\) is the calculated position on the y-axis based on image to world conversion from Eq. (2). The model implies that a displacement from position p.A to p.C would mean a 26 % decrease in detected motion band. If the camera is positioned closer to the patient than position p.A the observed motion band would increase and the model would scale down the detections. This will not be an issue here since the optimal position p.A is next to the patient.

3 Proposed System

In Fig. 1 the system for detection of compression depth are shown step by step. The figure is divided into two main sections; detection of bystander and regions of interest (ROIs) (top), and detection of compression depth (bottom). ADIs [10] are used to carry out both sections. ADI is a well known method for motion segmentation and has earlier been used in many applications such as object tracking [11], vehicle surveillance systems [12] and smoke detection [13].

3.1 Detection of Bystander by Motion Segmentation

In the following let f indicate an \(N \times K\) video frame where N is number of rows and K is number of columns, and f(rck) corresponds to row, r, and column, c, in frame number k.

From experiments we found that using three subsequent frames from the middle section of the sequences were enough to generate an ADI that revealed the position of the bystander. Spatial de-noising is done by Gaussian smoothing and the images are corrected for lens distortion [14] prior to ADI generation. The ADI is initialized by generating a \(N \times K\) sized frame of zeros. Furthermore first of the three frames, \(k_0\), is the reference frame and the ADI, A(rc), is found as:

$$\begin{aligned} A_k(r,c) = {\left\{ \begin{array}{ll} A_{k-1}(r,c)+ 1 &{} \text {if } \text {}|f(r,c,k_0)-f(r,c,k_0+i)| > T \\ A_{k-1}(r,c) &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(5)

where T is a threshold value and i is an index for the subsequent frames. The resulting ADI used in detection of bystander will then consist of values from 0 to 2.

The generated absolute ADI is further correlated with templates to find the position of the bystander. This is illustrated in 1.B and 1. C in Fig. 1. The templates used are scaled and resized versions of a template of a person‘s head and shoulder contour created from an example sequence. To avoid higher correlation caused by thicker lines when the scale factor is above 1, a morphological skeletonization or thinning [15] of the scaled template is performed. The template position of the best match indicates the position of the bystander.

3.2 Position Compensation

In the detection of compression depth the information of the motion band in the shoulder areas are used. The desired camera position is when the bystander is centred in the image frame and the camera is placed close to the patient’s arm. If the camera is positioned elsewhere compensation is needed. When compensating for position the bystander‘s shoulder points has to be detected. By starting in the first column, \(c_0\), in the template match square marked \(T_{size}\) in Fig. 1(1C), the columns for the detection center points are found as follows:

$$\begin{aligned} c1=c_0+(\frac{1}{6} \cdot K1), \qquad c2=c_0+(\frac{5}{6} \cdot K1) \end{aligned}$$
(6)

where K1 indicates the number of columns (width) of the matched template. Further the row number where the motion band starts is found by:

$$\begin{aligned} r_{i}=\min _{r}(A(r,c_i)\geqslant 1) \end{aligned}$$
(7)

where \(i=1,2\) indicates the two ROIs and r the row elements in the column \(c_i\). Together with c1 and c2 these rows define the detection center points \(p_1(c1,r1)\) and \(p_2(c2,r2)\). The points are marked with a red circle in Fig. 1(1C). \(p_1(c1,r1)\) and \(p_2(c2,r2)\) are then converted from image to world coordinates, \(w_1(x,y)\) and \(w_2(x,y)\) by solving Eqs. (2) and (3) for \(w_1(x,y)\) and \(w_2(x,y)\). The actual distance, \(d_{act, i}\), between the bystander and the camera is found by:

$$\begin{aligned} d_{act, i}=\sqrt{w_i(x)^2 + w_i(y)^2 + z_{w0} ^2} \end{aligned}$$
(8)

for \(i = 1,2\) which represents the two detections points and \(z_{w0}\) is illustrated in Fig. 2. The scaling factors for actual distance, \(dist_{corr}\), for each detection point is found by:

$$\begin{aligned} dist_{corr,i}=\frac{d_{act, i}}{z_{w0}} \end{aligned}$$
(9)

Further the compensating factor, \(ang_{corr}\), for the camera angle is found by using the model given in Eq. (4). The same compensating factor is used for both \(p_1(c1,r1)\) and \(p_2(c2,r2)\) since these points lie approximately on the same horizontal line in the image frame.

3.3 Detection of Compression Depth

For the dispatcher-bystander communication to be efficient, the dispatcher should guide one problem at a time, thus the compression rate should first be guided to the desired range (100–120 cpm). Detection of compression rate is described in [5]. Knowing that the compression rate is in the desired range also makes the compression motion more predictable and furthermore the compression depth estimation less complicated.

The steps in detection of compression depth are shown in Fig. 1(2) and the compression depth is estimated every half second. Consider a videostream with 30 fps, providing \(\frac{30}{2}=15\) non-overlapping video frames in each compression depth estimation, \(I(r,c,l_s)\), where l is the estimation number and s is a index for image number in this estimation. First, the images are spatially de-noised by Gaussian smoothing and corrected for lens distortion. Furthermore \(I(r,c,l_1)\) is used as the reference frame and the other 14 frames to generate an ADI as shown in Eq. (5) and in Fig. 1(2A). For each new estimation the ADI is first set to zero before generating the ADI for the next estimation.

A reasonable width for the ROIs is found to be \(M_{ROI}=21\) columns when using image frame size of \(N \times K =480 \times 640\). The vertical motion band along the head/arms is then avoided but we still use enough columns to get a good average measurement of the motion band. An example is shown in Fig. 1(2B) where the ROIs is marked with red. Motion band vectors, \(m_{band, i}\), for motion band size in columns, j, in the ROIs \(i=1,2\) are found by:

$$\begin{aligned} m_{band, i}(o)= \sum _{q=1}^ {N}{A(q,j)>1} \end{aligned}$$
(10)

where o is a vector index for the columns used and q represents the row number.

Further the mean of these vectors are multiplied with their two compensating factors - position in image frame and camera angle, providing the corrected pixel size of the motion bands, \(m_{mean, i}\):

$$\begin{aligned} m_{mean, i}= \frac{1}{M_{ROI}}\sum _{o=1}^ {M_{ROI}}{m_{band, i}(o)\cdot dist_{corr,i} \cdot ang_{corr}} \end{aligned}$$
(11)

used to find the combined detected motion band, \(m_{tot}\), for this estimation, l:

$$\begin{aligned} m_{tot}(l) =\frac{1}{2}(m_{mean, 1}+m_{mean, 2}) \end{aligned}$$
(12)

The last step is to filter the detections with a 3 coefficient weighted FIR filter to remove some of the noise caused by random movements from the bystander. The filter is selected from experimenting with different filter order and coefficient values to best suppress rapid changes without loosing important compression depth change information. \(CD_{det}(l)\) represent the compression depth detection for estimation l and are found by:

$$\begin{aligned} CD_{det}(l)=0.3\cdot m_{tot}(l)+0.35\cdot m_{tot}(l-1)+0.35\cdot m_{tot}(l-2) \end{aligned}$$
(13)

4 Experiments and Datasets

All compressions are performed on Resusci Anne QCPR Footnote 2 by the same bystander with \(z_{w0}=800\). Resusci Anne QCPR measures, among other things, the compression depth with an accuracy of ±15 % and these data are used as reference data in development and verification testing of the proposed system. The smartphone used for the recordings is a Xperia Z5 Compact (Sony, Japan).

The results are presented with Average error: \(\mu _{E} =\frac{1}{L}\sum _{l=1}^{L} | CD _{det}(l)- CD _{true}(l)|\) where L is number of estimations and \(CD_{true}(l)\) is the reference signal, and Performance, P, defined as percentage of the time where the \(| CD _{det}(l)- CD _{true}(l)|<10\) [mm]. According to the European Resuscitation Council Guidelines 2015 [16] 50–60 mm is the appropriate compression depth. A study of Stiell et al. [6] found that compression depth in the interval 40.3 to 55.3 mm provided maximum survival rate and the peak was found at 45.6 mm. Thus, the limit for accepted detection depths when calculating the P is here chosen to be \(\pm 10\) mm.

Each test starts with a target compression depth of approximately 20 mm and the target depth is gradually increased to 60 mm (maximum compression depth on Resusci Anne QCPR doll) during the 80–90 s recordings. The compression rate is in the desired range (100–120 cpm) for all tests. The detection of the bystander and the corresponding shoulder areas is performed once, and thereafter used throughout the sequence. Two different ways of finding the bystander’s position are used; completely automatic using the method described in Sect. 3.1, and manually by a visual inspection.

The camera is calibrated with the procedure described in [14], which is based on [17, 18]. The threshold used in generation of ADI is set to 50 and in the preprocessing of the images a Gaussian filter mask of size \(N=13\) with \(\sigma =3\) is used to reduce noise.

Modelling Experiment, Dataset 1

Equation (2) provides a theoretical conversion between pixels and mm. An experiment has been carried out to design a model for this conversion since a person performing compressions have larger movements than the actual compression depth itself. Dataset 1, D1, consist of 6 recordings where the phone for each recording is picked up and replaced at a point somewhere near the target of the optimal phone placement. The linear regression model for converting motion band in pixels to compression depth in mm is found to be:

$$\begin{aligned} CD_{conv}(l)= 2.7285 \cdot CD_{det}(l) - 13.9692 \end{aligned}$$
(14)

The data spread for D1 and the linear conversion model is shown in Fig. 4a.

Fig. 4.
figure 4

(a): The spread of D1 and the connection between detected motion band in pixels and the actual compression depth at that time. Linear regression model is shown in purple. Different colors correspond to different recordings. (b): Scene for recording D2. The triangular system of black X‘s marks the phone position for each recording. (Color figure online)

Verification Test, Dataset 2

Dataset 2, D2, consists of 9 recordings, each with the phone placed at a different position marked with black X in Fig. 4b. If we define the desired position as (0, p) where p represent position p.A in Fig. 2, these positions corresponds to (–100, p), (–50, p), (0, p), (50, p), (100, p), (–50, p+50), (0, p+50), (50, p+50) and (0, p+100). The values of the coordinates are given in millimetres. As shown on the smartphone in the figure, the (0, p+100) position is close to the limit of where the shoulders are included in the image frame, and is therefore the furthest distance from the bystander used in the recordings of D2. The y-coordinates chosen for D2 positions corresponds to position p.A, p.B and p.C in Fig. 2.

Fig. 5.
figure 5

Results for verification test, arranged in the same triangular form as seen in Fig. 4b. Blue graphs represent the reference data, orange the results with automatic detection of bystanders shoulders and red with manual detection of bystanders shoulders. The x-axis shows the estimation number (estimation each 0.5 s) and the y-axis shows the depth in millimetres. (Color figure online)

Table 1. Detection result for verification test performed on D2. Results are given as Average error, \(\mu _E\), with \(\sigma \) given in parentheses and Performance, P. Columns to the left, automatic detection of bystander‘s shoulder points. To the right, manually detection of bystander‘s shoulder points.

5 Results and Discussion

Table 1 shows the result from the proposed system, where the model found from D1 is tested on D2. The results from automatic detection of bystander shows poor results for position 2 and partly for position 1. By manually choosing the ROIs we get better results for position 1–4, but poorer results for position 5–9. The standard deviation given in parenthesis reveals little or no significant difference between the two methods for each position. Figure 5 also shows the results for each of the 9 positions in D2 arranged in the triangular form for the positions as in Fig. 4b. The reference data are shown in blue, the automatic bystander detection results in orange and when the bystander is manual detected in red. It can clearly be seen that the detection points chosen in automatic detection of bystander’s shoulder points for position 2 provides poor detection results. The overall results indicates that as a consequence of determining the ROIs only once we might not have found suiting ROIs for the whole sequence, and that the detection results depend largely on the detection points chosen.

6 Conclusion and Future Work

The proposed system shows promising results for detection of compression depth by the use of a smartphone camera under the circumstances investigated in this paper. Although all tests are performed by only a single bystander with known distance between ground and shoulders, the model could be adapted for different distances.

In future work we will test the system for different bystander with known size/arm-length, as well as estimating the distance to the bystander when the distance is unknown. The latter is expected to be challenging since a small bystander would be similar to a big bystander further away.

Since the system is planned to be a part of an existing application for dispatcher feedback [5], the user could possibly type in some user information (height weight, age) when downloading and installing the app. This information would not only be useful for estimating distance, but would also be information relevant for the dispatcher. The system must also be able to track the bystander and to update the ROIs every 5 s or so during detection. Templates used to detect the bystander can here be developed from previous analysed ADIs. It could also be useful to use more of the information in the detected motion band when deciding the compression depth.