1 Introduction

Three-dimensional (3D) quantitative evaluation of spine deformities is fundamental for the assessment of scoliosis surgery. Clinical parameters can be computed from 3D reconstructions from biplanar radiographs, helping the surgeons with patients’ follow-up and in assessing surgery’s clinical outcome [1]. Many difficulties arise from the scoliotic deformation present in pre-operative radiographs. The vertebrae are overlapped with soft tissues in the X-ray projection, and the vertebral axial rotation changes the vertebra appearance in radiographs. For postoperative radiographs, the presence of metallic hooks, rods and screws masking the spinal structures is challenging for reconstruction methods. Accurate clinical parameters extraction was addressed with semi-automatic methods [2, 3], however, they require non-negligible user supervision and training. For postoperative radiographs, the mean 3D reconstruction time was superior to 12 min [3] limiting their use in clinical workflows. Moreover, the occultation due to spine instrumentation degrades the reproducibility of results [3]. A fast semi-automatic method [4] has been proposed for decreasing user interaction time by using a statistical model of a landmarks-based vertebra representation. This representation, however, was too simple to determine the Cobb and kyphosis/lordosis clinical angles, since it did not describe the 3D orientations of the vertebral endplates.

An automatic method providing an estimate of a detailed 3D reconstruction from both pre-and post-operative biplanar radiographs would be highly beneficial. For pre-operative, a method for spine detection from 2D images using deep neural networks [5] was proposed. This method, however, does not allow to recover 3D clinical parameters and is not adapted to process postoperative radiographs. The method proposed by [6] detects the spine silhouette in both images to avoid the occultation problem due to the instrumentation. This method relies on a statistical model to recover the vertical position of vertebrae, making the estimation of vertebral endplates’ positions and orientations dependent on the statistical model and not on the images information.

In this paper, we propose a method for the automatic 3D spine reconstruction by combining two statistical priors for 3D/2D registration: (1) 3D global spine and vertebrae shape priors using a PCA-based statistical shape model (SSM), and (2) vertebra appearance priors using convolutional neural networks (CNN) and considering simultaneously biplanar image information. To prevent the registration process from diverging for postoperative radiographs, the metallic structures were detected and removed with an image in-painting method. Clinical parameters could then be automatically determined at pre-and post-surgery times. The method accuracy was assessed for vertebrae poses and clinical parameters using an expert-generated 3D reconstruction.

2 Automatic 3D Spine Reconstruction

A 3D/2D registration of vertebrae, based on CNN, is regularized via an SSM of the spine (Fig. 1.D). The SSM is constructed from a set of parameters representing the shape and pose of vertebrae. This process is applied to pre-operative radiographs (Fig. 1.A). To apply it to postoperative radiographs, the instrumentation is first detected and removed by an in-painting algorithm (Fig. 1.B and C).

Fig. 1.
figure 1

Illustration of the method process: 3D automatic reconstructions from pre- and post- in-painted biplanar radiographs provided the clinical parameters to assess the surgery result.

2.1 Statistical Spine Model

An enriched geometrical representation composed of ellipses, lines, spheres and points described the shapes of the vertebrae and the pelvis. Pedicles and endplates were modeled by 3D ellipses (Fig. 1.D). As previously proposed in shape + pose models [7], these primitives’ positions were embedded in a local referential frame attached to vertebrae. The vertebra pose was composed of a translation vector from a global origin located at the middle of the two femoral heads, and three orientation parameters. The SSM training comprised a set of 470 subjects whereas 1016 parameters described the spine model (going from cervical seven to sacral endplate). The principal component analysis (PCA) provided the following linear generative model: \( s = \bar{s} + Bm \), where \( m \) was a vector of 236 parameters in latent space (describing 99% of data variance), \( B \) was the PCA basis, \( \bar{s} \) was the data mean and \( s \) was the resulting vector describing the current spine model. We construct a linear system of equations (Eq. 1) to solve for the optimal model parameters given vertebral body center (VBC) target locations (t). We noted \( B_{Q} \) the sub-matrix of \( B \) determined by the subset {Q} of location coordinates imposed via the detected locations by CNN (cf. Sect. 2.2). The parameter \( \lambda \) adjusted the tradeoff between target fitting and the model smoothness, and was determined experimentally.

$$ \hat{m} = \left( {B_{Q}^{t} B_{Q} + \lambda I} \right)^{ - 1} B_{Q}^{t} \left( {t - \bar{s}_{\text{Q}} } \right) $$
(1)

The resulting spine model \( \hat{s} = \bar{s} + B\hat{m} \) provided the geometric primitives’ parameters used as deformation handles for an as rigid as possible 3D deformation [8] of a mesh generic model for detailed morpho-realistic 3D visualization of vertebra surfaces.

2.2 Vertebrae 3D/2D Registration Based on Convolutional Neural Networks

Three CNNs distributed in thoracic high (C7–T5), thoracic low (T6–T12), and lumbar (L1–L5) spine regions, were trained to predict the vertebral body center (VBC) image locations in a patch by regressing its relative displacement vector from the patch intensities [5, 9]. We studied the coefficient of determination \( r^{2} \) of three regression models and it was the CNN (\( r^{2} \) = 0.9) which provided the best results compared to multi-linear regression (\( r^{2} \) = 0.5) or DNN (\( r^{2} \) = 0.8). For training dataset generation, a set of patches were sampled randomly around the actual VBC. Due to the characteristics of the imaging device used [1], there is a correspondence for the vertical coordinate, therefore the patch vertical displacement sampling was the same for both views. To consider simultaneously both lateral and frontal radiograph information, we use a multi-channel CNN input to learn a joint model (Fig. 2.A). Using both views improves the proximo-distal mean error prediction of the VBC location with 1.4 ± 2.6 mm while it was 2 ± 3 mm when using only the frontal view. Using a patch size of [91 × 91] for each view, the input was thus a 3D tensor of size 2 × 91 × 91. The proposed CNN architecture contained three convolutional layers (with a kernel size of 5 × 5) followed by a dense layer of 500 units with rectified linear activation functions (Fig. 2). Once trained, the CNNs predicted the 2D displacements of the vertebral centers for each view. Using the epipolar geometry, the 3D stereo-corresponding point was computed as the intersection of the two lines of projections (Fig. 2.B) determined with image calibrations. This set of 3D positions t was then used to fit the SSM to patient data by inferring the shape and pose of each vertebra using the Eq. (1).

Fig. 2.
figure 2

Multi-channel input convolutional neural network for 3D landmark location’s prediction

The strategy for 3D/2D registration was as follows. We initially placed the mean SSM spine model by detecting the origin of the pelvis, the sacral endplate and the femoral heads. These landmarks were detected with a multiresolution approach initialized from a coarse pelvis origin detection [5]. Then, from the positions given by the mean spine model, an inner loop of CNN predictions moved iteratively the VBC points to their optimal positions as the prediction tended to a null displacement. The main outer registration loop began by registering the lumbar vertebrae (areas with fewer soft tissue organs where there is the maximum probability of image information). The proximal thoracic vertebrae came next, and finally, all vertebral levels were fixed in the statistical spine model in order to progressively converge to the optimal 3D reconstruction.

2.3 Postoperative Image In-Painting

The postoperative image in-painting was composed of two steps: the automatic segmentation of the instrumentation and its replacement with pertinent information. The metallic instrumentation composed of two rods and screws appeared as large connected regions in radiographs. Noting that the instrumentation was locally brighter than its surroundings, a gradient-based edge segmentation algorithm was adopted. First, a median filter was applied to reduce the image noise while preserving the edges. A Sobel filter was then applied (Fig. 3.b) allowing to select the edges of the instrumentation when gradient amplitude was superior to a threshold of 10%. Mathematical morphological opening operation removed the potential outliers. A contour filling process created a mask M1 (Fig. 3.c) containing the spinal instrumentation, however, some elements like bones were still present. In order to remove the outliers inside the contours, we computed a mask M2, representing the background, by thresholding all the pixels under 40% of the intensity range (Fig. 3.d). The final mask was computed with the logical expression: M = M1 ^ (¬M2) (Fig. 3.e).

Fig. 3.
figure 3

(A) Segmentation steps in frontal (top) and lateral (bottom) views: (a) original, (b) Sobel filter, (c) contour fill mask M1, (d) darkest zone mask M2, (e) final mask M; (B) Postoperative biplanar images with metallic spine instrumentation (f) and in-painted images (g).

Image in-painting consists in filling missing information in images [10]. We used a diffusion-based in-painting algorithm as suggested by a study of different in-painting methods in the context of postoperative radiographs [11]. The diffusion-based in-painting algorithm filled the missing area with a diffusion of pixel values, which is consistent with the surrounding area (Fig. 3.g). Its main advantage is the low computational complexity. The algorithm used the heat diffusion equation to solve for the pixels’ intensity propagation of the surrounding area into the mask.

3 Experiments

The experiments used a set of 38 operated scoliotic patients (mean age 17 ± 3 years old, mean pre-operative Cobb angle 45°) for whom pairs of pre- and post-operative biplanar X-rays were collected retrospectively with ethics committee approval. Each dataset came with a 3D spine reconstruction and associated clinical parameters. The 3D reconstructions, serving as ground truth to evaluate the proposed automatic method, were performed by an expert using a semi-automatic method [2]. The proposed method was implemented in a homemade C++/OpenGL software, using the compute shaders GLSL language for fast CNN prediction. The average time for spine reconstruction was 12 s. The postoperative radiograph in-painting took 40 s of image processing with a non-optimized Matlab® implementation.

3.1 Vertebrae Location and Orientation

The first evaluation was the quantification of the 3D accuracy of the spine reconstruction on pre- and post-operative X-ray images. A referential frame Rv attached to each vertebra (Fig. 4) was used to compare the vertebra poses between the proposed automatic method and the expert’s reconstructions. Table 1 presents absolute mean (SD) errors of vertebrae XYZ locations and lateral, sagittal, axial orientations for pre- and post-operative radiographs. Since the metallic rods and screws occlude the spine structures and disturb the 3D/2D registration for postoperative radiographs, we calculated the errors with and without the in-painted step to quantify the gain in precision of the pose recovering as brought by in-painting. The reconstruction method used by the expert has a confidence interval (CI) for vertebral pose accuracy that was assessed by an inter-operator reproducibility study from pre-operative radiographs [2]. For pre-operative automatic reconstruction, the mean errors were similar to CI ranges. Without any occlusion from instrumentation, the translation errors were inferior to 1.5 mm. Nevertheless, the scoliotic deformations induce a vertebral axial rotation, and the axial orientation error was the largest (4.5°). In terms of mean residual 3D distance from postoperative images, the semi-automatic method of [4] attains 2.19 mm, and for the thoracic-lumbar instrumented level the automatic method of [6] attains ~3 mm and the proposed method attains 3.6 mm. Figure 4 details the location errors by vertebral levels from postoperative radiographs with and without applying the in-painting step. The presence of metallic structures can be viewed as artifacts inducing outlier image features. In our case, the area with metallic objects gave a poor CNN prediction for landmark locations. When radiographs were in-painted, the high intensities were replaced by a homogeneous gray level distribution. The automatic location of vertebrae became clearly more accurate for the vertebral levels generally instrumented (T4–L2) as presented in Fig. 4. Thus, the in-painting step enhanced the CNN-based landmark detection and the 3D/2D registration convergence had been improved. However, the thoracic region was still difficult due to higher density of organ soft tissues.

Fig. 4.
figure 4

(Right) Comparison of XYZ translation absolute mean errors for post-operative (dotted lines) and post-operative in-painted images (solid lines) for vertebral levels C7 to L5 (the levels generally instrumented during surgery are framed in red).

Table 1. Lateral (L), sagittal (S) and axial (A) orientations (°) and locations (mm) errors.

3.2 Clinical Parameters

The second evaluation was the computation of clinical parameters to quantify the surgery correction. The pre- and post-operative parameters, and the differences (post – pre) representing the surgical correction, were calculated from the reference reconstructions and the proposed automatic reconstructions. The absolute mean errors (SD) of automatic reconstructions versus the reference reconstructions were also calculated to assess the accuracy of the method. Table 2 reports the clinical parameters’ values of the kyphosis (K) T1/T2, the lordosis (L) L1/L5, the height between C7 and S1 (H C1–S1), the main Cobb angle, and the apical vertebral axial rotation (AVR). In this table, we see for example that the mean Cobb angle was −45° for pre-surgery and decreases towards −15.8° for post-surgery. The mean Cobb angle surgery correction (difference post-pre) was 28° from reference reconstructions and 27.3° from the automatic reconstructions which showed similar mean difference. Similarly, the mean surgical correction for the patient growth (height C7–S1) was 29.1 mm and 29.8 mm from manual and automatic reconstructions respectively. The semi-automatic method used to generate the reference had the clinical parameters CI values for both pre- [2] and post- [3] operative (Table 2). This method requires time-consuming user interventions (10–20 min depending of the scoliosis severity) when all vertebral shapes are fully adjusted. In our case, the automatic clinical measurements from pre-operative radiograph revealed that the mean absolute errors were all inferior to 5.6° for angular parameters.

Table 2. Mean (SD) of pre- and post-operative and post minus pre differences (Diff) for clinical parameters from the semi-automatic reference reconstruction (Ref) and proposed automatic reconstruction (Auto). The last two columns reported the absolute mean (SD) errors of automatic parameters (Auto Error) and confidence intervals (CI 95%) from literature.

The 3D/2D registration of the spine model was based on the vertebra center displacements, thus the other vertebra anatomical characteristics were estimated statistically but still provided a relevant set of clinical parameters. To our knowledge, this is the first time that a solution for automatic vertebrae 3D reconstruction from biplanar X-rays provided the vertebral endplate orientations to compute the fundamental Cobbs and the kyphosis/lordosis angles. The method also provided the 3D parameter of the vertebral axial rotation with a limited mean error of 4.6° and 3.8° for pre- and post-operative respectively. For surgeons, this parameter serves to quantify the neutral rotation restoration of the apical vertebra (the most rotated) between pre- and post-exams.

Since the proposed spine model incorporated a detailed 3D surface, we plan to use an additional automatic local deformation for vertebrae fine-tuning that will improve the angular parameters accuracy, which will also be possible for postoperative radiographs thank to the image in-painting.

4 Conclusions

This paper presented a method for automatic spine 3D reconstruction combining a statistical shape model of the spine and a set of CNNs for 3D/2D registration. A diffusion-based in-painting method served to remove the surgical instrumentation in order to feed the CNN with visible information containing fewer perturbations. The proposed approach provided a fast-automatic 3D quantification of the fundamental parameters needed for scoliosis surgery assessment, and should considerably speed-up the spine radiographs analysis at the pre- and post-operative times.