1 Introduction

Statistical shape models have important applications is various medical imaging tasks, such as image segmentation (e.g. [8]), hypothesis testing (e.g. [3]), anatomical reconstruction (e.g. [1]), and pathology detection (e.g. [15]). The utility of such models is influenced by a shape representation that facilitates statistical analysis. In this regard, landmarks are a popular choice as a light-weight representation—compared to embedding shapes in the image intensity values at voxels—that is easy to understand and implement while promoting visual communication of the results [14]. To perform shape statistics, landmarks should be defined consistently within a given population to refer to the same anatomical position on every shape instance; a concept known as correspondence. Such correspondences are often created manually, but this is time-/labor-intensive, requiring qualified specialists (e.g. radiologists), and cost-prohibitive for 3D images and large collections of imaging data.

Ad-hoc automated methods, such as choosing nearest points between surfaces, are prone to mismatches and result in substandard analytical results. Alternatively, other methods try to match parameterizations of surfaces while maintaining regularity. For example, the spherical harmonics point distribution model (SPHARM-PDM) [16] is a parameterization-based correspondence scheme that relies on a smooth one-to-one mapping from each shape instance to the unit sphere. The mapping to a sphere as well as the a-priori reliance on smoothness, rather than shape features in the population, is a shortcoming. Other automated approaches have introduced the statistics of the population itself to drive the matching of shape features. For instance, minimum description length (MDL) [7] optimizes point correspondences using an information content objective, but it too relies on intermediate spherical surface parameterizations, which places limitations on the types of shapes and the optimization process. Forgoing the parameterization, the particle-based shape modeling (PSM) approach has offered a flexible nonparametric and general framework to establish dense point correspondences across shape ensembles without constraining surface topology or parameterization [4, 10]. The PSM approach formulates a trade-off between geometric description of each shape and the statistical compactness of the population, formulated in terms of entropy.

Current PSM implementations rely on a Gaussian models in the shape space, which is (approximately) the vector space formed by the x-y-z coordinates of the collection of correspondences (modulo a similarity transform). However, the distribution of anatomical structures can be far more complex than the Gaussian assumes, and surface locations are not always indicative of their correspondence. To address these shortcomings, previous work (in the context of brain imaging) has included sulcal depth [11] and brain connectivity [12] as additional features in the PSM optimization. While promising, such approaches are tailored to particular data set, anatomy and application. Datar et al. [6] proposed geodesic distances to user-identified landmarks on individual shapes and hence were able to effectively guide entropy-based optimization to a compact statistical model for complex shapes. However, this predefined mapping requires a careful selection of hand-crafted features and their success heavily rely on expertise (and time) required to find such features consistently.

In this paper, we propose an automated feature-learning approach for establishing dense correspondence models to alleviate the need for engineered shape descriptors or guiding landmarks. The idea is motivated by recent advances in computer vision and computer graphics that use deep convolutional neural networks (CNNs) to learn shape features for establishing pairwise correspondences between shapes (e.g. [2, 5]). In leveraging the feature-based PSM with deep-learned features, this paper makes several contributions including: (1) effectively learning features to establish shape correspondence using direct surface geometry information rather than relying on predefined surface descriptors, (2) incorporating the deep-learned features in the optimization of dense correspondences to automatically find a useful set of dense correspondences across complex shape ensembles, and (3) thereby relaxing the linear/Gaussian assumption of the current PSM approach while minimizing the required domain expertise by learning shape-specific characteristics in a fully automated fashion. Experiments on synthetic and real medical data demonstrate that the proposed deep feature-based PSM compares favorably with existing state-of-the-art particle-based shape modeling approaches.

Fig. 1.
figure 1

Geodesic patch extraction at a surface point \(\mathbf {x}\): (a), (b) Sample patches on a human scapula mesh. (c), (d) Finding a point (\(\mathbf {x}'\)) at a geodesic distance \(\rho \) at a direction which makes angle \(\theta \) with first principal curvature direction (\(\mathbf {u}\)) in the tangent space of \(\mathbf {x}\). (e) Two channels of an input to the CNN, representing geodesic patches where every pixel corresponds to the signed normal distance of a point \(\mathbf {x}'\) in the patch to the tangent plane at \(\mathbf {x}\).

2 Methodology

In this paper, we use deep convolutional neural networks (CNNs) to learn local shape features for establishing shape correspondence. The goal is to make some of the hidden layers of the network respond similarly at corresponding locations across the given shape ensemble, and dissimilarly at noncorresponding ones. In this regard, we use a Siamese network setting similar to [5], which consists of two identical copies of the same deep CNN and is fed with pairs of corresponding (positives) and noncorresponding (negatives) local surface descriptors. As input to the deep neural network, we extract local surface geometrical characteristics using circular geodesic patches in the neighborhood of given point locations on a shape’s surface (see Fig. 1). These patches are then used as the input layer to the Siamese network. After training, one copy of the two identical deep CNNs in the Siamese network becomes a nonlinear filter bank (we use the last convolutional layer, before the final classification, as in Fig. 2) that is computed on all mesh vertices to produce feature maps on given surface. These feature maps are then used to aid correspondence optimization in the entropy-based PSM. In what follows, we further expand on details of patch extraction, Siamese network configuration, and the modified correspondence objective function to obtain a useful set of dense correspondences across anatomical shapes.

Geodesic patch extraction: The success of deep CNNs have been demonstrated on analyzing functions defined on Euclidean grid-like domains such as images. Nonetheless, non-Euclidean data, particularly surface geometry, does not directly benefit from these deep models in which operations such as convolution and pooling are not readily defined. Recently, local charting methods (e.g. [2]) have been proposed as a generalization of convolution to non-Euclidean domains where a patch operator is defined to extract local surface patches that are subsequently associated with predefined shape descriptors. Here we rely directly on surface geometry to compute such local shape descriptors. Specifically, the graph of surface distance to the tangent space can encode local geometrical properties around a point. Hence, we propose to use signed normal distance to the tangent space sampled in the geodesic neighborhood of a surface point \(\mathbf {x}\) as a local snapshot of surface geometry. We use principal curvature directions to define the local intrinsic coordinate system of the patch constructed at a surface point \(\mathbf {x}\). As illustrated in Fig. 1, neighboring points that lie in the geodesic ball \(\mathcal {B}(\mathbf {x}) = \{\mathbf {x}' : d_X(\mathbf {x}, \mathbf {x}') \le \rho _o\} \) with radius \(\rho _o > 0\) are sampled on a circular geodesic patch where each ring corresponds to a particular geodesic distance \(d_X(\mathbf {x}, \mathbf {x}') = \rho \) with \(\rho \in [0, \rho _o]\) and every ray originating from \(\mathbf {x}\) is perpendicular to the geodesic rings are inclined at an angle \(\theta \in [0,2\pi )\) to the first principal curvature direction at \(\mathbf {x}\). \(\rho _o\) is automatically set to \(5\%\) of maximum shape diameter in the ensemble. Principal curvature directions are estimated using least squares fitting of a quadratic [13] while enforcing the right-hand on local coordinates, with the normal direction representing the z-axis. To ensure consistent and smooth principal curvature directions, meshes are generated by triangulating the DT’s isosurface using the dynamic particle-based implicit surface sampling [9] that yields water-tight triangular meshes. However, the principal curvature directions are accurate only up to the sign of the vector. To address this ambiguity, we extract two patches per surface point (see Fig. 1e), one with bases defined by the positive sign of principal directions and other by the negative sign and use them as a two channel image for input to the neural network.

Fig. 2.
figure 2

The Siamese network architecture used to learn features for correspondence and sample feature maps extracted using trained network(s).

Correspondence feature learning: Given sets of corresponding points on shapes in the ensemble, we train a Siamese CNN to learn a binary class problem, where positive class indicates correspondence between pair of input patches and negative class suggests that patches possess different shape characteristics. We use the network settings of Chopra et al. [5], see Fig. 2. During experiments we found that using \(p_{drop} > 0\) does not impact classification performance, therefore, we set it to zero for our experiments. All layers use the softplus activation function for nonlinearity and weights are initialized using a zero mean normal distribution with 0.05 standard deviation. Choice of softplus activation ensures smooth output features, crucial for correspondence optimization (see Fig. 2) and faster training convergence, being softer version of ReLU. We subtract mean of the training data from all training patches (and same mean is subtracted from testing patches for feature extraction) so that the input data can align with the distribution of initial network weights (spanning the hypersphere).

Deep feature-based PSM: The proposed PSM uses a set of dynamic particle systems, one for each shape, in which particles interact with one another with mutually repelling forces to sample (and therefore describe) the surface geometry. Consider a cohort of shapes \(\mathcal {S}= \{\mathbf {z}_1, \mathbf {z}_2, ..., \mathbf {z}_N\}\) containing N surfaces, each with its own set of M corresponding particles \(\mathbf {z}_n = [\mathbf {z}_n^1, \mathbf {z}_n^2, ..., \mathbf {z}_n^M] \in \mathbb {R}^{dM}\) such that \(\mathbf {z}_n^m \in \mathbb {R}^d\) and ordering implies correspondence across shapes. This representation involves two random variables: a shape space variable \(\mathbf {Z}\in \mathbb {R}^{dM}\) and a position variable \(\mathbf {X}_n \in \mathbb {R}^d\) that encodes particle distribution on the n–th shape (configuration space). For groupwise modeling, shapes in \(\mathcal {S}\) should share the same world coordinate system. We use generalized Procrustes alignment to estimate a rigid transformation matrix \(\mathbf {T}_n\) per shape instance such that \(\mathbf {z}_n^m = \mathbf {T}_n\mathbf {x}_n^m\). Correspondences are established by minimizing a combined shape correspondence and surface sampling cost function \(Q = H(\mathbf {Z}) - \sum _{n=1}^N H(\mathbf {X}_n)\), where H is an entropy estimation assuming Gaussian distribution on the joint vector of features and positions and a nonparametric density estimator in the (3D) configuration space, as in [4]. In particular, correspondence entropy solely relies on particle positions to achieve compact ensemble representation in shape-feature space (first term) against a uniform distribution of particles on each surface for accurate shape representation (second term). The entropy-based PSM formulation is generalized to a more flexible notion of correspondence by minimizing the entropy of arbitrary, vector-valued functions of the particle position [6, 11, 12]. Thus, the proposed feature-based PSM extension modifies only the correspondence term \(H(\mathbf {Z})\) where the sampling term, \(H(\mathbf {X}_n)\), still constrains the particles to lie on the shape’s surface by maximizing the positional entropy.

In this paper, we propose to use features extracted from the trained CNN to optimize correspondences on shape ensembles to generate compact statistical shape models. We also keep the particle positions to guide optimization for locating corresponding points as extracted features only encode local geometric properties and require positional information, unlike geodesic distances which encode global shape characteristics. Hence, \(q = d + L\) where L is the number of features extracted from the trained network. Hence, we have \(\mathbf {z}_n^m = f(\mathbf {x}_n^m) = \left[ f_d^1(\mathbf {x}_n^m), ..., f_d^L(\mathbf {x}_n^m), (\mathbf {T}_n \mathbf {x}_n^m)^T \right] \) where \(f_d^l(\mathbf {x}_n^m)\) is the \(l{-}\)th deep feature extracted at the \(m{-}\)th particle on the n–th shape, which is defined by \(\mathbf {z}_n = [\mathbf {z}_n^1, ..., \mathbf {z}_n^M]^T \in \mathbb {R}^{qM}\). Let \(\mathbf {y}_n = \mathbf {z}_n - \varvec{\mu }\) with the mean . The centered data matrix is defined as \(\mathbf {Y}= [\mathbf {y}_1, ... , \mathbf {y}_N] \in \mathbb {R}^{qM \times N}\). Since \(N \ll qM\), feature space covariance is estimated in the dual space of dimension N which defines the hyperplane in \(\mathbb {R}^{qM}\) that the N–samples inhabit. With a Gaussian assumption in the lifted feature space, shape space entropy estimate becomes , where \(\mathbf {I}\) is the identity matrix in the \(qM{-}\)dimensional feature space and \(\alpha > 0\) is a regularization parameter that prevents nondominant modes (smallest eigenvalues) from dominating the optimization process. Let \(\mathbf {C}= \left( \mathbf {Y}^T\mathbf {Y}+ \alpha \mathbf {I}\right) ^{-1} = [\mathbf {c}_1, ..., \mathbf {c}_N]\). By the chain rule, the partial derivative of the correspondence entropy term with respect to the data vector \(\mathbf {z}_n\) becomes \(\partial H(\mathbf {Z}) / \partial \mathbf {z}_n = \mathbf {J}_n^T \mathbf {c}_n\), where \(\mathbf {J}_n\) is the Jacobian of the functional data for the n–th shape that has the structure of a block diagonal matrix with \(M \times M\) blocks, each with diagonal block of \(q \times d\) sub-matrices of the function gradients at each particle, which is computed numerically by projecting the correspondence points on the surface. This gradient is used to update point correspondences in accordance with the optimization described in [4].

3 Results and Discussion

Fig. 3.
figure 3

(a), (c) Side view of coffee bean samples to highlight the structural complexity, (b) Samples from Bean2 dataset, (d) Samples from Bean4 dataset, (e) Samples from Scapula dataset, left column – controls, right column – patients.

Two synthetic datasets and a real dataset of medical shapes is used to demonstrate the ability of Deep feature-based PSM in generating optimal statistical models for complex shapes. Proposed method is compared with Positional PSM – Cates et al. [4] and Geodesics PSM – Datar et al. [6], using mean shape from the optimized statistical models and following quantitative measures: Scree plot: We plot the percentage of variance w.r.t. PCA (Principal Component Analysis) modes to demonstrate the model compactness. Generalization: Reconstruction error of the correspondence model for an unseen shape instance, using PCA model built from training samples, is evaluated in a leave-one-out fashion. We plot mean of point-wise physical distance between test samples and their reconstruction w.r.t. number of PCA modes (lower is better). Specificity: Ability of the model to generate plausible shapes, quantified as the Euclidean distance between a sampled shape (from PCA model built using all training samples) and its closest training sample based on \(l_2{-}\)norm (lower is better). Average over 1000 samples is used.

Fig. 4.
figure 4

Bean2 results, (a) Mean shape, (b) Scree plot, (c) Generalization (mm), (d) Specificity (mm), voxel size \(=\)1 mm.

Fig. 5.
figure 5

Bean4 results, (a) Mean positional-PSM (b) Mean proposed method (c) Scree plot (d) Generalization (mm) (e) Specificity(mm), voxel size \(=\) 1 mm.

Proof-of-concept: First synthetic dataset (Bean2) contains 30 shapes that represent a coffee bean with a spatially varying structure. Second dataset (Bean4) comprises of 23 samples of more complex coffee bean shape, with a group of closely located thin structures collectively varying in position (see Fig. 3). In order to generate training samples for the Siamese network, we use an optimized statistical shape model with 3072 points for Bean2, obtained using Geodesic-PSM. Patches from six shapes were randomly selected for every correspondence point to accumulate training data. Siamese network with \(L=10\) output features yield optimal classification performance of 0.92 AUC (area under the curve) of the ROC (Receiver Operating Characteristics) curve, resulting in \(90\%\) True Positive Ratio (TPR) at an expense of \(20\%\) False Positive Ratio (FPR). Given that there are multiple regions with similar shape characteristics, which may lead to higher FPR, therefore 20% is a relatively small penalty. Moreover, using position information in Deep feature-based PSM will help in reducing impact of false positives. All 10 features are used in PSM to generate a shape model with 4096 correspondence points. Figure 4 presents mean shape and quantitative evaluation of the correspondence model. Results indicate compactness of the statistical shape model and better generalization and specificity performance over Geodesics-based PSM.

Fig. 6.
figure 6

Scapula results using the proposed method (a) Mean shape, (b) Scree plot, (c) Generalization (mm), (d) Specificity (mm), voxel size \(=\) 0.5 mm.

Bean4 shapes have similar characteristics as Bean2, therefore, we use the same trained Bean2 network to extract features for Bean4 shapes. Figure 2 presents samples of same feature on the two datasets. Comparative results on Bean4 data, presented in Fig. 5, clearly highlight the superiority of proposed method over positional-PSM in compactness of the resulting statistical shape model as well as it outperforms in its ability to generalize over unseen data and to produce plausible shape samples.

Real data: 20 Scapula shapes (10 controls and 10 patients) were retrospectively chosen from a shoulder CT database of controls and patients with osseous Hill-Sachs lesions. Samples were rigidly aligned w.r.t. the glenoidal plane and a set of 16 anatomical landmarks were defined manually. Reconstructed scapula meshes were then clipped to the glenoid, acromion and coracoid to model the areas of high geometric curvature related to constraint of the humeral head, Fig. 3 illustrates shape samples. Geodesic-PSM based shape model with 2432 points was used to generate training data. Using \(L=5\) output features resulted in optimal classification performance of AUC \(=\) 0.80. Figure 6 showcases superior results using proposed method, especially in generalization and specificity. It is important to note that proposed method is able to achieve errors of about 2 voxels using dominant modes (5 modes) in contrast to minimum 3 voxels from other methods.