1 Introduction

Image segmentation is a fundamental problem in computer vision, and has been widely used in medical analysis, remote sensing and video surveillance, etc. Among various applications, active contours model is popular in image segmentation. Active contours with edge-detection [4, 13, 14] and region information [6, 16, 17] are two classical methods for image segmentation by minimizing certain energies to match the object boundary. In real applications, the performance of active contour method is unsatisfactory because of the misleading or missing feature in images, therefore, researchers have developed many ways to model the prior of shapes and use the shape prior to restrict evolvement of contours.

The addition of shape prior information has significantly improve object segmentation results. Chan and Zhu [5] inspired by the work of [8] introduces a labelling function which together with the level set function for object segmentation. In [19], shape prior is used to simultaneously segment multiple, possible overlapping objects. In addition to single shape prior, statistical information of given shapes could be used for accurate segmentation results. In [1], using active contour method, a geometric shape prior and the Mumford-Shah functional as complementary terms to segment an object. Jiang et al. [12] incorporated the generic knowledge of object and object boundary information into salient object segmentation. Recently, leaning methods are the attractive topic because of the excellent ability to cope with image noise and clutter. [7] use deep Boltzmann machine to learn shape priors from a training set of shapes. [20] employed the sparse shape composition to represent object shapes and proposed an effective shape prior modeling for segmentation.

An unsupervised approach of shape prior modeling [23] was proposed, which used the group similarity of shapes to segment objects from multiple images and utilized low-rank to restrain the evolution of contour. However, the method could not perform well when the contours of similar objects in multiple images are transformed with large rotation or scaling. In other words, the different pose of objects contour would limit the performance of low-rank.

It is reasonable to assume that the vectorized object shapes in multiple images form a low-rank matrix. However, similar shapes without registration will increase the rank of matrix, therefore it would lead a poor constraint on group similarity during the curve evolution. In order to enforce low-rank on similar shapes further, a shape registration is introduced for collaborative segmentation. The contributions of this paper are as follows:

  1. 1.

    A generic framework for joint object segmentation and shape registration is developed. By minimizing the proposed energy functional, segmentation and registration procedures are carried out simultaneously.

  2. 2.

    Circulant structure is adopted to shape registration, and it could be effectively solved by using Fast Fourier Transform (FFT).

  3. 3.

    Note that the matrix which contains vectors of similar shapes would be low-rank. Collaborative segmentation in multiple images is well performed by using SVD based low-rank matrix approximation, and it is robust to various disturbing factors such as noises, partial occlusions, missing information, etc.

The rest of the paper is organized as follows: Sect. 2 introduces the basic theory of our method and describes the algorithm to solve our method. Section 3 demonstrates experimental results. Finally, Sect. 4 gives some conclusions.

2 The Proposed Method

2.1 Necessity of Registration

Given a set of shapes \({{C}_{1}}\cdots ,{{C}_{n}}\), each shape could be represented by using a set of land-marks, \(C_i=[x_{i,1}, \cdots , \) \(x_{i,p}, y_{i,1},\cdots ,y_{i,p}]^T\), where \((x_{i,j}, y_{i,j})\) is a landmark on the outline of the shape. It is assumedthat the object shapes have similar contours, \({{C}_{i}}\) could be approximately generated from \(C_1\) by certain affine transformation. Given a landmark \(q=\left( x,y \right) \), we have coordinate transformation \(q'=Mq+\tau \), where M and \(\tau \) are the transformation matrix and translation, respectively. Usually, M consists of rotation \(\theta \) and scale \(\sigma \). Therefore, for each point, the affine transformation can be represented as follows:

$$\begin{aligned} \begin{array}{*{35}{l}} \left[ \begin{matrix} {{x}'} \\ {{y}'} \\ \end{matrix} \right] &{} =\left[ \begin{matrix} \cos \theta &{} -\sin \theta \\ \sin \theta &{} \cos \theta \\ \end{matrix} \right] \left[ \begin{matrix} \sigma &{} 0 \\ 0 &{} \sigma \\ \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ \end{matrix} \right] +\left[ \begin{matrix} {{\tau }_{x}} \\ {{\tau }_{y}} \\ \end{matrix} \right] \\ {} &{} =\left[ \begin{matrix} \sigma \cos \theta &{} -\sigma \sin \theta \\ \sigma \sin \theta &{} \sigma \cos \theta \\ \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ \end{matrix} \right] +\left[ \begin{matrix} {{\tau }_{x}} \\ {{\tau }_{y}} \\ \end{matrix} \right] \\ \end{array} \end{aligned}$$
(1)

For simplicity, we can represent \({{C}_{i}}\) as \({{C}_{i}}=\varPhi {{w}_{i}}\), where

$$\begin{array}{*{35}{l}} \varPhi &{} =\left[ \begin{matrix} C_{1}^{x} &{} 0 &{} C_{1}^{y} &{} 0 &{} 1 &{} 0 \\ 0 &{} C_{1}^{x} &{} 0 &{} C_{1}^{y} &{} 0 &{} 1 \\ \end{matrix} \right] \\ {{w}_{i}} &{} ={{\left[ \begin{matrix} {{\sigma }_{i}}\cos {{\theta }_{i}},{{\sigma }_{i}}\sin {{\theta }_{i}},-{{\sigma }_{i}}\sin {{\theta }_{i}},{{\sigma }_{i}}\cos {{\theta }_{i}},{{\tau }_{ix}},{{\tau }_{iy}} \\ \end{matrix} \right] }^{T}} \\ \end{array}$$

Since \(\varPhi \) only depends on \(C_1\), we have \(\mathbf C=\varPhi \left[ {{w}_{1}},{{w}_{2}},\cdots ,{{w}_{n}} \right] \), where \(\mathbf C=\left[ {{C}_{1}},{{C}_{2}},\cdots ,{{C}_{n}} \right] \). Obviously, \(Rank(C)\le 6\). This implies that the rank of transformational matrix \([{{w}_{1}},{{w}_{2}},\cdots ,{{w}_{n}}]\) is at most 6. The lower the rank of \(\mathbf C\), the higher the level of the correlation or structure similarity among shapes in \(\mathbf C\).

Fig. 1.
figure 1

The diagram of the vector \(\mathbf p\). Let \(s_i\) be the distance from landmark (solid circle) to centroid (hollow circle), and d is the maximum of \(s_i\) which could be used for normalization, therefore \(p_i = s_i / d\).

Minimization of the rank of \(\mathbf C\) has two principal effects. One encourages shape registration from multiple images, and the other is shape regularization that eliminates the difference between different shapes and keeps the main parts of structure similarity among shapes. Due to various disturbing factors such as noises, occlusion, etc., it is difficult to balance shape registration and regularization by using low-rank matrix approximation. Note that similar shapes without registration would increase the rank of \(\mathbf C\). When some dissimilar shapes which could not generated from \(C_1\) through transformation, the rank of shape matrix starts to mushroom, even more than 6, thereby it could not constrain the curve evolution well. To handle this, we propose an algorithm for joint shape registration and low-rank for object segmentation. Inspired by the work of Yezzi et al. [21, 22], object segmentation and registration are closely related problems, and it is widely accepted that registration is helpful to object segmentation and vice versa.

After shape alignment, the original vector \(C_i\) is transformed into a new vector \(g(C_i)\), where g is the registration function. Therefore, the low-rank of aligned shape matrix \(\left[ g(C_1),g(C_2),\cdots ,g(C_n)\right] \in \mathbb {R}^{2p\times n}\) would encourage shape regularization and it could be used as a shape constraint.

2.2 Shape Registration

The role of registration is to eliminate the deformations of unaligned shapes, therefore it needs to cope with affine transformations, such as rotation, stretching and displacement. Here, an unadorned ideal is used to deal with displacement and stretching. First, the centroid of shape \(C_i\) is calculated and denoted by \(O_i\). Second, we compute the distance to the centroid of shape for each landmark, and use the maximum \(d_i\) to normalize all distances. As shown in Fig. 1, all nomalized distances are denoted by \(p_{0}, p_{1},\cdots , p_{n}\), where \(p_i\) denotes the nomalized distance from its centroid to ith landmark \((x_i,y_i)\). The first step is to eliminate the effect of displacement and second one for the stretching.

The main observation in this work is that rotation invariance can be solved by circulant shift. Let \(\mathbf p=[p_{0}, p_{1},\cdots , p_{n}]\) denote the vector formed by all nomalized distances of shape. An \(n\times n\) circulant matrix \(U(\mathbf p)\) is obtained from the vector \(\mathbf p\) by concatenating all possible cyclic shifts of \(\mathbf p\):

$$\begin{aligned} U(\mathbf p) = \begin{bmatrix} p_0&p_1&p_2&\cdots&p_{n-1} \\ p_{n-1}&p_0&p_1&\cdots&p_{n-2} \\ p_{n-2}&p_{n-1}&p_0&\cdots&p_{n-3} \\ \vdots&\vdots&\vdots&\cdots&\vdots \\ p_1&p_2&p_3&\cdots&p_0 \\ \end{bmatrix} \end{aligned}$$
(2)

Clearly, the first row is vector \(\mathbf p\), and the second row is \(\mathbf p\) shifted one element to the right. Given a vector \(\mathbf v\), the product \(U(\mathbf p)\mathbf v\) represents convolution of vectors \(\mathbf p\) and \(\mathbf v\) [10], and it can be computed in the Fourier domain, using

$$\begin{aligned} U(\mathbf {p})\mathbf {v}={{\mathcal {F}}^{-1}}(\mathcal {F}(\mathbf p) \odot \mathcal {F}(\mathbf v)) \end{aligned}$$
(3)

where \(\odot \) denotes the element-wise product, while \(\mathcal {F}^{-1}\) and \(\mathcal {F}\) is the Fourier transform and its inverse, respectively. Because the circulant matrix U is defined by \(\mathbf p\) and the element of \(U(\mathbf p)\) can be represented as \({{u}_{ij}}={{p}_{(j-i)\text { }mod\text { }n}}\), we never should explicitly store and compute the entire matrix U.

Rotation alignment between \(C_i\) and \(C_j\) can be represented as circulant shift of the curve, namely, \(C_i\) can be approximately generated from \(C_j\) through circulant shifting z elements, where z is the number of elements shifted. Therefore, the problem of rotation invariance is converted to find the accurate value of z. Let \(\mathbf p_i\) and \(\mathbf p_j\) denote the distance vectors of \(C_i\) and \(C_j\), respectively. Due to the circulant convolution of \({\mathbf p_i}\star {\mathbf p_j}\), we can quickly evaluate all locations of curve \({C_j}\) according to reference curve \({C_i}\) by using FFT [10, 11], with \(*\) denoting the complex-conjugate,

$$\begin{aligned} {{\mathbf k}^{gauss}}=\exp (-\frac{1}{\phi }(\Vert {\mathbf p_i}{{\Vert }^{2}}+\Vert {\mathbf p_j}{{\Vert }^{2}}\text { -}2{{\mathcal {F}}^{-1}}(\mathcal {F}({\mathbf p_i})\odot {{\mathcal {F}}^{*}}({\mathbf p_j}))) \end{aligned}$$
(4)

Here the Gaussian kernel function is used and \(\phi \) is the corresponding constant, and \(\mathbf k^{gauss}\) measures the similarity between the curve \(C_i\) and \(C_j\). Obviously, the optimal z is the index of the maximum of \(\mathbf {k}^{guass}\). In shape registration, there are 3 parameters: the shape centroid O, the maximum distance d, and the shifted number z. More details about registration are shown in Algorithm 1.

figure a

2.3 Segmentation with Shape Registration and Low-Rank

Given a group of images \(I_1, \cdots , I_n\), we want to find corresponding curves \(C_1,\cdots ,C_n\) to segment their objects. By using shape similarity during curve evolution, an energy function based on shape registration and low-rank is proposed:

$$\begin{aligned} \underset{{\{\delta }_{i}\},\{{C}_{i}\}}{\mathop {\min }}\,\sum \limits _{i=1}^{n}{{{f}_{i}}}({{C}_{i}})+\lambda \left\| \mathbf Y \right\| _* \end{aligned}$$
(5)

where \(\mathbf Y=\left[ {{\text {g}}}({{C}_{1}}),{{g}}({{C}_{2}}),\cdots ,{{g}}({{C}_{n}}) \right] \), and \(\Vert \mathbf Y\Vert _*\) is the nuclear norm, \(\lambda \) is a regularization parameter, the larger the \(\lambda \) the higher similarity, the smaller the \(\lambda \) the greater change. In registration of shape \(C_i\), the corresponding transformation parameters \(\delta _i=\{O_i,d_i,z_i\}\) could be calculated for each curves. Here, \(f_i(C_i)\) is the energy of active contour model, such as CV model [6], snake model [14], geodesic active contour [3, 9]. For simplicity, CV model is adopted as data term in the proposed model:

$$\begin{aligned} \begin{aligned} f_i(C_i)&= \int _{\varOmega _1}(I_i(\mathbf {x})-u_1)^2d\mathbf {x}+\int _{\varOmega _2}(I_i(\mathbf {x}))^2d\mathbf {x}\\&+\beta \text {length}(C_i) \end{aligned} \end{aligned}$$

where \(\varOmega _1\) and \(\varOmega _2\) denote the regions inside and outside. \(u_1\) and \(u_2\) are the mean intensities of \(\varOmega _1\) and \(\varOmega _2\), respectively. \(\beta \) is a constant.

Fig. 2.
figure 2

Segmentation results on Heart with various rotation. (a) Results of [14], the rank is 8. (b) Results of [23] with a low \(\lambda \), the rank is 8. (c) Results of [23] with a high \(\lambda \), the rank is 6. (d) Results of the proposed model with registration, the rank is 6.

To optimize the energy function in (5), the proximal gradient method [15] is used for the following category of problems:

$$\begin{aligned} \min \limits _\mathbf C F(\mathbf C) + \lambda \mathcal {R}(\mathbf Y) \end{aligned}$$
(6)

where \(\mathcal {R}(\mathbf Y)\) is \(\Vert \mathbf Y\Vert _*\). Therefore, the following iteration is applied:

$$\begin{aligned} \mathbf Y^{k+1}=\arg \min \limits _{\mathbf Y}\frac{1}{2}\Vert \mathbf Y - [ \mathbf Y^k-\frac{1}{\mu }\nabla F(\mathbf Y^k)]\Vert _F^2+\frac{\lambda }{\mu }\mathcal {R}(\mathbf Y) \end{aligned}$$
(7)

It could be solved by the singular value thresholding algorithm, and the final \(\mathbf C^{k+1}\) can be obtained by the inverse function of g:

$$\begin{aligned} \mathbf C^{k+1} = g^{-1}(\mathcal {D}_{\frac{\lambda }{\mu }}(\mathbf Y-\frac{1}{\mu }\nabla F(\mathbf Y^k))) \end{aligned}$$
(8)

where \(\mathcal {D}_{\frac{\lambda }{\mu }}\) is a singular value thresholding operator introduced by Cai et al. in [2].

Fig. 3.
figure 3

Segmentation results on Heart with various rotation. (a) Results of [14], the rank is 8. (b) Results of [23] with a low \(\lambda \), the rank is 8. (c) Results of [23] with a high \(\lambda \), the rank is 6. (d) Results of the proposed model with registration, the rank is 6.

3 Experimental Results

In this section, we evaluate our method on Heart image sequence as the synthesized dataset and on the real images of notebook. In our implementation, we also found that parameters \(\lambda \) in (6) is easier to choose than [23], because the effect of \(\lambda \) is reduced by shape registration.

To certify that registration can reduce the rank of shape matrix and improve the accuracy of segmentation, we rotate some images of Heart with different angles as shown in Fig. 2. The results of [9] without the similarity constraint aren’t correctly segmented. We select the regularization parameter \(\lambda =25\) which products fairly good results. However, the method of [23] fails to segment the shapes accurately and its rank of shape matrix is 8. With a high \(\lambda =50\), the resulting shapes of [23] are more similar to each other, and their object shapes lost some important detail. Our results with the registration are more robust compared to the results without such constraint and the rank of output is 6. The proposed method is also applied to the images of notebook with distinct rotations, scaling and misleading information. As shown in the bottom of Fig. 3, the rank of results is 3, much lower than [23] and [14]. It is clear that the proposed method is robust to deal with complex transformation of object shapes.

Table 1. Quantitative comparison. The values in each cell imply means/standard deviations of all frames in each image sequence.

As shown in Table 1, Dice coefficient (Dice) and Hausdorff Distance (HD) [18] are used to quantitatively evaluate the segmentation results. We can see that [23] could not yields satisfactory result neither low \(\lambda \) nor high \(\lambda \). Our method has a better segmentation result both in Dice and HD.

4 Conclusion

In this paper, we proposed a method that simultaneously estimates shape registration and object segmentation for group similar images. Shape registration was used to eliminate the deformations of objects, and rotation invariance could be represented by circulant convolution. Owe to the well-established theory of circulant matrices, rotation alignment was efficiently solved by using FFT. The shape matrix which formed by the curves of objects from multiple test images has low-rank property, and the low-rank can be used to regularize shapes. Experimental results show that the proposed method is able to cope with the group similar images more robustly.