1 Introduction

Non-rigid objects are ubiquitous in our surroundings, e.g., from a walking pedestrian to a talking face, from waving tree branches to rattling water surfaces and so on. Non-rigid structure-from-motion (NRSfM) aims to recover 3D non-rigid/deformable structure and camera motions from 2D correspondences across multiple images, which has been receiving increasing attentions from the community.

This paper revisits NRSfM and focuses on the factorization framework for NRSfM, originally proposed by Bregler et al. (2000), as an important extension to the well-known Tomasi–Kanade factorization (Tomasi and Kanade 1992) from rigid scene to non-rigid scene, assuming that the non-rigid shape deformation follows a low-order linear combination model. To date, a large body of research has been devoted to this topic, and numerous different methods/algorithms have been proposed. However, despite all the efforts, this problem remains a difficult and still active research topic—no fully satisfactory solution is available today. For example, in a 2004 textbook (Hartley and Zisserman 2004, p. 444): “currently there is not a satisfactory solution”. In CVPR 2009 (Paladini et al. 2009): “although this low-rank shape model has proved a successful representation, the NRSfM problem is inherently under-constrained”. In ICCV 2011 Gotardo and Martinez (2011b): “in the absence of any (extra) prior knowledge on 3D shape deformation, computing NRSfM is still a difficult, underconstrained problem” (see e.g., Bartoli et al. 2008; Del Bue 2008; Torresani et al. 2008; Paladini et al. 2009; Akhter et al. 2008; Gotardo and Martinez 2011a).

One of the primary causes to such difficulty is due to the inherent basis ambiguity (ambiguity between deformable shape bases and time-varying coefficients or between trajectory bases and point-varying coefficients) of the non-rigid problem (Xiao et al. 2004). To overcome this, most existing work rely on introducing various prior knowledge to the problem at hand. They do so by assuming various constraints about the non-rigid scene (Bartoli et al. 2008), about the non-rigid shape bases (Xiao et al. 2004), about the time-varying coefficients (Torresani et al. 2008), about the deformation (Olsen and Bartoli 2008), or about the camera motion (Gotardo and Martinez 2011a), etc. For instance, many methods require that the camera moves smoothly, or the deformation trajectory is slow and smooth (Akhter et al. 2008; Gotardo and Martinez 2011a). However, these additional constraints not only limit the practical applicability of the factorization framework, but also obscure a clear theoretical understanding to the problem. We would like to answer: in order to solve the non-rigid factorization problem effectively, are these extraneous priors really essential?

In this paper, we propose a novel and simple solution to non-rigid factorization. Our method does not assume any extra prior knowledge about the problem other than the low-rank constraint, hence it is “prior-free”. Nevertheless, it does not suffer from the basis ambiguity difficulty, but is able to recover both camera motion and non-rigid shape accurately and reliably. Experiments on both synthetic and real benchmark data-sets show that: the proposed method, despite of being prior-free, outperforms most other (often prior-based) linear factorization methods. Furthermore, we give theoretical analysis on the uniqueness and degenerate cases of the solution.

To better present this paper, and also to put our contributions in context, we briefly review recent progress in factorization based NRSfM.

1.1 Related Works

As one of the significant problems in geometric computer vision, NRSfM has received considerable attention from the computer vision community. A great number of methods have been published, and most of the existing methods can be roughly classified into two categories: The first category is template-based NRSfM methods (e.g., Salzmann et al. 2008; Brunet et al. 2011; Perriollat et al. 2011) which assume a known 3D template (e.g. a mesh model) of the non-rigid surface. The second category is correspondence-based methods where inter-frame (and often sparse) feature point correspondences are available. Most structure-from-motion techniques used for rigid scene reconstruction (e.g., Pollefeys et al. 2004; Snavely et al. 2008; Dai et al. 2010; Li 2010) rely on feature point correspondences.

In this paper, we focus on the second category of methods, i.e. the correspondence-based methods for non-rigid structure from motion. Moreover, in the following discussions regarding related works, we confine ourselves exclusively to the matrix-factorization framework (Tomasi and Kanade 1992; Bregler et al. 2000).

Ever since Bregler et al.’s (2000) seminal work, researchers have been actively applying the factorization framework to various non-rigid problems. However, they soon noticed that, different from its rigid counterpart, the non-rigid factorization problem appeared to be much more difficult. In a 2004 paper, Xiao et al. (2004) proved that the problem itself is indeed ill-posed or under-constrained, in the sense that, based on the orthonormality constraint alone, one cannot recover the non-rigid shape bases and the corresponding shape coefficients uniquely. There is always a fundamental ambiguity between the shape bases and the shape coefficients.

To resolve this ambiguity, Xiao et al. (2004) suggested to add extraneous “basis constraints” so as to make the system well-constrained. In the same spirit of adding extra priors to regularize an otherwise under-constrained problem, Torresani et al. (2008) introduced Gaussian prior on the shape coefficients. Del Bue (2008) introduced special shape priors. Akhter et al. (2008) proposed to use a fixed set of Discrete Cosine Transform (DCT) bases in the trajectory dual space rather than the shape space. Recently, Akhter et al. (2012) further extended this method to temporal and spatial bilinear bases. DCT based method has also been extended to trajectory reconstruction problem (Park et al. 2010) and multi static cameras case (Zaheer et al. 2011; Angst and Pollefeys 2012). Temporally smooth deformation prior has also been used such as in Bartoli et al. (2008) and Aanæs and Kahl (2002). Other priors imposed speciality on the model such as assuming a quadratic model (Fayad et al. 2010), local-rigidity (Taylor et al. 2010), or extend to non-linear models (Rabaud and Belongie 2009; Gotardo and Martinez 2011b). Valmadre and Lucey (2012) introduced general trajectory priors for non-rigid reconstruction. Angst and Pollefeys (2012) provided a unified point of view for NRSfM factorization through modelling with tensor algebra. They also extended the tensor formulation to the case of a camera network where multiple static affine cameras observe the same deforming and moving non-rigid object.

Akhter et al. (2009) made an important theoretical progress, which reveals that: although the ambiguity in the shape bases is inherent, the 3D shape itself can be recovered uniquely without ambiguity. In a slightly earlier paper, Hartley and Vidal proved a similar result under perspective camera model (Hartley and Vidal 2008). Despite the significance of these theoretical results, neither paper has provided a practical algorithm to the problem, and our work aims to fill this gap.

In our implementations, we utilize the nuclear norm (or trace norm) heuristics—a heuristics widely used for rank minimization in the field of Compressive Sensing (CS) (Candès and Plan 2010) and has been widely used in computer vision and machine learning. Potentially, it allows our method to be further improved, e.g. to run much faster or to tackle challenging cases such as cases with missing data or outliers. Note that rank minimization formulation have also been applied to the rigid structure-from-motion problem (Dai et al. 2010; Angst et al. 2011).

Part of the work was published as a conference version in CVPR 2012 (Dai et al. 2012). However, this journal version substantively extends the conference version not only in deep theoretical analysis, but also in experiments.

1.2 Organization

In Sect. 2, we formulate the non-rigid structure from motion problem under the low-order linear combination model with discussion on the orthonormality constraint and inherent basis ambiguity. In Sect. 3, we state the main theory for recovering the Euclidean corrective matrix, i.e., null-space representation and the intersection theorem. Section 4 presents our algorithm solution for motion and shape recovery, which consists of recovering the Euclidean corrective matrix, recovering the camera motion and recovering the non-rigid shape in sequel. Based on recent progress in affine constrained rank minimization, Sect. 5 gives theoretical analysis to these formulations and solutions, i.e., uniqueness, relaxation gap and degenerate cases. Section 6 presents the experimental results with corresponding analysis while Sect. 7 closes the paper with future work.

2 Problem Statement

2.1 Formulation

The task of non-rigid factorization is to factorize an image measurement matrix \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) as the product of camera motion (projection) matrix \(\mathchoice{\displaystyle \mathtt M}{\textstyle \mathtt M}{\scriptstyle \mathtt M}{\scriptscriptstyle \mathtt M}\) and a non-rigid shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), such that \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}=\mathchoice{\displaystyle \mathtt M}{\textstyle \mathtt M}{\scriptstyle \mathtt M}{\scriptscriptstyle \mathtt M}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). We assume the measurement matrix is already centralized, therefore the camera motions reduce to pure rotations (Bregler et al. 2000).

Let us consider a monocular camera (moving or static) observing a non-rigid shape containing \(P\) 3D-points by \(F\) frames, and denote \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_{ij}\) as the \(j\)th 3D shape point in the \(i\)th frame, \(\mathbf{x}_{ij}=[u_{ij},v_{ij}]^T\) as the image point of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_{ij}\). Then

$$\begin{aligned} \mathbf{x}_{ij} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_{ij}, \end{aligned}$$
(1)

where \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i \in {\mathbb {R}}^{2\times 3}\) stands for the first two rows of the \(i\)th camera rotation, hence \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i^T=\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{2}\), i.e., \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i^T\) is a Stiefel matrix (Paladini et al. 2009).

Based on the low-order linear combination model, the non-rigid shape \({\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i} \in {\mathbb {R}}^{3\times P}\) can be represented as a linear combination of \(K\) shape bases \({\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}_k \in {\mathbb {R}}^{3\times P}\) with time-varying shape coefficients \(c_{ik}\) as: \( \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i = \sum _{k=1}^K c_{ik}{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}_k\). Under the orthographic camera model, the coordinates of the 2D image points observed at frame \(i\) are given by: \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_i= \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i\), where \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_i = [\mathbf{x}_{i1} ~\ldots ~\mathbf{x}_{iP}]\). Use this representation, and stack all the \(F\) frames of measurements and all the \(P\) points in a matrix form, we obtain:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}&= \left[ \begin{array}{l@{\quad }l@{\quad }l} \mathbf{x}_{11} &{} \cdots &{} \mathbf{x}_{1P} \\ \vdots &{} &{} \vdots \\ \mathbf{x}_{F1} &{} \cdots &{} \mathbf{x}_{FP} \\ \end{array} \right] = \left[ \begin{array}{l} \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1 \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1\\ \vdots \\ \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_F \\ \end{array} \right] \nonumber \\&= \left[ \begin{array}{l@{\quad }l@{\quad }l} c_{11}\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1 &{} \cdots &{} c_{1K}\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1\\ \vdots &{} \ddots &{} \vdots \\ c_{F1}\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F &{} \cdots &{} c_{FK}\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F \\ \end{array} \right] \left[ \begin{array}{l} {\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}_1 \\ \vdots \\ {\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}_K \\ \end{array} \right] \\&= \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}) \mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B} \doteq \mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}. \nonumber \end{aligned}$$
(2)

In this formula, we call \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} = \hbox {blkdiag}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1,\ldots ,\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F) \in {\mathbb {R}}^{2F\times 3F}\) the camera motion (rotation) matrix. Since \(\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }\in {\mathbb {R}}^{2F\times 3K}\) and \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\in {\mathbb {R}}^{3K\times P}\), it is easy to see: \(\text{ rank }(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W})\le \min (\text{ rank }(\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }),\text{ rank }(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}))\le 3K\). In general case (i.e. non degenerate basis, noise free), the rank of \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) is \(\text{ rank }(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W})=3K\). Additionally, the non-rigid shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3})\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) is low rank too, as \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S})\le \min (\text{ rank }(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}),\text{ rank }(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}))\le 3K\). Notice that \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) is not a general rank \(3K\) matrix but owns its special structure due to its \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}\) factor.

2.2 Orthonormality Constraint

From a measurement matrix \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) one can compute its rank \(3K\) approximation \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\hat{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}\) via SVD (singular value decomposition). However, this decomposition is not unique as any nonsingular matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G} \in {\mathbb {R}}^{3K \times 3K}\) can be inserted between \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\) and \(\hat{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}\) to obtain a new valid factorization as \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\hat{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}=\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}^{-1}\hat{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}= \mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\).

A particular matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) that rectifies \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\) to be a canonical Euclidean form is called the Euclidean corrective matrix, because once such a \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) is determined, one obtains \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3)= \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) and the shape bases \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B} = \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}^{-1}\hat{\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}}\).

Denote the \(i\)th double rows of \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\) as \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1:2i} \in {\mathbb {R}}^{2\times 3K},\) and the \(k\)th column-triplet of \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) as \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \in {\mathbb {R}}^{3K\times 3}\), we have:

$$\begin{aligned} \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1:2i} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k = c_{ik}\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i, \quad i=1,\ldots ,F, k=1,\ldots , K. \end{aligned}$$
(3)

Orthonormality constraints (i.e. rotation constraints) in the \(\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }\) matrix can be imposed to recover a Gram matrix \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\!\in \! {\mathbb {R}}^{3K\times 3K}\) formed by \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \!=\!\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T\) as \( \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1:2i} \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1:2i}^T \!=\! c_{ik}^2 \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{2}\). Since \(c_{ik}\) is not known, one can only establish two linear equations over \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) as:

$$\begin{aligned}&\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1}^{T} = \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i} \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}^{T}, \nonumber \\&\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}^{T} = 0. \end{aligned}$$
(4)

2.3 Inherent Ambiguity

In doing the above non-rigid factorization, Xiao et al. (2004) discovered that, the solutions are however fundamentally ambiguous, in the sense that one cannot expect to find the shape bases and shape coefficients uniquely. Such an inherent ambiguity largely explains why non-rigid factorization is fundamentally more difficult than its rigid counterpart.

Later, Akhter et al. (2009) showed that, quite surprisingly, the fundamental ambiguity does not necessarily lead to an ambiguous shape. They further proved that using the orthonormality constraints alone is in fact sufficient to recover a unique (unambiguous) non-rigid shape, (provided that a previously-overlooked rank 3 constraint on \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) (Eq. (4)) is accounted for). However, apart from its evident theoretical value, their paper did not propose any optimization algorithm (other than a local search algorithm due to Brand (2005)) to efficiently find the correct \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\). Instead, the authors argued that “the real difficulty in achieving good 3D reconstructions for non-rigid structures \(\ldots \) is not the ambiguity of the [basis] constraints, but the complexity of the underlying non-linear optimization”. In this paper, we will challenge this argument, by providing a simple yet efficient (optimization) solution to the NRSfM factorization.

3 Main Theory

In this section and the following section, we will present our main theory and solution for solving the NRSfM factorization problem in a prior-free way. First, we prove that the Euclidean corrective matrix must lie in the intersection of a linear subspace and positive semi-definite low-rank cone. Second, based on these theoretical results, we present our solution in sequel, which includes estimating the Euclidean corrective matrix, recovering the camera rotations and recovering the non-rigid shapes. In contrast to existing methods which mainly rely on the explicit basis representations, our method solves the camera motion and non-rigid shape directly without explicit parameterization with basis matrix and coefficients, thus without dealing with the basis ambiguity.

From now on, we assume the measurement matrix \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) is already truncated to rank \(3K\) (by e.g. SVD); the number of shape bases \(K\) has been estimated; and all the shape bases are sufficiently generic and non-degenerate. We start with a known result of NRSfM factorization.

Theorem 1

All the solutions \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) to the linear system Eq. (4) form a linear subspace of dimensionality (\(2K^2-K\)).

This result is a direct consequence of Xiao et al.’s Theorem in Xiao et al. (2004). It shows that the above linear system is inherently under-determined (as by (\(2K^2-K\)) rank deficient), no matter how many frames are given. Note that when \(K=1\) (rigid case), the dimension of the solution space is \(1\), suggesting that solving rigid problem is not ambiguous at all.

On the other hand, this result also provides us with the true dimensionality of the solution space of \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\), and note that \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) is precisely what we are after. However, in their paper, this practical implication has not been explicitly exploited.

In the following, we will show how one can take advantage of this result, and derive a practical algorithm that directly leads to a parameterization of this solution space. More precisely, we will prove that the solution space of \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) is actually the null-space of a certain matrix \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\) which can be directly obtained from the input image measurements. Practical usefulness of this representation is obvious.

3.1 Null-Space Representation

First, denote \(\hbox {vec}()\) as the vectorization operator, and \(\mathbf{q}_k = \hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\). Using \(\hbox {vec}(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} \mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}^T)=(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\otimes \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A})\hbox {vec}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})\), we rewrite the linear system Eq. (4) as:

$$\begin{aligned} \left[ \begin{array}{l} \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \otimes \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} - \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i} \otimes \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}\\ \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \otimes \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i} \end{array} \right] \mathbf{q}_k\doteq \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}_i\mathbf{q}_k=\mathbf{0}, \end{aligned}$$
(5)

where \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1}, \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}\) denote the \(2i-1\)th and \(2i\)th row of \(\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}\).

Stacking all such equations from all frames (\(i\!=\!1,\dots ,F\)), we then have

$$\begin{aligned} \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} \mathbf{q}_k = \mathbf{0}, \end{aligned}$$
(6)

where \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} = [\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}_1^T, \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}_2^T, \ldots ,\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}_F^T]^T\). This is a linear system of equations over the unknown \(9K^2\)-vector \(\mathbf{q}_k\).

Note that \(\mathbf{q}_k\) has \({(3K)(3K+1)}/{2}\) independent entries. It may appear that, given enough frames, i.e. when \(2F\ge {(3K)(3K+1)}/{2}, \mathbf{q}_k\) should be able to compute via linear least squares. However, this is not the case, because all valid solutions reside in a \(2K^2-K\) dimensional space as shown in Theorem 1. Moreover, Eq. (6) shows, the solution space is nothing but the null-space of \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\). In addition, it is easy to verify that the minimum required number of frames for computing the null-space linearly is \(F\ge (5K^2+5K)/4\).

Finally, we stress that, since matrix \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\) can be readily solved from the input measurements, finding a set of orthonormal bases that span the null-space is simple (e.g. by SVD of \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\)). We denote the basis of the null space spanned by \([\phi _1,\phi _2,\dots ,\phi _{2K^2-K}]\). This provides an explicit parametrization of the solution space. Specifically, we have reached:

Theorem 2

All the valid solutions \(\mathbf{q}_k=\hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\) to the linear system Eq. (6) can be parameterized as \(\mathbf{q}_k=\sum _{l=1}^{2K^2-K}\alpha _l\phi _l.\)

3.2 The Intersection Theorem

Combining all the proceeding results, we now arrive at the central theorem of this paper:

Theorem 3

(Intersection theorem) Under non-degenerate and noise-free conditions, any correct solution of \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) (i.e. the Gram matrix of a column-triplet of the Euclidean corrective matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\)) must lie in the intersection of the (\(2K^2-K\))-dimensional null-space of \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\) and a rank 3 positive semi-definite matrix cone, i.e., \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) belongs to

$$\begin{aligned} \left\{ \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}~ \hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)=\mathbf{0}\right\} \cap \left\{ \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \succeq 0\right\} \cap \left\{ \text{ rank }(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)=3\right\} . \end{aligned}$$
(7)

Proof

Denote \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) as column triplet of a corrective rectifying transform, and \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k = \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T\), then \(\text{ rank }(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k) = \text{ rank }(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k) = 3, \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \succeq 0\). Additionally, \(\hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\) lies in the null space of \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\) as \(\hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\) gives rectification with zero error. Thus \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) is a solution to the equation system which means that the equation system is well-defined.

Denote \(\tilde{\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}}_k\) as solution to the equation system Eq. (7), then \(\text{ rank }(\tilde{\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}}_k) = 3, \tilde{\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}}_k \succeq 0\) and \(\hbox {vec}(\tilde{\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}}_k)\) lies in the null space of the system of linear constraints on the elements \(\tilde{\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}}_k\). Thus all the solutions to the equation system satisfy the condition for rectifying transforms and achieve perfect structure recovery. There is no difference between these solutions.

4 Solution

Armed with the above results (in particular Theorem 3), we are now ready to present our simple prior-free method to the NRSfM factorization problem.

Recall that our goal is to recover both the camera motion matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) and the non-rigid shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) from image measurement matrix \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), such that \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}= \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} =\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3})\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B} \). Note that due to the inherent basis ambiguity, it is hopeless to recover a unique \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) or \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\). While in previous work many researchers chose to use a pre-defined special shape bases \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) or trajectory bases in the dual trajectory space (or enforce arbitrary priors on the shape bases or shape coefficients,) to pin down the undetermined degrees-of-freedom, in this work we will show how one can directly estimate \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) without fixing \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) or \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\).

Our method consists of three steps to be applied in sequel: (1) Estimate the (Gram of) Euclidean corrective matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\), (2) Estimate the camera rotations \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) and (3) Estimate the non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\).

4.1 Step 1: Estimate \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) by Trace-Norm Minimization

Our main intersection theorem (Theorem 3) naturally leads to a simple algorithm to solve for \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\), that is: to find the intersection of the aforementioned null-space and the rank 3 positive semi-definite matrix cone.

Because the rank-function itself is not very numerically-stably, measurements noise will increase the numerical rank of \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) dramatically, we slightly relax the \(\text{ rank }(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)=3\) condition to a rank-minimization problem, i.e. \(\min \text{ rank }(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\) (Under noise-less condition, these two conditions are equivalent to each other with zero relaxation gap). Note that however, rank-minimization is an NP-hard problem in general, and is very difficult to solve exactly. We therefore further relax it to a nuclear-norm minimization form, i.e., \(\min \Vert \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\Vert _*\). The nuclear norm of a matrix is the tightest convex bound of its rank, and for a positive semi-definite matrix, its nuclear norm is reduced to its trace norm (Recht et al. 2010; Angst et al. 2011). This is the case for our problem in question, because \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) is a positive semi-definite matrix. Thus we have \(\Vert \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\Vert _{*} = \hbox {trace}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\).

Then we arrive at the following trace norm minimization formulation to solve the Gram of Euclidean corrective matrix \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\).

$$\begin{aligned}&\min ~ \hbox {trace}\left( \mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\right) ,\,\, \hbox {such}\,\, \hbox {that,}\, \nonumber \\&\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k \succeq 0,\\&\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} ~\hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)=\mathbf{0}. \nonumber \end{aligned}$$
(8)

To avoid the trivial solution \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\), we express \(\hbox {vec}(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k)\) in explicit form of the null space representation thus excluding the case of having all-zero weights.Footnote 1

It is easy to recognize that the above trace norm minimization problem is a standard semi-definite programming (SDP). Also note that this SDP is actually very small, and of a fixed size of \(2K^2-K\) which is independent of the size of the measurement matrix. Thus this SDP problem can be solved very efficiently by any off-the-shelf SDP solver. Once \(\mathchoice{\displaystyle \mathtt Q}{\textstyle \mathtt Q}{\scriptstyle \mathtt Q}{\scriptscriptstyle \mathtt Q}_k\) is found, we use SVD to extract an exact rank 3 \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\). This solved \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) can be directly used to find \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) and then \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). Alternatively, if higher accuracy is desired, one can further improve the numerical accuracy of \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) by feeding it as an initial point to a non-linear refinement procedure, such as via the following unconstrained minimization:

$$\begin{aligned} \min \sum _{i=1}^F\left[ \left( 1 - \frac{\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}^T}{\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1}^T}\right) ^{2} \right. \\ \quad \,\, \left. + \left( 2\frac{\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i}^T}{\hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k^T \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1}^T}\right) ^{2}\right] , \end{aligned}$$

where the objective function is nothing but the orthonormality condition and this refinement is similar to a bundle-adjustment (Triggs et al. 2000) process.

4.2 Step 2: Compute Rotation Matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\)

Conventionally, once \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k \in {\mathbb {R}}^{3K\times 3}\) is solved (w.l.o.g., let’s denote it as \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_1\)), which is merely a single column-triplet in the full Euclidean corrective matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G} \in {\mathbb {R}}^{3K\times 3K}\), the commonly used next step is to solve for the other \(K-1\) column-triplets \([\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_2,\dots ,\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_K]\) and use them to populate the entire matrix \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\). Brand (2005) proposed a linear method to solve for the big \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) (for affine case). Because these \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\)s always have rotation ambiguity, in order to align them, Procrustes method must be employed subsequently (c.f. Xiao et al. 2004; Akhter et al. 2009). Once the big \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) is obtained, one then is allowed to compute the camera motion \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\), the shape coefficients matrix \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\) and the shape bases matrix \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\), and then reconstruct the non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). However, the above Procrustes-based approach is not only rather involved, but also not numerically stable. More importantly, it is not necessary, as shown in Akhter et al. (2008).

In this work, we adopt a simpler approach that directly computing the camera motion \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) (and directly computes the non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\)) from a single column-triplet \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\), without the need to fill in a big and full \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) matrix. The method goes as follows. Once \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k\) is solved, the rotation at every frame \(i=1,\ldots ,F\) can be solved by using:

$$\begin{aligned} \hat{\mathchoice{\displaystyle \mathtt \Pi }{\textstyle \mathtt \Pi }{\scriptstyle \mathtt \Pi }{\scriptscriptstyle \mathtt \Pi }}_{2i-1:2i} \mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}_k = c_{ik} \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_i, ~ i=1,\ldots ,F. \end{aligned}$$
(9)

Note that we need not care about the unknown value of \(c_{ik}\). However, there is a sign ambiguity in the obtained rotation. Assume the estimated rotation at the \(f\)th frame as \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\), denote the unit vector orthogonal to \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\) as \(\varvec{r}_f\), then both \([\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f; \varvec{r}_f^T]\) and \([-\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f; \varvec{r}_f^T]\) are valid rotation. Different sign selection for rotations results in recovered shape with cheirality (Hartley and Zisserman 2004), i.e., \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f = [\varvec{S}_{fx}; \varvec{S}_{fy}; \varvec{S}_{fz}]\) and \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f = [-\varvec{S}_{fx}; -\varvec{S}_{fy}; \varvec{S}_{fz}]\). This ambiguity can be fixed via constraining the rotation between any two consecutive frames not more than \(\pm 90^\circ \) (Akhter et al. 2009). Finally, the full motion matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) is formed as \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}=\hbox {blkdiag}([\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1,\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_2,\dots ,\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F])\).

4.3 Step 3: Estimate \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) by Rank-Minimization

Now we show how to solve the non-rigid shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). Most conventional methods do this indirectly, in the sense that they often start from solving the big \(\mathchoice{\displaystyle \mathtt G}{\textstyle \mathtt G}{\scriptstyle \mathtt G}{\scriptscriptstyle \mathtt G}\) matrix firstly, and then use pre-selected special shape bases \(\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) (such as the first \(K\) frame (Xiao et al. 2004), or DCT bases in the dual trajectory space (Akhter et al. 2008), or assume the the shape coefficients are also DCT-expandable (Gotardo and Martinez 2011a)), then the corresponding coefficient matrix \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\) can be determined, and also the shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}=(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3)\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\).

In the following subsections, we will provide simpler, more direct methods for solving \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\).

4.3.1 Pseudo Inverse Method

Recall that our goal is to solve \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) through the equation of \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3)\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) given \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) and \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\). This equation is under-determined because \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) is a short matrix of size \(2F\times 3F\). There should be no unique but an infinite family of solutions of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). However, we also notice that: the low-order linear combination model, i.e. \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3)\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\) immediately suggests that \(\hbox {rank}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}) \le 3K\).

However, special care has to be taken here that \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) is not a general rank \(3K\) matrix but owns its special structure as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3)\mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\). Here we relax the constraint and solve for general rank \(3K\) matrix first.

Taking into account of both the above arguments, we reach: a valid relaxed solution to the shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) must lie in the intersection of a low rank matrix set \(\{\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S})\le 3K\}\) and the solution space of the linear equation \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). As usual we equally transform the low-rank condition to rank-minimization formulation (under noise-less and non-degenerate case). Now the shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) must be a solution to the following rank minimization problem:

$$\begin{aligned}&\min \text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}), \hbox {such} \,\, \hbox {that}, \nonumber \\&\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}. \end{aligned}$$
(10)

This rank minimization formulation can be relaxed to nuclear norm minimization as:

$$\begin{aligned}&\min \Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert _{*}, \hbox {such} \,\, \hbox {that}, \nonumber \\&\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}. \end{aligned}$$
(11)

Remarks

We now make two important remarks: (1) The above rank minimization problem (in the context of NRSfM) accepts the ground truth shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) as a solution; (2) Computationally, the above solution may be solved by the (unique) Moore–Penrose pseudo inverse, i.e., \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^* =\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag }\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\).

Remark 1 is simply the main conclusion of Akhter et al. (2009), which states that: once \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) is fixed, there is no ambiguity in finding a low rank shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) (assuming the shape is sufficiently non-degenerate).

Remark 2 was a bit surprising to the authors, as it seems to suggest a simple linear and closed-form solution to our original rank-minimization problem of (11)-which is NP-hard to solve in general.

Fortunately, recent progress in Compressive Sensing has confirmed the correctness of out Remark 2. In particular, we use the following result due to Liu et al. (2013): the Moore–Penrose pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag }\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) is the unique minimizer of the nuclear norm minimization problem (11), and is also one minimizer of the rank minimization problem (10). The reader is referred to Liu et al. (2013) for a detailed proof. From this result, we see that the pseudo-inverse solution is indeed a possible solution that satisfies two necessary conditions (i.e., satisfying both the imaging equation and relaxed low rank conditions), though care must be given when discussing the equality of the approximation (i.e., relaxation gap) as well as the existence and uniqueness (multiplicity) issue associated with both the original rank minimization problem and the relaxed nuclear norm minimization problem (to be presented in Sect. 5.3).

We have numerically tested the pseudo-inverse method on both synthetic data and real data. Judging only from the numerical result in terms of 3D shape-recovery accuracy, the pseudo inverse solution outperforms Xiao et al.’s shape basis method by a large margin, and achieves comparable performance with metric-projection (Paladini et al. 2009) and EM-PPCA (Torresani et al. 2008). The reader is referred to the second last column of Table 2. Note that however, these methods (including our pseudo inverse method) are still inferior to the more recent DCT trajectory basis method (Akhter et al. 2008) and the CSF method (Gotardo and Martinez 2011a) (both methods rely on strong smoothness prior).

This inferiority prompted the authors to think further: perhaps the rank \(3K\) condition enforced upon \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) is not strong enough or sufficient? Pondering this question has led us to the following “block matrix method” which gives more favorable performance and theoretical guaranty.

4.3.2 Block Matrix Method

In the pseudo-inverse method, we mainly make use of the rank \(3K\) condition on deformable shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) that is \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S})\le 3K\). This \(3F\times P\) matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) is simply a stack of \(P\) 3D points \([X_i,Y_i,Z_i]^{T}\) over \(F\) frames.

However, we realize that, since in reality there are in fact only \(K\) shape bases (rather than \(3K\)), the shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) is not a fully-generic rank \(3K\) matrix, but has its special block structure indicated as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = (\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3) \mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}\). Ignoring this special structure will add spurious degrees-of-freedom (hence ambiguities) to the problem. Next we will show how to get a stronger (yet meaningful) rank-minimization formulation, considering the nature of the NRSfM factorization.

In particular, if we re-arrange the rows of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) that correspond to \(X,Y\), and \(Z\) coordinate separately, and in an \(F\times 3P\) block matrix form, denoted by \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) in belowFootnote 2:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }=\left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} X_{11} &{} ..&{} X_{1P} &{} Y_{11} &{}..&{} Y_{1P} &{} Z_{11} &{} .. &{} Z_{1P}\\ \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots \\ X_{F1} &{} ..&{} X_{FP} &{} Y_{F1} &{}.. &{} Y_{FP} &{} Z_{F1} &{}..&{} Z_{FP}\\ \end{array} \right] . \end{aligned}$$

Notice that under the linear combination model, the non-rigid shape is expressed as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i = \sum _{k=1}^K c_{ik} \mathchoice{\displaystyle \mathtt B}{\textstyle \mathtt B}{\scriptstyle \mathtt B}{\scriptscriptstyle \mathtt B}_k\), where

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i = \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} X_{i1} &{} X_{i2} &{} \cdots &{} X_{iP} \\ Y_{i1} &{} Y_{i2} &{} \cdots &{} Y_{iP} \\ Z_{i1} &{} Z_{i2} &{} \cdots &{} Z_{iP} \\ \end{array} \right] . \end{aligned}$$

Thus, it is easy to see that the low order constraint in the linear combination model has been equivalently expressed in the low rank condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) rather than the low rank condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\). Then we must have: \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp })\le K\). Since the row rank and column rank of a matrix are equal, this condition shows that non-rigid shape has a low rank property (rank \(K\)) in both shape space and trajectory space. Note that this rank \(K\) condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) is much stronger than the above rank \(3K\) condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) (A rank \(K\) condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp \) guarantees a rank \(3K\) condition on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), however the inverse conclusion does not exist in general.). The form of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) captures the essence of the \(K\)-order linear combination model.

By defining proper row-selection matrices \(\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X,\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y,\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z\in {\mathbb {R}}^{F\times 3F}\) (of 0–1 values, similar to a row-permutation matrixFootnote 3), we can compactly represent the relationship between \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) and \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp } = [\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{X} \,\, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{Y} \,\, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{Z}] (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}\otimes \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}). \end{aligned}$$
(12)

Now, to solve for this re-arranged block shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\), we use, again, a rank-minimization formulation:

$$\begin{aligned}&\min \hbox {rank}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }),\hbox {such} \,\, \hbox {that,}\, \nonumber \\&\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} =\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S},\\&\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }=[\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X ~~\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y ~~\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z](\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3\otimes \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}). \nonumber \end{aligned}$$
(13)

We relax the above rank minimization model with the nuclear norm heuristics, i.e., \(\min \Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\Vert _{*}\), thus reach the following nuclear norm minimization model:

$$\begin{aligned}&\min \Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp } \Vert _{*},\hbox {such}\,\, \hbox {that,}\, \nonumber \\&\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} =\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S},\\&\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }=[\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X ~~\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y ~~\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z](\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3\otimes \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}). \nonumber \end{aligned}$$
(14)

In principle, this nuclear norm minimization problem may be solved by a standard SDP solver such as Sedumi (Sturm 1999) and SDPT3 (Toh et al. 1999). However, unlike the case of Sect. 4.1 where the resulted-SDP has a small and fixed size, here this SDP is of size \(F\times 3P\), which renders an off-the-shelf SDP solver very inefficient when either \(P\) or \(F\) is large.

Below, we give an efficient approximate numerical implementation, based on the fixed point continuation algorithm (Ma et al. 2011).

4.3.2.1 4.3.2.1 Fast Numerical Implementation

We first rewrite the above minimization Eq. (14) in its Lagrange multiplier form as:

$$\begin{aligned}&\min \mu \Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\Vert _{*}+\frac{1}{2}\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}-\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert _{\mathrm {F}}^{2}, \hbox {such} \,\, \hbox {that}, \nonumber \\&\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }=[\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{X} \, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{Y} \, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_{Z}](\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}\otimes \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}), \end{aligned}$$
(15)

where \(\mu \) is the continuation (homotopy) parameter which diminishes as the algorithm iterates.

Next, the gradient of \(\frac{1}{2}{{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}-\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}}\) with respect to \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\) is obtained as:

$$\begin{aligned} g(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp })&= \frac{\partial \frac{1}{2} {{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} - \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^2}}{\partial \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }} \nonumber \\&= [\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X ~~ \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y ~~ \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z](\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_{3}\otimes (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}-\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}))). \end{aligned}$$
(16)

Then, we solve the minimization of Eq. (15) via the following two-line iteration update (c.f. Ma et al. 2011):

$$\begin{aligned} \left\{ \begin{array}{l} \mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}^{(k)} = \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp (k)}- \tau g(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp (k)}), \\ \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp (k+1)} = {\mathcal {S}}_{\tau \mu } (\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}^{(k)}), \\ \end{array} \right. \ \end{aligned}$$
(17)

where \(\tau \) is the step size of gradient descent, and \({\mathcal {S}}_v(\cdot )\) is the matrix shrinkage operatorFootnote 4(c.f. Ma et al. 2011). Once we have recovered the re-arranged shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp }\), we first project it to the nearest rank \(K\) matrix (note: not \(3K\)) via SVD, then rearrange it to \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\).

Thus we propose the fixed point continuation based algorithm for non-rigid shape recovery in Algorithm 1, where the initial \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\sharp (0)}\) is initialized by the pseudo-inverse method.

figure a

Convergence analysis of fixed point continuation algorithm for general matrix rank minimization has been given in Goldfarb and Ma (2011). Currently, the main computation takes place in the singular value shrinkage (SVT), where SVD is utilized during each iteration. Recent progress in compressive sensing and matrix completion such as SVT without SVD (Cai and Osher 2010) and alternating direction method (ADM) (Lin et al. 2010; Yang and Yuan 2013) can be used to further speed up the implementation.

4.3.3 Closed-Form Solution with Smooth Deformation Constraint

In this subsection, we introduce a modified version of our pseudo-inverse method by incorporating the smoothness prior. Although this topic is deviated from the main thesis of prior-free of the present paper, our intention is however to illustrate how sensible priors—if available—can be applied to further improve the quality of the solution.

By introducing smooth deformation prior,Footnote 5 we can formulate the non-rigid shape recovery problem as minimizing a data term evaluated on the image measurements and a regularization term (smoothness), thus reaching the following formulation:

$$\begin{aligned} \min _{\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}} \frac{1}{2} {{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} - \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^2} + \frac{1}{2}\lambda {{\Vert \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}}, \end{aligned}$$
(18)

where the first term measures the reconstruction error of the recovered motion and non-rigid shape while the second term measures the smoothness constraint. \(\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}\) can be chose as different pattern to characterize various kinds of smoothness, e.g. first order smoothness as in Eq. (19), second order smoothness, etc.

$$\begin{aligned} \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}_{ij} = \left\{ \begin{array}{l@{\quad }l} 1, &{} j=i,i=1,\ldots ,3(F-1), \\ -1,&{} j=i+3,i=1,\ldots ,3(F-1),\\ 0, &{} \hbox {Otherwise}. \end{array} \right. \end{aligned}$$
(19)

The optimization problem in Eq. (18) admits an analytical (closed-form) solution too, as shown below. First, we take the derivative of Eq. (18) with respect to \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) and set it to zero:

$$\begin{aligned} (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} + \lambda \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}^T\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H})\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}, \end{aligned}$$
(20)

which has a closed-form solution as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} + \lambda \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}^T\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H})^{\dag }\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}. \end{aligned}$$
(21)

The rotation matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) is of row full rank \(2F\) thus \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) is of rank \(2F\) generally. The smoothness matrix \(\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}\) is rank deficient too, thus \(\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}^T\mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}\) is of rank \(3F-3\) (for first order smoothness). In general case, \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} + \lambda \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}^T \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H}\) is full rank, thus invertible. We can replace the pseudo-inverse operation with matrix inverse, thus speeding up the implementation (matrix inverse is much easier and efficient than matrix pseudo-inverse).

There is a trade-off parameter \(\lambda \) in the closed-form solution. It trades off between reconstruction ability and smoothness. When \(\lambda \) approaches 0, the solution approaches \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag } \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), i.e. the pseudo-inverse solution. When \(\lambda \) is large enough, the solution approaches a rigid shape, which minimizes the combination of \({{\Vert \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}}\) and \({{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} - \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} \Vert }_\mathrm{F}^{2}}\). When \(\lambda \) approaches \(+\infty \), the solution approaches trivial solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\), which will make \({{\Vert \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}} \rightarrow 0\).

The reader may wonder how to select a good trade-off parameter as it trades-off between reconstruction error and smoothness. Here we give some empirical results. On the motion capture dataset “Stretch”, given the ground truth motion \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) and noisy measurements \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), we recover the deformable shape with different trade-off parameter \(\lambda \) ranging from \(10^{-15}\) to \(10^5\). Relationships between the normalized mean 3D error, the smooth cost function \({{\Vert \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}}/(F-1)\), the reprojection error on the measurements \(\sqrt{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} - \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} \Vert _\mathrm{F}^{2}/(FP)}\) and the trade off parameter \(\lambda \) are illustrated in Fig. 1. It is obviously to observe that there is a large interval of \(\lambda \) where the normalized mean 3D error is almost constant which suggests that the normalized mean 3D error is not sensitive the selection of \(\lambda \). Meanwhile, we confirm that when \(\lambda \) approaching 0, the result approaches the pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag }\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) which consists with the theoretical analysis above.

Fig. 1
figure 1

Performance evaluation under varying trade off parameter \(\lambda \) on motion capture sequence Stretch. (a) Normalized mean 3D error; (b) smooth cost function \({{\Vert \mathchoice{\displaystyle \mathtt H}{\textstyle \mathtt H}{\scriptstyle \mathtt H}{\scriptscriptstyle \mathtt H} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert }_\mathrm{F}^{2}}/(F-1)\); and (c) reprojection error on the image measurements

4.3.4 Connecting the Block Matrix Method and the Pseudo-Inverse Method

In this part, we briefly discuss the relationship between our block matrix method and pseudo-inverse method.

First, in this work, we aim at solving the non-rigid structure from motion problem in a prior-free way by capturing the essential low-rank constraint in the linear combination model. Our first attempt was to utilize the low-rank property of the non-rigid shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) directly. This effort had lead us first to the closed-form pseudo-inverse solution, which itself is an important step which provides new insight to the understanding of the problem. Specifically, by experimenting the pseudo-inverse method we realized that it cannot fully capture the structured low-rank constraint of the NRSfM model. This had motivated us to explore further, and we made the second discovery of the reshuffled shape matrix form \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp \). A low rank constraint on \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp \) captures completely the low-order linear combination model in explaining the image measurements. Therefore, the pseudo-inverse method works as the starting point in our prior-free solution for NRSfM.

Second, currently, the block matrix method for non-rigid shape recovery does not own a closed-form solution as in the pseudo-inverse method. Therefore, we proposed a fixed-point continuation based iterative method to solve it efficiently. The pseudo-inverse solution is utilized as initialization for the block matrix method to achieve accelerated convergence.

Third, in the theoretical analysis section, we will provide further analysis to the pseudo-inverse method as well as the block matrix method. According to the analysis, we note that valid non-rigid shapes lie in an affine subspace, where the pseudo-inverse solution achieves both nuclear norm and Frobenius norm minimization. The pseudo-inverse method explains away the image measurements with minimum Frobenius norm, where its solution appears to be a collection of coplanar shapes at each frame. This offers deeper insight to the NRSfM problem.

5 Theoretical Analysis

In the above sections, we have presented our main theory and numerical implementations of our simple “prior free” method to factorization based NRSfM, where we have applied the nuclear norm heuristics to the rank minimization formulation for motion and shape recovery. Then a theoretical problem naturally arises, i.e., can our formulation solve the original problem exactly and uniquely? In this section, we give further theoretical analysis to the formulation, regarding the uniqueness of the rank minimization, and relaxation gap in nuclear norm heuristics. Finally, we relate our theoretical results with existing work in NRSfM.

We first review the formulation of affine constrained rank minimization and nuclear norm minimization. Then we review theoretical results on null space analysis to affine constrained rank minimization. Finally, we apply these results to the pseudo-inverse method and block matrix method.

5.1 Affine Constrained Rank Minimization

5.1.1 Affine Rank Minimization

The affine rank minimization problem (ARMP) (Recht et al. 2010) aims at finding a matrix of minimum rank that satisfies a given system of linear equality constraints, i.e.,

$$\begin{aligned} \min \text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}), \hbox {such} \,\, \hbox {that}, {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})= \varvec{b}, \end{aligned}$$
(22)

where \({\mathcal {A}}: {\mathbb {R}}^{n_1 \times n_2} \rightarrow {\mathbb {R}}^m\) is a linear mapping operator. The linear mapping can always be expressed in matrix form as:

$$\begin{aligned} {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) = \mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} \hbox {vec}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}), \end{aligned}$$
(23)

where \(\hbox {vec}(\cdot )\) denotes vectorization of \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) with its columns stacked in order on top of one another, and \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A} \in {\mathbb {R}}^{m\times n_1 n_2}\).

5.1.2 Nuclear Norm Minimization

The affine rank minimization problem Eq. (22) is generally NP-hard (Recht et al. 2010). A common trick is to replace the rank function with the nuclear norm of the matrix, defined as \(\Vert \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\Vert _{*} = \sum _{i=1}^r \sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})\) where \(\sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})\) are the \(i\)th singular values and \(r = \text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})\).

By introducing the nuclear norm heuristics, the above affine rank minimization problem are approximated by the following nuclear norm minimization formulation:

$$\begin{aligned} \min \Vert \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\Vert _{*}, \hbox {such} \,\, \hbox {that}, {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X})=\varvec{b}, \end{aligned}$$
(24)

which is a convex optimization problem and can be recast as semi-definite programming (SDP).

5.2 Null Space Analysis

When we use the nuclear-norm relaxation to solve an original rank-minimization problem, a key theoretical question naturally arises: can the nuclear-norm relaxation always recover the exact rank-minimization solution, or is the relaxation gap tight? Conventionally, this kind of “relaxation gap questions” are answered through some statistical analysis of the incoherent conditions, or restricted isometry property (RIP), etc. developed in the field of Compressive Sensing. Recht et al. (2010) extended it to matrix rank minimization problem and proposed to use RIP to characterize the condition of unique recovery. However RIP is not invariant with respect to any invertible map, and relies crucially on particular stochastic assumption about the measurement matrix (e.g. with Gaussian entries).

In this section, we take a different, and more general viewpoint to look at the problem. As for any relaxation method, while we need to characterize the properties of the nuclear-norm relaxation itself, we also need to analyze when the original problem of rank-minimization admits unique solution. In fact, in the ARMP setting, the ability to recover \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) from \({\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) = \varvec{b}\) depends exclusively on the null-space of \({\mathcal {A}}\), which is invariant under invertible map.

First, we review recent progress for the rank minimization problem. Based on the analysis of the null-space property of \({\mathcal {A}}\), Eldar et al. (2012) presented the following theorem regarding the uniqueness of rank minimization problem.

Theorem 4

(Eldar et al. 2012) When \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}^{'}\) are each rank \(r\), then \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}^{'}\) is at most rank \(2r\). To guarantee that (22) reconstruct all rank \(r\) matrices, a necessary and sufficient condition is that there are no rank \(2r\) (or less) matrices in the null space of \({\mathcal {A}}\).

Here we provide our simplified proof.

Proof

Sufficiency: Suppose there are no rank \(2r\) or less matrices in the null space of \({\mathcal {A}}\), if the rank minimization formulation is not not unique, i.e., \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2\) achieve identical measurements \({\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1) = {\mathcal {A}}(X_2)\) with rank \(r\), then \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1 - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2\) lies in the null space of \({\mathcal {A}}\), additionally \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1 - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2) \le 2r\), which conflicts the assumption. Therefore, if there are no rank \(2r\) or less matrices in the null space of \({\mathcal {A}}\), then (22) reconstructs all rank \(r\) matrices.

Necessity: Suppose (22) reconstructs all rank \(r\) matrices, i.e. the rank minimization solution is unique. If there is a rank \(2r\) or less matrices \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}\) in the null space of \({\mathcal {A}}\), then we can always construct two rank \(r\) matrices \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2\) such that \({\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1) = {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2)\).

One direct way is: Denote the SVD of \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}\) as \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y} = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_Y \Sigma _Y \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}_Y^T, \Sigma _Y = \hbox {diag}(\sigma _1, \sigma _2, \ldots , \sigma _{2r})\), by letting

$$\begin{aligned} \Sigma _{X_1} = \hbox {diag}(\sigma _1, \sigma _2, \ldots , \sigma _{r}, 0, \ldots , 0)\end{aligned}$$

and

$$\begin{aligned} \Sigma _{X_2} = \hbox {diag}(0,\ldots ,0, -\sigma _{r+1}, -\sigma _{r+2}, \ldots , -\sigma _{2r}), \end{aligned}$$

we obtain \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1 = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_Y \Sigma _{X_1} \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}_Y^T \) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2 = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_Y \Sigma _{X_2} \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}_Y^T \) such that \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1) = \text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2) = r\) and \({\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1) = {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2)\). This conflicts with the assumption. Therefore we can conclude that if (22) reconstruct all rank \(r\) matrices, then there are no rank \(2r\) or less matrices \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}\) in the null space of \({\mathcal {A}}\).

In Fig. 2, we illustrates the relationship between rank \(r\) manifold and rank \(2r\) manifold. Let two rank \(r\) matrices be \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2\), then \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}_1 = \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_1 - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_2\) lies in the matrix manifold \(\text{ rank }(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}) \le 2r\). Note that, given a matrix \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}_2\) in the matrix manifold \(\text{ rank }(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}) \le 2r\), we can always construct pairs of rank \(r\) matrices \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_3\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_4\) or \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_5\) and \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_6\) such that \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}_2 = \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_3 - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_4\) and \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}_2 = \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_5 - \mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_6\). It is worth noting that the relationship is not necessarily full coverage, i.e., it is possible that some matrix \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) with \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) = r\) can not be reached from any matrix \(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}\) with \(\text{ rank }(\mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}) \le 2r, \mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y} \ne 0\) such that \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} + \mathchoice{\displaystyle \mathtt Y}{\textstyle \mathtt Y}{\scriptstyle \mathtt Y}{\scriptscriptstyle \mathtt Y}) = r\).

Fig. 2
figure 2

Relationship between rank \(r\) matrix manifold and rank \(2r\) matrix manifold

The above analysis gives the condition of uniform recovery under rank minimization formulation, for the nuclear norm heuristics relaxed version, we have the following result.

Theorem 5

(Oymak and Hassibi 2010; Oymak et al. 2011) Null space condition for low-rank recovery: Let \({\mathcal {A}}\) be a linear measurement operator, then one can recover all matrices \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_0\) with \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_0) = r\) via nuclear norm minimization (24) if and only if for any \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} \in {\mathcal {N}}({\mathcal {A}})\) we have

$$\begin{aligned} \sum _{i=1}^r \sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) < \sum _{i=r+1}^{n_1} \sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}). \end{aligned}$$
(25)

This theorem gives the uniform recovery condition for rank minimization via nuclear norm relaxation, i.e. recovering all the low rank matrices \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_0) = r\) such that \({\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) = {\mathcal {A}}(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}_0)\) via nuclear norm minimization. When this condition is not satisfied, it does not necessarily mean we can not recover a specific rank \(r\) matrix \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) from nuclear norm minimization. When the nuclear norm minimization can not recover the low rank solution, there are two possibilities: (1) the nuclear norm minimizer in the solution space does not achieve rank minimization and (2) there are multiple rank minimizers and the nuclear norm minimizer is one of the rank minimizers.

Theorem 5 is much stronger than Theorem 4. Given the condition that \(\sum _{i=1}^r \sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) < \sum _{i=r+1}^{n_1} \sigma _i(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) \), we must have \(\text{ rank }(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}) > 2r\), thus there are no rank \(2r\) or less matrices in the null space of \({\mathcal {A}}\), the condition for Theorem 4 is satisfied naturally. This can be explained that to recover all rank \(r\) matrices through nuclear norm minimization, the original rank minimization problem should have unique solution as well as the relaxation gap from rank minimization to nuclear norm minimization is zero.

5.3 Analysis of the Pseudo-Inverse Method

In this subsection and following subsection, we apply the above theoretical results to our formulation of pseudo-inverse method and block matrix method, giving analysis on uniqueness, general low rank solution and relaxation gap. Note that in the theoretical analysis, we make the assumption of noise-lees image measurement and non-degenerate shapes.

By introducing matrix vectorization operator and Kronecker product, the shape recovery equation \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) can be equivalently expressed as:

$$\begin{aligned} \hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}) = (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}) = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^* \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}), \end{aligned}$$
(26)

where \(\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \in {\mathbb {R}}^{P\times P}\) denotes the identity matrix. Thus the deformable shape recovery problem is transformed to:

$$\begin{aligned}&\min \text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}), \hbox {such} \,\, \hbox {that}, \nonumber \\&\hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}) = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{*} \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}). \end{aligned}$$
(27)

Therefore, it is easy to check that this problem falls into the family of affine rank minimization problem Eq. (22) except that \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{*}\) has a special block matrix form.

In NRSfM, the motion matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \in {\mathbb {R}}^{2F\times 3F}\) is a row full rank matrix and \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag } = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{T}\). Therefore the pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{*} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\dag } \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) lies in the range space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\). Denote the null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) as \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})\). The solution space to the homogeneous equation \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\) can be expressed as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\), where \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \in {\mathbb {R}}^{F \times P}\) is an arbitrary coefficient matrix.

Given the pseudo inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{*} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{T} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) and the null space \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})\), the solution system to \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) can be expressed as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{T} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} + {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}, \end{aligned}$$
(28)

which can be equivalently formulated as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \left[ \begin{array}{l@{\quad }l} \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T &{} {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \\ \end{array} \right] \left[ \begin{array}{l} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} \\ \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} \\ \end{array} \right] = \left[ \begin{array}{l@{\quad }l} \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T &{} {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \\ \end{array} \right] \mathchoice{\displaystyle \mathtt M}{\textstyle \mathtt M}{\scriptstyle \mathtt M}{\scriptscriptstyle \mathtt M} . \end{aligned}$$
(29)

Note that \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\) and \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})\) are orthogonal to each other, i.e. \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\) and \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \oplus {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) = {\mathbb {R}}^{3F \times 3F}\). Therefore \([\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T ~ {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})] \) is a full rank (non-singular) matrix and \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}) = \text{ rank }(\mathchoice{\displaystyle \mathtt M}{\textstyle \mathtt M}{\scriptstyle \mathtt M}{\scriptscriptstyle \mathtt M})\). To make \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) lies in rank \(3K\) subspace, i.e. \(\text{ rank }(\mathchoice{\displaystyle \mathtt M}{\textstyle \mathtt M}{\scriptstyle \mathtt M}{\scriptscriptstyle \mathtt M}) = 3K\), \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\) has to lie in the row space of \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), i.e. \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} = \mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), where \(\mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D} \in {\mathbb {R}}^{F \times 2F}\) is an arbitrary matrix.

Therefore, we conclude that for rank-minimization formulation (10), there are always multiple solutions lying in an affine subspace:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} + {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}. \end{aligned}$$
(30)

From this implicit rank minimization formulation, we can regularize the solution subspace in order to obtain solutions with better or desired properties such as smooth solution and sparse solution.

5.3.1 Frobenius Norm Minimization and Nuclear Norm Minimization

Denote the SVD of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{*}=\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{*} = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_1 \mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D}_1 \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}_1^T\), the SVD of \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}\) as \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_2 \mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D}_2 \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}_2^T\). Note that \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\) and \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})\) are orthogonal spaces to each other, thus \(\mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_1\) is orthogonal to \(\mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U}_2\). The Frobenius norm of all the solutions \(\Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\Vert _F^2 = \sum \sigma _{i1}^2 + \sum \sigma _{i2}^2\), where \(\sigma _{i1}\) and \(\sigma _{i2}\) are diagonal elements of \(\mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D}_1\) and \(\mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D}_2\) respectively. Apparently, when \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C} = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\), i.e. the pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^* = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) achieves Frobenius norm minimization. Additionally, Liu et al. (2013) showed that the pseudo-inverse solution i.e. \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^* = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T\) achieves nuclear norm minimization.

Thus we have shown that within the general solutions \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} + {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}) \mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), the pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) achieves both Frobenius norm minimization and nuclear norm minimization.

5.3.2 Degenerate Shape of Pseudo-Inverse Solution

For the pseudo-inverse solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^* = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\), due to the diagonal property of the rotation matrix \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\), for the \(f\)th frame, we have \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f\), where \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f \in {\mathbb {R}}^{2\times 3}\) denotes the rotation at the \(f\)th frame and \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f\) is the corresponding image measurement at the \(f\)th frame. Thus, it is easy to observe that \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f) = 2\), i.e., the non-rigid shape degenerates to be coplanar at each frame. This is due to the Frobenius norm minimization property of the pseudo-inverse solution, which explains the image measurements with minimal energy. We illustrate the degenerate shape of pseudo inverse solution of the Pickup sequence in Fig. 3, where each line in Fig. 3a corresponds to the projection of non-rigid shape at each frame. Note that, after low rank projection, the solution becomes non-degenerate.

Fig. 3
figure 3

Non-rigid shape recovery results on the Pickup sequence from the pseudo-inverse method, note that no rank \(K\) projection has been enforced. We illustrate the recovered non-rigid shapes at each frame. Obviously, non-rigid shape degenerates to coplanar configuration at each frame. (a) Non-rigid shapes at each frame, where the blue points show the shapes from the pseudo-inverse method and the red points show the ground truth 3D shapes. (b) Another viewpoint of the non-rigid shapes (Color figure online)

5.3.3 Geometric Interpolation

For the rotation at each frame \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f \in {\mathbb {R}}^{2\times 3}\), the null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\) can be obtained as \(\varvec{r}_f \in {\mathbb {R}}^{3\times 1}\) such that \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f \varvec{r}_f = \varvec{0}\). \(\varvec{r}_f\) can be retrieved from the SVD of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\) directly. Due to the diagonal property of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}, {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R})\) is easily arrived at as the diagonal matrix of \([\varvec{r}_1, \varvec{r}_2, \ldots , \varvec{r}_F]\).

For the non-rigid shape at each frame \(f\), we have

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f + \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}_f \otimes \varvec{r}_f, \end{aligned}$$
(31)

where \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}_f \in {\mathbb {R}}^{1\times P}\) denotes the \(f\)th row of \(\mathchoice{\displaystyle \mathtt D}{\textstyle \mathtt D}{\scriptstyle \mathtt D}{\scriptscriptstyle \mathtt D} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\). Thus non-rigid shape at each frame \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_f\) can be interpolated as a combination of rank 2 coplanar shape \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f\) and rank 1 3D shape \( \mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}_f\otimes \varvec{r}_f\). The geometric interpolation behind is that: once recovering the rotations, we can transform the image measurement to a coplanar shape in 3D space \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f^T \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f\), the depth \(\mathchoice{\displaystyle \mathtt C}{\textstyle \mathtt C}{\scriptstyle \mathtt C}{\scriptscriptstyle \mathtt C}_f\) of the direction orthogonal to the plane \(\varvec{r}_f\) is to determine, thus satisfying the low rank constraint on the non-rigid shape. Below we give an illustration of the geometric interpolation in Fig. 4.

Fig. 4
figure 4

The pseudo-inverse solution gives the coplanar degenerate shape, while the component in the null space gives the coefficients along the orthogonal direction to the degenerate plane. This problem can be viewed as in a rotated coordinate (with the recovered rotations), we know planar projection of the 3D points on the plane defined by \(\varvec{r}_1,\varvec{r}_2\) and aim to infer the coordinate \(c_{ij}\) along \(\varvec{r}_3\) such that the whole non-rigid shape is low rank in shape space

5.4 Analysis of the Block-Matrix-Method

We give further analysis of our block matrix method based on null space analysis of affine rank minimization problem.

5.4.1 Formulation

In the block matrix method, we aim to find rank minimization solution \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp \) such that \(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), where \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp = [\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X~ \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y ~ \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z] (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3 \otimes \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S})\).

To facilitate analysis, we introduce a slightly different arrangement of the non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }=\left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} X_{11} &{} Y_{11} &{} Z_{11} &{} \cdots &{} X_{1P} &{} Y_{1P} &{} Z_{1P}\\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ X_{F1} &{} Y_{F1} &{} Z_{F1} &{} \cdots &{} X_{FP} &{} Y_{FP} &{} Z_{FP}\\ \end{array} \right] . \end{aligned}$$

Note that this re-arranged shape matrix \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }\) is a column-wise permutation version of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\sharp \).

Denote the image measurement at frame \(f\) as

$$\begin{aligned} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f = \left[ u_{f1} ~ v_{f1} \ldots u_{fP} ~ v_{fP}\right] \in {\mathbb {R}}^{1\times 2P}, \end{aligned}$$
(32)

let \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f \in {\mathbb {R}}^{2\times 3}\) be the rotation at frame \(f\), then we have

$$\begin{aligned} \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f = \varvec{e}_f^T\left( \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star } \left( \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f^T\right) \right) , \end{aligned}$$
(33)

where \(\varvec{e}_f \in {\mathbb {R}}^{F \times 1}\) is all zero except the \(f\)th component being 1. Through using vectorization and Kronecker product, we obtain

$$\begin{aligned} \hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f) = \left( (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f) \otimes \varvec{e}_f^T\right) \hbox {vec} (\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }). \end{aligned}$$
(34)

Stack image measurements at all the image frames, we obtain the following relationship:

$$\begin{aligned} \left[ \begin{array}{l} \hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_1) \\ \vdots \\ \hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_f) \\ \vdots \\ \hbox {vec}(\mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}_F) \\ \end{array} \right] = \left[ \begin{array}{l} \left( (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_1) \otimes \varvec{e}_1^T\right) \\ \vdots \\ \left( (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f) \otimes \varvec{e}_f^T\right) \\ \vdots \\ \left( (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_F) \otimes \varvec{e}_F^T\right) \\ \end{array} \right] \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }). \end{aligned}$$
(35)

The above equation can be compactly expressed as:

$$\begin{aligned} \varvec{w} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star } \varvec{s}, \end{aligned}$$
(36)

where \(\varvec{w} \in {\mathbb {R}}^{2FP \times 1}, \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star } \in {\mathbb {R}}^{2FP\times 3FP}\) and \(\varvec{s} \in {\mathbb {R}}^{3FP\times 1}\).

Our target is to find the rank minimizer with the affine constraint defined as \(\varvec{w} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star } \varvec{s}\), i.e.

$$\begin{aligned}&\min \hbox {rank}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }),\, \hbox {such}\,\, \hbox {that,}\, \nonumber \\&\varvec{w} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star } \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }). \end{aligned}$$
(37)

This is standard affine constrained rank minimization problem (22). Thus we have arrived at an equivalent form of the block matrix method formulation. Nuclear norm heuristics can be utilized thus reaching:

$$\begin{aligned}&\min \Vert \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }\Vert _{*},\, \hbox {such}\,\, \hbox {that,}\, \nonumber \\&\varvec{w} = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star } \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }). \end{aligned}$$
(38)

Uniqueness of the solution depends on the null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star }\).

Without the rank minimization constraint, the solution space of \(\hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star )\) can be expressed as:

$$\begin{aligned} \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star ) = (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star )^\dag \varvec{w} + {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star ) \varvec{c}, \end{aligned}$$
(39)

In our NRSfM problem, \((\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star })^{\dag } = (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star })^T\). Therefore, the solution system is:

$$\begin{aligned} \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star ) = \left[ \begin{array}{l@{\quad }l} (\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^{\star })^{T} &{} {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star ) \\ \end{array} \right] \left[ \begin{array}{l} \varvec{w} \\ \varvec{c} \\ \end{array} \right] . \end{aligned}$$
(40)

Our block matrix method aims to find the rank minimizer/nuclear norm minimizer in the solution space. Unlike in the pseudo inverse method case, there does not exist a closed form solution to the rank minimization/nuclear norm minimization formulation.

5.4.2 Null Space Analysis

Uniqueness of affine constrained rank minimization problem depends on the null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \), which can be computed from the rotation \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\), therefore the null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \) can be easily computed from the vectors \(\varvec{r}_f\) (orthogonal to \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\)):

$$\begin{aligned} {\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star ) = \left[ \begin{array}{l@{\quad }l@{\quad }l} (\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \varvec{r}_1)\otimes \varvec{e}_1&\ldots&(\mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_P \otimes \varvec{r}_F)\otimes \varvec{e}_F \end{array} \right] . \end{aligned}$$
(41)

The dimension of \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star )\) is \(3FP \times FP\). When transforming the basis vector in the null space \({\mathcal {N}}(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star )\) to \(F \times 3P\) as \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i^\star \), we will have \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\) correspondingly. After some basic algebraic derivation, general solution to the homogeneous system \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^{\star }) = \mathchoice{\displaystyle \mathtt 0}{\textstyle \mathtt 0}{\scriptstyle \mathtt 0}{\scriptscriptstyle \mathtt 0}\) can be expressed in the form \(\sum _{i=1}^{FP} c_i \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i^\star \) as:

$$\begin{aligned} \sum _{i=1}^{FP} c_i \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i^\star&= \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} c_{11} \varvec{r}_1^T &{} c_{12} \varvec{r}_1^T &{} \cdots &{} c_{1P} \varvec{r}_1^T \\ c_{21} \varvec{r}_2^T &{} c_{22} \varvec{r}_2^T &{} \cdots &{} c_{2P} \varvec{r}_2^T \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ c_{F1} \varvec{r}_F^T &{} c_{F2} \varvec{r}_F^T &{} \cdots &{} c_{FP} \varvec{r}_F^T \\ \end{array} \right] \nonumber \\&= \left[ \begin{array}{l@{\quad }l@{\quad }l} \varvec{r}_1^T &{} &{} \\ &{} \ddots &{} \\ &{} &{} \varvec{r}_F^T \\ \end{array} \right] \left( \left[ \begin{array}{l@{\quad }l@{\quad }l} c_{11} &{} \cdots &{} c_{1P} \\ \vdots &{} \ddots &{} \vdots \\ c_{F1} &{} \cdots &{} c_{FP} \\ \end{array} \right] \otimes \mathchoice{\displaystyle \mathtt I}{\textstyle \mathtt I}{\scriptstyle \mathtt I}{\scriptscriptstyle \mathtt I}_3 \right) \nonumber \\ \end{aligned}$$
(42)

It is easy to observe that the null space is fully characterized by the camera rotations \(\varvec{r}_f\) and coefficient matrix \(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}= \left[ \begin{array}{l@{\quad }l@{\quad }l} c_{11} &{} \cdots &{} c_{1P} \\ \vdots &{} \ddots &{} \vdots \\ c_{F1} &{} \cdots &{} c_{FP} \\ \end{array} \right] . \) Additionally, we have:

$$\begin{aligned} \text{ rank }\left( \sum _{i=1}^{FP} c_i \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_{i}^{\star }\right) = 3 \text{ rank }(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}). \end{aligned}$$

5.4.3 Degenerate Cases

In the above part, we have given the explicit form of null space of \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \) as in Eq. (42). It is easy to observe that there even exists rank 1 solution in the null space (by letting only one \(c_{ij}\) be non-zero). According to Theorem 4, uniform recovery of deformable shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star \) is impossible, i.e., we cannot expect to recover all the low rank shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star \) such that \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star ) = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_0^\star )\) and \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star ) = \text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_0^\star )\), where \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_0^\star \) is the ground truth non-rigid shape to recover (in rearranged form).

This shows that the block matrix method cannot achieve uniform recovery for non-rigid shape. Note that the block matrix method equivalently encapsulates all the constraints under the linear combination model. Thus this further informs us that for low order linear combination model, generally there are multiple realizations for the image measurements. However as we will analyze later, these results do not affect our success in real world non-rigid shape recovery.

Here we give analysis to two degenerate cases: Degenerate case (1): Given recovered camera rotations \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\), we can recover the corresponding \(\varvec{r}_f\) through SVD. When \(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}\) owns low rank property, then \(\sum _{i=1}^{FP} c_i \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_i^\star \) will have low rank property too. We can generate arbitrary low rank \(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}\) (through projecting random \(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}\) to low rank subspace). As have illustrated in the proof of Theorem 4, from these low rank shape in the null space, we can easily construct rank \(K\) shapes \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star \) and \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star \) achieving identical measurements \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star ) = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}^\star \hbox {vec}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star ) = \varvec{w}\). Note that in this construction procedure, although \(\mathchoice{\displaystyle \mathtt c}{\textstyle \mathtt c}{\scriptstyle \mathtt c}{\scriptscriptstyle \mathtt c}\) can be arbitrary, the constructed degenerate non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star \) depends on recovered camera motions \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\).

Degenerate case (2): Given a non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), we can even construct camera motions \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\) such that there exists multiple low rank non-rigid shapes \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star , \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star \) achieving identical image measurements too.

Theorem 6

Given a general low rank non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) and its rearranged form \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star \) that \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}^\star ) = K\), we can always construct a measuring system \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) such that there are multiple low-rank solutions \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1\) and \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2\) thus \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1 = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2\) and \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star ) = \text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star ) = K\).

Proof

Without loss of generality, we assume that the first \(K\) rows of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star \) are independent, thus all the left \(F-K\) rows of \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star \) can be expressed as a linear combination of \((\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star )_i, i = 1,\ldots , K\) as \((\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star )_f = \sum _{k=1}^K \alpha _{fk}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star )_k\). Then if all the rotation vectors \(\mathchoice{\displaystyle \mathtt r}{\textstyle \mathtt r}{\scriptstyle \mathtt r}{\scriptscriptstyle \mathtt r}_i\) have similar relationship:

$$\begin{aligned} c_{fp} \varvec{r}_f = \sum _{k=1}^K c_{kp} \alpha _{fk} \varvec{r}_k, \end{aligned}$$

we can construct another shape as:

$$\begin{aligned} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star = \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star + \left[ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} c_{11} \varvec{r}_1^T &{} c_{12} \varvec{r}_1^T &{} \cdots &{} c_{1P} \varvec{r}_1^T \\ c_{21} \varvec{r}_2^T &{} c_{22} \varvec{r}_2^T &{} \cdots &{} c_{2P} \varvec{r}_2^T \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ c_{F1} \varvec{r}_F^T &{} c_{F2} \varvec{r}_F^T &{} \cdots &{} c_{FP} \varvec{r}_F^T \\ \end{array} \right] . \end{aligned}$$

Thus for \(f=K+1,\ldots ,F\), we have

$$\begin{aligned} (\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star )_{fp}&= \sum _{k=1}^K \alpha _{fk}(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star )_{kp} + \sum _{k=1}^K c_{kp} \alpha _{fk} \varvec{r}_k^T \\&= \sum _{k=1}^K \alpha _{fk} \left( (\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star )_{kp} + c_{kp}\varvec{r}_k^T\right) = \sum _{k=1}^K \alpha _{fk} (\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star )_{kp}. \end{aligned}$$

Therefore we can conclude that \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star \) is also a rank \(K\) non-rigid shape. Additionally, we have \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1 = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2\).

Thus given a non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), we can always find rotation \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) such that there are multiple solutions \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1 = \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R} \mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2 = \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\) and \(\text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_1^\star ) = \text{ rank }(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}_2^\star ) = K\). Notice that this kind of degeneration depends on the strong correlation between the camera motion and the shape deformation.

5.4.4 Relationship with Existing Theoretical Results

In the above section, we have shown two degenerate cases, where multiple solutions in non-rigid shape recovery exist. Note that due to the equivalency in the block matrix method and linear combination model, our analysis of multiple solutions applies to linear combination model naturally. Generally, these two degenerate cases depend on shape-motion coherence, implying given non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\), we can construct camera motions \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\) and given camera motions \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}\), we can construct non-rigid shape \(\mathchoice{\displaystyle \mathtt S}{\textstyle \mathtt S}{\scriptstyle \mathtt S}{\scriptscriptstyle \mathtt S}\) such that multiple solutions exist.

Hartley and Vidal (2008) analyzed the uniqueness of non-rigid shape recovery under the perspective camera model. Their conclusion states as “even though there are ambiguities in the reconstruction of shape bases and shape coefficients, the reconstructed shape is actually unique”. At a first glance, this conflicts with our analysis on degenerate cases. Nevertheless, in Hartley and Vidal (2008) assumptions on the shape and motion recovery have been made such as “The meaning of the assumption of genericity will be made clear in the proof. Broadly speaking, it means that the motion of the camera is sufficiently general and independent of the shape deformation, and that the shape space is indeed K-dimensional, spanned by the K shape bases.” Obviously, their assumptions have excluded degenerate cases of shape-motion coherence.

In future work, to fully characterize the uniqueness in motion and shape recovery, we will extend the concept of reconstructibility (Park et al. 2010; Valmadre and Lucey 2012) to NRSfM problem. Reconstructibility was proposed in analyzing trajectory reconstruction problem, where the camera motions are available. For NRSfM, this is an even harder problem due to the unknown camera motions. The hardness can be verified from another viewpoint. As we know, even for rigid structure from motion, there are degenerate cases where multiple realizations are possible (Hartley and Kahl 2007). However, we would like to emphasize again that in practical applications, camera motions and shape deformation have little possibility to be coherent, thus we can recover camera motions and non-rigid shape correctly and uniquely as verified in our experiments.

6 Experiments

6.1 Setup

We compare our methods against the state-of-the-art methods, which include (1) Xiao et al.’s shape basis method (XCK) (Xiao et al. 2004; 2) Torresani et al.’s EM-PPCA (Torresani et al. 2008; 3) Metric projection (Paladini et al. 2009; 4) Point trajectory approach (PTA or trajectory basis) (Akhter et al. 2008; 5) Column space fitting (CSF) (Gotardo and Martinez 2011a; 6) Column space fitting with complementary rank 3 spaces (CSF2) (Gotardo and Martinez 2011c) and (7) Kernel shape trajectory approach (KSTA) (Gotardo and Martinez 2011b).Footnote 6

To facilitate the comparison, we use the same error metrics as reported in Akhter et al. (2008) and Gotardo and Martinez (2011a), that is: \(e_{R}\) measures the mean error in rotation estimation and

$$\begin{aligned} e_{R}= \frac{1}{F}\sum _{f=1}^F\Vert \mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f -\tilde{\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}}_f\Vert _{\mathchoice{\displaystyle \mathtt F}{\textstyle \mathtt F}{\scriptstyle \mathtt F}{\scriptscriptstyle \mathtt F}}, \end{aligned}$$

where \(\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}_f\) is the ground truth rotation at frame \(f\) and \(\tilde{\mathchoice{\displaystyle \mathtt R}{\textstyle \mathtt R}{\scriptstyle \mathtt R}{\scriptscriptstyle \mathtt R}}_f\) is the recovered rotation; \(e_{3D}\) measures the normalized mean 3D error in the reconstructed 3D points and

$$\begin{aligned} e_{3D} = \frac{1}{\sigma FP} \sum _{f=1}^F \sum _{p=1}^P e_{fp}, \quad \sigma = \frac{1}{3F} \sum _{f=1}^F (\sigma _{fx} + \sigma _{fy} + \sigma _{fz}), \end{aligned}$$

where \(\sigma _{fx}, \sigma _{fy}\) and \(\sigma _{fz}\) are the standard deviations in \(X,Y\) and \(Z\) coordinates of the original shape at frame \(f\).

Extensive experiments were conduced to test the performance of the proposed methods, on both randomly synthetic data and real motion capture data. The random synthetic data, which satisfy the low-rank non-rigid model perfectly, are used only for the purpose of algorithm validation, for which our methods have obtained nearly perfect result (with zero error) as we expect; the results data are therefore omitted. Instead, only test results on real sequences are reported below. The real sequences we have tested include the standard sequences of Drink (\(1102/41\)), Pick-up (\(357/41\)), Yoga (\(307/41\)), Stretch (\(370/41\)), and Dance (\(264/75\)) used in Akhter et al. (2008), and Face (\(316/40\)), Shark (\(240/91\)) and Walking (\(260/55\)) in Torresani et al. (2008), where \((F/P)\) denotes the number of frames (\(F\)) and number of points (\(P\)).

6.2 Cumulative Histograms of Error

Our first experiment is aimed to give a statistical comparison between the performance of our method and several existing methods. For this purpose we used a real motion capture sequence, here e.g. the Stretch sequence. From the ground-truth 3D point clouds of the sequence as well the camera motion matrices, we re-synthesized \(F\) frames of image measurements with Gaussian random noise added in, where noise ratio is defined as \({{\Vert \hbox {Noise}\Vert }_\mathrm{F}}/{{\Vert \mathchoice{\displaystyle \mathtt W}{\textstyle \mathtt W}{\scriptstyle \mathtt W}{\scriptscriptstyle \mathtt W}\Vert }_\mathrm{F}}\). Use the obtained data, we tested our methods, as well as several other existing methods. We repeated the random test 100 times. Then, we plotted the cumulative histograms of the rotation estimation errors, and the 3D reconstruction errors, as shown in Fig. 5. This figure clearly reveals that: our block-matrix method outperforms most of the other methods.

Fig. 5
figure 5

Cumulative histograms of errors tested on the “Stretch” sequence. The most left curve gives the best performance. Left the rotation error; right the 3D reconstruction error. Our block matrix method outperforms most of the other existing methods (Color figure online)

6.3 Noise Performance

To analyze the behavior of our new methods under noise, we repeated the (above) first experiment at different noise levels. Example results on the “Stretch” sequence are given in Fig. 6 which plots the rotation estimation error and normalized mean 3D error as a function of the noise ratio.

Fig. 6
figure 6

Noise performance. (a) Rotation estimation error and (b) normalized mean 3D error

It is seen, our block matrix method achieves the best performance in terms of the accuracy for rotation estimation and for non-rigid shape recovery, compared favorably with almost all the other state-of-the-art competitors.

6.4 Compare All Methods on All Real Sequences

In this subsection, we provide experimental results of all the nine methods we are benchmarking, on all the real sequences at hand. Tables 1 and 2 summarize our main results, where both the camera rotation error (whenever the ground truth are available) and the shape reconstruction error (normalized mean 3D error) are provided. Figure 7 shows the comparison.

Fig. 7
figure 7

Motion capture data experimental results. Left rotation estimation error; right 3D reconstruction error

Table 1 Quantitative comparison in rotation estimation of our method with the state-of-the-art methods on motion capture data
Table 2 Quantitative comparison in non-rigid shape recovery of our proposed methods with the state-of-the-art methods on motion capture data (normalized mean 3D error)

Clearly, our block matrix method achieves the best performance in shape recovery on the sequences Pickup, Yoga, Dance and Face, comparable performance other benchmark sequences Drink, Stretch and Walking. The Shark sequence is an exception, but that probably due to the fact that this Shark sequence is in fact degenerate (Torresani et al. 2008).

6.5 Which Component Plays More Important Role?

In this paper, we propose a ‘prior-free’ method to recover non-rigid shape, which consists of camera motion recovery by trace norm minimization and non-rigid shape recovery by block matrix method. It is interesting to analyze how much of the performance improvement is due to our trace norm minimization based method for estimating camera motions, and how much of the performance improvement is due to our block matrix method in non-rigid shape recovery. In this subsection, we adapted a cross-validation method to analyze it. We ran three groups of experiments, where the first one used the ground truth rotation, the second one used the rotation recovered by the point trajectory approach (PTA) (Akhter et al. 2008), and the third one used the rotation estimated by our trace norm minimization method.Footnote 7

Experimental results are presented in Tables 3, 4, and 5 individually. In all the experiments, we used \(K\) as in Table 2. Note that CSF2 shares a very similar way with PTA in estimating camera rotation and the rotation recovered by CSF2 do not depend on the choice of \(K\). In the following tables, \(\hbox {PTA}_0, \hbox {CSF2}_0\) and \(\hbox {BMM}_0\) denote the non-rigid shape recovery results as in Table  2.

Table 3 Performance comparison in shape recovery given ground truth rotations
Table 4 Performance comparison in shape recovery given rotations estimated by PTA
Table 5 Performance comparison in shape recovery given rotation by our trace-norm minimization method

Table 3 shows non-rigid shape recovery performance with ground truth rotations. Compared with Table 2, it is clear that given ground truth rotations, PTA, CSF2 and BMM all achieve better performance in non-rigid shape recovery. Therefore, there still is space in camera motion estimation to achieve better reconstruction performance.

Table 4 shows non-rigid shape recovery performance with rotations estimated by PTA. We notice that (1) With identical rotations estimated by PTA, BMM achieves better performance compared with PTA except on Shark; (2) BMM with rotation estimated by PTA are still worse than original BMM with rotation estimation by trace norm minimization.

Table 5 shows non-rigid shape recovery performance with rotations estimated by our trace norm minimization method. Note that (1) With rotation by TNM, PTA achieves better shape recovery for sequences Pick-up, Yoga, Stretch, Dance and Walking while achieves comparable performance on Drink and Face. (2) With rotations by TNM, CSF2 achieves better performance on sequences Pickup, Yoga, Dance which is still inferior to BMM, while it achieves comparable performance on Drink and Face.

Therefore, we conclude that in our method, both rotation estimation by trace norm minimization and shape recovery by BMM play important role in achieving good 3D reconstruction.

6.6 Test on Temporally Reshuffled Data

A highlight of this work is that our method does not assume any prior knowledge about the problem. For instance, we do not assume the trajectories are smooth across frames, while many other method do assume this either explicitly or implicitly. So, we expect (predict) that our method is immune to random frame-order reshuffle (permutation).

To verify this point, we redo the experiments but on frame-reshuffled data, and obtain the following results as demonstrated in Fig. 8. It is seen that our methods is independent of frame ordering. On the original temporally smooth sequences, both the trajectory basis method and column space fitting methods achieve performance comparable but slightly inferior to our block matrix method, when the sequences are reshuffled, these methods performances very badly while our method remains unaffected.

Fig. 8
figure 8

Performance on frame order reshuffled sequences. (a) Rotation error and (b) normalized mean 3D error (Color figure online)

To analyze the behavior of our new methods under noise on reshuffled sequence, we repeated experiment at different noise levels. Experimental results on the reshuffled “Stretch” sequence are given in Fig. 9 which plots the estimation errors as a function of the noise ratio.

Fig. 9
figure 9

Performance on frame order reshuffled sequences. a Rotation error and b normalized mean 3D error (Color figure online)

Figure 10 gives a close inspection. Figure 10a, b compare the trajectories recovered by using the original sequence, and using a frame-reshuffled sequence, while Fig. 10c, d shows the normalized mean 3D error for both sequences. From this figure, the trajectory basis method fails to output acceptable results on the frame-permutated sequence, but our block matrix method achieves identical results on both cases. This is not surprising as permutating a matrix will not change its rank, and our methods does not assume any frame order or temporal smoothness.

Fig. 10
figure 10

Comparison of our block matrix method versus the trajectory basis method, on an original input video sequence, as well as on a frame-reshuffled version. (a) Recovered trajectory of one point in the X coordinate by both methods on the original sequence. (b) Recovered trajectory of one point in the X coordinate by both methods on the frame-reshuffled sequence. (c) Normalized mean 3D error for both methods on original sequence. (d) Normalized mean 3D error for both methods on frame reshuffled sequence

6.7 Sample Reconstruction Results

For visual evaluation, we give result comparison between our block matrix method and the trajectory basis method on a more complex sequence, Dance, see Fig. 11.

Fig. 11
figure 11

Comparison of the 3D reconstruction results on the Dance sequence. The blue dots denote the ground truth 3D points, and the red circles show the reconstructed points. Top row results by the trajectory basis method (Akhter et al. 2008), where the 3D errors are 0.3011, 0.2827, 0.2814 for the three frames. Bottom row our result by the block matrix method, where the 3D errors are 0.2228, 0.0355, 0.1389 for the three frames (Color figure online)

We also tested the Talking Face video,Footnote 8 using 500 frames and 68 features tracks. Figure 12 shows three frames of the original images and the resulted 3D points, where the reprojection error is \(0.4839\) pixels.

Fig. 12
figure 12

The talking face sequence used in our experiment (top row). The bottom row shows the 3D reconstruction result of our method

7 Closing Remarks

This paper advocates a prior-free approach to factorization based non-rigid structure from motion problem. Our method is purely convex, very easy to implement, and is guaranteed to converge to an optimal solution (at least approximately up to certain relaxation). It shows that, contrary to common belief, the NRSfM factorization problem can be solved unambiguously, efficiently and accurately, without using extra priors. This said, however, from a practical point of view, we do not against the use of available prior, as long as the prior is sensible and reflects the physical nature of the problem at hand. It is expected that, using good prior will further improve the quality of our solution, and make our method more applicable in complex real scenarios.

In the present paper, we have concentrated on the complete measurement case under orthographic camera model. Thanks to the recent progress in structure from motion and Compressive Sensing, the proposed method can be adapted to handling missing-data case (e.g. Buchanan and Fitzgibbon 2005; Dai et al. 2013; Tao and Yuan 2011), outlier case (e.g. Candès et al. 2011; Eriksson and van den Hengel 2010; Angst et al. 2011; Li 2009), multibody motion case (Li 2007), multi static cameras case (Angst and Pollefeys 2012; Zaheer et al. 2011) as well as perspective camera case (e.g., Dai et al. 2013; Xiao and Kanade 2005; Lladó et al. 2010). Other extensions include automatic estimation of the model complexity \(K\) and the discussion of reconstructibility (Park et al. 2010) for non-rigid structure from motion.