Abstract
This paper proposes a simple “prior-free” method for solving the non-rigid structure-from-motion (NRSfM) factorization problem. Other than using the fundamental low-order linear combination model assumption, our method does not assume any extra prior knowledge either about the non-rigid structure or about the camera motions. Yet, it works effectively and reliably, producing optimal results, and not suffering from the inherent basis ambiguity issue which plagued most conventional NRSfM factorization methods. Our method is very simple to implement, which involves solving a very small SDP (semi-definite programming) of fixed size, and a nuclear-norm minimization problem. We also present theoretical analysis on the uniqueness and the relaxation gap of our solutions. Extensive experiments on both synthetic and real motion capture data (assuming following the low-order linear combination model) are conducted, which demonstrate that our method indeed outperforms most of the existing non-rigid factorization methods. This work offers not only new theoretical insight, but also a practical, everyday solution to NRSfM.
Similar content being viewed by others
Notes
We explicitly express the solution as a linear combination of \(2K^2-K\) basis vectors of the null space, then use sum-to-one constraint to fix the scale freedom, i.e., \(\sum _{l=1}^{2K^2 -K} \alpha _l = 1\), where \(\alpha _i\) are coefficients and \(\phi _l\) are the bases of the null space of \(\mathchoice{\displaystyle \mathtt A}{\textstyle \mathtt A}{\scriptstyle \mathtt A}{\scriptscriptstyle \mathtt A}\). See Theorem 2 for detailed explanation.
An identical rearrangement was used in Akhter et al. (2008) but with different motivation and purpose.
\(\mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_X(i,3i-2)=1, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Y(i,3i-1)=1, \mathchoice{\displaystyle \mathtt P}{\textstyle \mathtt P}{\scriptstyle \mathtt P}{\scriptscriptstyle \mathtt P}_Z(i,3i)=1\), while all the other positions being zero.
Matrix Shrinkage Operator: Assume \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} \in {\mathbb {R}}^{m\times n}\) and the SVD of \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}\) is given by \(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X} = \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U} \hbox {Diag}(\sigma )\mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}^{\top }, \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U} \in {\mathbb {R}}^{m\times r}, \sigma \in {\mathbb {R}}_{+}^r, \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V} \in {\mathbb {R}}^{n\times r}\). For any \(v > 0\), the matrix shrinkage operator \(S_v(\cdot )\) is defined as \( S_v(\mathchoice{\displaystyle \mathtt X}{\textstyle \mathtt X}{\scriptstyle \mathtt X}{\scriptscriptstyle \mathtt X}):= \mathchoice{\displaystyle \mathtt U}{\textstyle \mathtt U}{\scriptstyle \mathtt U}{\scriptscriptstyle \mathtt U} \hbox {Diag}(s_v(\sigma )) \mathchoice{\displaystyle \mathtt V}{\textstyle \mathtt V}{\scriptstyle \mathtt V}{\scriptscriptstyle \mathtt V}^{\top }\), where \(s_v(\sigma )\) is defined as:
$$\begin{aligned} s_v(\sigma ):=\overline{\sigma }, \hbox {with}~~ \overline{\sigma }_i = \left\{ \begin{array}{l} \sigma _i-v, ~\hbox {if}~ \sigma _i - v > 0,\\ 0, ~\hbox {otherwise}. \\ \end{array} \right. \end{aligned}$$Note that KSTA utilizes a non-linear mapping, which means it is out of the scope of low-order linear combination model. We provide the result mainly for the purpose of benchmarking.
References
Aanæs, H., & Kahl, F. (2002). Estimation of deformable structure and motion. In ECCV workshop on vision and modelling of dynamic scenes (pp. 1–4).
Akhter, I., Sheikh, Y., & Khan, S. (2009). In defense of orthonormality constraints for nonrigid structure from motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1534–1541).
Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In Advances in neural information processing systems (pp. 41–48).
Akhter, I., Simon, T., Khan, S., Matthews, I., & Sheikh, Y. (2012). Bilinear spatiotemporal basis models. ACM Transactions on Graphics, 31(2), 17:1–17:12. doi:10.1145/2159516.2159523.
Angst, R., & Pollefeys, M. (2012). A unified view on deformable shape factorizations. In Proceedings of the European conference on computer vision (pp. 682–695). doi:10.1007/978-3-642-33783-3_49.
Angst, R., Zach, C., & Pollefeys, M. (2011). The generalized trace-norm and its application to structure-from-motion problems. In Proceedings of the IEEE international conference on computer vision (pp. 2502–2509).
Bartoli, A., Gay-Bellile, V., Castellani, U., Peyras, J., Olsen, S., & Sayd, P. (2008). Coarse-to-fine low-rank structure-from-motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Brand, M. (2005). A direct method for 3D factorization of nonrigid motion observed in 2D. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 122–128).
Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 690–696).
Brunet, F., Hartley, R., Bartoli, A., Navab, N., & Malgouyres, R. (2011). Monocular template-based reconstruction of smooth and inextensible surfaces. In Proceedings of the Asian conference on computer vision (pp. 52–66). doi:10.1007/978-3-642-19318-7_5.
Buchanan, A. M., & Fitzgibbon, A. W. (2005). Damped Newton algorithms for matrix factorization with missing data. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 316–322). doi:10.1109/CVPR.2005.118.
Cai, J., & Osher, S. (2010). Fast singular value thresholding without singular value decomposition. Technical report, University of California, Los Angeles.
Candès, E., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11:1–11:37.
Candès, E. J., & Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6), 925–936. doi:10.1109/JPROC.2009.2035722.
Dai, Y., Li, H., & He, M. (2010). Element-wise factorization for n-view projective reconstruction. In Proceedings of the European conference on computer vision (pp. 396–409).
Dai, Y., Li, H., & He, M. (2012). A simple prior-free method for non-rigid structure-from-motion factorization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2018–2025). doi:10.1109/CVPR.2012.6247905.
Dai, Y., Li, H., & He, M. (2013). Projective multiview structure and motion from element-wise factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2238–2251. doi:10.1109/TPAMI.2013.20.
Del Bue, A. (2008). A factorization approach to structure from motion with shape priors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Eldar, Y. C., Needell, D., & Plan, Y. (2012). Uniqueness conditions for low-rank matrix recovery. Applied and Computational Harmonic Analysis, 33(2), 309–314. doi:10.1016/j.acha.2012.04.002.
Eriksson, A., & van den Hengel, A. (2010). Efficient computation of robust low-rank matrix approximations in the presence of missing data using the \({L}_1\) norm. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 771–778).
Fayad, J., Agapito, L., & Del Bue, A. (2010). Piecewise quadratic reconstruction of non-rigid surfaces from monocular sequences. In Proceedings of the European conference on computer vision (pp. 297–310).
Goldfarb, D., & Ma, S. (2011). Convergence of fixed-point continuation algorithms for matrix rank minimization. Foundations of Computational Mathematics, 11(2), 183–210. doi:10.1007/s10208-011-9084-6.
Gotardo, P., & Martinez, A. (2011a). Computing smooth time-trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(10), 2051–2065. doi:10.1109/TPAMI.2011.50.
Gotardo, P., & Martinez, A. (2011b). Kernel non-rigid structure from motion. In Proceedings of the IEEE international conference on computer vision (pp. 802–809).
Gotardo, P., & Martinez, A. (2011c). Non-rigid structure from motion with complementary rank-3 spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3065–3072).
Hartley, R., & Kahl, F. (2007). Critical configurations for projective reconstruction from multiple views. International Journal of Computer Vision, 71(1), 5–47. doi:10.1007/s11263-005-4796-1.
Hartley, R., & Vidal, R. (2008). Perspective nonrigid shape and motion recovery. In Proceedings of the European conference on computer vision (pp. 276–289).
Hartley, R., & Zisserman, A. (2004) Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press. ISBN: 0521540518.
Li, H. (2007). Two-view motion segmentation from linear programming relaxation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8). doi:10.1109/CVPR.2007.382975.
Li, H. (2009). Consensus set maximization with guaranteed global optimality for robust geometry estimation. In Proceedings of the IEEE international conference on computer vision (pp. 1074–1080). doi:10.1109/ICCV.2009.5459398.
Li, H. (2010). Multi-view structure computation without explicitly estimating motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2777–2784). doi:10.1109/CVPR.2010.5540005.
Lin, Z., Chen, M., & Ma, Y. (2010). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. ArXiv e-prints. http://arxiv.org/abs/1009.5055.
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., & Ma, Y. (2013). Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171–184. doi:10.1109/TPAMI.2012.88.
Lladó, X., Bue, A. D., & Agapito, L. (2010). Non-rigid metric reconstruction from perspective cameras. IVC, 28(9), 1339–1353.
Ma, S., Goldfarb, D., & Chen, L. (2011). Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming, Series A, 128(1–2), 321–353.
Olsen, S. I., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244. doi:10.1007/s10851-007-0060-3.
Oymak, S., & Hassibi, B. (2010). New null space results and recovery thresholds for matrix rank minimization. ArXiv e-prints. http://arxiv.org/abs/1011.6326.
Oymak, S., Mohan, K., Fazel, M., Hassibi, B. (2011). A simplified approach to recovery conditions for low rank matrices. ArXiv e-prints. http://arxiv.org/abs/1103.1178.
Paladini, M., Bue, A. D., Stosic, M., Dodig, M., Xavier, J., & Agapito, L. (2009). Factorization for non-rigid and articulated structure using metric projections. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2898–2905). doi:10.1109/CVPRW.2009.5206602.
Park, H. S., Shiratori, T., Matthews, I., & Sheikh, Y. (2010). 3D reconstruction of a moving point from a series of 2D projections. In Proceedings of the European conference on computer vision (pp. 158–171).
Perriollat, M., Hartley, R., & Bartoli, A. (2011). Monocular template-based reconstruction of inextensible surfaces. International Journal of Computer Vision, 95(2), 124–137. doi:10.1007/s11263-010-0352-8.
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., et al. (2004). Visual modeling with a hand-held camera. International Journal of Computer Vision, 59(3), 207–232. doi:10.1023/B:VISI.0000025798.50602.3a.
Rabaud, V., & Belongie, S. (2009). Linear embeddings in non-rigid structure from motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2427–2434). Los Alamitos, CA: IEEE Computer Society. doi:10.1109/CVPRW.2009.5206628.
Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501. doi:10.1137/070697835.
Salzmann, M., Moreno-Noguer, F., Lepetit, V., & Fua, P. (2008). Closed-form solution to non-rigid 3D surface registration. In Proceedings of the European conference on computer vision (Vol. 5305, pp. 581–594). doi:10.1007/978-3-540-88693-8_43.
Snavely, N., Seitz, S., & Szeliski, R. (2008). Modeling the world from internet photo collections. International Journal of Computer Vision, 80(2), 189–210. doi:10.1007/s11263-007-0107-3.
Sturm, J. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11, 625–653.
Tao, M., & Yuan, X. (2011). Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization, 21(1), 57–81.
Taylor, J., Jepson, A., & Kutulakos, K. (2010). Non-rigid structure from locally-rigid motion. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2761–2768). doi:10.1109/CVPR.2010.5540002.
Toh, K., Todd, M., & Tutuncu, R. (1999). SDPT3—A Matlab software package for semidefinite programming. Optimization Methods and Software, 11, 545–581.
Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154. doi:10.1007/BF00129684.
Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 878–892.
Triggs, B., McLauchlan, P. F., Hartley, R. I., & Fitzgibbon, A. W. (2000). Bundle adjustment—A modern synthesis. In Proceedings of the international workshop on vision algorithms (pp. 298–372).
Valmadre, J., & Lucey, S. (2012). General trajectory prior for non-rigid reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1394–1401). doi:10.1109/CVPR.2012.6247826.
Xiao, J., Chai, J., & Kanade, T. (2004). A closed-form solution to non-rigid shape and motion recovery. In Proceedings of the European conference on computer vision (Vol. 3024, pp. 573–587). doi:10.1007/978-3-540-24673-2-46.
Xiao, J., & Kanade, T. (2005). Uncalibrated perspective reconstruction of deformable structures. In Proceedings of the IEEE international conference on computer vision (pp. 1075–1082). doi:10.1109/ICCV.2005.241.
Yang, J., & Yuan, X. (2013). Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Mathematics of Computation, 82(281), 301–329.
Zaheer, A., Akhter, I., Baig, M., Marzban, S., & Khan, S. (2011). Multiview structure from motion in trajectory space. In Proceedings of the IEEE international conference on computer vision (pp 2447–2453). doi:10.1109/ICCV.2011.6126529.
Acknowledgments
We wish to thank anonymous reviewers of CVPR 2012 and IJCV for their constructive comments, which greatly improve the paper. This work is funded, in part by National Natural Science Foundation of China (60736007, 61171154), and by Australia ARC-Discovery Grant and ARC-Linkage Grant. The authors wish to thank Professor Richard Hartley for guidance, supports, and invaluable discussions on this work. The first author was a China Scholarship Council (CSC)-funded PhD student to ANU, under the supervision of Professor Richard Hartley and Dr. Hongdong Li. The second author also had fruitful discussions with Professor Fredrik Kahl and Dr. Yaser Sheikh specifically on this NRSfM topic. Dr Jing Xiao kindly provided testing images to the authors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dai, Y., Li, H. & He, M. A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization. Int J Comput Vis 107, 101–122 (2014). https://doi.org/10.1007/s11263-013-0684-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0684-2