Abstract
Despite its most popularity among all depth cameras in the computer vision applications, the Microsoft Kinect sensor suffers from low depth accuracy. In this work we propose a novel non-parametric depth modification model to improve the depth accuracy of the Kinect sensor by iteratively registering depth images and color images. In particular, we first establish a coarse correspondence based on the feature descriptor of the canny edge at each iteration, and estimate the fine correspondence using an \(L_2E\) algorithm. We utilize the non-parametric Gaussian mixture model to replace the Gaussian single model and build the regularization term to constrain the correlations between functions. Then, based on the correspondence results, the depth data are corrected and optimized. Extensive experiments have been performed to verify the effectiveness of the proposed approach, and the results have demonstrated that our method is able to greatly enhance the depth accuracy of the Kinect sensor compared with baseline methods.
Similar content being viewed by others
Notes
We implemented RANSAC based on the publicly available code at http://www.robots.ox.ac.uk/~vgg/hzbook/code/.
We implemented CPD based on the publicly available code at http://www.bme.ogi.edu/~myron/matlab/cpd/.
The code of SC method are available at: https://vision.cornell.edu/se3/publications/.
We implemented \(L_2E\)(GSM) based on the publicly available code at http://www.escience.cn/people/jiayima/cxdm.html (Ma et al. 2013b).
The datasets are available at: http://rgbd-dataset.cs.washington.edu/.
The code of SC method are available at: https://vision.cornell.edu/se3/publications/.
References
Aydin, V. A., & Foroosh, H. (2017). A linear well-posed solution to recover high-frequency information for super resolution image reconstruction. Multidimensional Systems and Signal Processing, 2, 1–22.
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(24), 509–522.
Bhandari, A. K., Kumar, A., Singh, G. K., & Soni, V. (2016). Dark satellite image enhancement using knee transfer function and gamma correction based on DWT–SVD. Multidimensional Systems and Signal Processing, 27(2), 453–476.
Carmeli, C., Vito, E. D., & Toigo, A. (2006). Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem. Analysis and Applications, 10(4), 377–408.
Chen, G., & Coulombe, S. (2014). A new image registration method robust to noise. Multidimensional Systems and Signal Processing, 25(3), 601–609.
Dong, H., Figueroa, N., & Saddik, A. E. (2014). Towards consistent reconstructions of indoor spaces based on 6d rgb-d odometry and kinectfusion. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1796–1803)
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Gao, Y., & Yuille, A. L. (2016). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. arXiv:1607.07129.
Gao, Y., Ma, J., & Yuille, A. L. (2017). Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Transactions on Image Processing, 26(5), 2545–2560.
Han, J., & Bhanu, B. (2007). Fusion of color and infrared video for moving human detection. Pattern Recognition, 40(6), 1771–1784.
Han, J., Farin, D., & With, P. H. N. D. (2011). A mixed-reality system for broadcasting sports video to mobile devices. IEEE Multimedia, 18(2), 72–84.
Han, J., Pauwels, E., & Zeeuw, P. D. (2012). Visible and Infrared Image Registration Employing Line-Based Geometric Analysis. Berlin: Springer.
Han, J., Pauwels, E. J., & De Zeeuw, P. (2013a). Visible and infrared image registration in man-made environments employing hybrid visual features. Pattern Recognition Letters, 34(1), 42–51.
Han, J., Shao, L., Xu, D., & Shotton, J. (2013b). Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics, 43(5), 1318–1334.
Herrera, C. D., Kannala, C. J., & Heikkilä, J. (2011). Accurate and practical calibration of a depth and color camera pair. In Proceedings of the international conference computer analysis of images and patterns (pp. 437–445)
Herrera, C. D., Kannala, C. J., & Heikkilä, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 2058–2064.
Huang, X., & Zhe, C. (2002). A wavelet-based multisensor image registration algorithm. In International conference on signal processing (vol.1, pp. 773–776)
Jiang, J., Chen, C., Ma, J., Wang, Z., Wang, Z., & Hu, R. (2017a). Srlsp: A face image super-resolution algorithm using smooth regression with local structure prior. IEEE Transactions on Multimedia, 19(1), 27–40.
Jiang, J., Ma, J., Chen, C., Jiang, X., & Wang, Z. (2017b). Noise robust face image super-resolution through smooth sparse representation. IEEE Transactions on Cybernetics, 47(11), 3991–4002.
Jiang, J., Ma, X., Chen, C., Lu, T., Wang, Z., & Ma, J. (2017c). Single image super-resolution via locally regularized anchored neighborhood regression and nonlocal means. IEEE Transactions on Multimedia, 19(1), 15–26.
Liu, S., Shi, M., Zhu, Z., & Zhao, J. (2017). Image fusion based on complex-shearlet domain with guided filtering. Multidimensional Systems and Signal Processing, 28(1), 207–224.
Lu, T., Guan, Y., Zhang, Y., Qu, S., & Xiong, Z. (2017a). Robust and efficient face recognition via low-rank supported extreme learning machine. Multimedia Tools and Applications, 12, 1–22.
Lu, T., Xiong, Z., Zhang, Y., Wang, B., & Lu, T. (2017b). Robust face super-resolution via locality-constrained low-rank representation. IEEE Access, 5(99), 13,103–13,117.
Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013a). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.
Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013b). Robust estimation of nonrigid transformation for point set registration. In Proceedings of the IEEE conference computer vision and pattern recognition (pp. 2147–2154)
Ma, J., Zhao, J., Tian, J., Yuille, A. L., & Tu, Z. (2014). Robust point matching via vector field consensus. IEEE Transactions on Image Processing, 23(4), 1706–1721.
Ma, J., Qiu, W., Zhao, J., Ma, Y., Yuille, A. L., & Tu, Z. (2015a). Robust \(L_2E\) estimation of transformation for non-rigid registration. IEEE Transactions on Signal Processing, 63(5), 1115–1129.
Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015b). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.
Ma, J., Zhou, H., Zhao, J., Gao, Y., Jiang, J., & Tian, J. (2015c). Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Transactions on Geoscience and Remote Sensing, 53(12), 6469–6481.
Ma, J., Jiang, J., Liu, C., & Li, Y. (2017a). Feature guided gaussian mixture model with semi-supervised em and local geometric constraint for retinal image registration. Information Sciences, 417, 128–142.
Ma, J., Zhao, J., Guo, H., Jiang, J., Zhou, H., & Gao, Y. (2017b). Locality preserving matching. In Proceedings of the international joint conference on artificial intelligence (pp. 4492–4498).
Ma, J., Jiang, J., Zhou, H., Zhao, J., & Guo, X. (2018). Guided locality preserving feature matching for remote sensing image registration. In IEEE transactions on geoscience and remote sensing.
Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.
Micchelli, C. A., & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17(1), 177–204.
Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.
Peng, L., Zhang, Y., Zhou, Huabing, & Lu, T. (2018). A robust method for estimating image geometry with local structure constraint. IEEE Access, 6, 20734–20747.
Poggio, T., Torre, V., & Koch, C. (1985). Computational vision and regularization theory. Nature, 317(6035), 638–643.
Qian, Q., & Gunturk, B. K. (2016). Extending depth of field and dynamic range from differently focused and exposed images. Multidimensional Systems and Signal Processing, 27(2), 493–509.
Raposo, C., Barretov, J. P., & Nunes, U. (2013). Fast and accurate calibration of a kinect sensor. In Proceedings of the international conference on 3D Vision (pp. 342–349).
Scott, D. W. (2001). Parametric statistical modeling by minimum integrated square error. Technometrics, 43(3), 274–285.
Sevcenco, I. S., Hampton, P. J., & Agathoklis, P. (2015). A wavelet based method for image reconstruction from gradient data with applications. Multidimensional Systems and Signal Processing, 26(3), 717–737.
Smisek, J., Jancosek, M., & Pajdla, T. (2013). 3d with kinect. In J. Smisek, M. Jancosek, & T. Pajdla (Eds.), Consumer depth cameras for computer vision (pp. 3–25). Berlin: Springer.
Sun, S., Liu, R., Yang, C., Zhou, H., Zhao, J., & Ma, J. (2016). Comparative study on the speckle filters for the very high-resolution polarimetric synthetic aperture radar imagery. Journal of Applied Remote Sensing, 10(4), 045,014–045,014.
Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012). Scanning 3d full humanbodies using kinect. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643–650.
Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. Biometrics, 54, 674.
Wang, G., Wang, Z., Zhao, W., & Zhou, Q. (2014). Robust Point Matching Using Mixture of Asymmetric Gaussians for Nonrigid Transformation. Berlin: Springer.
Wang, Q., Li, J., Sullivan, G. J., & Sun, M. T. (2011). Reduced-complexity search for video coding geometry partitions using texture and depth data. In Proceedings of the IEEE visual communications and image processing (pp. 1–4).
Wang, Q., Sun, M. T., Sullivan, G. J., & Li, J. (2012). Complexity-reduced geometry partition search and high efficiency prediction for video coding. In Proceedings of the IEEE international symposium on circuits and systems (pp. 133–136)
Wei, Y., You, X., & Li, H. (2016). Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognition, 58, 216–226.
Wei, Y., Zhou, Y., & Li, H. (2017). Spectral-spatial response for hyperspectral image classification. Remote Sensing, 9(3), 203.
Wu, S., He, X., Lu, H., & Yuille, A. L. (2010). A unified model of short-range and long-range motion perception. In Advances in neural information processing systems (pp. 2478–2486).
Yang, C., Zhou, H., Sun, S., Liu, R., Zhao, J., & Ma, J. (2014). Good match exploration for infrared face recognition. Infrared Physics and Technology, 67, 111–115.
Yang, L., Zhang, L., & Dong, H. (2015). Evaluating and improving the depth accuracy of kinect for windows v2. Sensors, 15(8), 4275–4285.
Yu, Z., Zhou, H., & Li, C. (2017). Fast non-rigid image feature matching for agricultural uav via probabilistic inference with regularization techniques. Computers and Electronics in Agriculture, 143, 79–89.
Yuille, A., & Ullman, S. (1987). Rigidity and smoothness of motion. Cambridge: Massachusetts Institute of Technology.
Zhang, C., & Zhang, Z. (2011). Calibration between depth and color sensors for commodity depth cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Zhao, J., Ma, J., Tian, J., Ma, J., & Zhang, D. (2011). A robust method for vector field learning with application to mismatch removing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2977–2984).
Zhou, H., Zhang, D., Chen, C., & Tian, J. (2011). Discarding wide baseline mismatches with global and local transformation consistency. Electronics Letters, 47(1), 25–26.
Zhou, H., Ma, J., Yang, C., Sun, S., Liu, R., & Zhao, J. (2016). Non-rigid feature matching for remote sensing images via probabilistic inference with global and local regularizations. IEEE Geoscience and Remote Sensing Letters, 13(3), 374–378.
Zhou, Y., & Wei, Y. (2016). Learning hierarchical spectral-spatial features for hyperspectral image classification. IEEE Transactions on Cybernetics, 46(7), 1667–1678.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (Nos. 41501505, 61773295, 61503288, 61501413, 61502354).
Appendix
Appendix
1.1 Proof of Eq. (3)
We have \(F(\epsilon ) = \sum _{k = 1}^{4}w_k \phi (\epsilon |0,\sigma _{k}^2)\), where \(\phi (\epsilon |0,\sigma _{k}^2) =\phi (\epsilon |0,\sigma _k^2) = \frac{1}{\sqrt{2\pi }\sigma _k}e^{(-\frac{\epsilon ^2}{2\sigma _k^2})}\), and \(w_k\) denotes the possibility of a match point belonging to a match function. Suppose \(\phi _1 = \phi (\epsilon |0,\sigma _1^2), \phi _2 = \phi (\epsilon |0,\sigma _2^2), \phi _3 = \phi (\epsilon |0,\sigma _3^2), \phi _4 = \phi (\epsilon |0,\sigma _4^2)\). We assume \(M = \int (F(\epsilon ))^2d\epsilon \), and then
According to Wand and Jones (1994), we can get the following formulation:
where A is a constant, and we can omit it. Then we can get the following formulation:
1.2 Vector-valued reproducing Kernel Hilbert space
We review the basic theory of vector-valued reproducing kernel Hilbert space here, and for further details and references please see Carmeli et al. (2006), Micchelli and Pontil (2005).
Let X be a set, i.e.\(X \subseteq {\mathbb {R}}^P \), Y a real Hilbert space with the inner product (norm) \(\langle \cdot ,\cdot \rangle ,(\Vert \cdot \Vert )\), i.e.\(y \subseteq {\mathbb {R}}^D\), and H a Hilbert space with the inner product (norm) \(\langle \cdot ,\cdot \rangle _H,(\Vert \cdot \Vert _H)\), where \(P=D=\)2 or 3 for point matching problem. Note that a norm can be induced by an inner product, i.e.\(\forall f \in H,\Vert f\Vert _H = \root \of {\langle f,f \rangle _H}\). And a Hilbert space is a real or complex inner product space that is also a complete metric space with respect to the distance function induced by the inner product. Thus a vector-valued RKHS can be defined as follows.
Definition 1
A Hilbert space H is an RKHS if the evaluation maps \( ev_x:H \rightarrow Y(i.e.,ev_x(f)= f(x))\) are bounded; i.e.\(\forall x \in X\) and there exists a positive constant \(C_x\) such that
A reproducing kernel \(\varGamma :X \times X \rightarrow B(Y)\) is then defined as \(\varGamma (x,x'):=ev_xev_{x'}^*\), where B(Y) is the Banach space of bounded linear operators (i.e., \(\varGamma (x,x'),\forall x,x' \in X)\) on Y, i.e.\( B(Y) \subseteq {\mathbb {R}}^{D\times D}\), and \(ev_x^*\) is the adjoint of \(ev_x\). We have the following two properties about the RKHS and kernel.
Remark 1
The kernel \(\varGamma \) reproduces the value of a function \(f\in H\) at a point \(x \in X\). Indeed, given \(\forall x \in X\) and \(y \in Y\), we have \(ev_x^*y=\varGamma (\cdot ,x)y\), so that \(\langle f(x),y \rangle = \langle f,\varGamma (\cdot ,x)y\rangle _H\).
Remark 2
An RKHS defines a corresponding reproducing kernel. Conversely, a reproducing kernel defines a unique RKHS.
More specifically, for any \(M \in {\mathbb {N}}, \lbrace x_i \rbrace _{i=1}^M \subseteq X\), and a reproducing kernel \(\varGamma \), a unique RKHS can be defined by considering the completion of the space:
with respect to the norm induced by the inner product
where \(f = \sum _{i=1}^M \varGamma (\cdot ,x_i)c_i\) and \(g = \sum _{j=1}^M \varGamma (\cdot ,x_j)d_j\).
Rights and permissions
About this article
Cite this article
Peng, L., Zhang, Y., Zhou, H. et al. A non-parametric depth modification model for registration between color and depth images. Multidim Syst Sign Process 30, 1129–1148 (2019). https://doi.org/10.1007/s11045-018-0599-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-018-0599-8