Skip to main content
Log in

A non-parametric depth modification model for registration between color and depth images

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Despite its most popularity among all depth cameras in the computer vision applications, the Microsoft Kinect sensor suffers from low depth accuracy. In this work we propose a novel non-parametric depth modification model to improve the depth accuracy of the Kinect sensor by iteratively registering depth images and color images. In particular, we first establish a coarse correspondence based on the feature descriptor of the canny edge at each iteration, and estimate the fine correspondence using an \(L_2E\) algorithm. We utilize the non-parametric Gaussian mixture model to replace the Gaussian single model and build the regularization term to constrain the correlations between functions. Then, based on the correspondence results, the depth data are corrected and optimized. Extensive experiments have been performed to verify the effectiveness of the proposed approach, and the results have demonstrated that our method is able to greatly enhance the depth accuracy of the Kinect sensor compared with baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We implemented RANSAC based on the publicly available code at http://www.robots.ox.ac.uk/~vgg/hzbook/code/.

  2. We implemented CPD based on the publicly available code at http://www.bme.ogi.edu/~myron/matlab/cpd/.

  3. The code of SC method are available at: https://vision.cornell.edu/se3/publications/.

  4. We implemented \(L_2E\)(GSM) based on the publicly available code at http://www.escience.cn/people/jiayima/cxdm.html (Ma et al. 2013b).

  5. The datasets are available at: http://rgbd-dataset.cs.washington.edu/.

  6. The code of SC method are available at: https://vision.cornell.edu/se3/publications/.

References

  • Aydin, V. A., & Foroosh, H. (2017). A linear well-posed solution to recover high-frequency information for super resolution image reconstruction. Multidimensional Systems and Signal Processing, 2, 1–22.

    Google Scholar 

  • Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(24), 509–522.

    Google Scholar 

  • Bhandari, A. K., Kumar, A., Singh, G. K., & Soni, V. (2016). Dark satellite image enhancement using knee transfer function and gamma correction based on DWT–SVD. Multidimensional Systems and Signal Processing, 27(2), 453–476.

    Google Scholar 

  • Carmeli, C., Vito, E. D., & Toigo, A. (2006). Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem. Analysis and Applications, 10(4), 377–408.

    MathSciNet  MATH  Google Scholar 

  • Chen, G., & Coulombe, S. (2014). A new image registration method robust to noise. Multidimensional Systems and Signal Processing, 25(3), 601–609.

    MathSciNet  Google Scholar 

  • Dong, H., Figueroa, N., & Saddik, A. E. (2014). Towards consistent reconstructions of indoor spaces based on 6d rgb-d odometry and kinectfusion. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1796–1803)

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    MathSciNet  Google Scholar 

  • Gao, Y., & Yuille, A. L. (2016). Exploiting symmetry and/or manhattan properties for 3D object structure estimation from single and multiple images. arXiv:1607.07129.

  • Gao, Y., Ma, J., & Yuille, A. L. (2017). Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples. IEEE Transactions on Image Processing, 26(5), 2545–2560.

    MathSciNet  MATH  Google Scholar 

  • Han, J., & Bhanu, B. (2007). Fusion of color and infrared video for moving human detection. Pattern Recognition, 40(6), 1771–1784.

    MATH  Google Scholar 

  • Han, J., Farin, D., & With, P. H. N. D. (2011). A mixed-reality system for broadcasting sports video to mobile devices. IEEE Multimedia, 18(2), 72–84.

    Google Scholar 

  • Han, J., Pauwels, E., & Zeeuw, P. D. (2012). Visible and Infrared Image Registration Employing Line-Based Geometric Analysis. Berlin: Springer.

    Google Scholar 

  • Han, J., Pauwels, E. J., & De Zeeuw, P. (2013a). Visible and infrared image registration in man-made environments employing hybrid visual features. Pattern Recognition Letters, 34(1), 42–51.

    Google Scholar 

  • Han, J., Shao, L., Xu, D., & Shotton, J. (2013b). Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics, 43(5), 1318–1334.

    Google Scholar 

  • Herrera, C. D., Kannala, C. J., & Heikkilä, J. (2011). Accurate and practical calibration of a depth and color camera pair. In Proceedings of the international conference computer analysis of images and patterns (pp. 437–445)

  • Herrera, C. D., Kannala, C. J., & Heikkilä, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 2058–2064.

    Google Scholar 

  • Huang, X., & Zhe, C. (2002). A wavelet-based multisensor image registration algorithm. In International conference on signal processing (vol.1, pp. 773–776)

  • Jiang, J., Chen, C., Ma, J., Wang, Z., Wang, Z., & Hu, R. (2017a). Srlsp: A face image super-resolution algorithm using smooth regression with local structure prior. IEEE Transactions on Multimedia, 19(1), 27–40.

    Google Scholar 

  • Jiang, J., Ma, J., Chen, C., Jiang, X., & Wang, Z. (2017b). Noise robust face image super-resolution through smooth sparse representation. IEEE Transactions on Cybernetics, 47(11), 3991–4002.

    Google Scholar 

  • Jiang, J., Ma, X., Chen, C., Lu, T., Wang, Z., & Ma, J. (2017c). Single image super-resolution via locally regularized anchored neighborhood regression and nonlocal means. IEEE Transactions on Multimedia, 19(1), 15–26.

    Google Scholar 

  • Liu, S., Shi, M., Zhu, Z., & Zhao, J. (2017). Image fusion based on complex-shearlet domain with guided filtering. Multidimensional Systems and Signal Processing, 28(1), 207–224.

    MATH  Google Scholar 

  • Lu, T., Guan, Y., Zhang, Y., Qu, S., & Xiong, Z. (2017a). Robust and efficient face recognition via low-rank supported extreme learning machine. Multimedia Tools and Applications, 12, 1–22.

    Google Scholar 

  • Lu, T., Xiong, Z., Zhang, Y., Wang, B., & Lu, T. (2017b). Robust face super-resolution via locality-constrained low-rank representation. IEEE Access, 5(99), 13,103–13,117.

    Google Scholar 

  • Ma, J., Zhao, J., Tian, J., Bai, X., & Tu, Z. (2013a). Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognition, 46(12), 3519–3532.

    MATH  Google Scholar 

  • Ma, J., Zhao, J., Tian, J., Tu, Z., & Yuille, A. L. (2013b). Robust estimation of nonrigid transformation for point set registration. In Proceedings of the IEEE conference computer vision and pattern recognition (pp. 2147–2154)

  • Ma, J., Zhao, J., Tian, J., Yuille, A. L., & Tu, Z. (2014). Robust point matching via vector field consensus. IEEE Transactions on Image Processing, 23(4), 1706–1721.

    MathSciNet  MATH  Google Scholar 

  • Ma, J., Qiu, W., Zhao, J., Ma, Y., Yuille, A. L., & Tu, Z. (2015a). Robust \(L_2E\) estimation of transformation for non-rigid registration. IEEE Transactions on Signal Processing, 63(5), 1115–1129.

    MathSciNet  MATH  Google Scholar 

  • Ma, J., Zhao, J., Ma, Y., & Tian, J. (2015b). Non-rigid visible and infrared face registration via regularized gaussian fields criterion. Pattern Recognition, 48(3), 772–784.

    Google Scholar 

  • Ma, J., Zhou, H., Zhao, J., Gao, Y., Jiang, J., & Tian, J. (2015c). Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Transactions on Geoscience and Remote Sensing, 53(12), 6469–6481.

    Google Scholar 

  • Ma, J., Jiang, J., Liu, C., & Li, Y. (2017a). Feature guided gaussian mixture model with semi-supervised em and local geometric constraint for retinal image registration. Information Sciences, 417, 128–142.

    MathSciNet  Google Scholar 

  • Ma, J., Zhao, J., Guo, H., Jiang, J., Zhou, H., & Gao, Y. (2017b). Locality preserving matching. In Proceedings of the international joint conference on artificial intelligence (pp. 4492–4498).

  • Ma, J., Jiang, J., Zhou, H., Zhao, J., & Guo, X. (2018). Guided locality preserving feature matching for remote sensing image registration. In IEEE transactions on geoscience and remote sensing.

  • Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.

    Google Scholar 

  • Micchelli, C. A., & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17(1), 177–204.

    MathSciNet  MATH  Google Scholar 

  • Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.

    Google Scholar 

  • Peng, L., Zhang, Y., Zhou, Huabing, & Lu, T. (2018). A robust method for estimating image geometry with local structure constraint. IEEE Access, 6, 20734–20747.

    Google Scholar 

  • Poggio, T., Torre, V., & Koch, C. (1985). Computational vision and regularization theory. Nature, 317(6035), 638–643.

    Google Scholar 

  • Qian, Q., & Gunturk, B. K. (2016). Extending depth of field and dynamic range from differently focused and exposed images. Multidimensional Systems and Signal Processing, 27(2), 493–509.

    Google Scholar 

  • Raposo, C., Barretov, J. P., & Nunes, U. (2013). Fast and accurate calibration of a kinect sensor. In Proceedings of the international conference on 3D Vision (pp. 342–349).

  • Scott, D. W. (2001). Parametric statistical modeling by minimum integrated square error. Technometrics, 43(3), 274–285.

    MathSciNet  Google Scholar 

  • Sevcenco, I. S., Hampton, P. J., & Agathoklis, P. (2015). A wavelet based method for image reconstruction from gradient data with applications. Multidimensional Systems and Signal Processing, 26(3), 717–737.

    MathSciNet  Google Scholar 

  • Smisek, J., Jancosek, M., & Pajdla, T. (2013). 3d with kinect. In J. Smisek, M. Jancosek, & T. Pajdla (Eds.), Consumer depth cameras for computer vision (pp. 3–25). Berlin: Springer.

    Google Scholar 

  • Sun, S., Liu, R., Yang, C., Zhou, H., Zhao, J., & Ma, J. (2016). Comparative study on the speckle filters for the very high-resolution polarimetric synthetic aperture radar imagery. Journal of Applied Remote Sensing, 10(4), 045,014–045,014.

    Google Scholar 

  • Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012). Scanning 3d full humanbodies using kinect. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643–650.

    Google Scholar 

  • Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. Biometrics, 54, 674.

    Google Scholar 

  • Wang, G., Wang, Z., Zhao, W., & Zhou, Q. (2014). Robust Point Matching Using Mixture of Asymmetric Gaussians for Nonrigid Transformation. Berlin: Springer.

    Google Scholar 

  • Wang, Q., Li, J., Sullivan, G. J., & Sun, M. T. (2011). Reduced-complexity search for video coding geometry partitions using texture and depth data. In Proceedings of the IEEE visual communications and image processing (pp. 1–4).

  • Wang, Q., Sun, M. T., Sullivan, G. J., & Li, J. (2012). Complexity-reduced geometry partition search and high efficiency prediction for video coding. In Proceedings of the IEEE international symposium on circuits and systems (pp. 133–136)

  • Wei, Y., You, X., & Li, H. (2016). Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognition, 58, 216–226.

    Google Scholar 

  • Wei, Y., Zhou, Y., & Li, H. (2017). Spectral-spatial response for hyperspectral image classification. Remote Sensing, 9(3), 203.

    Google Scholar 

  • Wu, S., He, X., Lu, H., & Yuille, A. L. (2010). A unified model of short-range and long-range motion perception. In Advances in neural information processing systems (pp. 2478–2486).

  • Yang, C., Zhou, H., Sun, S., Liu, R., Zhao, J., & Ma, J. (2014). Good match exploration for infrared face recognition. Infrared Physics and Technology, 67, 111–115.

    Google Scholar 

  • Yang, L., Zhang, L., & Dong, H. (2015). Evaluating and improving the depth accuracy of kinect for windows v2. Sensors, 15(8), 4275–4285.

    Google Scholar 

  • Yu, Z., Zhou, H., & Li, C. (2017). Fast non-rigid image feature matching for agricultural uav via probabilistic inference with regularization techniques. Computers and Electronics in Agriculture, 143, 79–89.

    Google Scholar 

  • Yuille, A., & Ullman, S. (1987). Rigidity and smoothness of motion. Cambridge: Massachusetts Institute of Technology.

    Google Scholar 

  • Zhang, C., & Zhang, Z. (2011). Calibration between depth and color sensors for commodity depth cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Zhao, J., Ma, J., Tian, J., Ma, J., & Zhang, D. (2011). A robust method for vector field learning with application to mismatch removing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2977–2984).

  • Zhou, H., Zhang, D., Chen, C., & Tian, J. (2011). Discarding wide baseline mismatches with global and local transformation consistency. Electronics Letters, 47(1), 25–26.

    Google Scholar 

  • Zhou, H., Ma, J., Yang, C., Sun, S., Liu, R., & Zhao, J. (2016). Non-rigid feature matching for remote sensing images via probabilistic inference with global and local regularizations. IEEE Geoscience and Remote Sensing Letters, 13(3), 374–378.

    Google Scholar 

  • Zhou, Y., & Wei, Y. (2016). Learning hierarchical spectral-spatial features for hyperspectral image classification. IEEE Transactions on Cybernetics, 46(7), 1667–1678.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanduo Zhang.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 41501505, 61773295, 61503288, 61501413, 61502354).

Appendix

Appendix

1.1 Proof of Eq. (3)

We have \(F(\epsilon ) = \sum _{k = 1}^{4}w_k \phi (\epsilon |0,\sigma _{k}^2)\), where \(\phi (\epsilon |0,\sigma _{k}^2) =\phi (\epsilon |0,\sigma _k^2) = \frac{1}{\sqrt{2\pi }\sigma _k}e^{(-\frac{\epsilon ^2}{2\sigma _k^2})}\), and \(w_k\) denotes the possibility of a match point belonging to a match function. Suppose \(\phi _1 = \phi (\epsilon |0,\sigma _1^2), \phi _2 = \phi (\epsilon |0,\sigma _2^2), \phi _3 = \phi (\epsilon |0,\sigma _3^2), \phi _4 = \phi (\epsilon |0,\sigma _4^2)\). We assume \(M = \int (F(\epsilon ))^2d\epsilon \), and then

$$\begin{aligned} M&=\int \left( \sum _{k = 1}^{4}w_k \phi (\epsilon |0,\sigma _k^2)\right) ^2d\epsilon \nonumber \\&= \sum _{k = 1}^{4}\int w_k^2\phi _k^2d\epsilon + 2\sum _{i,j=1,i\ne j}^{4}w_iw_j\int \phi _i\phi _jd\epsilon \nonumber \\&=\sum _{k=1}^4\frac{w_k^2}{2^d(\pi \sigma _k)^{\frac{d}{2}}} + 2\sum _{i,j=1,i\ne j}^{4}w_iw_j\int \phi _i\phi _jd\epsilon . \end{aligned}$$
(12)

According to Wand and Jones (1994), we can get the following formulation:

$$\begin{aligned} M&=\sum _{k=1}^4\frac{w_k^2}{2^d(\pi \sigma _k)^{\frac{d}{2}}} +2\sum _{i,j=1,i\ne j}^{4}w_iw_j \phi (0|0,\sigma _i^2+\sigma _j^2).\nonumber \\&=\sum _{k=1}^4\frac{w_k^2}{2^d(\pi \sigma _k)^{\frac{d}{2}}} + A. \end{aligned}$$
(13)

where A is a constant, and we can omit it. Then we can get the following formulation:

$$\begin{aligned} \int (F(\epsilon ))^2d\epsilon&=\sum _{k=1}^4\frac{w_k^2}{2^d(\pi \sigma _k)^{\frac{d}{2}}}. \end{aligned}$$
(14)

1.2 Vector-valued reproducing Kernel Hilbert space

We review the basic theory of vector-valued reproducing kernel Hilbert space here, and for further details and references please see Carmeli et al. (2006), Micchelli and Pontil (2005).

Let X be a set, i.e.\(X \subseteq {\mathbb {R}}^P \), Y a real Hilbert space with the inner product (norm) \(\langle \cdot ,\cdot \rangle ,(\Vert \cdot \Vert )\), i.e.\(y \subseteq {\mathbb {R}}^D\), and H a Hilbert space with the inner product (norm) \(\langle \cdot ,\cdot \rangle _H,(\Vert \cdot \Vert _H)\), where \(P=D=\)2 or 3 for point matching problem. Note that a norm can be induced by an inner product, i.e.\(\forall f \in H,\Vert f\Vert _H = \root \of {\langle f,f \rangle _H}\). And a Hilbert space is a real or complex inner product space that is also a complete metric space with respect to the distance function induced by the inner product. Thus a vector-valued RKHS can be defined as follows.

Definition 1

A Hilbert space H is an RKHS if the evaluation maps \( ev_x:H \rightarrow Y(i.e.,ev_x(f)= f(x))\) are bounded; i.e.\(\forall x \in X\) and there exists a positive constant \(C_x\) such that

$$\begin{aligned} \Vert ev_x(f)\Vert = \Vert f(x)\Vert \le C_x\Vert f\Vert _H, \quad \forall f \in H. \end{aligned}$$
(15)

A reproducing kernel \(\varGamma :X \times X \rightarrow B(Y)\) is then defined as \(\varGamma (x,x'):=ev_xev_{x'}^*\), where B(Y) is the Banach space of bounded linear operators (i.e., \(\varGamma (x,x'),\forall x,x' \in X)\) on Y, i.e.\( B(Y) \subseteq {\mathbb {R}}^{D\times D}\), and \(ev_x^*\) is the adjoint of \(ev_x\). We have the following two properties about the RKHS and kernel.

Remark 1

The kernel \(\varGamma \) reproduces the value of a function \(f\in H\) at a point \(x \in X\). Indeed, given \(\forall x \in X\) and \(y \in Y\), we have \(ev_x^*y=\varGamma (\cdot ,x)y\), so that \(\langle f(x),y \rangle = \langle f,\varGamma (\cdot ,x)y\rangle _H\).

Remark 2

An RKHS defines a corresponding reproducing kernel. Conversely, a reproducing kernel defines a unique RKHS.

More specifically, for any \(M \in {\mathbb {N}}, \lbrace x_i \rbrace _{i=1}^M \subseteq X\), and a reproducing kernel \(\varGamma \), a unique RKHS can be defined by considering the completion of the space:

$$\begin{aligned} H_M=\left\{ \sum _{i=1}^M\varGamma (\cdot ,x_i)c_i:c_i \in Y\right\} , \end{aligned}$$
(16)

with respect to the norm induced by the inner product

$$\begin{aligned} \langle f,g \rangle _H=\sum _{i,j=1}^M \langle \varGamma (x_j,x_i)c_i,d_j\rangle , \quad \forall f,g \in H_M, \end{aligned}$$
(17)

where \(f = \sum _{i=1}^M \varGamma (\cdot ,x_i)c_i\) and \(g = \sum _{j=1}^M \varGamma (\cdot ,x_j)d_j\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, L., Zhang, Y., Zhou, H. et al. A non-parametric depth modification model for registration between color and depth images. Multidim Syst Sign Process 30, 1129–1148 (2019). https://doi.org/10.1007/s11045-018-0599-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-018-0599-8

Keywords

Navigation