Skip to main content
Log in

A Robust Monocular 3D Object Tracking Method Combining Statistical and Photometric Constraints

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Both region-based methods and direct methods have become popular in recent years for tracking the 6-dof pose of an object from monocular video sequences. Region-based methods estimate the pose of the object by maximizing the discrimination between statistical foreground and background appearance models, while direct methods aim to minimize the photometric error through direct image alignment. In practice, region-based methods only care about the pixels within a narrow band of the object contour due to the level-set-based probabilistic formulation, leaving the foreground pixels beyond the evaluation band unused. On the other hand, direct methods only utilize the raw pixel information of the object, but ignore the statistical properties of foreground and background regions. In this paper, we find it beneficial to combine these two kinds of methods together. We construct a new probabilistic formulation for 3D object tracking by combining statistical constraints from region-based methods and photometric constraints from direct methods. In this way, we take advantage of both statistical property and raw pixel values of the image in a complementary manner. Moreover, in order to achieve better performance when tracking heterogeneous objects in complex scenes, we propose to increase the distinctiveness of foreground and background statistical models by partitioning the global foreground and background regions into a small number of sub-regions around the object contour. We demonstrate the effectiveness of the proposed novel strategies on a newly constructed real-world dataset containing different types of objects with ground-truth poses. Further experiments on several challenging public datasets also show that our method obtains competitive or even superior tracking results compared to previous works. In comparison with the recent state-of-art region-based method, the proposed hybrid method is proved to be more stable under silhouette pose ambiguities with a slightly lower tracking accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Alismail, H., Browning, B., & Lucey, S. (2016). Robust tracking in low light and sudden illumination changes. In International conference on 3D vision (3DV) (pp. 389–398). IEEE.

  • Baker, S., & Matthews, I. (2004). Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3), 221–255.

    Article  Google Scholar 

  • Bibby, C., & Reid, I. (2008). Robust real-time visual tracking using pixel-wise posteriors. In European conference on computer vision (ECCV) (pp. 831–844). Springer.

  • Caron, G., Dame, A., & Marchand, E. (2014). Direct model based visual tracking and pose estimation using mutual information. Image and Vision Computing, 32(1), 54–63.

    Article  Google Scholar 

  • Chen, L., Zhou, F., Shen, Y., Tian, X., Ling, H., & Chen, Y. (2017). Illumination insensitive efficient second-order minimization for planar object tracking. In IEEE international conference on robotics and automation (ICRA). IEEE.

  • Choi, C., & Christensen, H. I. (2010). Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation. In IEEE international conference on robotics and automation (ICRA) (pp. 4048–4055).

  • Crivellaro, A., & Lepetit, V. (2014). Robust 3D tracking with descriptor fields. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3414–3421).

  • Dambreville, S., Sandhu, R., Yezzi, A., & Tannenbaum, A. (2008). Robust 3D pose estimation and efficient 2D region-based segmentation from a 3D shape prior. In European conference on computer vision (ECCV) (pp. 169–182). Springer.

  • Engel, J., Koltun, V., & Cremers, D. (2018). Direct sparse odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 611–625.

    Article  Google Scholar 

  • Engel, J., Schöps, T., & Cremers, D. (2014). LSD-SLAM: Large-scale direct monocular slam. In European conference on computer vision (ECCV) (pp. 834–849).

  • Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., & Marín-Jiménez, M. J. (2014). Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6), 2280–2292.

    Article  Google Scholar 

  • Hexner, J., & Hagege, R. R. (2016). 2D–3D pose estimation of heterogeneous objects using a region based approach. International Journal of Computer Vision, 118(1), 95–112.

    Article  MathSciNet  Google Scholar 

  • Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., & Lepetit, V. (2011). Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In International conference on computer vision (ICCV) (pp. 858–865).

  • Kehl, W., Manhardt, F., Tombari, F., Ilic, S., & Navab, N. (2017a). SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In International conference on computer vision (ICCV) (pp. 1521–1529).

  • Kehl, W., Tombari, F., Ilic, S., & Navab, N. (2017b). Real-time 3D model tracking in color and depth on a single CPU core. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 745–753).

  • Kerl, C., Sturm, J., & Cremers, D. (2013). Robust odometry estimation for RGB-D cameras. In IEEE international conference on robotics and automation (ICRA) (pp. 3748–3754). IEEE.

  • Lepetit, V., & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects. Breda: Now Publishers Inc.

    Book  Google Scholar 

  • Lima, J. P., Simões, F., Figueiredo, L., & Kelner, J. (2010). Model based markerless 3D tracking applied to augmented reality. Journal on 3D Interactive Systems, 1, 2–15.

  • Loesch, A., Bourgeois, S., Gay-Bellile, V., & Dhome, M. (2015). Generic edgelet-based tracking of 3D objects in real-time. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 6059–6066). IEEE.

  • Lucas, B. D., Kanade, T., et al. (1981). An iterative image registration technique with an application to stereo vision. In International joint conference on artificial intelligence (IJCAI) (Vol. 81, pp. 674–679).

  • Panin, G., Roth, E., & Knoll, A. (2008). Robust contour-based object tracking integrating color and edge likelihoods. In VMV (pp. 227–234).

  • Park, Y., Lepetit, V., & Woo, W. (2008). Multiple 3D object tracking for augmented reality. In IEEE/ACM international symposium on mixed and augmented reality (ISMAR) (pp. 117–120).

  • Pauwels, K., Rubio, L., Diaz, J., & Ros, E. (2013). Real-time model-based rigid object pose estimation and tracking combining dense and sparse visual cues. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2347–2354).

  • Petit, A., Marchand, E., & Kanani, K. (2013). A robust model-based tracker combining geometrical and color edge information. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3719–3724). IEEE.

  • Prisacariu, V. A., Kahler, O., Murray, D. W., & Reid, I. D. (2013). Simultaneous 3D tracking and reconstruction on a mobile phone. In IEEE international symposium on mixed and augmented reality (ISMAR) (pp. 89–98). IEEE.

  • Prisacariu, V. A., & Reid, I. D. (2012). PWP3D: Real-time segmentation and tracking of 3D objects. International Journal of Computer Vision, 98(3), 335–354.

    Article  MathSciNet  Google Scholar 

  • Ren, C. Y., Prisacariu, V., Kaehler, O., Reid, I., & Murray, D. (2014). 3D tracking of multiple objects with identical appearance using RGB-D input. In International conference on 3D vision (3DV) (Vol. 1, pp. 47–54). IEEE.

  • Ren, C., Prisacariu, V., Kähler, O., Reid, I., & Murray, D. (2017). Real-time tracking of single and multiple objects from depth-colour imagery using 3D signed distance functions. International Journal of Computer Vision, 124(1), 80–95.

    Article  MathSciNet  Google Scholar 

  • Scandaroli, G. G., Meilland, M., & Richa, R. (2012). Improving NCC-based direct visual tracking. In European conference on computer vision (ECCV) (pp. 442–455). Springer.

  • Seo, B. K., Park, H., Park, J. I., Hinterstoisser, S., & Ilic, S. (2014). Optimal local searching for fast and robust textureless 3D object tracking in highly cluttered backgrounds. IEEE Transactions on Visualization and Computer Graphics, 20(1), 99–110.

    Article  Google Scholar 

  • Seo, B. K., & Wuest, H. (2016). A direct method for robust model-based 3D object tracking from a monocular RGB image. In European conference on computer vision workshop (ECCVW) (pp. 551–562).

  • Singhal, P., White, R., & Christensen, H. (2016). Multi-modal tracking for object based slam. arXiv preprint arXiv:160304117.

  • Tjaden, H., Schwanecke, U., & Schömer, E. (2016). Real-time monocular segmentation and pose tracking of multiple objects. In European conference on computer vision (ECCV) (pp. 423–438). Springer.

  • Tjaden, H., Schwanecke, U., & Schömer, E. (2017). Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In International conference on computer vision (ICCV) (pp. 124–132).

  • Zhao, S., Wang, L., Sui, W., Wu, H. Y., & Pan, C. (2014). 3D object tracking via boundary constrained region-based model. In IEEE international conference on image processing (ICIP) (pp 486–490). IEEE.

  • Zhong, L., Lu, M., & Zhang, L. (2017). A direct 3D object tracking method based on dynamic textured model rendering and extended dense feature fields. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2017.2731519.

Download references

Acknowledgements

This work is partly supported by the National Natural Science Foundation of China under Grant U1533132.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leisheng Zhong.

Additional information

Communicated by M. Hebert.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: The Hessian Approximation

Appendix A: The Hessian Approximation

Here we give a mathematical explanation of the Hessian approximation [Eq. (19)] used for the statistical term in our energy function:

$$\begin{aligned} {E_{fb}}= & {} - \sum \limits _{i = 1:n} \sum \limits _{{\mathbf{x}} \in {\varOmega _i}} \log \left( {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} \right. \nonumber \\&\left. + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}} \right) \end{aligned}$$
(24)

We first consider the statistical energy for a single pixel \(\mathbf {x}\):

$$\begin{aligned} {E_{fb}}\left( {\mathbf{x}} \right)&= - \log \left( {{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}} \right) \nonumber \\ \end{aligned}$$
(25)

The Jacobian for pixel \(\mathbf {x}\):

$$\begin{aligned} {\mathbf{J}}_{fb}\left( {\mathbf{x}} \right)&= \frac{{\partial {E_{fb}}\left( {\mathbf{x}} \right) }}{{\partial {\mathbf{p}}}}\, \nonumber \\&\quad = \, -\, \frac{{{P_{{f_i}}} - {P_{{b_i}}}}}{{{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}}} \nonumber \\&\quad \frac{{\partial {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {\mathbf{p}}}} \end{aligned}$$
(26)

Here \({{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) \in {{\mathbf{R}}^6}\), corresponding to the 6-dof pose parameter \(\mathbf {p}\). The m-th element can be written as:

$$\begin{aligned} {\left[ {{{\mathbf{J}}_{fb}} \left( {\mathbf{x}} \right) } \right] _m}&= \frac{{\partial {E_{fb}}\left( {\mathbf{x}} \right) }}{{\partial {p_m}}}\, \nonumber \\&\quad = \, - \,\frac{{{P_{{f_i}}} - {P_{{b_i}}}}}{{{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}}} \nonumber \\&\quad \frac{{\partial {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {p_m}}} \end{aligned}$$
(27)

The Hessian for pixel \(\mathbf {x}\): \({{\mathbf{H}}_{fb}}\left( {\mathbf{x}} \right) \in {{\mathbf{R}}^{6 \times 6}}\), and the (mn)-th element can be written as:

$$\begin{aligned}&{\left[ {{{\mathbf{H}}_{fb}}\left( {\mathbf{x}} \right) } \right] _{m,n}} = \frac{{{\partial ^2}{E_{fb}}\left( {\mathbf{x}} \right) }}{{\partial {p_m}\partial {p_n}}}\, \nonumber \\ \qquad&= \frac{{\partial \left[ { - \frac{{{P_{{f_i}}} - {P_{{b_i}}}}}{{{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}}}\frac{{\partial {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {p_m}}}} \right] }}{{\partial {p_n}}}\, \nonumber \\ \qquad&= \frac{{{{\left( {{P_{{f_i}}} - {P_{{b_i}}}} \right) }^2}\frac{{\partial {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {p_n}}}}}{{{{\left[ {{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}} \right] }^2}}}\frac{{\partial {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {p_m}}} \nonumber \\&\qquad -\, \frac{{{P_{{f_i}}} - {P_{{b_i}}}}}{{{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) {P_{{f_i}}} + \left( {1 - {H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) } \right) {P_{{b_i}}}}}\frac{{{\partial ^2}{H_e}\left( {\varPhi \left( {\mathbf{x}} \right) } \right) }}{{\partial {p_m}\partial {p_n}}} \nonumber \\ \end{aligned}$$
(28)

We denote the first term as \(h_1\) (which contains first order derivatives), and the second term as \(h_2\) (which contains second order derivatives):

$$\begin{aligned} {\left[ {{{\mathbf{H}}_{fb}}\left( {\mathbf{x}} \right) } \right] _{m,n}} = {h_1} - {h_2} \end{aligned}$$
(29)

Comparing to Eq. (27), we have:

$$\begin{aligned} {h_1} = {\left[ {{{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) } \right] _m}{\left[ {{{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) } \right] _n} = {\left[ {{{\mathbf{J}}_{fb}}{{\left( {\mathbf{x}} \right) }^T}{{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) } \right] _{m,n}} \end{aligned}$$
(30)

As with the standard Gauss–Newton method (which aims to solve non-linear least square problems), we obtain an approximation of the Hessian matrix by ignoring the second-order derivative term \(h_2\):

$$\begin{aligned}&{\left[ {{{\mathbf{H}}_{fb}}\left( {\mathbf{x}} \right) } \right] _{m,n}} = {h_1} = {\left[ {{{\mathbf{J}}_{fb}}{{\left( {\mathbf{x}} \right) }^T}{{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) } \right] _{m,n}} \end{aligned}$$
(31)
$$\begin{aligned}&{{\mathbf{H}}_{fb}}\left( {\mathbf{x}} \right) = {{\mathbf{J}}_{fb}}{\left( {\mathbf{x}} \right) ^T}{{\mathbf{J}}_{fb}}\left( {\mathbf{x}} \right) \end{aligned}$$
(32)

In the end, we sum over all of the pixels and obtain the Hessian approximation used in Sect. 3.4:

$$\begin{aligned} {{\mathbf{H}}_{fb}} = \sum \limits _{i = 1:n} {\sum \limits _{{\mathbf{x}} \in {\varOmega _i}} {{\mathbf{J}}_{fb}{{\left( {\mathbf{x}} \right) }^T}} } {\mathbf{J}}_{fb}\left( {\mathbf{x}} \right) \end{aligned}$$
(33)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, L., Zhang, L. A Robust Monocular 3D Object Tracking Method Combining Statistical and Photometric Constraints. Int J Comput Vis 127, 973–992 (2019). https://doi.org/10.1007/s11263-018-1119-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1119-x

Keywords

Navigation