Abstract
Numerous stereo matching algorithms have been proposed to obtain disparity estimation for a single pair of stereo images. However, simply even applying the best of them to temporal frames independently, i.e., without considering the temporal consistency between consecutive frames, may suffer from the undesirable artifacts. Here, we proposed an adaptive, spatiotemporally consistent, constraints-based systematic method that generates spatiotemporally consistent disparity maps for stereo video image sequences. Firstly, a reliable temporal neighborhood is used to enforce the “self-similarity” assumption and prevent errors caused by false optical flow matching from propagating between consecutive frames. Furthermore, we formulate the adaptive temporal predicted disparity map as prior knowledge of the current frame. It is used as a soft constraint to enhance the temporal consistency of disparities, increase the robustness to luminance variance, and restrict the range of the potential disparities for each pixel. Additionally, to further strengthen smooth variation of disparities, the adaptive temporal segment confidence is incorporated as a soft constraint to reduce ambiguities caused by under- and over-segmentation, and retain the disparity discontinuities that align with 3D object boundaries from geometrically smooth, but strong color gradient regions. Experimental evaluations demonstrate that our method significantly improves the spatiotemporal consistency both quantitatively and qualitatively compared with other state-of-the-art methods on the synthetic DCB and realistic KITTI datasets.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bartczak, B., Jung, D., Koch, R.: Real-Time Neighborhood Based Disparity Estimation Incorporating Temporal Evidence, pp. 153–162. Springer, Berlin (2008)
Čech, J., Sanchez-Riera, J., Horaud, R.: Scene flow estimation by growing correspondence seeds. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3129–3136. IEEE (2011)
Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C.: A deep visual correspondence embedding model for stereo matching costs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 972–980 (2015)
Dahan, M.J., Chen, N., Shamir, A., Cohen-Or, D.: Combining color and depth for enhanced image segmentation and retargeting. Vis. Comput. 28(12), 1181–1193 (2012)
Davis, J., Ramamoorthi, R., Rusinkiewicz, S.: Spacetime stereo: a unifying framework for depth from triangulation. In: Proceedings. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, vol. 2, pp. II–359. IEEE (2003)
Dobias, M., Sara, R.: Real-time global prediction for temporally stable stereo. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 704–707 (2011)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the Kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)
Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5248–5257 (2017)
Gong, M.: Real-time joint disparity and disparity flow estimation on programmable graphics hardware. Comput. Vis. Image Underst. 113(1), 90–100 (2009)
Guerrero, P., Winnemöller, H., Li, W., Mitra, N.J.: Depthcut: improved depth edge estimation using multiple unreliable channels. Vis. Comput. 34(9), 1165–1176 (2017)
Guney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4165–4175 (2015)
Hamming distance. https://en.wikipedia.org/wiki/Hamming_distance
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)
Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally Consistent Disparity and Optical Flow via Efficient Spatio-Temporal Filtering, pp. 165–177. Springer, Berlin (2012)
Hung, C.H., Xu, L., Jia, J.: Consistent binocular depth and scene flow with chained temporal profiles. Int. J. Comput. Vis. 102(1–3), 271–292 (2013)
Jiang, J., Cheng, J., Chen, B., Wu, X.: Disparity prediction between adjacent frames for dynamic scenes. Neurocomputing 142, 335–342 (2014)
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression (2017). arXiv preprint arxiv:1703.04309
Khoshabeh, R., Chan, S.H., Nguyen, T.Q.: Spatio-temporal consistency in video disparity estimation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 885–888. IEEE (2011)
Kitti 2012 stereo benchmark. http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo
Kitti 2015 stereo benchmark. http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo
Kordelas, G.A., Alexiadis, D.S., Daras, P., Izquierdo, E.: Revisiting guided image filter based stereo matching and scanline optimization for improved disparity estimation. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3803–3807. IEEE (2014)
Larsen, E.S., Mordohai, P., Pollefeys, M., Fuchs, H.: Temporally consistent reconstruction from multiple video streams using enhanced belief propagation. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)
Li, L., Yu, X., Zhang, S., Zhao, X., Zhang, L.: 3d cost aggregation with multiple minimum spanning trees for stereo matching. Appl. Opt. 56(12), 3411–3420 (2017)
Li, X., Liu, J.: Efficient stereo matching using segment optimization. In: ICIP (2016)
Li, Y., Zhang, J., Zhong, Y., Wang, M.: An efficient stereo matching based on fragment matching. Vis. Comput. 1–13 (2018). https://doi.org/10.1007/s00371-018-1491-0
Lin, S.H., Chung, P.C.: Temporal consistency enhancement of depth video sequence. In: 2014 International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), vol. 3, pp. 1897–1900. IEEE (2014)
Liu, F., Philomin, V.: Disparity estimation in stereo sequences using scene flow. In: Proceedings of the British Machine Vision Conference, pp. 55.1–55.11. BMVA Press (2009)
Liu, J., Li, C., Fan, X., Wang, Z., Shi, M., Yang, J.: View synthesis with 3d object segmentation-based asynchronous blending and boundary misalignment rectification. Vis. Comput. 32(6), 989–999 (2016)
Liu, J., Li, C., Mei, F., Wang, Z.: 3d entity-based stereo matching with ground control points and joint second-order smoothness prior. Vis. Comput. 31(9), 1253–1269 (2015)
Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703 (2016)
Matsuo, T., Fukushima, N., Ishibashi, Y.: Weighted joint bilateral filter with slope depth compensation filter for depth map refinement. VISAPP 2, 300–309 (2013)
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Min, D., Lu, J., Do, M.N.: Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 21(3), 1176–1190 (2012)
Min, D., Yea, S., Vetro, A.: Temporally consistent stereo matching using coherence function. In: 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2010, pp. 1–4. IEEE (2010)
Ntouskos, V., Pirri, F.: Confidence driven tgv fusion (2016). arXiv preprint arXiv:1603.09302
Pham, C.C., Nguyen, V.D., Jeon, J.W.: Efficient spatio-temporal local stereo matching using information permeability filtering. In: 2012 19th IEEE International Conference on Image Processing, pp. 2965–2968 (2012)
Qi, F., Zhao, D., Liu, S., Fan, X.: 3d visual saliency detection model with generated disparity map. Multimed. Tools Appl. 76(2), 3087–3103 (2017)
Richardt, C., Orr, D., Davies, I., Criminisi, A., Dodgson, N.A.: Real-time spatiotemporal stereo matching using the dual-cross-bilateral grid. In: European Conference on Computer Vision, pp. 510–523. Springer (2010)
Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: BMVC, vol. 2, p. 4 (2016)
Shaked, A., Wolf, L.: Improved stereo matching with constant highway networks and reflective loss (2016). arXiv preprint arxiv:1701.00165
Sizintsev, M., Wildes, R.P.: Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 493–500. IEEE (2009)
Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439 (2010)
Taniai, T., Sinha, S.N., Sato, Y.: Fast multi-frame stereo scene flow with motion segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6891–6900. IEEE (2017)
Vogel, C., Roth, S., Schindler, K.: View-consistent 3d scene flow estimation over multiple frames. In: European Conference on Computer Vision, pp. 263–278. Springer (2014)
Vogel, C., Schindler, K., Roth, S.: 3d scene flow estimation with a piecewise rigid scene model. Int. J. Comput. Vis. 115(1), 1–28 (2015)
Vretos, N., Daras, P.: Temporal and color consistent disparity estimation in stereo videos. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3798–3802. IEEE (2014)
Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., Cremers, D.: Stereoscopic scene flow computation for 3d motion understanding. Int. J. Comput. Vis. 95(1), 29–51 (2011)
Xing, G., Liu, Y., Zhang, W., Ling, H.: Light mixture intrinsic image decomposition based on a single rgb-d image. Vis. Comput. 32(6–8), 1013–1023 (2016)
Xu, S., Zhang, F., He, X., Shen, X., Zhang, X.: Pm-pm: patchmatch with potts model for object segmentation and stereo matching. IEEE Trans. Image Process. 24(7), 2182–2196 (2015)
Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: European Conference on Computer Vision, pp. 756–771. Springer (2014)
Yang, W., Zhang, G., Bao, H., Kim, J., Lee, H.Y.: Consistent depth maps recovery from a trinocular video sequence. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1466–1473. IEEE (2012)
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)
Zeng, H., Ma, K.K.: Content-adaptive temporal consistency enhancement for depth video. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 3017–3020. IEEE (2012)
Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 974–988 (2009)
Zhu, S., Yan, L.: Local stereo matching algorithm with efficient matching cost and adaptive guided image filter. Vis. Comput. 33(9), 1087–1102 (2017)
Funding
This study was funded by the National Natural Science Foundation of China (Grant No.: 61802109), the Natural Science Foundation of Hebei province (Grant No.: F2017205066), the Science Foundation of Hebei Normal University (Grant No.: L2017B06, L2018K02).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tian, L., Liu, J., Ling, H. et al. Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints. Vis Comput 35, 1427–1446 (2019). https://doi.org/10.1007/s00371-018-01622-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-018-01622-1