Skip to main content
Log in

InStereo2K: a large real dataset for stereo matching in indoor scenes

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Deep neural networks have shown great success in stereo matching in recent years. On the KITTI datasets, most top performing methods are based on neural networks. However, on the Middlebury datasets, these methods usually do not perform well. The KITTI datasets are collected in outdoor scenes while the Middlebury datasets are collected in indoor scenes. It is commonly believed that the community still lacks a large labelled dataset for stereo matching in indoor scenes. In this paper, we introduce a new stereo dataset called InStereo2K. It contains 2050 pairs of stereo images with highly accurate groundtruth disparity maps, including 2000 pairs for training and 50 pairs for test. Experimental results show that our dataset can significantly improve the performance of several latest networks (including StereoNet and PSMNet) on the Middlebury 2014 dataset. The large scale, high accuracy and rich diversity of the proposed InStereo2K dataset provide new opportunities to researchers in the area of stereo matching and beyond. It also takes end-to-end stereo matching methods a step towards practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2012

  2. Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

  3. Li D, Liu N, Guo Y, et al. pose estimation for random bin-picking using partition viewpoint feature histograms. Pattern Recogn Lett, 2019, 128: 148–154

    Article  Google Scholar 

  4. Khan S H, Guo Y, Hayat M, et al. Unsupervised primitive discovery for improved 3D generative modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 9739–9748

  5. Wang W, Gao W, Hu Z Y. Effectively modeling piecewise planar urban scenes based on structure priors and CNN. Sci China Inf Sci, 2019, 62: 029102

    Article  Google Scholar 

  6. Yan T, Gan Y, Xia Z, et al. Segment-based disparity refinement with occlusion handling for stereo matching. IEEE Trans Image Process, 2019, 28: 3885–3897

    Article  MathSciNet  Google Scholar 

  7. Liang Z, Guo Y, Feng Y, et al. Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE Trans Pattern Anal Mach Intell, 2019. doi: https://doi.org/10.1109/TPAMI.2019.2928550

  8. Khamis S, Fanello S, Rhemann C, et al. StereoNET: guided hierarchical refinement for real-time edge-aware depth prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), 2018. 573–590

  9. Chang J R, Chen Y S. Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5410–5418

  10. Scharstein D, Hirschmüller H, Kitajima Y, et al. High-resolution stereo datasets with subpixel-accurate ground truth. In: Prcoeedings of German Conference on Pattern Recognition. Berlin: Springer, 2014. 31–42

    Google Scholar 

  11. Scharstein D, Szeliski R. High-accuracy stereo depth maps using structured light. In: Prcoeedings of 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003

  12. Schüops T, Schüonberger J L, Galliani S, et al. A multi-view stereo benchmark with high-resolution images and multicamera videos. In: Prcoeedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  13. Mayer N, Ilg E, Hausser P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 4040–4048

  14. Scharstein D, Szeliski R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vision, 2002, 47: 7–42

    Article  Google Scholar 

  15. Zbontar J, LeCun Y. Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res, 2016, 17: 2

    MATH  Google Scholar 

  16. Mei X, Sun X, Zhou M, et al. On building an accurate stereo matching system on graphics hardware. In: Prcoeedings of IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011. 467–474

  17. Luo W, Schwing A G, Urtasun R. Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5695–5703

  18. Shaked A, Wolf L. Improved stereo matching with constant highway networks and reflective confidence learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 4641–4650

  19. Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 66–75

  20. Liang Z, Feng Y, Guo Y, et al. Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 2811–2820

  21. Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In: Prcoeedings of European Conference on Computer Vision. Berlin: Springer, 2016. 483–499

    Google Scholar 

  22. Zhang F, Prisacariu V, Yang R, et al. GA-Net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

  23. Lohry W, Chen V, Zhang S. Absolute three-dimensional shape measurement using coded fringe patterns without phase unwrapping or projector calibration. Opt Express, 2014, 22: 1287–1301

    Article  Google Scholar 

  24. Zhang S, Yau S T. Generic nonsinusoidal phase error correction for three-dimensional shape measurement using a digital video projector. Appl Opt, 2007, 46: 36–43

    Article  Google Scholar 

  25. Scharstein D, Pal C. Learning conditional random fields for stereo. In: Prcoeedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. 1–8

  26. Butler D J, Wulff J, Stanley G B, et al. A naturalistic open source movie for optical flow evaluation. In: Prcoeedings of European Conference on Computer Vision. Berlin: Springer, 2012. 611–625

  27. Ros G, Sellart L, Materzynska J, et al. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 3234–3243

  28. He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 1904–1916

    Article  Google Scholar 

  29. Mayer N, Ilg E, Fischer P, et al. What makes good synthetic training data for learning disparity and optical flow estimation? Int J Comput Vis, 2018, 126: 942–960

    Article  Google Scholar 

  30. Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014. ArXiv: 14126980

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61402489, 61972435, 61972435, 61602499), Natural Science Foundation of Guangdong Province (Grant No. 2019A1515011271), Fundamental Research Funds for the Central Universities (Grant No. 18lgzd06), and Shenzhen Technology and Innovation Committee (Grant No. 201908073000399).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuhua Xu or Yulan Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bao, W., Wang, W., Xu, Y. et al. InStereo2K: a large real dataset for stereo matching in indoor scenes. Sci. China Inf. Sci. 63, 212101 (2020). https://doi.org/10.1007/s11432-019-2803-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2803-x

Keywords

Navigation