Faster Self-adaptive Deep Stereo

Wang, Haiyang; Wang, Xinchao; Song, Jie; Lei, Jie; Song, Mingli

doi:10.1007/978-3-030-69525-5_11

Faster Self-adaptive Deep Stereo

Haiyang Wang¹²,
Xinchao Wang¹³,
Jie Song¹²,
Jie Lei¹² &
…
Mingli Song¹²

Conference paper
First Online: 27 February 2021

1071 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12622))

Abstract

Fueled by the power of deep learning, stereo vision has made unprecedented advances in recent years. Existing deep stereo models, however, can be hardly deployed to real-world scenarios where the data comes on-the-fly without any ground-truth information, and the data distribution continuously changes over time. Recently, Tonioni et al. proposed the first real-time self-adaptive deep stereo system (MADNet) to address this problem, which, however, still runs at a relatively low speed with not so satisfactory performance. In this paper, we significantly upgrade their work in both speed and accuracy by incorporating two key components. First, instead of adopting only the image reconstruction loss as the proxy supervision, a second more powerful supervision is proposed, termed Knowledge Reverse Distillation (KRD), to guide the learning of deep stereo models. Second, we introduce a straightforward yet surprisingly effective Adapt-or-Hold (AoH) mechanism to automatically determine whether or not to fine-tune the stereo model in the online environment. Both components are lightweight and can be integrated into MADNet with only a few lines of code. Experiments demonstrate that the two proposed components improve the system by a large margin in both speed and accuracy. Our final system is twice as fast as MADNet, meanwhile attains considerable superior performance on the popular benchmark datasets KITTI.

This work is supported by National Natural Science Foundation of China (61976186), the Major Scientfic Research Project of Zhejiang Lab (No. 2019KD0AC01) and Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 887–895 (2017)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3273–3282 (2019)
Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-net: guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 185–194 (2019)
Google Scholar
Tonioni, A., Poggi, M., Mattoccia, S., Di Stefano, L.: Unsupervised adaptation for deep stereo. In: IEEE International Conference on Computer Vision (2017)
Google Scholar
Pang, J., et al.: Zoom and learn: generalizing deep stereo matching to novel domains. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L.D.: Real-time self-adaptive deep stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 195–204 (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Article Google Scholar
Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19315-6_3
Chapter Google Scholar
Wang, X., Li, Z., Tao, D.: Subspaces indexing model on grassmann manifold for image search. IEEE Trans. Image Process. 20, 2627–2635 (2011)
Article MathSciNet Google Scholar
Qiu, J., Wang, X., Maybank, S.J., Tao, D.: World from blur. In: IEEE Conference on Computer Vision and Pattern Recognition. In: CVPR, pp. 8493–8504 (2019)
Google Scholar
Wang, X., Türetken, E., Fleuret, F., Fua, P.: Tracking interacting objects using intertwined flows. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2312–2326 (2016)
Article Google Scholar
Lan, L., Wang, X., Hua, G., Huang, T.S., Tao, D.: Semi-online multi-people tracking by re-identification. Int. J. Comput. Vis. 128, 1937–1955 (2020)
Article MathSciNet Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47, 7–42 (2002)
Article Google Scholar
Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: 18th International Conference on Pattern Recognition (ICPR2006), vol. 3, pp. 15–18. IEEE (2006)
Google Scholar
Kolmogorov, V., Zabih, R.: Computing visual correspondence with occlusions via graph cuts. Technical report, Cornell University (2001)
Google Scholar
Yang, Y., Qiu, J., Song, M., Tao, D., Wang, X.: Distilling knowledge from graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Zbontar, J., et al.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17, 2 (2016)
MathSciNet MATH Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: IEEE International Conference on Computer Vision, pp. 66–75 (2017)
Google Scholar
Wang, X., Türetken, E., Fleuret, F., Fua, P.: Tracking interacting objects optimally using integer programming. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 17–32. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_2
Chapter Google Scholar
Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Yang, E., Deng, C., Li, C., Liu, W., Li, J., Tao, D.: Shared predictive cross-modal deep quantization. IEEE Trans. Neural Netw. Learn. Syst. 29, 5292–5303 (2018)
Article Google Scholar
Yin, X., Wang, X., Yu, J., Zhang, M., Fua, P., Tao, D.: FishEyeRecNet: a multi-context collaborative deep network for fisheye image rectification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 475–490. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_29
Chapter Google Scholar
Deng, C., Yang, E., Liu, T., Li, J., Liu, W., Tao, D.: Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans. Image Process. 28, 4032–4044 (2019)
Article MathSciNet Google Scholar
Wang, J., Huang, S., Wang, X., Tao, D.: Not all parts are created equal: 3D pose estimation by modeling bi-directional dependencies of body parts. In: IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
Google Scholar
Ye, J., Ji, Y., Wang, X., Ou, K., Tao, D., Song, M.: Student becoming the master: knowledge amalgamation for joint scene parsing, depth estimation, and more. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Poggi, M., Mattoccia, S.: Learning from scratch a confidence measure. In: BMVC (2016)
Google Scholar
Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 104–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_7
Chapter Google Scholar
Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2016)
Article Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529 (2015)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Wang, Z., et al.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004)
Article Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3070 (2015)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Google Scholar
Li, A., Yuan, Z.: Occlusion aware stereo matching via cooperative unsupervised learning. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 197–213. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_13
Chapter Google Scholar
Aleotti, F., Tosi, F., Zhang, L., Poggi, M., Mattoccia, S.: Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation. arXiv preprint arXiv:2008.07130 (2020)

Download references

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, China
Haiyang Wang, Jie Song, Jie Lei & Mingli Song
Stevens Institute of Technology, Hoboken, NJ, USA
Xinchao Wang

Authors

Haiyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinchao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Song
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lei
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingli Song .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 58477 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Wang, X., Song, J., Lei, J., Song, M. (2021). Faster Self-adaptive Deep Stereo. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12622. Springer, Cham. https://doi.org/10.1007/978-3-030-69525-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-69525-5_11
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69524-8
Online ISBN: 978-3-030-69525-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics