Skip to main content
Log in

Subspace-PnP: A Geometric Constraint Loss for Mutual Assistance of Depth and Optical Flow Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Unsupervised optical flow and stereo depth estimation are two fundamental tasks in computer vision. Current studies (Tosi et al., in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4654–4665, 2020; Ranjan et al., in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12240–12249, 2019; Wang et al., in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8071–8081, 2019; Yin and Shi, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1983–1992, 2018) demonstrate that jointly learning networks for optical flow and stereo depth estimation via the geometric constraints can mutually benefit the two tasks and in turn yield large accuracy improvements. However, most of these methods generate geometric constraints based on estimated camera pose, which are not applicable to scenarios with moving objects that have different motions from the camera. In addition, errors of estimated camera pose would yield inaccurate constraints for the two tasks. In this paper, we propose a novel and universal geometric loss function, named Subspace-PnP, which is based on the Perspective-n-Points (PnP) and union-of-subspaces theory (Ji et al., in: IEEE Winter conference on applications of computer vision, pp 461–468, 2014) to jointly estimate the optical flow and stereo depth. The construction of Subspace-PnP dose not rely on the camera pose, but implicitly contains information of camera pose and motions of all moving objects. Our experiments show that the Subspace-PnP loss can mutually guide the estimation of optical flow and depth, enabling better robustness and greater accuracy even in dynamic scenes. In addition, we propose a motion-occlusion simulation method to handle occlusions caused by moving objects in optical flow estimation, which in turn can yield further performance improvement. Our method achieves the state-of-the-art performance for joint optical flow and stereo depth estimation on the KITTI 2012 and KITTI 2015 benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Andrew, A. M. (2001). Multiple view geometry in computer vision. Kybernetes.

  • Cao, Y., Zhao, T., Xian, K., Shen, C., Cao, Z., & Xu, S. (2018). Monocular depth estimation with augmented ordinal depth relationships. IEEE Transactions on Image Processing.

  • Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5410–5418).

  • Chen, R., Han, S., Xu, J., & Su, H. (2019). Point-based multi-view stereo network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1538–1547).

  • Chen, J., Yang, X., Jia, Q., & Liao, C. (2020). Denao: Monocular depth estimation network with auxiliary optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • DeSouza, G. N., & Kak, A. C. (2002). Vision for mobile robot navigation: A survey. IEEE transactions on pattern analysis and machine intelligence, 24(2), 237–267.

    Article  Google Scholar 

  • Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 2758–2766).

  • Elhamifar, E., & Vidal, R. (2013). Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2765–2781.

    Article  Google Scholar 

  • Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354–3361). IEEE.

  • Gissot, S. F., Hochedez, J. F., Chainais, P., & Antoine, J. P. (2008). 3D reconstruction from SECCHI-EUVI images using an optical-flow algorithm: method description and observation of an erupting filament. Solar Physics, 252(2), 397–408.

    Article  Google Scholar 

  • Godard, C., Mac Aodha, O., & Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 270–279).

  • Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., & Tan, P. (2020). Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2495–2504).

  • Guan, S., Li, H., & Zheng, W. S. (2019, July). Unsupervised learning for optical flow estimation using pyramid convolution lstm. In 2019 IEEE international conference on multimedia and expo (ICME) (pp. 181–186). IEEE.

  • Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3273–3282).

  • Herakleous, K., & Poullis, C. (2013, September). Improving augmented reality applications with optical flow. In 2013 IEEE international conference on image processing (pp. 3403–3406). IEEE.

  • Hu, P., Wang, G., & Tan, Y. P. (2018). Recurrent spatial pyramid CNN for optical flow estimation. IEEE Transactions on Multimedia, 20(10), 2814–2823.

    Article  Google Scholar 

  • Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2462–2470).

  • Ince, S., & Konrad, J. (2008). Occlusion-aware optical flow estimation. IEEE Transactions on Image Processing, 17(8), 1443–1451.

    Article  MathSciNet  Google Scholar 

  • Jason, J. Y., Harley, A. W., & Derpanis, K. G. (2016, October). Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In European conference on computer vision (pp. 3–10). Springer.

  • Ji, P., Salzmann, M., & Li, H. (2014, March). Efficient dense subspace clustering. In IEEE Winter conference on applications of computer vision (pp. 461–468). IEEE.

  • Jonschkowski, R., Stone, A., Barron, J. T., Gordon, A., Konolige, K., & Angelova, A. (2020). What matters in unsupervised optical flow. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 (pp. 557–572). Springer.

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision (pp. 66–75).

  • Laga, H., Jospin, L. V., Boussaid, F., & Bennamoun, M. (2020). A survey on deep learning techniques for stereo-based depth estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Lai, H. Y., Tsai, Y. H., & Chiu, W. C. (2019). Bridging stereo matching and optical flow via spatiotemporal correspondence. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1890–1899).

  • Liang L., Guangyao Z., Wenlong Y., & Yong L. (2019). Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity. International Joint Conference on Artificial Intelligence (IJCAI).

  • Liu, P., King, I., Lyu, M. R., & Xu, J. (2019, July). Ddflow: Learning optical flow with unlabeled data distillation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 8770–8777).

  • Liu, P., King, I., Lyu, M. R., & Xu, J. (2020). Flow2stereo: Effective self-supervised learning of optical flow and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6648–6657).

  • Liu, P., Lyu, M., King, I., & Xu, J. (2019). Selflow: Self-supervised learning of optical flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4571–4580).

  • Liu, L., Zhang, J., He, R., Liu, Y., Wang, Y., Tai, Y., & Huang, F. (2020). Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6489–6498).

  • Luo, H., Gao, Y., Wu, Y., Liao, C., Yang, X., & Cheng, K. T. (2018). Real-time dense monocular SLAM with online adapted depth prediction network. IEEE Transactions on Multimedia, 21(2), 470–483.

    Article  Google Scholar 

  • Luo, C., Yang, Z., Wang, P., Wang, Y., Xu, W., Nevatia, R., & Yuille, A. (2019). Every pixel counts++: Joint learning of geometry and motion with 3d holistic understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2624–2641.

    Article  Google Scholar 

  • Ma, J., Jiang, X., Fan, A., Jiang, J., & Yan, J. (2021). Image matching from handcrafted to deep features: A survey. International Journal of Computer Vision, 129(1), 23–79.

    Article  MathSciNet  Google Scholar 

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4040–4048).

  • Mayer, N., Ilg, E., Fischer, P., Hazirbas, C., Cremers, D., Dosovitskiy, A., & Brox, T. (2018). What makes good synthetic training data for learning disparity and optical flow estimation? International Journal of Computer Vision, 126(9), 942–960.

    Article  Google Scholar 

  • Meister, S., Hur, J., & Roth, S. (2018, April). Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).

  • Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3061–3070).

  • Mishiba, K. (2020). Fast depth estimation for light field cameras. IEEE Transactions on Image Processing, 29, 4232–4242.

    Article  Google Scholar 

  • Mostafavi, M., Wang, L., & Yoon, K. J. (2021). Learning to reconstruct hdr images from events, with applications to depth and flow prediction. International Journal of Computer Vision, 129(4), 900–920.

    Article  Google Scholar 

  • Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4161–4170).

  • Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., & Black, M. J. (2019). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12240–12249).

  • Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., & Black, M. J. (2020). Learning multi-human optical flow. International Journal of Computer Vision, 128(4), 873–890.

    Article  Google Scholar 

  • Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., & Zha, H. (2017, February). Unsupervised deep learning for optical flow estimation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).

  • Song, X., Zhao, X., Fang, L., Hu, H., & Yu, Y. (2020). Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision, 128(4), 910–930.

    Article  Google Scholar 

  • Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2018). Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8934–8943).

  • Tang, M., Wen, J., Zhang, Y., Gu, J., Junker, P., Guo, B., & Han, Y. (2018). A universal optical flow based real-time low-latency omnidirectional stereo video system. IEEE Transactions on Multimedia, 21(4), 957–972.

    Article  Google Scholar 

  • Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., & Stefano, L. D. (2019). Real-time self-adaptive deep stereo. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 195–204).

  • Tosi, F., Aleotti, F., Ramirez, P. Z., Poggi, M., Salti, S., Stefano, L. D., & Mattoccia, S. (2020). Distilled semantics for comprehensive scene understanding from videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4654–4665).

  • Vidal, E. E. R. (2009). Sparse subspace clustering. In 2009 IEEE conference on computer vision and pattern recognition (CVPR), (Vol. 6, pp. 2790–2797).

  • Wang, C., Buenaposada, J. M., Zhu, R., & Lucey, S. (2018). Learning depth from monocular videos using direct methods. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2022–2030).

  • Wang, Y., Wang, P., Yang, Z., Luo, C., Yang, Y., & Xu, W. (2019). Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8071–8081).

  • Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., & Xu, W. (2018). Occlusion aware unsupervised learning of optical flow. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4884–4893).

  • Yang, Z., Wang, P., Wang, Y., Xu, W., & Nevatia, R. (2018). Every pixel counts: Unsupervised geometry learning with holistic 3d motion understanding. In Proceedings of the European conference on computer vision (ECCV) workshops.

  • Yang, X., Yuan, Z., Zhu, D., Chi, C., Li, K., & Liao, C. (2020). Robust and Efficient RGB-D SLAM in Dynamic Environments. IEEE Transactions on Multimedia.

  • Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In Proceedings of the European conference on computer vision (ECCV) (pp. 636–651).

  • Yang, X., Gao, Y., Luo, H., Liao, C., & Cheng, K. T. (2019). Bayesian denet: Monocular depth prediction and frame-wise fusion with synchronized uncertainty. IEEE Transactions on Multimedia, 21(11), 2701–2713.

    Article  Google Scholar 

  • Yao, Y., Luo, Z., Li, S., Fang, T., & Quan, L. (2018). Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV) (pp. 767–783).

  • Yin, Z., & Shi, J. (2018). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1983-1992).

  • Yin, Z., Darrell, T., & Yu, F. (2019). Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6044–6053).

  • Zhai, M., Xiang, X., Lv, N., Kong, X., & El Saddik, A. (2020). An object context integrated network for joint learning of depth and optical flow. IEEE Transactions on Image Processing, 29, 7807–7818.

    Article  Google Scholar 

  • Zhang, C., Chen, Z., Wang, M., Li, M., & Jiang, S. (2017). Robust non-local TV-\( L^{1} \) optical flow estimation with occlusion detection. IEEE Transactions on Image Processing, 26(8), 4055–4067.

    Article  MathSciNet  Google Scholar 

  • Zhong, Y., Dai, Y., & Li, H. (2017). Self-supervised learning for stereo matching with self-improving ability. arXiv preprint arXiv:1709.00930.

  • Zhong, Y., Ji, P., Wang, J., Dai, Y., & Li, H. (2019). Unsupervised deep epipolar flow for stationary or dynamic scenes. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12095–12104).

  • Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. (2017). Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1851–1858).

  • Zhou, H., Ummenhofer, B., & Brox, T. (2020). DeepTAM: Deep tracking and mapping with convolutional neural networks. International Journal of Computer Vision, 128(3), 756–769.

    Article  Google Scholar 

  • Zou, Y., Luo, Z., & Huang, J. B. (2018). Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proceedings of the European conference on computer vision (ECCV) (pp. 36–53).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 62122029, 62061160490 and U20B2064.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Yang.

Additional information

Communicated by Jiaya Jia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chi, C., Hao, T., Wang, Q. et al. Subspace-PnP: A Geometric Constraint Loss for Mutual Assistance of Depth and Optical Flow Estimation. Int J Comput Vis 130, 3054–3069 (2022). https://doi.org/10.1007/s11263-022-01652-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01652-2

Keywords

Navigation