Abstract
Obtaining the ground truth labels from a video is challenging since the manual annotation of pixel-wise flow labels is prohibitively expensive and laborious. Besides, existing approaches try to adapt the trained model on synthetic datasets to authentic videos, which inevitably suffers from domain discrepancy and hinders the performance for real-world applications. To solve these problems, we propose RealFlow, an Expectation-Maximization based framework that can create large-scale optical flow datasets directly from any unlabeled realistic videos. Specifically, we first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow. Thus the new image pairs and their corresponding flows can be regarded as a new training set. Besides, we design a Realistic Image Pair Rendering (RIPR) module that adopts softmax splatting and bi-directional hole filling techniques to alleviate the artifacts of the image synthesis. In the E-step, RIPR renders new images to create a large quantity of training data. In the M-step, we utilize the generated training data to train an optical flow network, which can be used to estimate optical flows in the next E-step. During the iterative learning steps, the capability of the flow network is gradually improved, so is the accuracy of the flow, as well as the quality of the synthesized dataset. Experimental results show that RealFlow outperforms previous dataset generation methods by a considerably large margin. Moreover, based on the generated dataset, our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods. Our code and dataset are available at https://github.com/megvii-research/RealFlow.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aleotti, F., Poggi, M., Mattoccia, S.: Learning optical flow from still images. In: Proceedings CVPR, pp. 15201–15211 (2021)
Baker, S., Scharstein, D., Lewis, J., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vision 92(1), 1–31 (2011)
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44
Caelles, S., Pont-Tuset, J., Perazzi, F., Montes, A., Maninis, K.K., Van Gool, L.: The 2019 davis challenge on vos: unsupervised multi-object segmentation. arXiv:1905.00737 (2019)
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings ICCV, pp. 2758–2766 (2015)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings CVPR, pp. 4340–4349 (2016)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings CVPR, pp. 3354–3361 (2012)
Huang, Z., Zhang, T., Heng, W., Shi, B., Zhou, S.: Real-time intermediate flow estimation for video frame interpolation. In: Proceedings ECCV (2022)
Hui, T.W., Tang, X., Loy, C.C.: Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings CVPR, pp. 8981–8989 (2018)
Hur, J., Roth, S.: Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings CVPR, pp. 5754–5763 (2019)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings CVPR, pp. 2462–2470 (2017)
Im, W., Kim, T.-K., Yoon, S.-E.: Unsupervised learning of optical flow with deep feature similarity. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 172–188. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_11
Janai, J., Güney, F., Ranjan, A., Black, M., Geiger, A.: Unsupervised learning of multi-frame optical flow with occlusions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 713–731. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_42
Janai, J., Guney, F., Wulff, J., Black, M.J., Geiger, A.: Slow flow: exploiting high-speed cameras for accurate and diverse optical flow reference data. In: Proceedings CVPR, pp. 3597–3607 (2017)
Yu, J.J., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 3–10. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_1
Jiang, S., Campbell, D., Lu, Y., Li, H., Hartley, R.: Learning to estimate hidden motions with global motion aggregation. In: Proceedings ICCV, pp. 9772–9781 (2021)
Jonschkowski, R., Stone, A., Barron, J.T., Gordon, A., Konolige, K., Angelova, A.: What matters in unsupervised optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 557–572. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_33
Lai, W.S., Huang, J.B., Yang, M.H.: Semi-supervised learning for optical flow with generative adversarial networks. In: Proceedings NeurIPS, pp. 353–363 (2017)
Li, H., Luo, K., Liu, S.: Gyroflow: gyroscope-guided unsupervised optical flow learning. In: Proceedings ICCV, pp. 12869–12878 (2021)
Li, J., Wang, N., Zhang, L., Du, B., Tao, D.: Recurrent feature reasoning for image inpainting. In: Proceedings CVPR, pp. 7760–7768 (2020)
Liu, C., Freeman, W.T., Adelson, E.H., Weiss, Y.: Human-assisted motion annotation. In: Proceedings CVPR, pp. 1–8 (2008)
Liu, L., et al.: Learning by analogy: reliable supervision from transformations for unsupervised optical flow estimation. In: Proceedings CVPR, pp. 6489–6498 (2020)
Liu, P., King, I., Lyu, M., Xu, J.: Ddflow:learning optical flow with unlabeled data distillation. In: Proceedings AAAI, pp. 8770–8777 (2019)
Liu, P., Lyu, M., King, I., Xu, J.: Selflow:self-supervised learning of optical flow. In: Proceedings CVPR, pp. 4571–4580 (2019)
Liu, S., Luo, K., Luo, A., Wang, C., Meng, F., Zeng, B.: Asflow: unsupervised optical flow learning with adaptive pyramid sampling. IEEE Trans. Circuits Syst. Video Technol. 32(7), 4282–4295 (2021)
Liu, S., Luo, K., Ye, N., Wang, C., Wang, J., Zeng, B.: Oiflow: Occlusion-inpainting optical flow estimation by unsupervised learning. IEEE Trans. on Image Processing 30, 6420–6433 (2021)
Luo, A., Yang, F., Li, X., Liu, S.: Learning optical flow with kernel patch attention. In: Proceedings CVPR, pp. 8906–8915 (2022)
Luo, A., Yang, F., Luo, K., Li, X., Fan, H., Liu, S.: Learning optical flow with adaptive graph reasoning. In: Proceedings AAAI, pp. 1890–1898 (2022)
Luo, K., Wang, C., Liu, S., Fan, H., Wang, J., Sun, J.: Upflow: upsampling pyramid for unsupervised optical flow learning. In: Proceedings CVPR, pp. 1045–1054 (2021)
Mayer, N., et al.: What makes good synthetic training data for learning disparity and optical flow estimation? Int. J. Comput. Vision 126(9), 942–960 (2018)
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings CVPR, pp. 4040–4048 (2016)
McLachlan, G.J., Krishnan, T.: The EM algorithm and extensions, vol. 382. John Wiley & Sons (2007)
Meister, S., Hur, J., Roth, S.: Unflow: unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings AAAI (2018)
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings CVPR, pp. 3061–3070 (2015)
Niklaus, S., Liu, F.: Softmax splatting for video frame interpolation. In: Proceedings CVPR, pp. 5437–5446 (2020)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings ICCV, pp. 12179–12188 (2021)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1623–1637 (2022)
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: Proceedings CVPR, pp. 4161–4170 (2017)
Ren, Z., et al.: Stflow: self-taught optical flow estimation using pseudo labels. IEEE Trans. Image Process. 29, 9113–9124 (2020)
Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., Zha, H.: Unsupervised deep learning for optical flow estimation. In: Proceedings AAAI, pp. 1495–1501 (2017)
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2013)
Stone, A., Maurer, D., Ayvaci, A., Angelova, A., Jonschkowski, R.: Smurf: self-teaching multi-frame unsupervised raft with full-image warping. In: Proceedings CVPR, pp. 3887–3896 (2021)
Sun, D., et al.: Tf-raft: a tensorflow implementation of raft. In: ECCV Robust Vision Challenge Workshop (2020)
Sun, D., et al.: Autoflow: learning a better training set for optical flow. In: Proceedings CVPR, pp. 10093–10102 (2021)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings CVPR, pp. 8934–8943 (2018)
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: Models matter, so does training: an empirical study of cnns for optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 1408–1423 (2020)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Wang, Y., Yang, Y., Yang, Z., Zhao, L., Wang, P., Xu, W.: Occlusion aware unsupervised learning of optical flow. In: Proceedings CVPR, pp. 4884–4893 (2018)
Xu, X., Siyao, L., Sun et al., W.: Quadratic video interpolation. In: Proceedings NeurIPS 32 (2019)
Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., Zhou, B.: Drivingstereo: a large-scale dataset for stereo matching in autonomous driving scenarios. In: Proceedings CVPR, pp. 899–908 (2019)
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings CVPR, pp. 2636–2645 (2020)
Zhong, Y., Ji, P., Wang, J., Dai, Y., Li, H.: Unsupervised deep epipolar flow for stationary or dynamic scenes. In: Proceedings CVPR, pp. 12095–12104 (2019)
Acknowledgement
This work was supported by the National Natural Science Foundation of China (NSFC) No.62173203, No.61872067 and No.61720106004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Han, Y. et al. (2022). RealFlow: EM-Based Realistic Optical Flow Dataset Generation from Videos. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13679. Springer, Cham. https://doi.org/10.1007/978-3-031-19800-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-19800-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19799-4
Online ISBN: 978-3-031-19800-7
eBook Packages: Computer ScienceComputer Science (R0)