Skip to main content

Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14358))

Included in the following conference series:

Abstract

With the rapid development of the Internet, video compression and reconstruction have attracted more and more attention as the use and transmission frequency of video data have increased dramatically. Traditional methods rely on hand-crafted modules for inter-frame and intra-frame coding, but they often fail to fully exploit the redundant information of video frames. To address this problem, this paper proposes a deep learning video compression method which combines conditional context information and residual information to fully compress intra-frame and inter-frame redundancy. Specifically, the proposed algorithm uses conditional coding to provide rich context information for residual methods. At the same time, residual coding supports conditional coding in dealing with redundant information. By fusing the video frames generated by the two methods, information complementarity is achieved. Experimental results from two benchmark datasets show that our method can effectively remove redundancy between video frames and reconstruct video frames with low distortion to achieve better than state-of-the-art (SOTA) performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016)

  2. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)

  3. Guan, Q.L., Zheng, Y., Meng, L., Dong, L.Q., Hao, Q.: Improving the generalization of visual classification models across IoT cameras via cross-modal inference and fusion. IEEE Internet Things J. 10, 15835–15846 (2023)

    Article  Google Scholar 

  4. He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14766–14775 (2021). https://doi.org/10.1109/CVPR46437.2021.01453

  5. Hu, Z., Chen, Z., Xu, D., Lu, G., Ouyang, W., Gu, S.: Improving deep video compression by resolution-adaptive flow coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 193–209. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_12

    Chapter  Google Scholar 

  6. Hu, Z., Lu, G., Xu, D.: FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1502–1511 (2021)

    Google Scholar 

  7. Johnston, N., et al.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4385–4393 (2018)

    Google Scholar 

  8. Lee, S.J., Lee, S., Cho, S.I., Kang, S.J.: Object detection-based video retargeting with spatial-temporal consistency. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4434–4439 (2020)

    Article  Google Scholar 

  9. Li, C., Liu, X., Zhang, X., Qin, B.: Design of UAV single object tracking algorithm based on feature fusion. In: 2021 40th Chinese Control Conference (CCC), pp. 3088–3092. IEEE (2021)

    Google Scholar 

  10. Li, J., Li, B., Lu, Y.: Deep contextual video compression. Adv. Neural. Inf. Process. Syst. 34, 18114–18125 (2021)

    Google Scholar 

  11. Li, X., Wu, L., Chen, X., Meng, L., Meng, X.: DSE-Net: artistic font image synthesis via disentangled style encoding. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)

    Google Scholar 

  12. Li, X., Wu, L., Wang, C., Meng, L., Meng, X.: Compositional zero-shot artistic font synthesis. In: Proceedings of IJCAI (2023)

    Google Scholar 

  13. Li, X., Ma, H., Meng, L., Meng, X.: Comparative study of adversarial training methods for long-tailed classification. In: Proceedings of the 1st International Workshop on Adversarial Learning for Multimedia,

    Google Scholar 

  14. Li, X., Zheng, Y., Ma, H., Qi, Z., Meng, X., Meng, L.: Cross-modal learning using privileged information for long-tailed image classification. In: Proceedings of CVM (2023)

    Google Scholar 

  15. Lin, J., Liu, D., Li, H., Wu, F.: M-LVC: multiple frames prediction for learned video compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3546–3554 (2020)

    Google Scholar 

  16. Liu, J., et al.: Prompt learning with cross-modal feature alignment for visual domain adaptation. In: Proceedings of CAAI (2022)

    Google Scholar 

  17. Liu, T., Qi, Z., Chen, Z., Meng, X., Meng, L.: Cross-training with prototypical distillation for improving the generalization of federated learning. In: Proceedings of ICME (2023)

    Google Scholar 

  18. Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015 (2019)

    Google Scholar 

  19. Ma, H., Li, X., Meng, L., Meng, X.: Comparative study of adversarial training methods for cold-start recommendation. In: Proceedings of ADVM (2021)

    Google Scholar 

  20. Ma, H., Qi, Z., Dong, X., Li, X., Zheng, Y., Meng, X.M.L.: Cross-modal content inference and feature enrichment for cold-start recommendation. In: Proceedings of IJCNN (2023)

    Google Scholar 

  21. Ma, H., et al.: Exploring false hard negative sample in cross-domain recommendation. In: Proceedings of RecSys (2023)

    Google Scholar 

  22. Ma, H., et al.: Triple sequence learning for cross-domain recommendation. arXiv preprint arXiv:2304.05027 (2023)

  23. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp. 1273–1282. PMLR (2017)

    Google Scholar 

  24. Meng, L., Feng, F., He, X., Gao, X., Chua, T.S.: Heterogeneous fusion of semantic and collaborative information for visually-aware food recommendation. In: Proceedings of MM (2020)

    Google Scholar 

  25. Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)

    Google Scholar 

  26. Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  27. Qi, Z., Wang, Y., Chen, Z., Wang, R., Meng, X., Meng, L.: Clustering-based curriculum construction for sample-balanced federated learning. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13606, pp. 155–166. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20503-3_13

  28. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)

    Article  Google Scholar 

  29. Sun, W., Li, X., Li, M., Wang, Y., Zheng, Y., Meng, X., Meng, L.: Sequential fusion of multi-view video frames for 3D scene generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 597–608. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_49

  30. Taubman, D., Marcellin, M.: JPEG 2000: standard for interactive imaging. Proc. IEEE 90(8), 1336–1357 (2002). https://doi.org/10.1109/JPROC.2002.800725

    Article  Google Scholar 

  31. Toderici, G., et al.: Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015)

  32. Wang, Y., Li, X., Ma, H., Qi, Z., Meng, X., Meng, L.: Causal inference with sample balancing for out-of-distribution detection in visual classification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 572–583. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_47

  33. Wang, Y., Li, X., Qi, Z., Li, J., Li, X., Meng, X., Meng, L.: Meta-causal feature learning for out-of-distribution generalization. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol. 13806, pp. 530–545. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25075-0_36

  34. Wang, Y., Qi, Z., Li, X., Liu, J., Meng, X., Meng, L.: Multi-channel attentive weighting of visual frames for multimodal video classification. In: Proceedings of IJCNN (2023)

    Google Scholar 

  35. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1398–1402. IEEE (2003)

    Google Scholar 

  36. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h. 264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)

    Article  Google Scholar 

  37. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127, 1106–1125 (2019)

    Article  Google Scholar 

  38. Yang, R., Mentzer, F., Gool, L.V., Timofte, R.: Learning for video compression with hierarchical quality and recurrent enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6628–6637 (2020)

    Google Scholar 

  39. Yao, L., Chu, Z., Li, S., Li, Y., Gao, J., Zhang, A.: A survey on causal inference. ACM Trans. Knowl. Disc. Data (TKDD) 15(5), 1–46 (2021)

    Article  Google Scholar 

  40. Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(4), 1–47 (2020)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073), the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048), the TaiShan Scholars Program (Grant no. tsqn202211289).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, R., Qi, Z., Meng, X., Meng, L. (2023). Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14358. Springer, Cham. https://doi.org/10.1007/978-3-031-46314-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46314-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46313-6

  • Online ISBN: 978-3-031-46314-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics