Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction

Wang, Ran; Qi, Zhuang; Meng, Xiangxu; Meng, Lei

doi:10.1007/978-3-031-46314-3_29

Ran Wang¹⁴,
Zhuang Qi¹⁴,
Xiangxu Meng¹⁴ &
…
Lei Meng^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14358))

Included in the following conference series:

International Conference on Image and Graphics

328 Accesses
2 Citations

Abstract

With the rapid development of the Internet, video compression and reconstruction have attracted more and more attention as the use and transmission frequency of video data have increased dramatically. Traditional methods rely on hand-crafted modules for inter-frame and intra-frame coding, but they often fail to fully exploit the redundant information of video frames. To address this problem, this paper proposes a deep learning video compression method which combines conditional context information and residual information to fully compress intra-frame and inter-frame redundancy. Specifically, the proposed algorithm uses conditional coding to provide rich context information for residual methods. At the same time, residual coding supports conditional coding in dealing with redundant information. By fusing the video frames generated by the two methods, information complementarity is achieved. Experimental results from two benchmark datasets show that our method can effectively remove redundancy between video frames and reconstruct video frames with low distortion to achieve better than state-of-the-art (SOTA) performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)
Guan, Q.L., Zheng, Y., Meng, L., Dong, L.Q., Hao, Q.: Improving the generalization of visual classification models across IoT cameras via cross-modal inference and fusion. IEEE Internet Things J. 10, 15835–15846 (2023)
Article Google Scholar
He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14766–14775 (2021). https://doi.org/10.1109/CVPR46437.2021.01453
Hu, Z., Chen, Z., Xu, D., Lu, G., Ouyang, W., Gu, S.: Improving deep video compression by resolution-adaptive flow coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 193–209. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_12
Chapter Google Scholar
Hu, Z., Lu, G., Xu, D.: FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1502–1511 (2021)
Google Scholar
Johnston, N., et al.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4385–4393 (2018)
Google Scholar
Lee, S.J., Lee, S., Cho, S.I., Kang, S.J.: Object detection-based video retargeting with spatial-temporal consistency. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4434–4439 (2020)
Article Google Scholar
Li, C., Liu, X., Zhang, X., Qin, B.: Design of UAV single object tracking algorithm based on feature fusion. In: 2021 40th Chinese Control Conference (CCC), pp. 3088–3092. IEEE (2021)
Google Scholar
Li, J., Li, B., Lu, Y.: Deep contextual video compression. Adv. Neural. Inf. Process. Syst. 34, 18114–18125 (2021)
Google Scholar
Li, X., Wu, L., Chen, X., Meng, L., Meng, X.: DSE-Net: artistic font image synthesis via disentangled style encoding. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Google Scholar
Li, X., Wu, L., Wang, C., Meng, L., Meng, X.: Compositional zero-shot artistic font synthesis. In: Proceedings of IJCAI (2023)
Google Scholar
Li, X., Ma, H., Meng, L., Meng, X.: Comparative study of adversarial training methods for long-tailed classification. In: Proceedings of the 1st International Workshop on Adversarial Learning for Multimedia,
Google Scholar
Li, X., Zheng, Y., Ma, H., Qi, Z., Meng, X., Meng, L.: Cross-modal learning using privileged information for long-tailed image classification. In: Proceedings of CVM (2023)
Google Scholar
Lin, J., Liu, D., Li, H., Wu, F.: M-LVC: multiple frames prediction for learned video compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3546–3554 (2020)
Google Scholar
Liu, J., et al.: Prompt learning with cross-modal feature alignment for visual domain adaptation. In: Proceedings of CAAI (2022)
Google Scholar
Liu, T., Qi, Z., Chen, Z., Meng, X., Meng, L.: Cross-training with prototypical distillation for improving the generalization of federated learning. In: Proceedings of ICME (2023)
Google Scholar
Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015 (2019)
Google Scholar
Ma, H., Li, X., Meng, L., Meng, X.: Comparative study of adversarial training methods for cold-start recommendation. In: Proceedings of ADVM (2021)
Google Scholar
Ma, H., Qi, Z., Dong, X., Li, X., Zheng, Y., Meng, X.M.L.: Cross-modal content inference and feature enrichment for cold-start recommendation. In: Proceedings of IJCNN (2023)
Google Scholar
Ma, H., et al.: Exploring false hard negative sample in cross-domain recommendation. In: Proceedings of RecSys (2023)
Google Scholar
Ma, H., et al.: Triple sequence learning for cross-domain recommendation. arXiv preprint arXiv:2304.05027 (2023)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp. 1273–1282. PMLR (2017)
Google Scholar
Meng, L., Feng, F., He, X., Gao, X., Chua, T.S.: Heterogeneous fusion of semantic and collaborative information for visually-aware food recommendation. In: Proceedings of MM (2020)
Google Scholar
Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)
Google Scholar
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Qi, Z., Wang, Y., Chen, Z., Wang, R., Meng, X., Meng, L.: Clustering-based curriculum construction for sample-balanced federated learning. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13606, pp. 155–166. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20503-3_13
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Article Google Scholar
Sun, W., Li, X., Li, M., Wang, Y., Zheng, Y., Meng, X., Meng, L.: Sequential fusion of multi-view video frames for 3D scene generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 597–608. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_49
Taubman, D., Marcellin, M.: JPEG 2000: standard for interactive imaging. Proc. IEEE 90(8), 1336–1357 (2002). https://doi.org/10.1109/JPROC.2002.800725
Article Google Scholar
Toderici, G., et al.: Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015)
Wang, Y., Li, X., Ma, H., Qi, Z., Meng, X., Meng, L.: Causal inference with sample balancing for out-of-distribution detection in visual classification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 572–583. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_47
Wang, Y., Li, X., Qi, Z., Li, J., Li, X., Meng, X., Meng, L.: Meta-causal feature learning for out-of-distribution generalization. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol. 13806, pp. 530–545. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25075-0_36
Wang, Y., Qi, Z., Li, X., Liu, J., Meng, X., Meng, L.: Multi-channel attentive weighting of visual frames for multimodal video classification. In: Proceedings of IJCNN (2023)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Google Scholar
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h. 264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)
Article Google Scholar
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127, 1106–1125 (2019)
Article Google Scholar
Yang, R., Mentzer, F., Gool, L.V., Timofte, R.: Learning for video compression with hierarchical quality and recurrent enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6628–6637 (2020)
Google Scholar
Yao, L., Chu, Z., Li, S., Li, Y., Gao, J., Zhang, A.: A survey on causal inference. ACM Trans. Knowl. Disc. Data (TKDD) 15(5), 1–46 (2021)
Article Google Scholar
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(4), 1–47 (2020)
Article Google Scholar

Download references

Acknowledgment

This work is supported by the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073), the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048), the TaiShan Scholars Program (Grant no. tsqn202211289).

Author information

Authors and Affiliations

Shandong University, Jinan, Shandong, China
Ran Wang, Zhuang Qi, Xiangxu Meng & Lei Meng
Shandong Research Institute of Industrial Technol, Jinan, China
Lei Meng

Authors

Ran Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxu Meng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Meng .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, R., Qi, Z., Meng, X., Meng, L. (2023). Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14358. Springer, Cham. https://doi.org/10.1007/978-3-031-46314-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-46314-3_29
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46313-6
Online ISBN: 978-3-031-46314-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction