skip to main content
research-article

3D Tensor Auto-encoder with Application to Video Compression

Published: 11 May 2021 Publication History

Abstract

Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n, the number of parameters in an auto-encoder is in general O(n). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O(n1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.

References

[1]
ISO/IEC CD 23090-3 Versatile Video Coding, document N10692, Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/.
[2]
Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin. 2015. Flywheel: Google’s data compression proxy for the mobile web. In USENIX Symposium on Networked Systems Design and Implementation. 367--380.
[3]
Sekine Asadi Amiri and Hamid Hassanpour. 2018. Image compression using JPEG with reduced blocking effects via adaptive down-sampling and self-learning image sparse representation. Multimedia Tools Applic. 77, 7 (2018), 8677--8693.
[4]
Brett W. Bader, Tamara G. Kolda et al. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html.
[5]
Mohammad Haris Baig, Vladlen Koltun, and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255.
[6]
Fabrice Bellard. 2015. The BPG image format. Retrieved from http://bellard.org/bpg/.
[7]
Johann A. Bengua, Phien Ho, Hoang Duong Tuan, and Minh N. Do. 2016. Matrix product state for higher-order tensor compression and classification. IEEE Trans. Sig. Proc. PP, 99 (2016), 1--1.
[8]
G. Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. BJONTEGAARD G. Doc. VCEG-M33 ITU-T Q6/16, Austin, TX, USA, 2-4 April 2001.
[9]
Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. 2017. DeepCoder: A deep neural network based video compression. In Visual Communications and Image Processing (VCIP’17). IEEE, 1--4.
[10]
Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. 2020. Learning for video compression. IEEE Trans. Circ. Syst. Vid. Technol. 30, 2 (2020), 566--576.
[11]
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2018. Deep convolutional autoencoder-based lossy image compression. arXiv preprint arXiv:1804.09535 (2018).
[12]
Charilaos Christopoulos, Athanassios Skodras et al. 2000. The JPEG2000 still image coding system: An overview. IEEE Trans. Consum. Electron. 46, 4 (2000), 1103--1127.
[13]
VN Index. 2013. Cisco Visual Networking Index: Forecast and Methodology, 2015--2020. Retrieved from http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html.
[14]
Wenrui Dai, Yangmei Shen, Xin Tang, Junni Zou, Hongkai Xiong, and Chang Wen Chen. 2016. Sparse representation with spatio-temporal online dictionary learning for promising video coding. IEEE Trans. Image Proc. 25, 10 (2016), 4580--4595.
[15]
Chris Ding, Heng Huang, and Dijun Luo. 2008. Tensor reduction error analysis--Applications to video compression and classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, 1--8.
[16]
Bo Du, Mengfei Zhang, Lefei Zhang, and Xuelong Li. 2014. Hyperspectral biological images compression based on multiway tensor projection. In IEEE International Conference on Multimedia and Expo (ICME’14). IEEE, 1--6.
[17]
Frédéric Dufaux, Gary J. Sullivan, and Touradj Ebrahimi. 2009. The JPEG XR image coding standard [Standards in a Nutshell]. IEEE Sig. Proc. Mag. 26, 6 (2009).
[18]
Thierry Dumas, Aline Roumy, and Christine Guillemot. 2017. Image compression with stochastic winner-take-all auto-encoder. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 1512--1516.
[19]
Leyuan Fang, Nanjun He, and Hui Lin. 2017. CP tensor-based compression of hyperspectral images. J. Optic. Soc. Amer. A A 34, 2 (2017), 252--258.
[20]
Xiph.org Foundation. 2010. Xiph.org Video Test Media. Retrieved from https://media.xiph.org/video/derf/.
[21]
Shmuel Friedland, Qun Li, and Dan Schonfeld. 2014. Compressive sensing of sparse tensors.IEEE Trans. Image Proc. 23, 10 (2014), 4438--4447.
[22]
Jun Han, Salvator Lombardo, Christopher Schroers, and Stephan Mandt. 2018. Deep probabilistic video compression. arXiv preprint arXiv:1810.02845 (2018).
[23]
Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.
[24]
Maziar Irannejad and Homayoun Mahdavi-Nasab. 2018. Block matching video compression based on sparse representation and dictionary learning. Circ. Syst. Sig. Proc. 37, 8 (2018), 3537--3557.
[25]
J. Jiang. 1999. Image compression with neural networks èC A survey. Sig. Proc. Image Commun. 14, 9 (1999), 737--760.
[26]
Fatih Kamisli. 2013. Intra prediction based on Markov process modeling of images. IEEE Trans. Image Proc. 22, 10 (2013), 3916--3925.
[27]
Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, and Alan C. Bovik. 2018. Adversarial video compression guided by soft edge detection. arXiv preprint arXiv:1811.10673 (2018).
[28]
Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[29]
Alex Krizhevsky and Geoffrey E. Hinton. 2012. Using very deep autoencoders for content-based image retrieval. In European Symposium on Artificial Neural Networks (ESANN’11).
[30]
Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2018), 2316--2330.
[31]
Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2019. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Proc. 28, 5 (2019), 2140--2151.
[32]
Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Trans. Image Proc. 25, 11 (2016), 5088--5103.
[33]
Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11006--11015.
[34]
Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 3--11.
[35]
Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald Bultje, Adrian Grange, John Koleszar, Paul Wilkins, and Yaowu Xu. 2015. A technical overview of VP9—The latest open-source video codec. SMPTE Motion Imag. J. 124, 1 (2015), 44--54.
[36]
Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi. 2004. Video coding with H. 264/AVC: Tools, performance, and complexity. IEEE Circ. Syst. Mag. 4, 1 (2004), 7--28.
[37]
Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned video compression. arXiv preprint arXiv:1811.06981 (2018).
[38]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.
[39]
Yun Song, Gaobo Yang, Hongtao Xie, Dengyong Zhang, and Sun Xingming. 2017. Residual domain dictionary learning for compressed sensing video recovery. Multimedia Tools Applic. 76, 7 (2017), 10083--10096.
[40]
Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand et al. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649--1668.
[41]
Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations (ICLR’17). 1--19.
[42]
George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).
[43]
George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5435--5443.
[44]
Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. 4790--4798.
[45]
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems. 613--621.
[46]
Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38, 1 (1992), xviii--xxxiv.
[47]
Qingzhu Wang, Mengying Wei, Xiaoming Chen, and Zhuang Miao. 2018. Joint encryption and compression of 3D images based on tensor compressive sensing with non-autonomous 3D chaotic system. Multimedia Tools Applic. 77, 2 (2018), 1715--1734.
[48]
Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference (DCC). IEEE, 410--419.
[49]
Chao Yuan Wu, Nayan Singhal, and Philipp Krähenbühl. 2018. Video compression through image interpolation. In European Conference on Computer Vision (ECCV’18). Springer, 425--440.
[50]
Yimin Yang, Q. M. Jonathan Wu, and Yaonan Wang. 2016. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. PP, 99 (2016), 1--15.
[51]
Li Yingzhen and Stephan Mandt. 2018. Disentangled sequential autoencoder. In International Conference on Machine Learning (ICML’18). 5656--5665.
[52]
Jia Zhang, Sam Tak Wu Kwong, Tiesong Zhao, and Horace Ho Shing Ip. 2018. Complexity control in HEVC intra coding for industrial video applications. IEEE Trans. Industr. Inform. PP, 99 (2018), 1--1.

Cited By

View all
  • (2025)A SAM-guided Two-stream Lightweight Model for Anomaly DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370657421:2(1-23)Online publication date: 9-Jan-2025
  • (2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
  • (2024)High Efficiency Deep-learning Based Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366131120:8(1-23)Online publication date: 12-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
May 2021
410 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3461621
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2021
Accepted: 01 October 2020
Revised: 01 September 2020
Received: 01 November 2019
Published in TOMM Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ADAM
  2. Video compression
  3. auto-encoder
  4. tensor

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China (NSFC)
  • Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province
  • Research Project of Jiangsu Vocational College of Information Technology
  • New Generation AI Major Project of Ministry of Science and Technology of China
  • High Level of Jiangsu Province Key Construction Project Fund
  • “Qing Lan Project” Teaching Team in Colleges and Universities of Jiangsu Province

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A SAM-guided Two-stream Lightweight Model for Anomaly DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370657421:2(1-23)Online publication date: 9-Jan-2025
  • (2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
  • (2024)High Efficiency Deep-learning Based Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366131120:8(1-23)Online publication date: 12-Jun-2024
  • (2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
  • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
  • (2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
  • (2024)LiteWiSys: A Lightweight System for WiFi-based Dual-task Action PerceptionACM Transactions on Sensor Networks10.1145/363217720:4(1-19)Online publication date: 11-May-2024
  • (2024)Gait Attribute Recognition: A New Benchmark for Learning Richer Attributes From Human Gait PatternsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.331893419(1-14)Online publication date: 1-Jan-2024
  • (2024)Neural network-based cross-channel chroma prediction for versatile video codingThe Journal of Supercomputing10.1007/s11227-023-05868-y80:9(12166-12185)Online publication date: 8-Feb-2024
  • (2023)Video Codec Using Machine Learning Based on Parametric Orthogonal FiltersOptical Memory and Neural Networks10.3103/S1060992X2304002132:4(226-232)Online publication date: 22-Dec-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media