Skip to main content
Log in

Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://bellard.org/bpg/.

References

  • Akansu, A. N., Haddad, P. A., Haddad, R. A., & Haddad, P. R. (2001). Multiresolution signal decomposition: Transforms, subbands, and wavelets. New York: Academic Press.

    MATH  Google Scholar 

  • Akansu, A. N., & Liu, Y. (1991). On-signal decomposition techniques. Optical Engineering, 30(7), 912–921.

    Article  Google Scholar 

  • Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1611.01704

  • Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. Technical Report. arXiv preprint arXiv:1802.01436

  • Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.

    Article  Google Scholar 

  • Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., & Wang, Y. (2019). Neural image compression via non-local attention optimization and improved context modeling. Technical Report. arXiv preprint arXiv:1910.06244

  • Christopoulos, C., Skodras, A., & Ebrahimi, T. (2000). The JPEG2000 still image coding system: An overview. IEEE Transactions on Consumer Electronics, 46(4), 1103–1127.

    Article  Google Scholar 

  • Dejean-Servières, M., Desnos, K., Abdelouahab, K., Hamidouche, W., Morin, L., & Pelcat, M. (2017). Study of the impact of standard image compression techniques on performance of image classification with a convolutional neural network. Technical Report. hal-01725126. https://hal.archives-ouvertes.fr/hal-01725126

  • Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In International conference on quality of multimedia experience (pp. 1–6). IEEE.

  • Duan, L., Liu, J., Yang, W., Huang, T., & Gao, W. (2020). Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing, 29, 8680–8695.

    Article  Google Scholar 

  • Gomez, A. N., Ren, M., Urtasun, R., & Grosse, R. B. (2017). The reversible residual network: Backpropagation without storing activations. In: Advances in neural information processing systems (pp. 2214–2224).

  • Goutsias, J., & Heijmans, H. J. (2000). Nonlinear multiresolution signal decomposition schemes (I) Morphological pyramids. IEEE Transactions on Image Processing, 9(11), 1862–1876.

    Article  MathSciNet  Google Scholar 

  • He, C., Shi, Z., Qu, T., Wang, D., & Liao, M. (2019). Lifting scheme-based deep neural network for remote sensing scene classification. Remote Sensing, 11(22), 2648.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • Heijmans, H. J., & Goutsias, J. (2000). Nonlinear multiresolution signal decomposition schemes (II) Morphological wavelets. IEEE Transactions on Image Processing, 9(11), 1897–1913.

    Article  MathSciNet  Google Scholar 

  • Hu, Y., Yang, S., Yang, W., Duan, L. Y., & Liu, J. (2020). Towards coding for human and machine vision: A scalable image coding approach. In ICME (pp. 1–6). IEEE.

  • Huang, G., Liu, Z., Van Der Maaten L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR (pp. 4700–4708).

  • Jacobsen, J. H., Smeulders, A., & Oyallon, E. (2018). i-Revnet: Deep invertible networks. Technical Report. arXiv preprint arXiv:1802.07088

  • Johnston, P., Elyan, E., & Jayne, C. (2018). Spatial effects of video compression on classification in convolutional neural networks. In IJCNN (pp. 1–8).

  • Kwaśnicka, H., & Jain, L. C. (2018). Bridging the semantic gap in image and video analysis. Berlin: Springer.

    Book  Google Scholar 

  • Latif, A., Rasheed, A., Sajid, U., Ahmed, J., Ali, N., Ratyal, N. I., et al. (2019). Content-based image retrieval and feature extraction: A comprehensive review. Mathematical Problems in Engineering, 2019, 1–21.

    Article  Google Scholar 

  • Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. Technical Report. arXiv preprint arXiv:1809.10452

  • Li, M., Zhang, K., Zuo, W., Timofte, R., & Zhang, D. (2020). Learning context-based non-local entropy modeling for image compression. Technical Report. arXiv preprint arXiv:2005.04661

  • Lo, S. C., Li, H., & Freedman, M. T. (2003). Optimization of wavelet decomposition for image compression and feature preservation. IEEE Transactions on Medical Imaging, 22(9), 1141–1151.

    Article  Google Scholar 

  • Ma, H., Liu, D., Xiong, R., & Wu, F. (2019). iWave: CNN-based wavelet-like transform for image compression. IEEE Transactions on Multimedia, 22, 1667–1679.

    Article  Google Scholar 

  • Ma, H., Liu, D., Yan, N., Li, H., & Wu, F. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2020.3026003.

    Article  Google Scholar 

  • Ma, S., Zhang, X., Wang, S., Zhang, X., Jia, C., & Wang, S. (2018). Joint feature and texture coding: Toward smart video representation via front-end intelligence. IEEE Transactions on Circuits and Systems for Video Technology, 29(10), 3095–3105.

    Article  Google Scholar 

  • Maji, S., Rahtu, E., Kannala, J., Blaschko, M., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. Technical Report. arXiv preprint arXiv:1306.5151.

  • Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.

    Article  Google Scholar 

  • Marpe, D., Schwarz, H., & Wiegand, T. (2003). Context-based adaptive binary arithmetic coding in the h.264/AVC video compression standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 620–636.

    Article  Google Scholar 

  • Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (pp. 10771–10780).

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer A. (2017). Automatic differentiation in PyTorch. Technical report. OpenReview.net, https://openreview.net/forum?id=BJJsrmfCZ.

  • Poyser, M., Atapour-Abarghouei, A., & Breckon, T. P. (2020). On the impact of lossy image and video compression on the performance of deep convolutional neural network architectures. Technical Report. arXiv preprint arXiv:2007.14314.

  • Rodriguez, M. X. B., Gruson, A., Polania, L., Fujieda, S., Prieto, F., Takayama, K., & Hachisuka, T. (2020). Deep adaptive wavelet network. In IEEE Winter conference on applications of computer vision (pp. 3111–3119).

  • Ruder, S. (2017). An overview of multi-task learning in deep neural networks. Technical Report. arXiv preprint arXiv:1706.05098

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. Technical Report. arXiv preprint arXiv:1703.00810

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Technical Report. arXiv preprint arXiv:1409.1556.

  • Sweldens, W. (1998). The lifting scheme: A construction of second generation wavelets. SIAM Journal on Mathematical Analysis, 29(2), 511–546.

    Article  MathSciNet  Google Scholar 

  • Taubman, D. (2000). High performance scalable image compression with EBCOT. IEEE Transactions on Image Processing, 9(7), 1158–1170.

    Article  MathSciNet  Google Scholar 

  • Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. Technical Report. arXiv preprint arXiv:physics/0004057.

  • Tishby, N., & Zaslavsky, N. (2015). Deep learning and the information bottleneck principle. In IEEE information theory workshop (pp. 1–5).

  • Toderici, G., O’Malley, S. M., Hwang, S. J., Vincent, D., Minnen, D., Baluja, S., Covell, M., & Sukthankar, R. (2015). Variable rate image compression with recurrent neural networks. Technical Report. arXiv preprint arXiv:1511.06085

  • Torfason, R., Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., & Van Gool, L. (2018). Towards image understanding from deep compression without decoding. Technical Report. arXiv preprint arXiv:1803.06131

  • Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report. CNS-TR-2011-001, California Institute of Technology.

  • Wang, S., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2019). Scalable facial image compression with deep feature reconstruction. In ICIP (pp. 2691–2695). IEEE.

  • Xia, S., Liang, K., Yang, W., Duan, L. Y., & Liu, J. (2020). An emerging coding paradigm VCM: A scalable coding approach beyond feature and signal. In ICME (pp. 1–6). IEEE

  • Yan, N., Liu, D., Li, H., & Wu, F. (2020). Semantically scalable image coding with compression of feature maps. In ICIP, IEEE (pp. 3114–3118).

  • Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2016). A joint compression scheme of video feature descriptors and visual content. IEEE Transactions on Image Processing, 26(2), 633–647.

    Article  MathSciNet  Google Scholar 

  • Zhao, J., Peng, Y., & He, X. (2020). Attribute hierarchy based multi-task learning for fine-grained image classification. Neurocomputing, 395, 150–159.

    Article  Google Scholar 

  • Zhao, Z. Q., Zheng, P., & Xu St, Wu X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.

    Article  Google Scholar 

  • Zhou, L., Sun, Z., Wu, X., & Wu, J. (2019). End-to-end optimized image compression with attention mechanism. In CVPR workshops (pp. 1–4).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China under Grant 2018YFA0701603, by the Natural Science Foundation of China under Grant 61772483, and by the Fundamental Research Funds for the Central Universities under Contract WK3490000005. We acknowledge the support of the GPU cluster built by MCC Lab of the School of Information Science and Technology of USTC.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dong Liu or Houqiang Li.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, K., Liu, D., Li, L. et al. Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations. Int J Comput Vis 129, 2605–2621 (2021). https://doi.org/10.1007/s11263-021-01491-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01491-7

Keywords

Navigation