Dynamic Neural Networks for Adaptive Implicit Image Compression

Huang, Binru; Zhang, Yue; Hu, Yongzhen; Dai, Shaohui; Huang, Ziyang; Chao, Fei

doi:10.1007/978-981-99-8552-4_34

Binru Huang¹⁵,
Yue Zhang¹⁵,
Yongzhen Hu¹⁵,
Shaohui Dai¹⁵,
Ziyang Huang¹⁵ &
…
Fei Chao^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14435))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

346 Accesses

Abstract

Compression with Implicit Neural Presentations (COIN) is a neural network image compression method based on multilayer perceptron (MLP). COIN encodes an image with an MLP that maps pixel positions to RGB values matching, the weights of the MLP are quantized to obtain a code stored as an image. However, this single implicit network structure performs generally when dealing with images of multiple complexities. In this paper, we propose a novel implicit dynamic neural network to process images in a dynamic and adaptive manner. Specifically, this paper uses the Sobel operator to divide the complexity of the images and use it as a criterion to select the network width and depth adaptively. To better fit the image features, this paper concludes with further quantification of the dynamic network parameters and storage matrices. Therefore, only some of the relevant network parameters with their storage matrices are required when storing the images. In training this dynamic network, this paper uses a meta-learning approach for the multi-image compression task. Experimental results show that our method outperforms COIN and JPEG in terms of image reconstruction results for the CIFAR-10 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anokhin, I., Demochkin, K., Khakhulin, T., Sterkin, G., Lempitsky, V., Korzhenkov, D.: Image generators with conditionally-independent pixel synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14278–14287 (2021)
Google Scholar
Bisoi, R., Dash, P.K.: A hybrid evolutionary dynamic neural network for stock market trend analysis and prediction using unscented kalman filter. Appl. Soft Comput. 19, 41–56 (2014)
Article Google Scholar
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: International Conference on Machine Learning, pp. 527–536. PMLR (2017)
Google Scholar
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)
Google Scholar
Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)
Dupont, E., Loya, H., Alizadeh, M., Goliński, A., Teh, Y.W., Doucet, A.: Coin++: neural compression across modalities. arXiv preprint arXiv:2201.12904 (2022)
Dupont, E., Teh, Y.W., Doucet, A.: Generative models as distributions of functions. arXiv preprint arXiv:2102.04776 (2021)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Google Scholar
Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7436–7456 (2021)
Article Google Scholar
Huang, G., Chen, D., Li, T., Wu, F., Van Der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1891–1900 (2019)
Google Scholar
Liu, Z., Karam, L.J., Watson, A.B.: JPEG 2000 encoding with perceptual distortion control. IEEE Trans. Image Process. 15(7), 1763–1778 (2006)
Article Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Murata, A., Gallese, V., Luppino, G., Kaseda, M., Sakata, H.: Selectivity for the shape, size, and orientation of objects for grasping in neurons of monkey parietal area AIP. J. Neurophysiol. 83(5), 2580–2601 (2000)
Article Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Google Scholar
Pathak, B., Barooah, D.: Texture analysis based on the gray-level co-occurrence matrix considering possible orientations. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(9), 4206–4212 (2013)
Google Scholar
de Queiroz, R.L.: Processing JPEG-compressed images and documents. IEEE Trans. Image Process. 7(12), 1661–1672 (1998)
Article Google Scholar
Shaham, T.R., Gharbi, M., Zhang, R., Shechtman, E., Michaeli, T.: Spatially-adaptive pixelwise networks for fast image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14882–14891 (2021)
Google Scholar
Skorokhodov, I., Ignatyev, S., Elhoseiny, M.: Adversarial generation of continuous images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10753–10764 (2021)
Google Scholar
Strümpler, Y., Postels, J., Yang, R., Gool, L.V., Tombari, F.: Implicit neural representations for image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13686, pp. 74–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_5
Tan, Z., Chen, J., Kang, Q., Zhou, M., Abusorrah, A., Sedraoui, K.: Dynamic embedding projection-gated convolutional neural networks for text classification. IEEE Trans. Neural Networks Learn. Syst. 33(3), 973–982 (2021)
Article Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural. Inf. Process. Syst. 33, 7537–7547 (2020)
Google Scholar
Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–18 (2018)
Google Scholar
Wang, X., Yu, F., Dou, Z.Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 409–424 (2018)
Google Scholar
Xu, X., Wang, Z., Shi, H.: UltraSR: spatial encoding is a missing key for implicit image function-based arbitrary-scale super-resolution. arXiv preprint arXiv:2103.12716 (2021)
Yang, B., Bender, G., Le, Q.V., Ngiam, J.: CondConv: conditionally parameterized convolutions for efficient inference. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Yang, L., Han, Y., Chen, X., Song, S., Dai, J., Huang, G.: Resolution adaptive networks for efficient inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2369–2378 (2020)
Google Scholar
Yu, H., Winkler, S.: Image complexity and spatial information. In: 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17. IEEE (2013)
Google Scholar
Yüce, G., Ortiz-Jiménez, G., Besbinar, B., Frossard, P.: A structured dictionary perspective on implicit neural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19228–19238 (2022)
Google Scholar
Zemliachenko, A., Kozhemiakin, R., Vozel, B., Lukin, V.: Prediction of compression ratio in lossy compression of noisy images. In: 2016 13th International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET), pp. 693–697. IEEE (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, China
Binru Huang, Yue Zhang, Yongzhen Hu, Shaohui Dai, Ziyang Huang & Fei Chao
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen, 361005, People’s Republic of China
Fei Chao

Authors

Binru Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongzhen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shaohui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Ziyang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Chao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Chao .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Appendices

A Overall Process Pseudocode

As follows, is the pseudo code of the method in this paper.

B Class Meta-Learning $M_{\psi }$

Store each image in the dataset explicitly for its $M_{\psi }$ and evaluate each coordinate point when reconstructing the image. Denote the output y as

$$\begin{aligned} y = f_{\theta }\left( x,M_{\psi }, t\right) \end{aligned}$$

(8)

Only $M_{\psi }$ of each image alone needs to be overfitted to each image by the base network. Also, minimize the difference between the reconstruction result of the base dictionary model on the whole dataset and the dataset data.

$$\begin{aligned} L(x,M_\psi ,d)=\sum \limits _{j=1}^{n}||f_\theta (x_i,M_{\psi }, t), y_i||_{2} \end{aligned}$$

(9)

$$\begin{aligned} \min _{\theta ,M_{\psi }}L(x,M_\psi ,d) \end{aligned}$$

(10)

COIN++ [7], a special MAML is applied to generate well-initialized network parameters. While COIN++ emphasizes the generalization of the model to obtain updated network parameters by several gradient descent, in our experiments, multi-task learning is performed to overfit $M_\psi $. It is required to meta-learning a $\theta $ that over-fits the storage matrix $M_{\psi }$ at each new data point. Therefore, in the inner loop, $M_\psi $ is learned in the following way:

$$\begin{aligned} M_{\psi }^{(j)} = M_{\psi }^{(j)} - \alpha \nabla _{M_{\psi }}L\left( \theta , M_{\psi }, d^{(j)}\right) \end{aligned}$$

(11)

In the outer loop, the network parameters $\theta $ are updated using the errors generated for each data point:

$$\begin{aligned} \theta \leftarrow \theta -\beta \nabla _{\theta }\sum _{j=1}^{N}L\left( \theta , M_\psi ^{(j)}, d^{(j)}\right) \end{aligned}$$

(12)

The outer loop base model learns as many dictionary frequencies as possible, while the inner loop performs gradient updates for each image’s storage matrix, guiding the generation mask to correctly select the appropriate frequency in the base dictionary to achieve multi-image compression.

C Experimental Details

1.1 C.1 Verify the Need for Dynamic Networks

1.2 C.2 Analysis of Experimental Results of Ablation Dynamic Depth

We selected 20 images from each complexity category, maintaining their original width (r) within the category. We then calculated the average image reconstruction effects while increasing the number of network layers.

According to the Figure 5(a), as the complexity of the images gradually increased, we observed that a greater number of network layers were required to achieve a better image reconstruction effect while consuming relatively more resources. For instance, easy images yielded the best effect at 7 layers, medium images at 8 layers, and difficult images required a deeper network with the best effect achieved at 9 layers. This demonstrates that the appropriate network depth varies depending on the complexity of the images. Deeper networks are necessary for more complex images to achieve superior image reconstruction. This further affirms the importance of dynamically adapting the network depth in our approach, allowing for optimal image reconstruction results across images of different complexities.

Table 2. Fixed width and dynamic depth image reconstruction effect table with different complexity

Full size table

Our analysis confirms the significance of dynamically adjusting the network depth in our approach, achieving optimal image reconstruction results for images of different complexities.

1.3 C.3 Analysis of Experimental Results of Ablation Dynamic Width

For each category, we selected 20 images of each and fixed their optimal depths (verified in previous experiments). Their average image reconstruction effect values were calculated with increasing network width (by adjusting the r value).

According to the Figure 5(b), we can intuitively observe that the best results are achieved with an r value of 2 in medium complexity images and with an r value of 3 in difficult complexity images, both with the same width setting. In the case of simple complexity images, as the r value increases, i.e., the network width becomes wider, the effect of image reconstruction increases slightly and then decreases to the same as the initial one. This is because the simple image requires less frequency composition, while the increase of network width brings more computational consumption, and the optimization effect is not obvious. Therefore the optimal r value for the simple image should be 1, again with the same setting as its width.

Specifically, our set the following r values: r=1 (easy image: 37.8, medium image: 35.2, difficult image: 37.8), r=2 (easy image: 38.03, medium image: 35.88, difficult image: 35.23), r=3 (easy image: 37.8, medium image: 35.3, difficult image: 35.9). From the image reconstruction results with different r values, it can be intuitively seen that the best results are achieved with a r value of 2 in the medium images and with a r value of 3 in the difficult images, which coincides with their width settings. In contrast, in the easy images, as the r value increases (i.e., the network width increases), the image reconstruction effect first slightly improves and then decreases to the same as the initial one. This is because the easy image requires less frequency composition [33], while increasing the network width at the same time adds more computational overhead and is less effective in optimization. Therefore, the optimal r value for simple images should be 1, which is the same as its initial width. And the optimal r values for medium and difficult images are 2 and 3, respectively, and their image reconstruction effects under other r values follow the variation trend.

These results analyzed further validate the necessity of dynamically adapting the network width in our method and being able to adapt to images of different complexity to achieve the best image reconstruction results.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, B., Zhang, Y., Hu, Y., Dai, S., Huang, Z., Chao, F. (2024). Dynamic Neural Networks for Adaptive Implicit Image Compression. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_34

Download citation

DOI: https://doi.org/10.1007/978-981-99-8552-4_34
Published: 28 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Neural Networks for Adaptive Implicit Image Compression

Abstract

Access this chapter

References