Skip to main content

Dynamic Neural Networks for Adaptive Implicit Image Compression

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14435))

Included in the following conference series:

  • 346 Accesses

Abstract

Compression with Implicit Neural Presentations (COIN) is a neural network image compression method based on multilayer perceptron (MLP). COIN encodes an image with an MLP that maps pixel positions to RGB values matching, the weights of the MLP are quantized to obtain a code stored as an image. However, this single implicit network structure performs generally when dealing with images of multiple complexities. In this paper, we propose a novel implicit dynamic neural network to process images in a dynamic and adaptive manner. Specifically, this paper uses the Sobel operator to divide the complexity of the images and use it as a criterion to select the network width and depth adaptively. To better fit the image features, this paper concludes with further quantification of the dynamic network parameters and storage matrices. Therefore, only some of the relevant network parameters with their storage matrices are required when storing the images. In training this dynamic network, this paper uses a meta-learning approach for the multi-image compression task. Experimental results show that our method outperforms COIN and JPEG in terms of image reconstruction results for the CIFAR-10 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anokhin, I., Demochkin, K., Khakhulin, T., Sterkin, G., Lempitsky, V., Korzhenkov, D.: Image generators with conditionally-independent pixel synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14278–14287 (2021)

    Google Scholar 

  2. Bisoi, R., Dash, P.K.: A hybrid evolutionary dynamic neural network for stock market trend analysis and prediction using unscented kalman filter. Appl. Soft Comput. 19, 41–56 (2014)

    Article  Google Scholar 

  3. Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: International Conference on Machine Learning, pp. 527–536. PMLR (2017)

    Google Scholar 

  4. Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)

    Google Scholar 

  5. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)

    Google Scholar 

  6. Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)

  7. Dupont, E., Loya, H., Alizadeh, M., Goliński, A., Teh, Y.W., Doucet, A.: Coin++: neural compression across modalities. arXiv preprint arXiv:2201.12904 (2022)

  8. Dupont, E., Teh, Y.W., Doucet, A.: Generative models as distributions of functions. arXiv preprint arXiv:2102.04776 (2021)

  9. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)

    Google Scholar 

  10. Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/

  11. Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7436–7456 (2021)

    Article  Google Scholar 

  12. Huang, G., Chen, D., Li, T., Wu, F., Van Der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844 (2017)

  13. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  14. Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1891–1900 (2019)

    Google Scholar 

  15. Liu, Z., Karam, L.J., Watson, A.B.: JPEG 2000 encoding with perceptual distortion control. IEEE Trans. Image Process. 15(7), 1763–1778 (2006)

    Article  Google Scholar 

  16. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)

    Google Scholar 

  17. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)

    Article  Google Scholar 

  18. Murata, A., Gallese, V., Luppino, G., Kaseda, M., Sakata, H.: Selectivity for the shape, size, and orientation of objects for grasping in neurons of monkey parietal area AIP. J. Neurophysiol. 83(5), 2580–2601 (2000)

    Article  Google Scholar 

  19. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)

    Google Scholar 

  20. Pathak, B., Barooah, D.: Texture analysis based on the gray-level co-occurrence matrix considering possible orientations. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(9), 4206–4212 (2013)

    Google Scholar 

  21. de Queiroz, R.L.: Processing JPEG-compressed images and documents. IEEE Trans. Image Process. 7(12), 1661–1672 (1998)

    Article  Google Scholar 

  22. Shaham, T.R., Gharbi, M., Zhang, R., Shechtman, E., Michaeli, T.: Spatially-adaptive pixelwise networks for fast image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14882–14891 (2021)

    Google Scholar 

  23. Skorokhodov, I., Ignatyev, S., Elhoseiny, M.: Adversarial generation of continuous images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10753–10764 (2021)

    Google Scholar 

  24. Strümpler, Y., Postels, J., Yang, R., Gool, L.V., Tombari, F.: Implicit neural representations for image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13686, pp. 74–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_5

  25. Tan, Z., Chen, J., Kang, Q., Zhou, M., Abusorrah, A., Sedraoui, K.: Dynamic embedding projection-gated convolutional neural networks for text classification. IEEE Trans. Neural Networks Learn. Syst. 33(3), 973–982 (2021)

    Article  Google Scholar 

  26. Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural. Inf. Process. Syst. 33, 7537–7547 (2020)

    Google Scholar 

  27. Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–18 (2018)

    Google Scholar 

  28. Wang, X., Yu, F., Dou, Z.Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 409–424 (2018)

    Google Scholar 

  29. Xu, X., Wang, Z., Shi, H.: UltraSR: spatial encoding is a missing key for implicit image function-based arbitrary-scale super-resolution. arXiv preprint arXiv:2103.12716 (2021)

  30. Yang, B., Bender, G., Le, Q.V., Ngiam, J.: CondConv: conditionally parameterized convolutions for efficient inference. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  31. Yang, L., Han, Y., Chen, X., Song, S., Dai, J., Huang, G.: Resolution adaptive networks for efficient inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2369–2378 (2020)

    Google Scholar 

  32. Yu, H., Winkler, S.: Image complexity and spatial information. In: 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17. IEEE (2013)

    Google Scholar 

  33. Yüce, G., Ortiz-Jiménez, G., Besbinar, B., Frossard, P.: A structured dictionary perspective on implicit neural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19228–19238 (2022)

    Google Scholar 

  34. Zemliachenko, A., Kozhemiakin, R., Vozel, B., Lukin, V.: Prediction of compression ratio in lossy compression of noisy images. In: 2016 13th International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET), pp. 693–697. IEEE (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Chao .

Editor information

Editors and Affiliations

Appendices

A Overall Process Pseudocode

As follows, is the pseudo code of the method in this paper.

Algorithm 1
figure a

Employ dynamic neural networks to compress images of different complexities with implicit neural representations

B Class Meta-Learning \(M_{\psi }\)

Store each image in the dataset explicitly for its \(M_{\psi }\) and evaluate each coordinate point when reconstructing the image. Denote the output y as

$$\begin{aligned} y = f_{\theta }\left( x,M_{\psi }, t\right) \end{aligned}$$
(8)

Only \(M_{\psi }\) of each image alone needs to be overfitted to each image by the base network. Also, minimize the difference between the reconstruction result of the base dictionary model on the whole dataset and the dataset data.

$$\begin{aligned} L(x,M_\psi ,d)=\sum \limits _{j=1}^{n}||f_\theta (x_i,M_{\psi }, t), y_i||_{2} \end{aligned}$$
(9)
$$\begin{aligned} \min _{\theta ,M_{\psi }}L(x,M_\psi ,d) \end{aligned}$$
(10)

COIN++ [7], a special MAML is applied to generate well-initialized network parameters. While COIN++ emphasizes the generalization of the model to obtain updated network parameters by several gradient descent, in our experiments, multi-task learning is performed to overfit \(M_\psi \). It is required to meta-learning a \(\theta \) that over-fits the storage matrix \(M_{\psi }\) at each new data point. Therefore, in the inner loop, \(M_\psi \) is learned in the following way:

$$\begin{aligned} M_{\psi }^{(j)} = M_{\psi }^{(j)} - \alpha \nabla _{M_{\psi }}L\left( \theta , M_{\psi }, d^{(j)}\right) \end{aligned}$$
(11)

In the outer loop, the network parameters \(\theta \) are updated using the errors generated for each data point:

$$\begin{aligned} \theta \leftarrow \theta -\beta \nabla _{\theta }\sum _{j=1}^{N}L\left( \theta , M_\psi ^{(j)}, d^{(j)}\right) \end{aligned}$$
(12)

The outer loop base model learns as many dictionary frequencies as possible, while the inner loop performs gradient updates for each image’s storage matrix, guiding the generation mask to correctly select the appropriate frequency in the base dictionary to achieve multi-image compression.

C Experimental Details

1.1 C.1 Verify the Need for Dynamic Networks

Fig. 4.
figure 4

Verify the dynamic necessity of neural network. Take twenty pictures of each complexity and change the depth or width to get the image reconstruction effect of each complexity picture in a specific network, which is measured by PSNR.

1.2 C.2 Analysis of Experimental Results of Ablation Dynamic Depth

We selected 20 images from each complexity category, maintaining their original width (r) within the category. We then calculated the average image reconstruction effects while increasing the number of network layers.

According to the Figure 5(a), as the complexity of the images gradually increased, we observed that a greater number of network layers were required to achieve a better image reconstruction effect while consuming relatively more resources. For instance, easy images yielded the best effect at 7 layers, medium images at 8 layers, and difficult images required a deeper network with the best effect achieved at 9 layers. This demonstrates that the appropriate network depth varies depending on the complexity of the images. Deeper networks are necessary for more complex images to achieve superior image reconstruction. This further affirms the importance of dynamically adapting the network depth in our approach, allowing for optimal image reconstruction results across images of different complexities.

Table 2. Fixed width and dynamic depth image reconstruction effect table with different complexity

Our analysis confirms the significance of dynamically adjusting the network depth in our approach, achieving optimal image reconstruction results for images of different complexities.

1.3 C.3 Analysis of Experimental Results of Ablation Dynamic Width

Fig. 5.
figure 5

The results of ablation experiments on dynamic neural networks. During the ablation of dynamic depth, the fixed network width, that is, the fixed R-value, gradually deepens the network depth. Compare the optimal network depth for images of different complexity. When the dynamic width is ablated, the network depth is fixed and the network width is gradually widened, that is, the R-value is increased. Compare the optimal network width for images of different complexity.

For each category, we selected 20 images of each and fixed their optimal depths (verified in previous experiments). Their average image reconstruction effect values were calculated with increasing network width (by adjusting the r value).

According to the Figure 5(b), we can intuitively observe that the best results are achieved with an r value of 2 in medium complexity images and with an r value of 3 in difficult complexity images, both with the same width setting. In the case of simple complexity images, as the r value increases, i.e., the network width becomes wider, the effect of image reconstruction increases slightly and then decreases to the same as the initial one. This is because the simple image requires less frequency composition, while the increase of network width brings more computational consumption, and the optimization effect is not obvious. Therefore the optimal r value for the simple image should be 1, again with the same setting as its width.

Specifically, our set the following r values: r=1 (easy image: 37.8, medium image: 35.2, difficult image: 37.8), r=2 (easy image: 38.03, medium image: 35.88, difficult image: 35.23), r=3 (easy image: 37.8, medium image: 35.3, difficult image: 35.9). From the image reconstruction results with different r values, it can be intuitively seen that the best results are achieved with a r value of 2 in the medium images and with a r value of 3 in the difficult images, which coincides with their width settings. In contrast, in the easy images, as the r value increases (i.e., the network width increases), the image reconstruction effect first slightly improves and then decreases to the same as the initial one. This is because the easy image requires less frequency composition [33], while increasing the network width at the same time adds more computational overhead and is less effective in optimization. Therefore, the optimal r value for simple images should be 1, which is the same as its initial width. And the optimal r values for medium and difficult images are 2 and 3, respectively, and their image reconstruction effects under other r values follow the variation trend.

These results analyzed further validate the necessity of dynamically adapting the network width in our method and being able to adapt to images of different complexity to achieve the best image reconstruction results.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, B., Zhang, Y., Hu, Y., Dai, S., Huang, Z., Chao, F. (2024). Dynamic Neural Networks for Adaptive Implicit Image Compression. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8552-4_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8551-7

  • Online ISBN: 978-981-99-8552-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics