Abstract
Compression with Implicit Neural Presentations (COIN) is a neural network image compression method based on multilayer perceptron (MLP). COIN encodes an image with an MLP that maps pixel positions to RGB values matching, the weights of the MLP are quantized to obtain a code stored as an image. However, this single implicit network structure performs generally when dealing with images of multiple complexities. In this paper, we propose a novel implicit dynamic neural network to process images in a dynamic and adaptive manner. Specifically, this paper uses the Sobel operator to divide the complexity of the images and use it as a criterion to select the network width and depth adaptively. To better fit the image features, this paper concludes with further quantification of the dynamic network parameters and storage matrices. Therefore, only some of the relevant network parameters with their storage matrices are required when storing the images. In training this dynamic network, this paper uses a meta-learning approach for the multi-image compression task. Experimental results show that our method outperforms COIN and JPEG in terms of image reconstruction results for the CIFAR-10 dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anokhin, I., Demochkin, K., Khakhulin, T., Sterkin, G., Lempitsky, V., Korzhenkov, D.: Image generators with conditionally-independent pixel synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14278–14287 (2021)
Bisoi, R., Dash, P.K.: A hybrid evolutionary dynamic neural network for stock market trend analysis and prediction using unscented kalman filter. Appl. Soft Comput. 19, 41–56 (2014)
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: International Conference on Machine Learning, pp. 527–536. PMLR (2017)
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8628–8638 (2021)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)
Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)
Dupont, E., Loya, H., Alizadeh, M., Goliński, A., Teh, Y.W., Doucet, A.: Coin++: neural compression across modalities. arXiv preprint arXiv:2201.12904 (2022)
Dupont, E., Teh, Y.W., Doucet, A.: Generative models as distributions of functions. arXiv preprint arXiv:2102.04776 (2021)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7436–7456 (2021)
Huang, G., Chen, D., Li, T., Wu, F., Van Der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1891–1900 (2019)
Liu, Z., Karam, L.J., Watson, A.B.: JPEG 2000 encoding with perceptual distortion control. IEEE Trans. Image Process. 15(7), 1763–1778 (2006)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Murata, A., Gallese, V., Luppino, G., Kaseda, M., Sakata, H.: Selectivity for the shape, size, and orientation of objects for grasping in neurons of monkey parietal area AIP. J. Neurophysiol. 83(5), 2580–2601 (2000)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Pathak, B., Barooah, D.: Texture analysis based on the gray-level co-occurrence matrix considering possible orientations. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(9), 4206–4212 (2013)
de Queiroz, R.L.: Processing JPEG-compressed images and documents. IEEE Trans. Image Process. 7(12), 1661–1672 (1998)
Shaham, T.R., Gharbi, M., Zhang, R., Shechtman, E., Michaeli, T.: Spatially-adaptive pixelwise networks for fast image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14882–14891 (2021)
Skorokhodov, I., Ignatyev, S., Elhoseiny, M.: Adversarial generation of continuous images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10753–10764 (2021)
Strümpler, Y., Postels, J., Yang, R., Gool, L.V., Tombari, F.: Implicit neural representations for image compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13686, pp. 74–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_5
Tan, Z., Chen, J., Kang, Q., Zhou, M., Abusorrah, A., Sedraoui, K.: Dynamic embedding projection-gated convolutional neural networks for text classification. IEEE Trans. Neural Networks Learn. Syst. 33(3), 973–982 (2021)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural. Inf. Process. Syst. 33, 7537–7547 (2020)
Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–18 (2018)
Wang, X., Yu, F., Dou, Z.Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 409–424 (2018)
Xu, X., Wang, Z., Shi, H.: UltraSR: spatial encoding is a missing key for implicit image function-based arbitrary-scale super-resolution. arXiv preprint arXiv:2103.12716 (2021)
Yang, B., Bender, G., Le, Q.V., Ngiam, J.: CondConv: conditionally parameterized convolutions for efficient inference. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Yang, L., Han, Y., Chen, X., Song, S., Dai, J., Huang, G.: Resolution adaptive networks for efficient inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2369–2378 (2020)
Yu, H., Winkler, S.: Image complexity and spatial information. In: 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17. IEEE (2013)
Yüce, G., Ortiz-Jiménez, G., Besbinar, B., Frossard, P.: A structured dictionary perspective on implicit neural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19228–19238 (2022)
Zemliachenko, A., Kozhemiakin, R., Vozel, B., Lukin, V.: Prediction of compression ratio in lossy compression of noisy images. In: 2016 13th International Conference on Modern Problems of Radio Engineering, Telecommunications and Computer Science (TCSET), pp. 693–697. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Overall Process Pseudocode
As follows, is the pseudo code of the method in this paper.
B Class Meta-Learning \(M_{\psi }\)
Store each image in the dataset explicitly for its \(M_{\psi }\) and evaluate each coordinate point when reconstructing the image. Denote the output y as
Only \(M_{\psi }\) of each image alone needs to be overfitted to each image by the base network. Also, minimize the difference between the reconstruction result of the base dictionary model on the whole dataset and the dataset data.
COIN++ [7], a special MAML is applied to generate well-initialized network parameters. While COIN++ emphasizes the generalization of the model to obtain updated network parameters by several gradient descent, in our experiments, multi-task learning is performed to overfit \(M_\psi \). It is required to meta-learning a \(\theta \) that over-fits the storage matrix \(M_{\psi }\) at each new data point. Therefore, in the inner loop, \(M_\psi \) is learned in the following way:
In the outer loop, the network parameters \(\theta \) are updated using the errors generated for each data point:
The outer loop base model learns as many dictionary frequencies as possible, while the inner loop performs gradient updates for each image’s storage matrix, guiding the generation mask to correctly select the appropriate frequency in the base dictionary to achieve multi-image compression.
C Experimental Details
1.1 C.1 Verify the Need for Dynamic Networks
1.2 C.2 Analysis of Experimental Results of Ablation Dynamic Depth
We selected 20 images from each complexity category, maintaining their original width (r) within the category. We then calculated the average image reconstruction effects while increasing the number of network layers.
According to the Figure 5(a), as the complexity of the images gradually increased, we observed that a greater number of network layers were required to achieve a better image reconstruction effect while consuming relatively more resources. For instance, easy images yielded the best effect at 7 layers, medium images at 8 layers, and difficult images required a deeper network with the best effect achieved at 9 layers. This demonstrates that the appropriate network depth varies depending on the complexity of the images. Deeper networks are necessary for more complex images to achieve superior image reconstruction. This further affirms the importance of dynamically adapting the network depth in our approach, allowing for optimal image reconstruction results across images of different complexities.
Our analysis confirms the significance of dynamically adjusting the network depth in our approach, achieving optimal image reconstruction results for images of different complexities.
1.3 C.3 Analysis of Experimental Results of Ablation Dynamic Width
For each category, we selected 20 images of each and fixed their optimal depths (verified in previous experiments). Their average image reconstruction effect values were calculated with increasing network width (by adjusting the r value).
According to the Figure 5(b), we can intuitively observe that the best results are achieved with an r value of 2 in medium complexity images and with an r value of 3 in difficult complexity images, both with the same width setting. In the case of simple complexity images, as the r value increases, i.e., the network width becomes wider, the effect of image reconstruction increases slightly and then decreases to the same as the initial one. This is because the simple image requires less frequency composition, while the increase of network width brings more computational consumption, and the optimization effect is not obvious. Therefore the optimal r value for the simple image should be 1, again with the same setting as its width.
Specifically, our set the following r values: r=1 (easy image: 37.8, medium image: 35.2, difficult image: 37.8), r=2 (easy image: 38.03, medium image: 35.88, difficult image: 35.23), r=3 (easy image: 37.8, medium image: 35.3, difficult image: 35.9). From the image reconstruction results with different r values, it can be intuitively seen that the best results are achieved with a r value of 2 in the medium images and with a r value of 3 in the difficult images, which coincides with their width settings. In contrast, in the easy images, as the r value increases (i.e., the network width increases), the image reconstruction effect first slightly improves and then decreases to the same as the initial one. This is because the easy image requires less frequency composition [33], while increasing the network width at the same time adds more computational overhead and is less effective in optimization. Therefore, the optimal r value for simple images should be 1, which is the same as its initial width. And the optimal r values for medium and difficult images are 2 and 3, respectively, and their image reconstruction effects under other r values follow the variation trend.
These results analyzed further validate the necessity of dynamically adapting the network width in our method and being able to adapt to images of different complexity to achieve the best image reconstruction results.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, B., Zhang, Y., Hu, Y., Dai, S., Huang, Z., Chao, F. (2024). Dynamic Neural Networks for Adaptive Implicit Image Compression. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14435. Springer, Singapore. https://doi.org/10.1007/978-981-99-8552-4_34
Download citation
DOI: https://doi.org/10.1007/978-981-99-8552-4_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8551-7
Online ISBN: 978-981-99-8552-4
eBook Packages: Computer ScienceComputer Science (R0)