Efficient Training with Denoised Neural Weights

Gong, Yifan; Zhan, Zheng; Li, Yanyu; Idelbayev, Yerlan; Zharkov, Andrey; Aberman, Kfir; Tulyakov, Sergey; Wang, Yanzhi; Ren, Jian

doi:10.1007/978-3-031-73010-8_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15141))

Included in the following conference series:

European Conference on Computer Vision

284 Accesses

Abstract

Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights spanning a wide range. Specifically, we first collect a dataset with various image editing concepts and their corresponding trained weights, which are later used for the training of the weight generator. To address the different characteristics among layers and the substantial number of weights to be predicted, we divide the weights into equal-sized blocks and assign each block an index. Subsequently, a diffusion model is trained with such a dataset using both text conditions of the concept and the block indexes. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds. Compared to training from scratch (i.e., Pix2pix), we achieve a $15\times $ training time acceleration for a new concept while obtaining even better image generation quality. We will release our dataset, code, and the pre-trained weight generator.

Y. Gong—Work done during internship at Snap Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Image-To-Image Translation Using Deep Convolutional GANs

Flexible Generative Adversarial Networks with Non-parametric Activation Functions

AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

References

Bachlechner, T., Majumder, B.P., Mao, H., Cottrell, G., McAuley, J.: Rezero is all you need: Fast convergence at large depth. In: Uncertainty in Artificial Intelligence, pp. 1352–1361. PMLR (2021)
Google Scholar
Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks. arXiv preprint arXiv:1711.05136 (2017)
Brooks, T., Holynski, A., Efros, A.A.: InstructPix2Pix: learning to follow image editing instructions. arXiv preprint arXiv:2211.09800 (2022)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chiang, W.L., et al.: Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality (2023). https://vicuna.lmsys.org. Accessed 14 Apr 2023
Cordonnier, J.B., Loukas, A., Jaggi, M.: On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584 (2019)
De, S., Smith, S.: Batch normalization biases residual blocks towards the identity function in deep networks. Adv. Neural. Inf. Process. Syst. 33, 19964–19975 (2020)
Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: efficient finetuning of quantized LLMs. Adv. Neural. Inf. Process. Syst. 36 (2024)
Google Scholar
Dettmers, T., Zettlemoyer, L.: Sparse networks from scratch: faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019)
d’Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: ConViT: improving vision transformers with soft convolutional inductive biases. In: International Conference on Machine Learning, pp. 2286–2296. PMLR (2021)
Google Scholar
Erkoç, Z., Ma, F., Shan, Q., Nießner, M., Dai, A.: HyperDiffusion: generating implicit neural fields with weight-space diffusion. arXiv preprint arXiv:2303.17015 (2023)
Evci, U., Gale, T., Menick, J., Castro, P.S., Elsen, E.: Rigging the lottery: making all tickets winners. In: International Conference on Machine Learning, pp. 2943–2952. PMLR (2020)
Google Scholar
Geng, Z., et al.: InstructDiffusion: a generalist modeling interface for vision tasks. arXiv preprint arXiv:2309.03895 (2023)
Gong, Y., et al.: E2GAN: efficient training of efficient GANs for image-to-image translation. arXiv preprint arXiv:2401.06127 (2024)
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Huang, X.S., Perez, F., Ba, J., Volkovs, M.: Improving transformer optimization through better initialization. In: International Conference on Machine Learning, pp. 4475–4483. PMLR (2020)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kong, Z., et al.: Peeling the onion: hierarchical reduction of data redundancy for efficient vision transformer training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 8360–8368 (2023)
Google Scholar
Lee, N., Ajanthan, T., Torr, P.H.: SNIP: single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018)
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models. arXiv preprint arXiv:2211.09794 (2022)
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part IX. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Parmar, G., Kumar Singh, K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)
Google Scholar
Parmar, G., Zhang, R., Zhu, J.Y.: On aliased resizing and surprising subtleties in GAN evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11410–11420 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Ruiz, N., et al.: HyperDreamBooth: hypernetworks for fast personalization of text-to-image models. arXiv preprint arXiv:2307.06949 (2023)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)
Google Scholar
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
Sun, X., et al.: Ultra-low precision 4-bit training of deep neural networks. Adv. Neural. Inf. Process. Syst. 33, 1796–1807 (2020)
Google Scholar
Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. Adv. Neural. Inf. Process. Syst. 33, 6377–6389 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017)
Google Scholar
Venkataramani, S., et al.: RaPiD: AI accelerator for ultra-low precision training and inference. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 153–166. IEEE (2021)
Google Scholar
Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376 (2020)
Wang, K., et al.: Neural network diffusion (2024)
Google Scholar
Wang, Z., et al.: SparCL: sparse continual learning on the edge. Adv. Neural. Inf. Process. Syst. 35, 20366–20380 (2022)
Google Scholar
Wortsman, M., Dettmers, T., Zettlemoyer, L., Morcos, A., Farhadi, A., Schmidt, L.: Stable and low-precision training for large-scale vision-language models. Adv. Neural. Inf. Process. Syst. 36 (2024)
Google Scholar
Yuan, G., et al.: MEST: Accurate and fast memory-economic sparse training framework on the edge. Adv. Neural. Inf. Process. Syst. 34, 20838–20850 (2021)
Google Scholar
Zhang, H., Dauphin, Y.N., Ma, T.: Fixup initialization: residual learning without normalization. arXiv preprint arXiv:1901.09321 (2019)
Zhang, L., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
Google Scholar
Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Snap Inc., Santa Monica, USA
Yifan Gong, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov & Jian Ren
Northeastern University, Boston, USA
Yifan Gong, Zheng Zhan, Yanyu Li & Yanzhi Wang

Authors

Yifan Gong
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Yanyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yerlan Idelbayev
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Zharkov
View author publications
You can also search for this author in PubMed Google Scholar
Kfir Aberman
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Tulyakov
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yifan Gong .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8545 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, Y. et al. (2025). Efficient Training with Denoised Neural Weights. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15141. Springer, Cham. https://doi.org/10.1007/978-3-031-73010-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-73010-8_2
Published: 10 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73009-2
Online ISBN: 978-3-031-73010-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Training with Denoised Neural Weights