skip to main content
10.1145/3642970.3655847acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Free Access

Comparative Profiling: Insights Into Latent Diffusion Model Training

Published:22 April 2024Publication History

ABSTRACT

Generative AI models are at the forefront of advancing creative and analytical tasks, pushing the boundaries of what machines can generate and comprehend. Among these, latent diffusion models represent significant advancements in generating high-fidelity audio and images. This study introduces a systematic approach to study GPU utilisation during the training of these models by leveraging Weights & Biases and the PyTorch Profiler for detailed monitoring and profiling. Our methodology is designed to uncover inefficiencies in GPU resource allocation, pinpointing bottlenecks in the training pipeline. The insights gained aim to guide the development of strategies for enhancing training efficiency, potentially reducing computational costs and accelerating the development cycle of generative AI models. This contribution not only highlights the critical role of resource optimisation in scaling AI technologies but also opens new avenues for research in efficient model training.

References

  1. Marcel Aach, Eray Inanc, Rakesh Sarma, Morris Riedel, and Andreas Lintermann. 2023. Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks. Journal of Big Data 10 (6 2023), 96. Issue 1. https://doi.org/10.1186/s40537-023-00765-wGoogle ScholarGoogle ScholarCross RefCross Ref
  2. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  3. Ahmed M. Abdelmoniem and Marco Canini. 2021. DC2: Delay-aware Compression Control for Distributed Machine Learning. In IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. 1--10. https://doi.org/10.1109/INFOCOM42981.2021.9488810Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lucas Bellaiche, Rohin Shahi, Martin Harry Turpin, Anya Ragnhildstveit, Shawn Sprockett, Nathaniel Barr, Alexander Christensen, and Paul Seli. 2023. Humans versus AI: whether and why we prefer human-created compared to AI-created artwork. Cognitive Research: Principles and Implications 8 (7 2023), 42. Issue 1. https://doi.org/10.1186/s41235-023-00499-6Google ScholarGoogle ScholarCross RefCross Ref
  5. Lukas Biewald. 2020. Experiment Tracking with Weights and Biases. https://www.wandb.com/ Software available from wandb.com.Google ScholarGoogle Scholar
  6. Ebubekir BUBER and Banu DIRI. 2018. Performance Analysis and CPU vs GPU Comparison for Deep Learning. In 2018 6th International Conference on Control Engineering Information Technology (CEIT). 1--6. https://doi.org/10.1109/CEIT.2018.8751930Google ScholarGoogle ScholarCross RefCross Ref
  7. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. (12 2015). http://arxiv.org/abs/1512.01274Google ScholarGoogle Scholar
  8. William Falcon and The PyTorch Lightning team. 2019. PyTorch Lightning. https://doi.org/10.5281/zenodo.3828935Google ScholarGoogle ScholarCross RefCross Ref
  9. Attila Farkas, Krisztián Póra, Sándor Szénási, Gábor Kertész, and Róbert Lovas. 2022. Evaluation of a distributed deep learning framework as a reference architecture for a cloud environment. In 2022 IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC). 000083--000088. https://doi.org/10.1109/ICCC202255925.2022.9922765Google ScholarGoogle ScholarCross RefCross Ref
  10. Cong Fu, Keqiang Yan, Limei Wang, Wing Yee Au, Michael McThrow, Tao Komikado, Koji Maruhashi, Kanji Uchino, Xiaoning Qian, and Shuiwang Ji. 2023. A Latent Diffusion Model for Protein Structure Generation. arXiv:2305.04120 [q-bio.BM]Google ScholarGoogle Scholar
  11. Dipesh Gyawali. 2023. Comparative Analysis of CPU and GPU Profiling for Deep Learning Models. (9 2023). http://arxiv.org/abs/2309.02521Google ScholarGoogle Scholar
  12. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, Florida, USA) (MM '14). Association for Computing Machinery, New York, NY, USA, 675--678. https://doi.org/10.1145/2647868.2654889Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, and Gunhee Kim. 2019. AudioCaps: Generating Captions for Audios in The Wild. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 119--132. https://doi.org/10.18653/v1/N19-1011Google ScholarGoogle ScholarCross RefCross Ref
  14. Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, and Soumith Chintala. 2020. PyTorch Distributed: Experiences on Accelerating Data Parallel Training. (6 2020). http://arxiv.org/abs/2006.15704Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, and Mark D. Plumbley. 2023. AudioLDM: Text-to-Audio Generation with Latent Diffusion Models. (1 2023). https://arxiv.org/abs/2301.12503Google ScholarGoogle Scholar
  16. Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, and Mark D. Plumbley. 2023. AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining. (8 2023). http://arxiv.org/abs/2308.05734Google ScholarGoogle Scholar
  17. Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini, and Marco Canini. 2021. An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems. In Proceedings of Machine Learning and Systems, A. Smola, A. Dimakis, and I. Stoica (Eds.), Vol. 3. 297--322. https://proceedings.mlsys.org/paper_files/paper/2021/file/fea47a8aa372e42f3c84327aec9506cf-Paper.pdfGoogle ScholarGoogle Scholar
  18. Sophie J Nightingale and Hany Farid. 2022. AI-synthesized faces are indistinguishable from real faces and more trustworthy. Proceedings of the National Academy of Sciences of the United States of America 119 (2 2022). Issue 8. https://doi.org/10.1073/pnas.2120481119Google ScholarGoogle ScholarCross RefCross Ref
  19. PyTorch. 2021. Introducing PyTorch Profiler - the new and improved performance tool. https://pytorch.org/docs/stable/profiler.html Software available from https://pytorch.org.Google ScholarGoogle Scholar
  20. Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. 2020. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD '20). Association for Computing Machinery, New York, NY, USA, 3505--3506. https://doi.org/10.1145/3394486.3406703Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. (12 2021). http://arxiv.org/abs/2112.10752Google ScholarGoogle Scholar
  22. Nadathur Satish, Narayanan Sundaram, and Kurt Keutzer. 2009. Optimizing the use of GPU memory in applications with large data sets. 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings, 408--418. https://doi.org/10.1109/HIPC.2009.5433185Google ScholarGoogle ScholarCross RefCross Ref
  23. Frank Seide and Amit Agarwal. 2016. CNTK: Microsoft's Open-Source Deep-Learning Toolkit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 2135. https://doi.org/10.1145/2939672.2945397Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. (2 2018). https://arxiv.org/abs/1802.05799Google ScholarGoogle Scholar
  25. Shaohuai Shi, Qiang Wang, and Xiaowen Chu. 2017. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. (11 2017). http://arxiv.org/abs/1711.05979Google ScholarGoogle Scholar
  26. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. Proceedings of the 32nd International Conference on Machine Learning, 2256--2265.Google ScholarGoogle Scholar
  27. Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O'Connor, and Stephen W. Keckler. 2015. Flexible software profiling of GPU architectures. SIGARCH Comput. Archit. News 43, 3S (jun 2015), 185--197. https://doi.org/10.1145/2872887.2750375Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ehsan Yousefzadeh-Asl-Miandoab, Ties Robroek, and Pinar Tozun. 2023. Profiling and Monitoring Deep Learning Training Tasks. In Proceedings of the 3rd Workshop on Machine Learning and Systems (Rome, Italy) (EuroMLSys '23). Association for Computing Machinery, New York, NY, USA, 18--25. https://doi.org/10.1145/3578356.3592589Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparative Profiling: Insights Into Latent Diffusion Model Training

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and Systems
          April 2024
          218 pages
          ISBN:9798400705410
          DOI:10.1145/3642970

          Copyright © 2024 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 April 2024

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate18of26submissions,69%
        • Article Metrics

          • Downloads (Last 12 months)13
          • Downloads (Last 6 weeks)13

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader