ABSTRACT
Even though sophisticated deep learning methods are getting better and better day by day, still they rely on a large number of datasets. But it is not always possible to acquire large datasets for all kinds of problems. Though diffusion models are now popular for their creative applications, it is already proven that they can generate better realistic-looking synthetic images compared to Generative Adversarial Networks (GAN). GANs are a popular option for image synthesis that helps the data sampling process for datasets that have low amounts of data or imbalanced data. In our work, we have experimented with a pre-trained text-to-image generation diffusion model for generating datasets for two different classes of problems. These problems are two common problems that can get benefitted from deep learning-based solutions but the lack of datasets hampers the process. We used the diffusion model to generate synthetic images and used those images as the training and validation data for the problems we tried to solve. Then we tested the models with manually collected real-world data and demonstrated the performance of such a method comparatively. From our experiments, we found that the diffusion model can generate realistic images and is up to 50 times faster in data generation compared to the manual human process. Also, in our testing, we found that the Convolutional Neural Networks trained with these synthetic data can achieve up to 80% and 89% accuracy scores.
- Alceu Bissoto, Eduardo Valle, and Sandra Avila. 2021. GAN-Based Data Augmentation and Anonymization for Skin-Lesion Analysis: A Critical Review. 1847–1856.Google Scholar
- Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, and Akshay Chaudhari. 2022. Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. (October 2022). DOI:https://doi.org/10.48550/arxiv.2210.04133Google ScholarCross Ref
- Prafulla Dhariwal, Openai, and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. Adv Neural Inf Process Syst 34, (December 2021), 8780–8794.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. 770–778. Retrieved February 17, 2023 from http://image-net.org/challenges/LSVRC/2015/Google Scholar
- Chip Huyen and an O'Reilly Media Company. Safari. 2022. Designing Machine Learning Systems. (2022), 350. Retrieved February 17, 2023 from https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/Google Scholar
- Amina Kammoun, Rim Slama, Hedi Tabia, Tarek Ouni, and Mohmed Abid. 2022. Generative Adversarial Networks for Face Generation: A Survey. ACM Comput Surv 55, 5 (December 2022). DOI:https://doi.org/10.1145/3527850Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Lei Ba. 2014. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (December 2014). DOI:https://doi.org/10.48550/arxiv.1412.6980Google ScholarCross Ref
- Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. (December 2021). DOI:https://doi.org/10.48550/arxiv.2112.10741Google ScholarCross Ref
- Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. (August 2022). DOI:https://doi.org/10.48550/arxiv.2208.12242Google ScholarCross Ref
- Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. (May 2022). DOI:https://doi.org/10.48550/arxiv.2205.11487Google ScholarCross Ref
- Veit Sandfort, Ke Yan, Perry J. Pickhardt, and Ronald M. Summers. 2019. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Scientific Reports 2019 9:1 9, 1 (November 2019), 1–9. DOI:https://doi.org/10.1038/s41598-019-52737-xGoogle ScholarCross Ref
- Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev, Uc Berkeley, Gentec Data, and Tu Darmstadt. 2022. LAION-5B: An open large-scale dataset for training next generation image-text models. (October 2022). DOI:https://doi.org/10.48550/arxiv.2210.08402Google ScholarCross Ref
- Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, Huiyu Zhou, Ruili Wang, M. Emre Celebi, and Jie Yang. 2021. Image synthesis with adversarial networks: A comprehensive survey and case studies. Information Fusion 72, (August 2021), 126–146. DOI:https://doi.org/10.1016/J.INFFUS.2021.02.014Google ScholarCross Ref
- Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. 2256–2265. Retrieved February 17, 2023 from https://proceedings.mlr.press/v37/sohl-dickstein15.htmlGoogle Scholar
- Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A Efros, and Berkeley Ai Research. 2017. Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. 2223–2232. Retrieved February 17, 2023 from https://github.com/junyanz/CycleGAN.Google Scholar
- DALL·E: Creating Images from Text. Retrieved February 17, 2023 from https://openai.com/blog/dall-e/Google Scholar
- Stable Diffusion Public Release — Stability AI. Retrieved February 17, 2023 from https://stability.ai/blog/stable-diffusion-public-releaseGoogle Scholar
Index Terms
- Usability of Pre-trained Diffusion Models in Generating Novel Datasets and Its Performance Evaluation
Recommendations
Effectiveness of convolutional layers in pre-trained models for classifying common weeds in groundnut and corn crops
Highlights- A new balanced and multi-class dataset for groundnut crops with 15 common weeds has been taken from different locations in Andrapradesh and Tamilnadu states ...
AbstractIn modern agriculture, herbicides are most commonly used to control weeds. A large amount of herbicide usage not only has an adverse effect on the soil but also has a severe environmental impact. A smart herbicide sprayer combines the ...
Graphical abstractDisplay Omitted
VQ-VDM: Video Diffusion Models with 3D VQGAN
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in AsiaIn recent years, deep generative models have achieved impressive performance such as realizing image generation that is indistinguishable from real images. Particularly, Latent Diffusion Models, one of the image generation models, have had a significant ...
Bridging pre-trained models and downstream tasks for source code understanding
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringWith the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to effectively adapt ...
Comments