Abstract
The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, MidJourney, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., Fleet, D.J.: Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466 (2023)
Bansal, M.A., Sharma, D.R., Kathuria, D.M.: A systematic review on data scarcity problem in deep learning: solution and applications. ACM Comput. Surv. 54(10s) (2022). https://doi.org/10.1145/3502287
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023)
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Courier Corporation, Chelmsford (2013)
Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative adversarial networks-based data augmentation for brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 4039–4051 (2021). https://doi.org/10.1109/TNNLS.2020.3016666
Fournaris, A.P., Lalos, A.S., Serpanos, D.: Generative adversarial networks in AI-enabled safety-critical systems: friend or foe? Computer 52(9), 78–81 (2019). https://doi.org/10.1109/MC.2019.2924546
Gozalo-Brizuela, R., Garrido-Merchan, E.C.: ChatGPT is not all you need. a state of the art review of large generative AI models. arXiv (2023). https://doi.org/10.48550/ARXIV.2301.04655, https://arxiv.org/abs/2301.04655
Hataya, R., Bao, H., Arai, H.: Will large-scale generative models corrupt future datasets? arXiv preprint arXiv:2211.08095 (2022)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)
Jiang, R., Chiappa, S., Lattimore, T., György, A., Kohli, P.: Degenerate feedback loops in recommender systems. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 383–390, January 2019.https://doi.org/10.1145/3306618.3314288
Jiang, Z., Zhang, J., Gong, N.Z.: Evading watermark based detection of AI-generated content. arXiv preprint arXiv:2305.03807 (2023)
Karagiannakos, S., Adaloglou, N.: Diffusion models: toward state-of-the-art image generation (2022). https://theaisummer.com/
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019)
Laurençon, H., et al.: The bigscience roots corpus: a 1.6 TB composite multilingual dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 31809–31826. Curran Associates, Inc. (2022)
Lhoest, Q., et al.: Datasets: a community library for natural language processing. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 175–184. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021.https://doi.org/10.18653/v1/2021.emnlp-demo.21, https://aclanthology.org/2021.emnlp-demo.21
Li, C., et al.: Geometry-based molecular generation with deep constrained variational autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 1–10 (2022).https://doi.org/10.1109/TNNLS.2022.3147790
Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., Burke, R.: Feedback loop and bias amplification in recommender systems. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2145–2148 (2020)
Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R.: Combining generative artificial intelligence (AI) and the internet: heading towards evolution or degradation? (2023)
Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: International Conference on Machine Learning, pp. 7176–7185. PMLR (2020)
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, December 2008
Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. arXiv (2022). https://doi.org/10.48550/ARXIV.2210.08402, https://arxiv.org/abs/2210.08402
Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv (2021). https://doi.org/10.48550/ARXIV.2111.02114, https://arxiv.org/abs/2111.02114
Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., Anderson, R.: The curse of recursion: training on generated data makes models forget (2023)
Simard, M.: Clean data for training statistical MT: the case of MT contamination. In: Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, pp. 69–82 (2014)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: Caltech-UCSD birds-200-2011 (cub-200-2011). Technical report. CNS-TR-2011-001, California Institute of Technology (2011)
Weng, L.: What are diffusion models? lilianweng.github.io, July 2021. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
Xiao, Z., Kreis, K., Vahdat, A.: Tackling the generative learning trilemma with denoising diffusion GANs. arXiv preprint arXiv:2112.07804 (2021)
Zhang, C., Geng, Y., Han, Z., Liu, Y., Fu, H., Hu, Q.: Autoencoder in autoencoder networks. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2022). https://doi.org/10.1109/TNNLS.2022.3189239
Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion model in generative AI: a survey. arXiv preprint arXiv:2303.07909 (2023)
Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion models in generative AI: a survey (2023)
Acknowledgements
This work was supported by the FUN4DATE (PID2022-136684O7B-C21/22) and ENTRUDIT (TED2021-130118B-I00) projects funded by the Spanish Agencia Estatal de Investigacion (AEI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R. (2024). Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. In: Cuzzolin, F., Sultana, M. (eds) Epistemic Uncertainty in Artificial Intelligence . Epi UAI 2023. Lecture Notes in Computer Science(), vol 14523. Springer, Cham. https://doi.org/10.1007/978-3-031-57963-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-57963-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57962-2
Online ISBN: 978-3-031-57963-9
eBook Packages: Computer ScienceComputer Science (R0)