Skip to main content

The Effects of Class Balance on the Training Energy Consumption of Logistic Regression Models

  • Conference paper
  • First Online:
Research Challenges in Information Science (RCIS 2024)

Abstract

The presence of Artificial Intelligence and specifically Machine Learning (ML) has increased in all manner of software applications, and it already plays a major role in a variety of systems pertaining to Information Science such as public transport, disease diagnosis support and other medical problems. This increase in use has raised concerns about possible environmental impacts, since ML models require to be trained in datacentres that can impose a high ecological toll. With the aim of uncovering new ways of reducing the energy consumption of ML models, in this study we will explore the energetic impact of class balance for binary classification tasks by comparing a set of logistic regression models (LRMs) trained on a synthetic balanced dataset against another set trained on a synthetic, unbalanced dataset. We focus on the total energy and time required to complete the task, and discover that the order in energy efficiency of the models remained consistent regardless of class balance, but those trained on the unbalanced dataset required between 1.42 and 1.5 times more energy to complete the tasks, despite requiring only around 1 s more of runtime. We finish by analysing the results and proposing using synthetic datasets to estimate the energy cost of different hyperparameter options for LRMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pimentel, L.D.A., et al.: Solving the train timetabling problem, a mathematical model and a genetic algorithm solution approach. In: 6th International Conference on Railway Operations Modelling and Analysis, RailTokyo2015, March 2015, Tokyo, Japan (2015). https://hal.science/hal-01338609. Accessed 10 Jan 2024

  2. Brownlee, A.E.I., et al.: Exploring the accuracy - energy trade-off in machine learning. In: 2021 IEEE/ACM International Workshop on Genetic Improvement (GI), May 2021, pp. 11–18 (2021). https://doi.org/10.1109/GI52543.2021.00011. https://ieeexplore.ieee.org/document/9474356. Accessed 12 Jan 2024

  3. Cai, E., et al.: NeuralPower: predict and deploy energy-efficient convolutional neural networks, 15 October 2017. arXiv arXiv:1710.05420 [cs,stat]. Accessed 12 Jan 2024

  4. Castaño, J., et al.: Exploring the carbon footprint of Hugging Face’s ML models: a repository mining study. In: 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 26 October 2023, pp. 1–12 (2023). https://doi.org/10.1109/ESEM56168.2023.10304801. arXiv arXiv:2305.11164 [cs , stat]. Accessed 12 Jan 2024

  5. Currie, C.S.M., et al.: How simulation modelling can help reduce the impact of COVID-19. J. Simul. 14(2), 83–97 (2020). ISSN 1747-7778. https://doi.org/10.1080/17477778.2020.1751570. Accessed 10 Jan 2024

  6. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding, 24 May 2019. arXiv arXiv:1810.04805 [cs]. Accessed 08 Dec 2023

  7. Dezen, N.: Microsoft creates new opportunities for partners through AI offerings and expansion of Microsoft Cloud Partner Program. The Official Microsoft Blog, 22 March 2023. https://blogs.microsoft.com/blog/2023/03/22/microsoft-creates-new-opportunities-for-partners-through-ai-offerings-and-expansion-of-microsoftcloud-partner-program/. Accessed 08 Dec 2023

  8. Ferroni, P., et al.: Artificial intelligence for cancer-associated thrombosis risk assessment. Lancet Haematol. 5(9), e391 (2018). ISSN 2352-3026. https://doi.org/10.1016/S2352-3026(18)30111-X. Accessed 10 Jan 2024

  9. García-Martín, E., et al.: Estimation of energy consumption in machine learning. J. Parallel Distrib. Comput. 134, 75–88 (2019). ISSN 0743-7315. https://doi.org/10.1016/j.jpdc.2019.07.007. Accessed 12 Jan 2024

  10. Google: Sustainable Innovation & Technology - Google Sustainability. Sustainability (2023). https://sustainability.google/reports/google-2023-environmental-report/. Accessed 08 Dec 2023

  11. Gutierrez, M., et al.: Dataset: the effects of class balance on the training energy consumption of logistic regression models, March 2024. https://doi.org/10.5281/zenodo.10823624

  12. Gutiérrez, M., Moraga, M.A., García, F.: Analysing the energy impact of different optimisations for machine learning models. In: 2022 International Conference on ICT for Sustainability (ICT4S), June 2022, pp. 46–52 (2022). https://doi.org/10.1109/ICT4S55073.2022.00016.

  13. Henderson, P., et al.: Towards the systematic reporting of the energy and carbon footprints of machine learning, 29 November 2022. arXiv arXiv:2002.05651 [cs]. Accessed 12 Jan 2024

  14. Kucharski, A.J., et al.: Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. 20(5), pp. 553–558 (2020). ISSN 1473-3099. https://doi.org/10.1016/S1473-3099(20)30144-4. https://www.sciencedirect.com/science/article/pii/S1473309920301444. Accessed 10 Jan 2024

  15. Lacoste, A., et al.: Quantifying the carbon emissions of machine learning, 4 November 2019. https://doi.org/10.48550/arXiv.1910.09700. arXiv arXiv:1910.09700 [cs]. Accessed 12 Jan 2024

  16. Li, P., et al.: Making AI less: uncovering and addressing the secret water footprint of AI models, 29 October 2023. arXiv arXiv:2304.03271 [cs]. Accessed 08 Dec 2023

  17. Luccioni, A.S., Viguier, S., Ligozat, A.-L.: Estimating the carbon footprint of BLOOM, a 176B parameter language model, 3 November 2022. arXiv arXiv:2211.02001 [cs]. Accessed 08 Dec 2023

  18. Mancebo, J., et al.: EET: a device to support the measurement of software consumption. In: Proceedings of the 6th International Workshop on Green and Sustainable Software, GREENS 2018, 27 May 2018, pp. 16–22. Association for Computing Machinery, New York (2018). ISBN 978-1-4503-5732-6. https://doi.org/10.1145/3194078.3194081. Accessed 19 Jan 2022

  19. Mancebo, J., et al.: FEETINGS: framework for energy efficiency testing to improve environmental goal of the software. Sustain. Comput. Inf. Syst. 30, 100558 (2021). ISSN 2210-5379. https://doi.org/10.1016/j.suscom.2021.100558. https://www.sciencedirect.com/science/article/pii/S2210537921000494. Accessed 04 Feb 2022

  20. Mehdi, Y.: Announcing Microsoft Copilot, your everyday AI companion. The Official Microsoft Blog, 21 September 2023. https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilotyour-everyday-ai-companion/. Accessed 08 Dec 2023

  21. Microsoft: 2022 Environmental Sustainability Report. In: Global Sustainability (2022)

    Google Scholar 

  22. Joppa, L., Smith, B.: An update on Microsoft’s sustainability commitments: building a foundation for 2030. The Official Microsoft Blog, 10 March 2022. https://blogs.microsoft.com/blog/2022/03/10/anupdate-on-microsofts-sustainability-commitments-building-afoundation-for-2030/. Accessed 08 Dec 2023

  23. Srinivasa Rao, A.S.R., Vazquez, J.A.: Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone-based survey when cities and towns are under quarantine. Infect. Control Hosp. Epidemiol. 41(7), 826-830 (2020). ISSN 0899-823X, 1559-6834. https://doi.org/10.1017/ice.2020.61. Accessed 10 Jan 2024

  24. Rodrigues, C.F., Riley, G., Luján, M.: SyNERGY: an energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1 (2018)

    Google Scholar 

  25. Rösler, D., et al.: Discerning primary and secondary delays in railway networks using explainable AI. Transp. Res. Procedia 52, 171–178 (2021). 23rd EURO Working Group on Transportation Meeting, EWGT 2020, 16–18 September 2020, Paphos, Cyprus (Jan. 1, 2021). ISSN 2352-1465. https://doi.org/10.1016/j.trpro.2021.01.018. https://www.sciencedirect.com/science/article/pii/S2352146521000405. Accessed 10 Jan 2024

  26. Shoieb, D., Youssef, S., Ahmed, W.: Computer-aided model for skin diagnosis using deep learning. J. Image Graph. 4, 116–121 (2016). https://doi.org/10.18178/joig.4.2.122-129

  27. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). International Conference on Computational Intelligence and Data Science. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2018.05.122. https://www.sciencedirect.com/science/article/pii/S1877050918308548. Accessed 10 Jan 2024

  28. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, July 2019, Florence, Italy, pp. 3645–3650. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/P19-1355. https://aclanthology.org/P19-1355. Accessed 04 Feb 2022

  29. Verdecchia, R., et al.: Data-centric green AI an exploratory empirical study. In: 2022 International Conference on ICT for Sustainability (ICT4S), Plovdiv, Bulgaria, June 2022, pp. 35–45. IEEE (2022). ISBN 978-1-66548-286-8. https://doi.org/10.1109/ICT4S55073.2022.00015. https://ieeexplore.ieee.org/document/9830097/. Accessed 12 Jan 2024

  30. WEKA’s RandomRBF. https://weka.sourceforge.io/doc.dev/weka/datagene-rators/classifiers/classification/RandomRBF.html. Accessed 09 Jan 2024

Download references

Acknowledgments

This work was supported by the following projects: OASSIS (PID2021-122554OB C31/ AEI/10.13039/ 501100011033/FEDER, UE); EMMA (Project SBPLY/ 21 /180501/ 000115, funded by CECD (JCCM) and FEDER funds); SEEAT (PDC2022-133249-C31 funded by MCIN /AEI/ 10.13039/501100011033 and European Union NextGenerationEU/PRTR); PLAGEMIS (TED2021-129245B-C22 funded by MCIN /AEI/ 10.13039/501100011033 and European Union NextGenerationEU /PRTR); UNION (2022-GRIN-34110).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María Gutiérrez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gutiérrez, M., Calero, C., García, F., Moraga, M.Á. (2024). The Effects of Class Balance on the Training Energy Consumption of Logistic Regression Models. In: Araújo, J., de la Vara, J.L., Santos, M.Y., Assar, S. (eds) Research Challenges in Information Science. RCIS 2024. Lecture Notes in Business Information Processing, vol 513. Springer, Cham. https://doi.org/10.1007/978-3-031-59465-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-59465-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-59464-9

  • Online ISBN: 978-3-031-59465-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics