skip to main content
10.1145/3661167.3661203acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

An Empirical Study on the Energy Usage and Performance of Pandas and Polars Data Analysis Python Libraries

Published: 18 June 2024 Publication History

Abstract

Context. Python’s growing popularity in data analysis and the contemporary emphasis on energy-efficient software tools necessitate an investigation into the energy implications of data operations, particularly in resource-intensive domains like data science. Goal. We aim to assess the energy usage of Pandas, a widely-used Python data manipulation library, and Polars, a Rust-based library known for its performance. The study aims to provide insights for data scientists by identifying scenarios where one library outperforms the other in terms of energy usage, while exploring the possible correlations between energy and performance metrics. Method. We performed four separate experiment blocks including 8 Data Analysis Tasks (DATs) from an official TPCH Benchmark done by Polars and 6 Synthetic DATs. Both DATs groups are run with small and large dataframes and for both libraries. Results. Polars is more energy-efficient than Pandas when manipulating large dataframes. For small dataframes, the TPCH Benchmarking DATs does not show significant differences, while for the Synthetic DATs, Polars performs significantly better. We identified strong positive correlations between energy usage and execution time, as well as memory usage for Pandas, while Polars did not show significant memory usage correlations for the majority of runs. There is a significantly negative correlation between energy usage and CPU usage for Pandas. Conclusions. We recommend using Polars for energy-efficient and fast data analysis, emphasizing the importance of CPU core utilization in library selection.

References

[1]
Promise R. Agbedanu, Richard Musabe, James Rwigema, Ignace Gatare, and Yanis Pavlidis. 2022. IPCA-SAMKNN: A Novel Network IDS for Resource Constrained Devices. In International Seminar on Machine Learning, Optimization, and Data Science (ISMODE). 540–545.
[2]
Stefanos Georgiou, Maria Kechagia, Tushar Sharma, Federica Sarro, and Ying Zou. 2022. Green ai: Do deep learning frameworks have different costs?. In Proceedings of the 44th International Conference on Software Engineering. 1082–1094.
[3]
Pramod Gupta and Anupam Bagchi. 2024. Introduction to Pandas. Springer Nature Switzerland, Cham, 161–196.
[4]
Guillermo Macbeth, Eugenia Razumiejczyk, and Rubén Daniel Ledesma. 2010. Cliff´s Delta Calculator: A non-parametric effect size program for two groups of observations. Universitas Psychologica 10, 2 (jun 2010), 545–555.
[5]
Ivano Malavolta, Eoin Martino Grua, Cheng-Yu Lam, Randy De Vries, Franky Tan, Eric Zielinski, Michael Peters, and Luuk Kaandorp. 2020. A framework for the automatic execution of measurement-based experiments on android devices. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 61–66.
[6]
Adel Noureddine. 2022. PowerJoular and JoularJX: Multi-Platform Software Power Monitoring Tools. In 18th International Conference on Intelligent Environments. Biarritz, France.
[7]
Rui Pereira, Marco Couto, Francisco Ribeiro, Rui Rua, Jácome Cunha, João Paulo Fernandes, and João Saraiva. 2021. Ranking programming languages by energy efficiency. Science of Computer Programming 205 (may 2021), 102609.
[8]
Nurzihan Fatema Reya, Abtahi Ahmed, Tashfia Zaman, and Md. Motaharul Islam. 2023. GreenPy: Evaluating Application-Level Energy Efficiency in Python for Green Computing. Annals of Emerging Technologies in Computing (2023).
[9]
Michael P Rooney and Suzanne J Matthews. 2023. Evaluating FFT performance of the C and Rust Languages on Raspberry Pi platforms. In 2023 57th Annual Conference on Information Sciences and Systems (CISS). IEEE, 1–6.
[10]
Shriram Shanbhag and Sridhar Chimalakonda. 2023. An Exploratory Study on Energy Consumption of Dataframe Processing Libraries. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 284–295.
[11]
Arman Shehabi, Sarah Smith, Dale Sartor, Richard Brown, Magnus Herrlin, Jonathan Koomey, Eric Masanet, Nathaniel Horner, Inês Azevedo, and William Lintner. 2016. United States data center energy usage report.
[12]
Linus Wagner, Maximilian Mayer, Andrea Marino, Alireza Soldani Nezhad, Hugo Zwaan, and Ivano Malavolta. 2023. On the Energy Consumption and Performance of WebAssembly Binaries across Programming Languages and Runtimes in IoT. In Proceedings of EASE 2023 (Oulu, Finland) (EASE ’23). ACM, New York, NY, USA, 72–82.
[13]
Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Supun Kamburugamuve, Thejaka Amila Kanewala, Hasara Maithree, Pulasthi Wickramasinghe, Ahmet Uyar, Gurhan Gunduz, and Geoffrey Fox. 2020. High Performance Data Engineering Everywhere. In 2020 IEEE International Conference on Smart Data Services (SMDS). 122–132.
[14]
Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou, and Michael R. Lyu. 2021. Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs. ACM Transactions on Software Engineering and Methodology 31, 1 (sep 2021), 1–25.

Cited By

View all
  • (2024)AutoML-Driven Insights into Patient Outcomes and Emergency Care During Romania’s First Wave of COVID-19Bioengineering10.3390/bioengineering1112127211:12(1272)Online publication date: 15-Dec-2024
  • (2024)Exploring the Impact of K-Anonymisation on the Energy Efficiency of Machine Learning Algorithms2024 10th International Conference on ICT for Sustainability (ICT4S)10.1109/ICT4S64576.2024.00022(128-137)Online publication date: 24-Jun-2024

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)186
  • Downloads (Last 6 weeks)37
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AutoML-Driven Insights into Patient Outcomes and Emergency Care During Romania’s First Wave of COVID-19Bioengineering10.3390/bioengineering1112127211:12(1272)Online publication date: 15-Dec-2024
  • (2024)Exploring the Impact of K-Anonymisation on the Energy Efficiency of Machine Learning Algorithms2024 10th International Conference on ICT for Sustainability (ICT4S)10.1109/ICT4S64576.2024.00022(128-137)Online publication date: 24-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media