research-article

How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs

Authors:

Mohamed Amine Batoun,

Mohammed Sayagh,

Ali OuniAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 140 - 149

https://doi.org/10.1145/3661167.3661234

Published: 18 June 2024 Publication History

Get Access

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs

Pages 140 - 149

Abstract
References

Abstract

Software logging is the practice of recording different events that occur within a software system, which are useful for several analysis activities. However, striking the right balance between logging and system overhead is challenging. Prior work has conducted various machine learning-based solutions to suggest where to insert logging statements. But most importantly, before answering the question “where to log?’’, practitioners first need to determine whether a file needs logging at the first place. To do so, we conduct in this paper an empirical study to characterize the log density (i.e., ratio of log lines over the total lines of code) in seven open-source software projects. Then, we propose a deep learning based approach to predict the log density based on syntactic and semantic features of the source code. We find that the percentage of files with at least one log line ranges from 5% to 33% across the studied projects. Additionally, the median log density in the files with at least one log line ranges from 0.95% to 1.85% across the seven projects and can go up to 18%. Our findings resonate with the hypothesis that not all source code files require logging. On the other hand, our log density models achieve an average accuracy of 84%. Whereas our cross-project log density prediction results show a promising performance with an average accuracy of 72%, which represents over 86% (ratio of cross/within) of the corresponding within-project predictions using syntactic features. Our results show that we can accurately predict whether a file needs logging and such predictions may be generalized across projects.

References

[1]

Batoun Mohamed Amine, Yung Ka Lai, Tian Yuan, and Sayagh Mohammed. 2023. An Empirical Study on GitHub Pull Requests’ Reactions. ACM Transactions on Software Engineering and Methodology (2023).

Abstract

References

Index Terms

Recommendations

Mining historical test logs to predict bugs and localize faults in the test logs

Hue: A User-Adaptive Parser for Hybrid Logs

A Survey of Software Log Instrumentation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations