How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs

Published: 18 June 2024 Publication History


Software logging is the practice of recording different events that occur within a software system, which are useful for several analysis activities. However, striking the right balance between logging and system overhead is challenging. Prior work has conducted various machine learning-based solutions to suggest where to insert logging statements. But most importantly, before answering the question “where to log?’’, practitioners first need to determine whether a file needs logging at the first place. To do so, we conduct in this paper an empirical study to characterize the log density (i.e., ratio of log lines over the total lines of code) in seven open-source software projects. Then, we propose a deep learning based approach to predict the log density based on syntactic and semantic features of the source code. We find that the percentage of files with at least one log line ranges from 5% to 33% across the studied projects. Additionally, the median log density in the files with at least one log line ranges from 0.95% to 1.85% across the seven projects and can go up to 18%. Our findings resonate with the hypothesis that not all source code files require logging. On the other hand, our log density models achieve an average accuracy of 84%. Whereas our cross-project log density prediction results show a promising performance with an average accuracy of 72%, which represents over 86% (ratio of cross/within) of the corresponding within-project predictions using syntactic features. Our results show that we can accurately predict whether a file needs logging and such predictions may be generalized across projects.


EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
Author Tags

  1. Deep Learning
  2. Log Density
  3. Software Logging


