Skip to main content
Log in

Guiding log revisions by learning from software evolution history

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Despite the importance of log statements in postmortem debugging, developers are difficult to establish good logging practices. There are mainly two reasons. First, there are no rigorous specifications or systematic processes to instruct logging practices. Second, logging code evolves with bug fixes or feature updates. Without considering the impact of software evolution, previous works on log enhancement can partially release the first problem but are hard to solve the latter. To fill this gap, this paper proposes to guide log revisions by learning from evolution history. Motivated by code clones, we assume that logging code with similar context is pervasive and deserves similar modifications and conduct an empirical study on 12 open-source projects to validate our assumption. Upon this, we design and implement LogTracker, an automatic tool that learns log revision rules by mining the correlation between logging context and modifications and recommends candidate log revisions by applying these rules. With an enhanced modeling of logging context, LogTracker can instruct more intricate log revisions that cannot be covered by existing tools. Our experiments show that LogTracker can detect 369 instances of candidates when applied to the latest versions of software. So far, we have reported 79 of them, and 52 have been accepted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Revisions are considered as after-thoughts if they are modified later than the modification of the surrounding code.

  2. Here “version” means the internal version number (not the release version). This may be incremented many times in one day.

  3. Log statements share semantically similar context if they print similar log variables under similar condition and are called as “context-similar log revisions” for simplification.

  4. Source code of our prototype is hosted in Github (2019).

  5. In the following sections, we will call these “rules” for simplicity.

  6. Log statements are recognized with regex which is explained in next paragraph.

  7. A hunk is the basic unit in a patch. It begins with range information and is immediately followed with the line additions, line deletions, and any number of the contextual lines. Hunks used in this experiment contain six lines of contextual code before and after the edited code.

  8. With edit scripts as a sequences of edit actions (Falleri et al. 2014), syntactical edit scripts in this paper refers to sequences of edit actions made to syntactical structures.

  9. Logging context model used to describe the semantics context of log revisions is explained detailedly in Section 3.2.

  10. For consideration of accuracy, clustering algorithm used in this paper takes the threshold of similarity as one.

  11. This paper models log modifications based on syntactical edit scripts, see Section 3.3 for more details

  12. Given one revision, if its category is “log deletion”, the new log statement is marked as empty string. Similarly, if its category is “log insertion”, the old log statement is empty string.

  13. For reducing false alarms, we only recommend revisions if the similarity of candidate pair is 100%.

  14. For rules that insert new log statements, we split code snippets on basis of function.

  15. Confidence interval is 3.93 with a confidence level as 95%. This is calculated with Sample Size Calculator (Systems CR 2019).

  16. As mentioned in Section 3.5.1, LogTracker automatically filters infeasible log revisions, while for considering of accuracy, we also manually verify the correctness of automatic filtering.

  17. We found that candidates posted in Github are more possible to be replied. In fact, 29 candidates detected in OpenDDS, Ice and GIMP are both replied in time since their issues are managed with Github.

  18. As shown in Tables 4 and 5, log revisions and rules of other four projects are so few that we do not show data of the four software in this experiment.

  19. This process is done by searching historical log revisions that share the same keywords in contextual lines.

  20. In this case, each of the generated similar revision group consists of only one train instance. Considering the limited input, they are taken as effective rules.

  21. As mentioned in Section 4.1, developers may miss log revisions. Besides, the process of manually building oracle test suit may also miss some context-similar log revisions. As such, recall of this experiment is not reliable and we do not mention it here.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanshan Li.

Additional information

Communicated by: Chanchal Roy, Janet Siegmund, and David Lo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work in this paper was supported by National Natural Science Foundation of China (Project No.61690203, U1711261, 61872373 and 61872375); National Key R&D Program of China (Project No.2017YFB1001802 and 2017YFB0202201). An earlier version (Li et al. 2018) was presented at the IEEE/ACM International Conference on Program Comprehension 2018.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Niu, X., Jia, Z. et al. Guiding log revisions by learning from software evolution history. Empir Software Eng 25, 2302–2340 (2020). https://doi.org/10.1007/s10664-019-09757-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09757-y

Keywords

Navigation