Guiding log revisions by learning from software evolution history

Li, Shanshan; Niu, Xu; Jia, Zhouyang; Liao, Xiangke; Wang, Ji; Li, Tao

doi:10.1007/s10664-019-09757-y

Guiding log revisions by learning from software evolution history

Published: 09 September 2019

Volume 25, pages 2302–2340, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Shanshan Li¹,
Xu Niu ORCID: orcid.org/0000-0001-7622-7904¹,
Zhouyang Jia¹,
Xiangke Liao¹,
Ji Wang¹ &
…
Tao Li²

593 Accesses
5 Citations
Explore all metrics

Abstract

Despite the importance of log statements in postmortem debugging, developers are difficult to establish good logging practices. There are mainly two reasons. First, there are no rigorous specifications or systematic processes to instruct logging practices. Second, logging code evolves with bug fixes or feature updates. Without considering the impact of software evolution, previous works on log enhancement can partially release the first problem but are hard to solve the latter. To fill this gap, this paper proposes to guide log revisions by learning from evolution history. Motivated by code clones, we assume that logging code with similar context is pervasive and deserves similar modifications and conduct an empirical study on 12 open-source projects to validate our assumption. Upon this, we design and implement LogTracker, an automatic tool that learns log revision rules by mining the correlation between logging context and modifications and recommends candidate log revisions by applying these rules. With an enhanced modeling of logging context, LogTracker can instruct more intricate log revisions that cannot be covered by existing tools. Our experiments show that LogTracker can detect 369 instances of candidates when applied to the latest versions of software. So far, we have reported 79 of them, and 52 have been accepted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems

Article 07 March 2019

Towards just-in-time suggestions for log changes

Article 24 October 2016

Quick remedy commits and their impact on mining software repositories

Article Open access 28 October 2021

Notes

Revisions are considered as after-thoughts if they are modified later than the modification of the surrounding code.
Here “version” means the internal version number (not the release version). This may be incremented many times in one day.
Log statements share semantically similar context if they print similar log variables under similar condition and are called as “context-similar log revisions” for simplification.
Source code of our prototype is hosted in Github (2019).
In the following sections, we will call these “rules” for simplicity.
Log statements are recognized with regex which is explained in next paragraph.
A hunk is the basic unit in a patch. It begins with range information and is immediately followed with the line additions, line deletions, and any number of the contextual lines. Hunks used in this experiment contain six lines of contextual code before and after the edited code.
With edit scripts as a sequences of edit actions (Falleri et al. 2014), syntactical edit scripts in this paper refers to sequences of edit actions made to syntactical structures.
Logging context model used to describe the semantics context of log revisions is explained detailedly in Section 3.2.
For consideration of accuracy, clustering algorithm used in this paper takes the threshold of similarity as one.
This paper models log modifications based on syntactical edit scripts, see Section 3.3 for more details
Given one revision, if its category is “log deletion”, the new log statement is marked as empty string. Similarly, if its category is “log insertion”, the old log statement is empty string.
For reducing false alarms, we only recommend revisions if the similarity of candidate pair is 100%.
For rules that insert new log statements, we split code snippets on basis of function.
Confidence interval is 3.93 with a confidence level as 95%. This is calculated with Sample Size Calculator (Systems CR 2019).
As mentioned in Section 3.5.1, LogTracker automatically filters infeasible log revisions, while for considering of accuracy, we also manually verify the correctness of automatic filtering.
We found that candidates posted in Github are more possible to be replied. In fact, 29 candidates detected in OpenDDS, Ice and GIMP are both replied in time since their issues are managed with Github.
As shown in Tables 4 and 5, log revisions and rules of other four projects are so few that we do not show data of the four software in this experiment.
This process is done by searching historical log revisions that share the same keywords in contextual lines.
In this case, each of the generated similar revision group consists of only one train instance. Considering the limited input, they are taken as effective rules.
As mentioned in Section 4.1, developers may miss log revisions. Besides, the process of manually building oracle test suit may also miss some context-similar log revisions. As such, recall of this experiment is not reliable and we do not mention it here.

References

Ice (2018) Ice - comprehensive rpc framework. https://zeroc.com/products/ice
Arnold M, Ryder BG (2001) A framework for reducing the cost of instrumented code. ACM SIGPLAN Not 36(5):168–179. https://doi.org/10.1145/381694.378832. http://portal.acm.org/citation.cfm?doid=381694.378832
Article Google Scholar
Chen BJ, Jiang ZM (2017) Characterizing logging practices in Java-based open source software projects - a replication study in apache software foundation. Empir Softw Eng 22(1):330–374. https://doi.org/10.1007/s10664-016-9429-5
Article Google Scholar
Chen B, Jiang Z M (2017) Characterizing and detecting anti-patterns in the logging code. Proceedings - 2017 IEEE/ACM 39th international conference on software engineering, ICSE 2017, pp 71–81. https://doi.org/10.1109/ICSE.2017.15
Collard M L, Decker M J, Maletic JI (2013) SrcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In: IEEE international conference on software maintenance, ICSM, IEEE, pp 516-519, DOI https://doi.org/10.1109/ICSM.2013.85
Collectd (2017) Start page - collectd - The system statistics collection daemon. http://collectd.org/
Conservancy SF (2018) Git. https://git-scm.com/
Davison W (2018) rsync. https://rsync.samba.org/
Defays D (1977) An efficient algorithm for a complete link method. Comput J 20 (4):364–366. https://doi.org/10.1093/comjnl/20.4.364. http://oup.prod.sis.lan/comjnl/article-pdf/20/4/364/1108735/200364.pdf
Article MathSciNet MATH Google Scholar
Ding R, Zhou H, Lou J G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log 2: a cost-aware logging mechanism for performance diagnosis
Falleri JR, Morandat F, Blanc X, Martinez M, Montperrus M (2014) Fine-grained and accurate source code differencing. Proceedings of the 29th ACM/IEEE international conference on automated software engineering - ASE ’14 pp 313–324. http://dl.acm.org/citation.cfm?doid=2642937.2642982
Foundation FS (2016) Diffutils - gnu project - free software foundation. https://www.gnu.org/software/diffutils/
Foundation FS (2017a) Tar - gnu project - free software foundation. https://www.gnu.org/software/tar/
Foundation FS (2017b) Wget - gnu project - free software foundation. https://www.gnu.org/software/wget/
Foundation PS (2018) Built-in functions-python 2.7.14 documentation. https://docs.python.org/2/library/functions.html
Foundation TAS (2017c) httpd - apache hypertext transfer protocol server - apache http server version 2.4. http://httpd.apache.org/docs/2.4/programs/httpd.html
Foundation W (2019) Wireshark - go deep. https://www.wireshark.org/
Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. Proceedings of the 36th international conference on software engineering - ICSE ’14 pp 24–33. http://dl.acm.org/citation.cfm?doid=2591062.2591175
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. Proceedings of the 30th international conference on Software engineering - ICSE ’08 p 321. http://portal.acm.org/citation.cfm?doid=1368088.1368132
Github (2018a) Github - gumtreediff/gumtree: A neat code differencing tool. https://github.com/GumTreeDiff/gumtree
GitHub (2018b) skyhover/deckard: Code clone detection; clone-related bug detection; semantic clone analysis. https://github.com/skyhover/Deckard
Github (2019) niuxu18/logtracker: Automatic tool which tries to guide log revisions by mining software evolution
Hassani M, Shang W, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 11:1–33
Google Scholar
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: Scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on on Software Engineering - ICSE ’07, pp 96–105, DOI https://doi.org/10.1109/ICSE.2007.30
Juergens E, Deissenboeck F, Hummel B (2009) CloneDetective - A workbench for clone detection research. In: Proceedings of the 31th International Conference on Software Engineering - ICSE ’09, pp 603–606, DOI https://doi.org/10.1109/ICSE.2009.5070566
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670. https://doi.org/10.1109/TSE.2002.1019480
Article Google Scholar
Kawrykow D, Robillard M P (2011) Non-essential changes in version histories. In: Proceedings of the 33th international conference on Software engineering - ICSE ’11, pp 351–360. https://doi.org/10.1145/1985793.1985842
kevin8t8 (2018) The mutt e-mail client. http://www.mutt.org/
Kim M, Sazawal V, Notkin D (2005) An empirical study of code clone genealogies. ACM SIGSOFT Software Engineering Notes 30(5):187. https://doi.org/10.1145/1095430.1081737. http://portal.acm.org/citation.cfm?doid=1095430.1081737
Article Google Scholar
Li H, Shang W, Zou Y, E Hassan A (2017) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865. https://doi.org/10.1007/s10664-016-9467-z
Article Google Scholar
Li S, Niu X, Jia Z, Wang J, He H, Wang T (2018) Logtracker: Learning log revision behaviors proactively from software evolution history. In: Proceedings of IEEE/ACM international conference on program comprehension 2018 - ICPC, 2018
Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - OSDI ’04, pp 20, DOI https://doi.org/10.1109/TSE.2006.28
Media S (2018) Sloccount download — sourceforge.net. https://sourceforge.net/projects/sloccount/
Meng N, Kim M, McKinley KS (2011) Systematic editing. Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI ’11 p 329. http://portal.acm.org/citation.cfm?doid=1993498.1993537
Meng N, Kim M, McKinley K S (2013) LASE : Locating and Applying Systematic Edits by Learning from Examples
Mondai M, Roy C K, Schneider K A (2018) Micro-clones in evolving software. Proceedings of 25th IEEE international conference on software analysis, evolution and reengineering - SANER’18, pp 50–60
OCI (2018) Opendds. http://opendds.org/
Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: Proceedings of the 37th IEEE international conference on software engineering - ICSE ’15, pp 169–178, DOI https://doi.org/10.1109/ICSE.2015.145
Polozov O, Gulwani S (2015) FlashMeta: A framework for inductive program synthesis. ACM SIGPLAN Not 50(10):107–126. https://doi.org/10.1145/2858965.2814310. http://dl.acm.org/citation.cfm?doid=2858965.2814310
Article Google Scholar
Rolim R, Soares G, D’Antoni L, Polozov O, Gulwani S, Gheyi R, Suzuki R, Hartmann B (2017) Learning syntactic program transformations from examples. In: Proceedings of the 39th international conference on software engineering - ICSE ’17, pp 404–415, DOI https://doi.org/10.1109/ICSE.2017.44
Sigelman B H, Andr L, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C (2010) Dapper, A large-scale distributed systems tracing infrastructure. Tech rep., California, USA. https://ai.google/research/pubs/pub36356
Systems CR (2019) Sample size calculator. https://www.surveysystem.com/sscalc.htm
Team TG (2019) Gimp - gnu image manipulation program. https://www.gimp.org/
Venema W (2013) The postfix home page. http://www.postfix.org/
Yuan D, Park S, Huang P, Liu Y, Lee M (2012a) Be conservative: enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation - OSDI ’12, 41(6):293–306
Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th international conference on software engineering - ICSE ’12, pp 102–112
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012c) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30(1):1–28. https://doi.org/10.1145/2110356.2110360. http://dl.acm.org/citation.cfm?doid=2110356.2110360
Article Google Scholar
Zhao X, Rodrigues K, Stumm M (2017) Log20: Fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of the 26th symposium on operating systems principles - SOSP ’17, pp 565–581, DOI https://doi.org/10.1145/3132747.3132778
Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th international conference on software engineering - ICSE ’15, pp 415–425, DOI https://doi.org/10.1109/ICSE.2015.60

Download references

Author information

Authors and Affiliations

The School of Computer, National University of Defense Technology, Changsha, China
Shanshan Li, Xu Niu, Zhouyang Jia, Xiangke Liao & Ji Wang
The School of Computer Science and Technology, Nankai University, Tianjin, China
Tao Li

Authors

Shanshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xu Niu
View author publications
You can also search for this author in PubMed Google Scholar
Zhouyang Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xiangke Liao
View author publications
You can also search for this author in PubMed Google Scholar
Ji Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shanshan Li.

Additional information

Communicated by: Chanchal Roy, Janet Siegmund, and David Lo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work in this paper was supported by National Natural Science Foundation of China (Project No.61690203, U1711261, 61872373 and 61872375); National Key R&D Program of China (Project No.2017YFB1001802 and 2017YFB0202201). An earlier version (Li et al. 2018) was presented at the IEEE/ACM International Conference on Program Comprehension 2018.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Niu, X., Jia, Z. et al. Guiding log revisions by learning from software evolution history. Empir Software Eng 25, 2302–2340 (2020). https://doi.org/10.1007/s10664-019-09757-y

Download citation

Published: 09 September 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10664-019-09757-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Guiding log revisions by learning from software evolution history

Abstract

Access this article

Similar content being viewed by others

Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems

Towards just-in-time suggestions for log changes

Quick remedy commits and their impact on mining software repositories

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Guiding log revisions by learning from software evolution history

Abstract

Access this article

Similar content being viewed by others

Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems

Towards just-in-time suggestions for log changes

Quick remedy commits and their impact on mining software repositories

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation