Skip to main content
Log in

Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Execution logs, which are generated by logging code, are widely used in modern software projects for tasks like monitoring, debugging, and remote issue resolution. Ineffective logging would cause confusion, lack of information during problem diagnosis, or even system crash. However, it is challenging to develop and maintain logging code, as it inter-mixes with the feature code. Furthermore, unlike feature code, it is very challenging to verify the correctness of logging code. Currently developers usually rely on their intuition when performing their logging activities. There are no well established logging guidelines in research and practice. In this paper, we intend to derive such guidelines through mining the historical logging code changes. In particular, we have extracted and studied the Logging-Code-Issue-Introducing (LCII) changes in six popular large-scale Java-based open source software systems. Preliminary studies on this dataset show that: (1) both co-changed and independently changed logging code changes can contain fixes to the LCII changes; (2) the complexity of fixes to LCII changes are similar to regular logging code updates; (3) it takes longer for developers to fix logging code issues than regular bugs; and (4) the state-of-the-art logging code issue detection tools can only detect a small fraction (3%) of the LCII changes. This highlights the urgent need for this area of research and the importance of such a dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • JBoss Logging (2018) https://jboss-logging.github.io/jboss-logging-tools/. Last accessed: 11/28/2018

  • PR5906 (2018) Split all log messages into separate module project codes. https://github.com/wildfly/wildfly/pull/5906. Last accessed: 02/14/2018

  • Replication Package for the LCAnalyzer work (2018) http://www.cse.yorku.ca/zmjiang/share/replication_package/icse2017_chen/LCAnalyzer.zip. Last accessed: 02/14/2018

  • The AspectJ Project (2016) https://eclipse.org/aspectj/. Last accessed: 08/26/2016

  • The replication package (2018) http://www.cse.yorku.ca/zmjiang/share/replication_package/emse2018_chen/replication_package.zip. Last accessed: 04/09/2018

  • Barik T, DeLine R, Drucker S, Fisher D (2016) The Bones of the System: A Case Study of Logging and Telemetry at Microsoft. In: Companion Proceedings of the 38th International Conference on Software Engineering)

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: Bias in bug-fix datasets. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE)

  • Chen B, Jiang ZM (2016) Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation Empirical Software Engineering

  • Chen B, Jiang ZM (2017) Characterizing and detecting anti-patterns in the logging code. In: 2017 IEEE/ACM 39Th international conference on software engineering (ICSE), pp 71–81

  • da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan A (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657

    Article  Google Scholar 

  • Davies S, Roper M, Wood M (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. J Softw: Evol Process 26(1):107–139

    Google Scholar 

  • Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A Cost-aware Logging Mechanism for Performance Diagnosis. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (ATC)

  • Fluri B, Wursch M, Pinzger M, Gall H (2007) Change distilling:tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33 (11):725–743

    Article  Google Scholar 

  • Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: Improving the design of existing code. Addison-Wesley Longman Publishing co. Inc., Reading

    Google Scholar 

  • Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where Do Developers Log? An Empirical Study on Logging Practices in Industry. In: Companion Proceedings of the 36th International Conference on Software Engineering

  • HADOOP-12666 (2018a) Support Microsoft Azure Data Lake - as a file system in Hadoop. https://issues.apache.org/jira/browse/HADOOP-12666. Last accessed: 02/06/2018

  • HADOOP-7358 (2018b) Improve log levels when exceptions caught in RPC handler. https://issues.apache.org/jira/browse/HADOOP-7358. Last accessed: 02/14/2018

  • HADOOP-8347 (2018c) Hadoop Common logs misspell ’successful’. https://issues.apache.org/jira/browse/HADOOP-8347. Last accessed: 02/14/2018

  • HBASE-10470 (2018a) Import generates huge log file while importing large amounts of data. https://issues.apache.org/jira/browse/HBASE-10470. Last accessed: 01/24/2018

  • HBASE-12539 (2018b) HFileLinkCleaner logs are uselessly noisy. https://issues.apache.org/jira/browse/HBASE-12539. Last accessed: 02/14/2018

  • HBASE-750 (2016c) NPE caused by StoreFileScanner.updateReaders. https://issues.apache.org/jira/browse/HBASE-750/. Last accessed: 08/26/2016

  • HBASE-8754 (2018d) Log the client IP/port of the balancer invoker. https://issues.apache.org/jira/browse/HBASE-8754. Last accessed: 02/14/2018

  • HDFS-1073 (2018a) Simpler model for Namenode’s fs Image and edit Logs. https://issues.apache.org/jira/browse/HDFS-1073. Last accessed: 02/14/2018

  • HDFS-11448 (2018b) JN log segment syncing should support HA upgrade. https://issues.apache.org/jira/browse/HDFS-11448. Last accessed: 02/14/2018

  • HDFS-4122 (2018c) Cleanup HDFS logs and reduce the size of logged messages. https://issues.apache.org/jira/browse/HDFS-4122. Last accessed: 02/07/2018

  • HDFS-5800 (2018d) Typo: soft-limit for hard-limit in DFSClient. https://issues.apache.org/jira/browse/HDFS-5800. Last accessed: 02/14/2018

  • HHH-6732 (2018) Some logging trace statements are missing guards against unneeded string creation. https://hibernate.atlassian.net/browse/HHH-6732. Last accessed: 02/14/2018

  • Humble J, Farley D (2010) Continuous delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Addison-Wesley Professional, Reading

    Google Scholar 

  • Jiang ZM, Hassan AE, Hamann G, Flora P (2009) Automated performance analysis of load tests. In: Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM)

  • Kabinna S, Bezemer CP, Shang W, Hassan AE (2016) Logging Library Migrations: A Case Study for the Apache Software Foundation Projects. In: Proceedings of the 13th International Conference on Mining Software Repositories (MSR)

  • Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. J Stat Softw Code Snippets 28(1):1–9

  • Kiczales G, Lamping J, Mendhekar A, Maeda C, Lopes C, Loingtier JM, Irwin J (1997) Aspect-oriented programming

  • Kim S, Zimmermann T, Pan K, Whitehead EJJ (2006) Automatic identification of bug-introducing changes. In: 21St IEEE/ACM international conference on automated software engineering (ASE’06)

  • Li H, Shang W, Hassan AE (2017a) Which log level should developers choose for a new logging statement? Empir Softw Eng 22(4):1684–1716

  • Li H, Shang W, Zou Y, Hassan AE (2017b) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865

  • Moha N, Gueheneuc YG, Duchien L, Meur AFL (2010) DECOR: a method for the specification and detection of code and design smells IEEE transactions on software engineering (TSE)

  • Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55(2):55–61

    Article  Google Scholar 

  • Palomba F, Bavota G, Penta MD, Oliveto R, Poshyvanyk D, Lucia AD (2015) Mining Version Histories for Detecting Code Smells IEEE transactions on software engineering (TSE)

  • Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: Companion Proceedings of the 37th International Conference on Software Engineering

  • Rigby PC, German DM, Storey MA (2008) Open source software peer review practices: a case study of the apache server. In: Proceedings of the 30th International Conference on Software Engineering (ICSE)

  • Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys? In: Annual meeting of the Florida Association of Institutional Research

  • Sajnani H, Saini V, Svajlenko J, Roy CK, Lopes CV (2016) SourcererCC: Scaling Code Clone Detection to Big-code. In: Proceedings of the 38th International Conference on Software Engineering (ICSE)

  • Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw: Evol Process 26(1):3–26

  • Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding Log Lines Using Development Knowledge. In: Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME)

  • Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2005 International Workshop on Mining Software Repositories

  • Williams C, Spacco J (2008a) Branching and merging in the repository. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR’08, pp 19–22, New York

  • Williams C, Spacco J (2008b) Szz revisited: Verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, DEFECTS ’08, pp 32–36, New York

  • Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)

  • Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI)

  • Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12. IEEE Press, Piscataway, pp 102–112

  • Zhao X, Rodrigues K, Luo Y, Stumm M, Yuan D, Zhou Y (2017) log20: Fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of the 26th Symposium on Operating Systems Principles (SOSP)

  • Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering (PROMISE)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boyuan Chen.

Additional information

Communicated by: Lin Tan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Jiang, Z.M.(. Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems. Empir Software Eng 24, 2285–2322 (2019). https://doi.org/10.1007/s10664-019-09690-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09690-0

Keywords

Navigation