Skip to main content
Log in

Which log level should developers choose for a new logging statement?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements. In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement. We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Listing 1
Listing 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://logging.apache.org/log4j/2.x

  2. http://commons.apache.org/proper/commons-logging

  3. http://www.slf4j.org

  4. https://issues.apache.org/jira/browse/HADOOP-10274

  5. https://issues.apache.org/jira/browse/HADOOP-10015

  6. svn log. http://svnbook.red-bean.com/en/1.7/svn.ref.svn.c.log.html

  7. http://www.ej-technologies.com/products/jprofiler/overview.html

  8. https://eclipse.org/jdt/

  9. http://www.inside-r.org/packages/cran/rms/docs/orm

References

  • Aguinis H (2004) Regression analysis for categorical moderators. Guilford Press

  • Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Article  Google Scholar 

  • Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge

  • Cullmann AD (2015) HandTill2001: Multiple Class Area under ROC Curve. R package version 0.2-10.

  • D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577

    Article  Google Scholar 

  • Eberhardt C (2014) The art of logging. http://www.codeproject.com/Articles/42354/The-Art-of-Logging. Accessed 12 May 2016

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7 (1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461– 470

    Article  MathSciNet  MATH  Google Scholar 

  • Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion ’14, pp 24–33

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Gülcü C, Stark S (2003) The complete log4j manual. Quality Open Software

  • Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  MATH  Google Scholar 

  • Harrell Jr FE (2015a) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer

  • Harrell Jr FE (2015b) rms: Regression Modeling Strategies. R package version 4.4-1

  • Harrell Jr FE (2014) with contributions from Charles Dupont, and many others. Hmisc: Harrell Miscellaneous. R package version 3.14-5

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp 78–88

  • Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER ’16, pp 326–337

  • Kabinna S, Bezemer C-P, Shang W, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, pp 154–164

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer

  • Lawless J, Singhal K (1978) Efficient screening of nonnormal regression models. Biometrics 34(2):318–327

    Article  Google Scholar 

  • Mant J, Doust J, Roalfe A, Barton P, Cowie MR, Glasziou P, Mant D, McManus R, Holder R, Deeks J et al (2009) Systematic review and individual patient data meta-analysis of diagnosis of heart failure, with modelling of implications of different diagnostic strategies in primary care. Health Technol Assess 13(32):1–207

    Article  Google Scholar 

  • Mantel N (1970) Why stepdown procedures in variable selection. Technometrics 12(3):621–625

    Article  Google Scholar 

  • Mariani L, Pastore F (2008) Automated identification of failure causes in system logs. In: Proceedings of the 2008 19th International Symposium on Software Reliability Engineering, ISSRE ’08, pp 117–126

  • Mariani L, Pastore F, Pezze M (2009) A toolset for automated failure analysis. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp 563–566

  • McCullagh P (1980) Regression models for ordinal data. J Royal Stat Soc. Ser B (Methodological) 42(2):109–142

    MathSciNet  MATH  Google Scholar 

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the Qt, VTK, and ITK projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp 192–201

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21 (5):2146–2189

    Article  Google Scholar 

  • McKelvey RD, Zavoina W (1975) A statistical model for the analysis of ordinal level dependent variables. J Math Sociol 4(1):103–120

    Article  MathSciNet  MATH  Google Scholar 

  • MSDN (2011) Logging an exception. https://msdn.microsoft.com/en-us/library/ff664711(v=pandp.50).aspx. Accessed 12 May 2016

  • Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55(2):55–61

    Article  Google Scholar 

  • Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: Proceedings of the 37th International Conference on Software Engineering - Volume 2 , ICSE ’15, pp 169–178

  • Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2011) An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the 18th Working Conference on Reverse Engineering, WCRE ’11, pp 335– 344

  • Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw Evol Process 26(1):3–26

    Article  Google Scholar 

  • Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010) Understanding the impact of code and process metrics on postrelease defects: A case study on the Eclipse project. In: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, pp 4:1–4:10

  • Sommer S, Huggins RM (1996) Variables selection using the Wald test and a robust CP. Appl Stat 45(1):15–29

    Article  MATH  Google Scholar 

  • Wilks DS (2011) Statistical methods in the atmospheric sciences, vol 100. Academic press

  • Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pp 117–132

  • Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: Error diagnosis by connecting clues from run-time logs. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pp 143–154

  • Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp 3–14

  • Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, volume 12 of OSDI ’12 , pp 293–306

  • Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 102–112

  • Yuan D, Luo Y, Zhuang X, Rodrigues GR, Zhao X, Zhang Y, Jain PU, Stumm M (2014) Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI ’14, pp 249–265

  • Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1 , ICSE ’15, pp 415–425

  • Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp 563–572

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Li.

Additional information

Communicated by: Mark Grechanik

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Shang, W. & Hassan, A.E. Which log level should developers choose for a new logging statement?. Empir Software Eng 22, 1684–1716 (2017). https://doi.org/10.1007/s10664-016-9456-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9456-2

Keywords

Navigation