Skip to main content
Log in

Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

The pre-logging overhead refers to the overhead produced by pre-logging statements (PLS), which are used to construct the parameters of logging calls. In practice, these logging-related statements are usually guarded by conditional statements, known as logging guards, to ensure that they are only executed when necessary (e.g., at the debugging phase). However, developers occasionally forget to add logging guards to costly PLS, resulting in missing logging guards (MLG) issues, which can have a significant impact on performance, particularly for high-throughput software, and lead to critical performance issues. In this paper, (1) we conduct the first empirical study of 137 commits addressing MLG issues in five popular open-source software of the Hadoop ecosystem. Based on the results, we reveal five findings of the current practice of logging guards. (2) We devise an accurate algorithm to detect PLSs (over 95% in precision and recall) and find out 16 problematic partially guarded logging calls (10 of them are confirmed and fixed by developers). (3) We investigate two metric-based ranking approaches using six software metrics to prioritize PLSs based on their impact on performance. We discover that the execution frequency of PLSs achieves the best performance, and combining multiple software metrics can improve performance (7.8% on average).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. To allow for easier replication and to encourage future research on this subject, we have released a replication package at https://github.com/coder-chenzhi/plover.

  2. https://logging.apache.org/log4j/2.x/index.html

  3. https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/Expression.java.

  4. It is worth noting that the total execution time usually increases one fourth when enabling the profiling, which indicates that the accurate performance measurement of PLSs is not realistic in production.

  5. https://sourceforge.net/p/lemur/wiki/RankLib/.

  6. https://www.atlassian.com/software/clover.

  7. https://cobertura.github.io/cobertura.

  8. https://en.wikipedia.org/wiki/Lazy_evaluation.

References

  • Arnold, M., Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Proceedings of 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 168–179 (2001)

  • Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of 2005 International Conference on Machine Learning, pp. 89–96 (2005)

  • Busjaeger, B., Xie, T.: Learning for test prioritization: an industrial case study. In: Proceedings of 2016 ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 975–980 (2016)

  • Cai, H., Zhou, X., Lou, S., Zhang, Y., Huang, G.: LogPruner: a tool for pruning logging call in android apps. In: Proceedings of 2017 Asia-Pacific Symposium on Internetware, pp. 2:1–2:10 (2017)

  • Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of 2007 International Conference on Machine Learning, pp. 129–136 (2007)

  • Chen, B., Jiang, Z.M.J.: Characterizing logging practices in java-based open source software projects—a replication study in apache software foundation. Empir. Softw. Eng. 22(1), 330–374 (2017)

    Article  Google Scholar 

  • Chen, B., Jiang, Z.M.: Studying the use of Java logging utilities in the wild. In: Proceedings of 2020 IEEE International Conference on Software Engineering (2020)

  • Ding, R., Zhou, H., Lou, J.-G., Zhang, H., Lin, Q., Fu, Q., Zhang, D., Xie, T.: Log2: a cost-aware logging mechanism for performance diagnosis. In: Proceedings of 2015 USENIX Annual Technical Conference, pp. 139–150 (2015)

  • Ding, Z., Chen, J., Shang, W.: Towards the use of the readily available tests from the release pipeline as performance tests: are we there yet? In Proceedings of 2020 ACM/IEEE International Conference on Software Engineering, pp. 1435–1446 (2020)

  • Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  • Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat., pp. 1189–1232 https://doi.org/10.1214/aos/1013203451 (2001)

  • Fu, Q., Zhu, J., Hu, W., Lou, J.-G., Ding, R., Lin, Q., Zhang, D., Xie, T.: Where do developers log? An empirical study on logging practices in industry. In: Companion Proceedings of 2014 International Conference on Software Engineering, pp. 24–33 (2014)

  • Gay, D., Steensgaard, B.: Fast escape analysis and stack allocation for object-based programs. In: Proceedings of 2000 International Conference on Compiler Construction, pp. 82–93 (2000)

  • Hassani, M., Shang, W., Shihab, E., Tsantalis, N.: Studying and detecting log-related issues. Empir. Softw. Eng. 23(6), 3248–3280 (2018)

    Article  Google Scholar 

  • He, P., Chen, Z., He, S., Lyu, M.R.: Characterizing the natural language descriptions in software logging statements. In: Proceedings of 2018 ACM/IEEE International Conference on Automated Software Engineering, pp. 178–189 (2018)

  • Jia, Z., Li, S., Liu, X., Liao, X., Liu, Y.: SMARTLOG: place error log statement by deep understanding of log intention. In: Proceedings of 2018 IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 61–71 (2018)

  • Johnson, B., Song, Y., Murphy-Hill, E., Bowdidge, R.: Why don’t software developers use static analysis tools to find bugs? In: Proceedings of 2013 International Conference on Software Engineering, pp. 672–681 (2013)

  • Lam, P., Bodden, E., Lhoták, O., Hendren, L.: The soot framework for java program analysis: a retrospective. In: Cetus Users and Compiler Infastructure Workshop, vol. 15, p. 35 (2011)

  • Lhoták, O., Hendren, L.: Scaling java points-to analysis using s park. In: International Conference on Compiler Construction, pp. 153–169 (2003)

  • Heng, L., Weiyi, S., Ahmed, E.H.: Which log level should developers choose for a new logging statement? Empir. Softw. Eng. 22(4), 1684–1716 (2017)

    Article  Google Scholar 

  • Li, Z., Chen, T.-H., Yang, J., Shang, W.: DLFinder: characterizing and detecting duplicate logging code smells. In: Proceedings of 2019 International Conference on Software Engineering, pp. 152–163 (2019)

  • Li, Z., Chen, T.-H., Shang, W.: Where shall we log? Studying and suggesting logging locations in code blocks. In: Proceedings of 2020 IEEE/ACM International Conference on Automated Software Engineering, pp. 361–372. IEEE (2020)

  • Li, Z., Li, H., Chen, T.-H., Shang, W.: Deeplv: suggesting log levels using ordinal based neural networks. In: Proceedings of 2021 IEEE/ACM International Conference on Software Engineering, pp. 1461–1472 (2021)

  • Liu, Z., Xia, X., Lo, D., Xing, Z., Hassan, A.E., Li, S.: Which variables should i log? IEEE Trans. Softw. Eng. 47(9), 2012–2031 (2021)

    Google Scholar 

  • Lyu, Y., Li, D., Halfond, W.G.J.: Remove RATs from your code: automated optimization of resource inefficient database writes for mobile applications. In: Proceedings of 2018 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 310–321 (2018)

  • Mizouchi, T., Shimari, K., Ishio, T., Inoue, K.: PADLA: a dynamic log level adapter using online phase detection. In: Proceedings of 2019 International Conference on Program Comprehension, pp. 135–138 (2019)

  • Mostafa, S., Wang, X., Xie, T.: PerfRanker: prioritization of performance regression tests for collection-intensive software. In: Proceedings of 2017 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 23–34 (2017)

  • Nistor, A., Chang, P.-C., Radoi, C., Lu, S.: Caramel: detecting and fixing performance problems that have non-intrusive fixes. In: Proceedings of 2015 ACM/IEEE International Conference on Software Engineering, pp. 902–912 (2015)

  • Pecchia, A., Cinque, M., Carrozza, G., Cotroneo, D.: Industry practices and event logging: assessment of a critical software development process. In: Proceedings of 2015 IEEE/ACM International Conference on Software Engineering, pp. 169–178 (2015)

  • Qian, J., Zhou, Y., Xu, B.: Improving side-effect analysis with lazy access path resolving. In: Proceedings of 2009 IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 35–44 (2009)

  • Rong, G., Xu, Y., Gu, S., Zhang, H., Shao, D.: Can you capture information as you intend to? A case study on logging practice in industry. In: Proceedings of 2020 IEEE International Conference on Software Maintenance and Evolution, pp. 12–22 (2020)

  • Smaragdakis, Y., Balatsouras, G.: Pointer analysis. Found. Trends Program. Lang. 2(1), 1–69 (2015)

    Article  Google Scholar 

  • Snelting, G., Robschink, T., Krinke, J.: Efficient path conditions in dependence graphs for software safety analysis. ACM Trans. Softw. Eng. Methodol. 15(4), 410–457 (2006)

    Article  Google Scholar 

  • Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. 4, 352–357 (1984)

    Article  Google Scholar 

  • Yang, N., Cuijpers, P., Schiffelers, R., Lukkien, J., Serebrenik, A.: An interview study of how developers use execution logs in embedded software engineering. In: Proceedings of 2021 IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, pp. 61–70 (2021)

  • Yuan, D., Park, S., Zhou, Y.: Characterizing logging practices in open-source software. In: Proceedings of 2012 IEEE/ACM International Conference on Software Engineering, pp. 102–112 (2012)

  • Zhao, G., Alencar da Costa, D., Zou, Y.: Improving the pull requests review process using learning-to-rank algorithms. Empir. Softw. Eng. 24(4), 2140–2170 (2019)

    Article  Google Scholar 

  • Zhao, X., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D., Zhou, Y.: Log20: fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of 2017 Symposium on Operating Systems Principles, pp. 565–581 (2017)

  • Zhi, C., Yin, J., Deng, S., Ye, M., Fu, M., Xie, T.: An exploratory study of logging configuration practice in Java. In: Proceedings of 2019 IEEE International Conference on Software Maintenance and Evolution, pp. 459–469 (2019)

  • Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M.R., Zhang, D.: Learning to log: helping developers make informed logging decisions. In: Proceedings of 2015 IEEE/ACM IEEE International Conference on Software Engineering, pp. 415–425 (2015)

Download references

Acknowledgements

This work was partially supported by the National Science Foundation of China (No. U20A20173 and No. 62125206), Natural Science Foundation of Zhejiang Province (No. LR18F020003), and Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuiguang Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhi, C., Deng, S., Han, J. et al. Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem. Autom Softw Eng 29, 11 (2022). https://doi.org/10.1007/s10515-021-00317-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-021-00317-7

Keywords

Navigation