Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem

Zhi, Chen; Deng, Shuiguang; Han, Junxiao; Yin, Jianwei

doi:10.1007/s10515-021-00317-7

Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem

Published: 31 December 2021

Volume 29, article number 11, (2022)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Chen Zhi¹,
Shuiguang Deng ORCID: orcid.org/0000-0003-3361-284X¹,
Junxiao Han² &
…
Jianwei Yin³

517 Accesses
1 Citation
Explore all metrics

Abstract

The pre-logging overhead refers to the overhead produced by pre-logging statements (PLS), which are used to construct the parameters of logging calls. In practice, these logging-related statements are usually guarded by conditional statements, known as logging guards, to ensure that they are only executed when necessary (e.g., at the debugging phase). However, developers occasionally forget to add logging guards to costly PLS, resulting in missing logging guards (MLG) issues, which can have a significant impact on performance, particularly for high-throughput software, and lead to critical performance issues. In this paper, (1) we conduct the first empirical study of 137 commits addressing MLG issues in five popular open-source software of the Hadoop ecosystem. Based on the results, we reveal five findings of the current practice of logging guards. (2) We devise an accurate algorithm to detect PLSs (over 95% in precision and recall) and find out 16 problematic partially guarded logging calls (10 of them are confirmed and fixed by developers). (3) We investigate two metric-based ranking approaches using six software metrics to prioritize PLSs based on their impact on performance. We discover that the execution frequency of PLSs achieves the best performance, and combining multiple software metrics can improve performance (7.8% on average).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Log4Perf: suggesting and updating logging locations for web-based systems’ performance monitoring

Article 26 October 2019

The sense of logging in the Linux kernel

Article 06 August 2022

Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation

Article 07 May 2016

Notes

To allow for easier replication and to encourage future research on this subject, we have released a replication package at https://github.com/coder-chenzhi/plover.
https://logging.apache.org/log4j/2.x/index.html
https://git.eclipse.org/c/jdt/eclipse.jdt.core.git/tree/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/Expression.java.
It is worth noting that the total execution time usually increases one fourth when enabling the profiling, which indicates that the accurate performance measurement of PLSs is not realistic in production.
https://sourceforge.net/p/lemur/wiki/RankLib/.
https://www.atlassian.com/software/clover.
https://cobertura.github.io/cobertura.
https://en.wikipedia.org/wiki/Lazy_evaluation.

References

Arnold, M., Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Proceedings of 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 168–179 (2001)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of 2005 International Conference on Machine Learning, pp. 89–96 (2005)
Busjaeger, B., Xie, T.: Learning for test prioritization: an industrial case study. In: Proceedings of 2016 ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 975–980 (2016)
Cai, H., Zhou, X., Lou, S., Zhang, Y., Huang, G.: LogPruner: a tool for pruning logging call in android apps. In: Proceedings of 2017 Asia-Pacific Symposium on Internetware, pp. 2:1–2:10 (2017)
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of 2007 International Conference on Machine Learning, pp. 129–136 (2007)
Chen, B., Jiang, Z.M.J.: Characterizing logging practices in java-based open source software projects—a replication study in apache software foundation. Empir. Softw. Eng. 22(1), 330–374 (2017)
Article Google Scholar
Chen, B., Jiang, Z.M.: Studying the use of Java logging utilities in the wild. In: Proceedings of 2020 IEEE International Conference on Software Engineering (2020)
Ding, R., Zhou, H., Lou, J.-G., Zhang, H., Lin, Q., Fu, Q., Zhang, D., Xie, T.: Log2: a cost-aware logging mechanism for performance diagnosis. In: Proceedings of 2015 USENIX Annual Technical Conference, pp. 139–150 (2015)
Ding, Z., Chen, J., Shang, W.: Towards the use of the readily available tests from the release pipeline as performance tests: are we there yet? In Proceedings of 2020 ACM/IEEE International Conference on Software Engineering, pp. 1435–1446 (2020)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat., pp. 1189–1232 https://doi.org/10.1214/aos/1013203451 (2001)
Fu, Q., Zhu, J., Hu, W., Lou, J.-G., Ding, R., Lin, Q., Zhang, D., Xie, T.: Where do developers log? An empirical study on logging practices in industry. In: Companion Proceedings of 2014 International Conference on Software Engineering, pp. 24–33 (2014)
Gay, D., Steensgaard, B.: Fast escape analysis and stack allocation for object-based programs. In: Proceedings of 2000 International Conference on Compiler Construction, pp. 82–93 (2000)
Hassani, M., Shang, W., Shihab, E., Tsantalis, N.: Studying and detecting log-related issues. Empir. Softw. Eng. 23(6), 3248–3280 (2018)
Article Google Scholar
He, P., Chen, Z., He, S., Lyu, M.R.: Characterizing the natural language descriptions in software logging statements. In: Proceedings of 2018 ACM/IEEE International Conference on Automated Software Engineering, pp. 178–189 (2018)
Jia, Z., Li, S., Liu, X., Liao, X., Liu, Y.: SMARTLOG: place error log statement by deep understanding of log intention. In: Proceedings of 2018 IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 61–71 (2018)
Johnson, B., Song, Y., Murphy-Hill, E., Bowdidge, R.: Why don’t software developers use static analysis tools to find bugs? In: Proceedings of 2013 International Conference on Software Engineering, pp. 672–681 (2013)
Lam, P., Bodden, E., Lhoták, O., Hendren, L.: The soot framework for java program analysis: a retrospective. In: Cetus Users and Compiler Infastructure Workshop, vol. 15, p. 35 (2011)
Lhoták, O., Hendren, L.: Scaling java points-to analysis using s park. In: International Conference on Compiler Construction, pp. 153–169 (2003)
Heng, L., Weiyi, S., Ahmed, E.H.: Which log level should developers choose for a new logging statement? Empir. Softw. Eng. 22(4), 1684–1716 (2017)
Article Google Scholar
Li, Z., Chen, T.-H., Yang, J., Shang, W.: DLFinder: characterizing and detecting duplicate logging code smells. In: Proceedings of 2019 International Conference on Software Engineering, pp. 152–163 (2019)
Li, Z., Chen, T.-H., Shang, W.: Where shall we log? Studying and suggesting logging locations in code blocks. In: Proceedings of 2020 IEEE/ACM International Conference on Automated Software Engineering, pp. 361–372. IEEE (2020)
Li, Z., Li, H., Chen, T.-H., Shang, W.: Deeplv: suggesting log levels using ordinal based neural networks. In: Proceedings of 2021 IEEE/ACM International Conference on Software Engineering, pp. 1461–1472 (2021)
Liu, Z., Xia, X., Lo, D., Xing, Z., Hassan, A.E., Li, S.: Which variables should i log? IEEE Trans. Softw. Eng. 47(9), 2012–2031 (2021)
Google Scholar
Lyu, Y., Li, D., Halfond, W.G.J.: Remove RATs from your code: automated optimization of resource inefficient database writes for mobile applications. In: Proceedings of 2018 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 310–321 (2018)
Mizouchi, T., Shimari, K., Ishio, T., Inoue, K.: PADLA: a dynamic log level adapter using online phase detection. In: Proceedings of 2019 International Conference on Program Comprehension, pp. 135–138 (2019)
Mostafa, S., Wang, X., Xie, T.: PerfRanker: prioritization of performance regression tests for collection-intensive software. In: Proceedings of 2017 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 23–34 (2017)
Nistor, A., Chang, P.-C., Radoi, C., Lu, S.: Caramel: detecting and fixing performance problems that have non-intrusive fixes. In: Proceedings of 2015 ACM/IEEE International Conference on Software Engineering, pp. 902–912 (2015)
Pecchia, A., Cinque, M., Carrozza, G., Cotroneo, D.: Industry practices and event logging: assessment of a critical software development process. In: Proceedings of 2015 IEEE/ACM International Conference on Software Engineering, pp. 169–178 (2015)
Qian, J., Zhou, Y., Xu, B.: Improving side-effect analysis with lazy access path resolving. In: Proceedings of 2009 IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 35–44 (2009)
Rong, G., Xu, Y., Gu, S., Zhang, H., Shao, D.: Can you capture information as you intend to? A case study on logging practice in industry. In: Proceedings of 2020 IEEE International Conference on Software Maintenance and Evolution, pp. 12–22 (2020)
Smaragdakis, Y., Balatsouras, G.: Pointer analysis. Found. Trends Program. Lang. 2(1), 1–69 (2015)
Article Google Scholar
Snelting, G., Robschink, T., Krinke, J.: Efficient path conditions in dependence graphs for software safety analysis. ACM Trans. Softw. Eng. Methodol. 15(4), 410–457 (2006)
Article Google Scholar
Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. 4, 352–357 (1984)
Article Google Scholar
Yang, N., Cuijpers, P., Schiffelers, R., Lukkien, J., Serebrenik, A.: An interview study of how developers use execution logs in embedded software engineering. In: Proceedings of 2021 IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, pp. 61–70 (2021)
Yuan, D., Park, S., Zhou, Y.: Characterizing logging practices in open-source software. In: Proceedings of 2012 IEEE/ACM International Conference on Software Engineering, pp. 102–112 (2012)
Zhao, G., Alencar da Costa, D., Zou, Y.: Improving the pull requests review process using learning-to-rank algorithms. Empir. Softw. Eng. 24(4), 2140–2170 (2019)
Article Google Scholar
Zhao, X., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D., Zhou, Y.: Log20: fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of 2017 Symposium on Operating Systems Principles, pp. 565–581 (2017)
Zhi, C., Yin, J., Deng, S., Ye, M., Fu, M., Xie, T.: An exploratory study of logging configuration practice in Java. In: Proceedings of 2019 IEEE International Conference on Software Maintenance and Evolution, pp. 459–469 (2019)
Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M.R., Zhang, D.: Learning to log: helping developers make informed logging decisions. In: Proceedings of 2015 IEEE/ACM IEEE International Conference on Software Engineering, pp. 415–425 (2015)

Download references

Acknowledgements

This work was partially supported by the National Science Foundation of China (No. U20A20173 and No. 62125206), Natural Science Foundation of Zhejiang Province (No. LR18F020003), and Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies.

Author information

Authors and Affiliations

Zhejiang University & Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies, Hangzhou, China
Chen Zhi & Shuiguang Deng
Zhejiang University City College, Hangzhou, China
Junxiao Han
Zhejiang University, Hangzhou, China
Jianwei Yin

Authors

Chen Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Shuiguang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Junxiao Han
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuiguang Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhi, C., Deng, S., Han, J. et al. Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem. Autom Softw Eng 29, 11 (2022). https://doi.org/10.1007/s10515-021-00317-7

Download citation

Received: 12 July 2021
Accepted: 08 December 2021
Published: 31 December 2021
DOI: https://doi.org/10.1007/s10515-021-00317-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem

Abstract

Access this article

Similar content being viewed by others

Log4Perf: suggesting and updating logging locations for web-based systems’ performance monitoring

The sense of logging in the Linux kernel

Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards automatic detection and prioritization of pre-logging overhead: a case study of hadoop ecosystem

Abstract

Access this article

Similar content being viewed by others

Log4Perf: suggesting and updating logging locations for web-based systems’ performance monitoring

The sense of logging in the Linux kernel

Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation