Automatically Generating Descriptive Texts in Logging Statements: How Far Are We?

Liu, Xiaotong; Jia, Tong; Li, Ying; Yu, Hao; Yue, Yang; Hou, Chuanjia

doi:10.1007/978-3-030-64437-6_13

Xiaotong Liu⁹,
Tong Jia⁹,
Ying Li⁹,
Hao Yu⁹,
Yang Yue¹⁰ &
…
Chuanjia Hou⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12470))

Included in the following conference series:

Asian Symposium on Programming Languages and Systems

543 Accesses
2 Citations

Abstract

In most cases, logs are the only accurate information available for administrators to understand system behavior and diagnose failure root causes. However, due to the lack of well-defined logging guidance, it is challenging for developers to decide what to log, especially logging statements that contain descriptive texts and variables. In this paper, we explore automatically generation of descriptive texts in logging statements and evaluate the effectiveness of various automatic generation methods. We propose that to generate descriptive texts in logging statements can be transferred as a retrieval-based Q&A task. According to the roles of query and answer, we design two retrieval strategies including Code&Code and Code&Log. To measure the similarity between the query and answer, we utilize two types of retrieval algorithms including Information retrieval-based and neural networks-based algorithms. We conduct a systematic analysis of various retrieval algorithms under different retrieval strategies in terms of their effectiveness, and assess their accuracy using the automatic metrics and human evaluation during which 5 instructive findings are presented. We believe that these findings can provide potential implications for both researchers and practitioners for relevant research. Moreover, we construct and release a log text dataset containing over 138K valid log texts from 85 Java projects in Apache ecosystem for future logging statement analysis and generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/liuxiaotong0302/LogSearch.

References

Chen, B., Jiang, Z.M.J.: Characterizing logging practices in Java-based open source software projects–a replication study in Apache Software Foundation. Empirical Softw. Eng. 22, 330–374 (2017)
Article Google Scholar
Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M.R., Zhang, D.: Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering, vol. 1, pp. 415–425. IEEE Press (2015)
Google Scholar
Yuan, D., Zheng, J., Park, S., Zhou, Y., Savage, S.: Improving software diagnosability via log enhancement. ACM Trans. Comput. Syst. (TOCS) 30, 4 (2012)
Google Scholar
He, P., Chen, Z., He, S., Lyu, M.R.: Characterizing the natural language descriptions in software logging statements. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 178–189. ACM (2018)
Google Scholar
Lv, F., Zhang, H., Lou, J.-g., Wang, S., Zhang, D., Zhao, J.: Codehow: effective code search based on API understanding and extended boolean model (e). In: 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270. IEEE (2015)
Google Scholar
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642
Google Scholar
Gu, X., Zhang, H., Kim, S.: Deep code search. In: IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 933–944. IEEE (2018)
Google Scholar
Eclipse JDT. http://www.eclipse.org/jdt/
Camel Case. https://en.wikipedia.org/wiki/camelcase
Levenshtein Distance. https://en.wikipedia.org/wiki/Levenshtein_distance
Jaccard Index. https://en.wikipedia.org/wiki/Jaccard_index
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., Zhou, M.: Response generation by context-aware prototype editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7281–7288 (2019)
Google Scholar
Yuan, D., et al.: Be conservative: enhancing failure diagnosis with proactive logging. In: Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 2012), pp. 293–306 (2012)
Google Scholar
Lal, S., Sardana, N., Sureka, A.: LogOptPlus: learning to optimize logging in catch and if programming constructs. In: IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), pp. 215–220. IEEE (2016)
Google Scholar
Jia, T., Li, Y., Zhang, C., Xia, W., Jiang, J., Liu, Y.: Machine deserves better logging: a log enhancement approach for automatic fault diagnosis. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 106–111. IEEE (2018)
Google Scholar
Zhao, X., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D., Zhou, Y.: The game of twenty questions: do you know where to log? In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 125–131. ACM (2017)
Google Scholar
Cinque, M., Cotroneo, D., Pecchia, A.: Event logs for the analysis of software failures: a rule-based approach. IEEE Trans. Software Eng. 39, 806–821 (2012)
Article Google Scholar
Chen, B., Jiang, Z.M.J.: Characterizing and detecting anti-patterns in the logging code. In: Proceedings of the 39th International Conference on Software Engineering, pp. 71–81. IEEE Press (2017)
Google Scholar
Li, H., Shang, W., Hassan, A.E.: Which log level should developers choose for a new logging statement? Empirical Softw. Eng. 22(4), 1684–1716 (2016). https://doi.org/10.1007/s10664-016-9456-2
Article Google Scholar
Su, Z., Ahn, B.-R., Eom, K.-Y., Kang, M.-K., Kim, J.-P., Kim, M.-K.: Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm. In: 3rd International Conference on Innovative Computing Information and Control, pp. 569–569. IEEE (2008)
Google Scholar
Apache Ecosystem. https://www.apache.org/
McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans. Softw. Eng. 38, 1069–1087 (2011)
Article Google Scholar
Wang, K., Ming, Z., Chua, T.-S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval, pp. 187–194. ACM (2019)
Google Scholar
Shen, Y., Rong, W., Sun, Z., Ouyang, Y., Xiong, Z.: Question/answer matching for CQA system via combining lexical and sequential information. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Pecchia, A., Cinque, M., Carrozza, G., Cotroneo, D.: Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th International Conference on Software Engineering, vol. 2, pp. 169–178. IEEE Press (2015)
Google Scholar
Li, Z., Chen, T.-H., Yang, J., Shang, W.: DLFinder: characterizing and detecting duplicate logging code smells. In: IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 152–163. IEEE (2019)
Google Scholar
Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., Wang, X.: Neural-machine-translation-based commit message generation: how far are we? In: IEEE/ACM 33rd International Conference on Automated Software Engineering (ASE), pp. 373–384. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Peking University, Beijing, China
Xiaotong Liu, Tong Jia, Ying Li, Hao Yu & Chuanjia Hou
University of California Irvine, Irvine, CA, 92697, USA
Yang Yue

Authors

Xiaotong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Jia
View author publications
You can also search for this author in PubMed Google Scholar
Ying Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yue
View author publications
You can also search for this author in PubMed Google Scholar
Chuanjia Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Li .

Editor information

Editors and Affiliations

University of Hong Kong, Hong Kong, Hong Kong
Bruno C. d. S. Oliveira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Jia, T., Li, Y., Yu, H., Yue, Y., Hou, C. (2020). Automatically Generating Descriptive Texts in Logging Statements: How Far Are We?. In: Oliveira, B.C.d.S. (eds) Programming Languages and Systems. APLAS 2020. Lecture Notes in Computer Science(), vol 12470. Springer, Cham. https://doi.org/10.1007/978-3-030-64437-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-64437-6_13
Published: 24 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64436-9
Online ISBN: 978-3-030-64437-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics