Abstract
In most cases, logs are the only accurate information available for administrators to understand system behavior and diagnose failure root causes. However, due to the lack of well-defined logging guidance, it is challenging for developers to decide what to log, especially logging statements that contain descriptive texts and variables. In this paper, we explore automatically generation of descriptive texts in logging statements and evaluate the effectiveness of various automatic generation methods. We propose that to generate descriptive texts in logging statements can be transferred as a retrieval-based Q&A task. According to the roles of query and answer, we design two retrieval strategies including Code&Code and Code&Log. To measure the similarity between the query and answer, we utilize two types of retrieval algorithms including Information retrieval-based and neural networks-based algorithms. We conduct a systematic analysis of various retrieval algorithms under different retrieval strategies in terms of their effectiveness, and assess their accuracy using the automatic metrics and human evaluation during which 5 instructive findings are presented. We believe that these findings can provide potential implications for both researchers and practitioners for relevant research. Moreover, we construct and release a log text dataset containing over 138K valid log texts from 85 Java projects in Apache ecosystem for future logging statement analysis and generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, B., Jiang, Z.M.J.: Characterizing logging practices in Java-based open source software projects–a replication study in Apache Software Foundation. Empirical Softw. Eng. 22, 330–374 (2017)
Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M.R., Zhang, D.: Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering, vol. 1, pp. 415–425. IEEE Press (2015)
Yuan, D., Zheng, J., Park, S., Zhou, Y., Savage, S.: Improving software diagnosability via log enhancement. ACM Trans. Comput. Syst. (TOCS) 30, 4 (2012)
He, P., Chen, Z., He, S., Lyu, M.R.: Characterizing the natural language descriptions in software logging statements. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 178–189. ACM (2018)
Lv, F., Zhang, H., Lou, J.-g., Wang, S., Zhang, D., Zhao, J.: Codehow: effective code search based on API understanding and extended boolean model (e). In: 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270. IEEE (2015)
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 631–642
Gu, X., Zhang, H., Kim, S.: Deep code search. In: IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 933–944. IEEE (2018)
Eclipse JDT. http://www.eclipse.org/jdt/
Camel Case. https://en.wikipedia.org/wiki/camelcase
Levenshtein Distance. https://en.wikipedia.org/wiki/Levenshtein_distance
Jaccard Index. https://en.wikipedia.org/wiki/Jaccard_index
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)
Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., Zhou, M.: Response generation by context-aware prototype editing. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7281–7288 (2019)
Yuan, D., et al.: Be conservative: enhancing failure diagnosis with proactive logging. In: Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 2012), pp. 293–306 (2012)
Lal, S., Sardana, N., Sureka, A.: LogOptPlus: learning to optimize logging in catch and if programming constructs. In: IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), pp. 215–220. IEEE (2016)
Jia, T., Li, Y., Zhang, C., Xia, W., Jiang, J., Liu, Y.: Machine deserves better logging: a log enhancement approach for automatic fault diagnosis. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 106–111. IEEE (2018)
Zhao, X., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D., Zhou, Y.: The game of twenty questions: do you know where to log? In: Proceedings of the 16th Workshop on Hot Topics in Operating Systems, pp. 125–131. ACM (2017)
Cinque, M., Cotroneo, D., Pecchia, A.: Event logs for the analysis of software failures: a rule-based approach. IEEE Trans. Software Eng. 39, 806–821 (2012)
Chen, B., Jiang, Z.M.J.: Characterizing and detecting anti-patterns in the logging code. In: Proceedings of the 39th International Conference on Software Engineering, pp. 71–81. IEEE Press (2017)
Li, H., Shang, W., Hassan, A.E.: Which log level should developers choose for a new logging statement? Empirical Softw. Eng. 22(4), 1684–1716 (2016). https://doi.org/10.1007/s10664-016-9456-2
Su, Z., Ahn, B.-R., Eom, K.-Y., Kang, M.-K., Kim, J.-P., Kim, M.-K.: Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm. In: 3rd International Conference on Innovative Computing Information and Control, pp. 569–569. IEEE (2008)
Apache Ecosystem. https://www.apache.org/
McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans. Softw. Eng. 38, 1069–1087 (2011)
Wang, K., Ming, Z., Chua, T.-S.: A syntactic tree matching approach to finding similar questions in community-based QA services. In: Proceedings of the 32nd ACM SIGIR conference on Research and development in information retrieval, pp. 187–194. ACM (2019)
Shen, Y., Rong, W., Sun, Z., Ouyang, Y., Xiong, Z.: Question/answer matching for CQA system via combining lexical and sequential information. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Pecchia, A., Cinque, M., Carrozza, G., Cotroneo, D.: Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th International Conference on Software Engineering, vol. 2, pp. 169–178. IEEE Press (2015)
Li, Z., Chen, T.-H., Yang, J., Shang, W.: DLFinder: characterizing and detecting duplicate logging code smells. In: IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 152–163. IEEE (2019)
Liu, Z., Xia, X., Hassan, A.E., Lo, D., Xing, Z., Wang, X.: Neural-machine-translation-based commit message generation: how far are we? In: IEEE/ACM 33rd International Conference on Automated Software Engineering (ASE), pp. 373–384. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Jia, T., Li, Y., Yu, H., Yue, Y., Hou, C. (2020). Automatically Generating Descriptive Texts in Logging Statements: How Far Are We?. In: Oliveira, B.C.d.S. (eds) Programming Languages and Systems. APLAS 2020. Lecture Notes in Computer Science(), vol 12470. Springer, Cham. https://doi.org/10.1007/978-3-030-64437-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-64437-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64436-9
Online ISBN: 978-3-030-64437-6
eBook Packages: Computer ScienceComputer Science (R0)