skip to main content
10.1145/3551349.3556961acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Call Me Maybe: Using NLP to Automatically Generate Unit Test Cases Respecting Temporal Constraints

Published: 05 January 2023 Publication History

Abstract

A class may need to obey temporal constraints in order to function correctly. For example, the correct usage protocol for an iterator is to always check whether there is a next element before asking for it; iterating over a collection when there are no items left leads to a NoSuchElementException. Automatic test case generation tools such as Randoop and EvoSuite do not have any notion of these temporal constraints. Generating test cases by randomly invoking methods on a new instance of the class under test may raise run time exceptions that do not necessarily expose software faults, but are rather a consequence of violations of temporal properties.
This paper presents CallMeMaybe, a novel technique that uses natural language processing to analyze Javadoc comments to identify temporal constraints. This information can guide a test case generator towards executing sequences of method calls that respect the temporal constraints. Our evaluation on 73 subjects from seven popular Java systems shows that CallMeMaybe achieves a precision of 83% and a recall of 70% when translating temporal constraints into Java expressions. For the two biggest subjects, the integration with Randoop flags 11,818 false alarms and enriches 12,024 correctly failing test cases due to violations of temporal constraints with clear explanation that can help software developers.

References

[1]
Glenn Ammons, Ras Bodik, and James R Larus. 2002. Mining Specifications. In Proceedings of the Symposium on Principles of Programming Languages. ACM, 4–16.
[2]
Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining specifications. In POPL 2002: Proceedings of the 29th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Portland, Oregon, 4–16.
[3]
Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525.
[4]
A. W. Biermann and J. A. Feldman. 1972. On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput. C-21, 6 (June 1972), 592–597.
[5]
Arianna Blasi, Alberto Goffi, Konstantin Kuznetsov, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Sergio Delgado Castellanos. 2018. Translating Code Comments to Procedure Specifications. In Proceedings of the International Symposium on Software Testing and Analysis(ISSTA ’18). ACM.
[6]
Arianna Blasi, Alessandra Gorla, Michael D Ernst, Mauro Pezze, and Antonio Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation. Journal of Systems and Software 181 (2021), 111041.
[7]
E. M. Clarke, Orna Grumberg, and Doron Peled. 1999. Model Checking. MIT Press.
[8]
Guido de Caso, Victor Braberman, Diego Garbervetsky, and Sebastian Uchitel. 2013. Enabledness-based program abstractions for behavior validation. ACM Transactions on Software Engineering and Methodology 22, 3 (July 2013), 25:1–25:46.
[9]
Luciano Del Corro and Rainer Gemulla. 2013. ClausIE: Clause-based Open Information Extraction. In Proceedings of the International Conference on World Wide Web(WWW ’13). ACM, 355–366.
[10]
Vasiliki Efstathiou, Christos Chatzilenas, and Diomidis Spinellis. 2018. Word embeddings for the software engineering domain. In Proceedings of the Working Conference on Mining Software Repositories. ACM, 38–41.
[11]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-Oriented Software. In Proceedings of the European Software Engineering Conference held jointly with the ACM SIGSOFT International Symposium on Foundations of Software Engineering(ESEC/FSE ’11). ACM, 416–419.
[12]
Mark Gabel and Zhendong Su. 2008. Javert: Fully automatic mining of general temporal properties from dynamic traces. In FSE 2008: Proceedings of the ACM SIGSOFT 16th Symposium on the Foundations of Software Engineering. Atlanta, GA, USA, 339–349.
[13]
Dimitra Giannakopoulou and Corina S. Păsăreanu. 2009. Interface generation and compositional verification in JavaPathfinder. In FASE 2009: Fundamental Approaches to Software Engineering. York, UK, 94–108.
[14]
Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. 2016. Automatic Generation of Oracles for Exceptional Behaviors. In Proceedings of the International Symposium on Software Testing and Analysis(ISSTA ’16). ACM, 213–224.
[15]
Gerard J. Holzmann. 1997. The Model Checker SPIN. IEEE Transactions on Software Engineering 23, 5 (May 1997), 279–295. Special Issue: Formal Methods in Software Practice.
[16]
Ruihong Huang, Ignacio Cases, Dan Jurafsky, Cleo Condoravdi, and Ellen Riloff. 2016. Distinguishing Past, On-going, and Future Events: The EventStatus Corpus. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
[17]
Caroline Lemieux, Dennis Park, and Ivan Beschastnikh. 2015. General LTL Specification Mining. In ASE 2015: Proceedings of the 30th Annual International Conference on Automated Software Engineering. Lincoln, NE, USA.
[18]
David Lo and Siau-Cheng Khoo. 2006. SMArTIC: Towards building an accurate, robust and scalable specification miner. In FSE 2006: Proceedings of the ACM SIGSOFT 14th Symposium on the Foundations of Software Engineering. Portland, OR, USA, 265–275.
[19]
Davide Lorenzoli, Leonardo Mariani, and Mauro Pezzè. 2008. Automatic Generation of Software Behavioral Models. In 30th International Conference on Software Engineering (ICSE). IEEE Computer Society.
[20]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: System Demonstrations(ACL ’14). Association for Computational Linguistics, 55–60.
[21]
Kenneth L. McMillan. 1993. Symbolic model checking. Kluwer Academic Publishers.
[22]
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In Proceedings of the International Conference on Software Engineering(ICSE ’07). ACM, 75–84.
[23]
Rahul Pandita, Kunal Taneja, Laurie Williams, and Teresa Tung. 2016. ICON: Inferring temporal constraints from natural language api descriptions. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE Computer Society, 378–388.
[24]
Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring Method Specifications from Natural Language API Descriptions. In Proceedings of the International Conference on Software Engineering(ICSE ’12). IEEE Computer Society, 815–825.
[25]
Michael Pradel, Philipp Bichsel, and Thomas R Gross. 2010. A framework for the evaluation of specification miners based on finite state machines. In 2010 IEEE International Conference on Software Maintenance. IEEE, 1–10.
[26]
Michael Pradel and Thomas R Gross. 2012. Leveraging test generation and specification mining for automated bug detection without false positives. In Proceedings of the International Conference on Software Engineering. IEEE, 288–298.
[27]
Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Path-sensitive inference of function precedence protocols. In Proceedings of the International Conference on Software Engineering. IEEE, 240–250.
[28]
Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Andrea Di Sorbo, and Oscar Nierstrasz. 2021. How to identify class comment types? A multi-language approach for class comment classification. Journal of Systems and Software 181 (2021), 111047.
[29]
Tomohiro Sakaguchi, Daisuke Kawahara, and Sadao Kurohashi. 2018. Comprehensive Annotation of Various Types of Temporal Information on the Time Axis. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
[30]
Sebastian Schuster and Christopher D Manning. 2016. Enhanced english universal dependencies: An improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2371–2378.
[31]
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or Bad Comments? */. In Proceedings of the Symposium on Operating Systems Principles(SOSP ’07). ACM, 145–158.
[32]
Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: Mining Annotations from Comments and Code to Detect Interrupt Related Concurrency Bugs. In Proceedings of the International Conference on Software Engineering(ICSE ’11). 11–20.
[33]
Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In Proceedings of the International Conference on Software Testing, Verification and Validation(ICST ’12). IEEE Computer Society, 260–269.
[34]
Suresh Thummalapenta, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. 2009. MSeqGen: Object-Oriented Unit-Test Generation Via Mining Source Code. In Proceedings of the European Software Engineering Conference held jointly with the ACM SIGSOFT International Symposium on Foundations of Software Engineering(ESEC/FSE ’09). ACM, 193–202.
[35]
Westley Weimer and George Necula. 2005. Mining temporal specifications for error detection. In TACAS 2005: Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Edinburgh, UK, 461–476.
[36]
Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, and Manuvir Das. 2006. Perracotta: Mining Temporal API Rules from Imperfect Traces. In ICSE 2006, Proceedings of the 28th International Conference on Software Engineering. Shanghai, China, 282–291.
[37]
Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009. Inferring Resource Specifications from Natural Language API Documentation. In Proceedings of the International Conference on Automated Software Engineering(ASE ’09). IEEE Computer Society, 307–318.

Cited By

View all
  • (2024)Practitioners’ Expectations on Automated Test GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680386(1618-1630)Online publication date: 11-Sep-2024
  • (2024)Towards Generating Contracts for Scientific Data Analysis WorkflowsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00256(2048-2055)Online publication date: 17-Nov-2024
  • (2024)Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack DefenseCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54528-3_16(281-300)Online publication date: 23-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
October 2022
2006 pages
ISBN:9781450394758
DOI:10.1145/3551349
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Specification inference
  2. automatic test case generation
  3. natural language processing
  4. software testing
  5. test oracle generation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Swiss National Science Foundation

Conference

ASE '22

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Practitioners’ Expectations on Automated Test GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680386(1618-1630)Online publication date: 11-Sep-2024
  • (2024)Towards Generating Contracts for Scientific Data Analysis WorkflowsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00256(2048-2055)Online publication date: 17-Nov-2024
  • (2024)Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack DefenseCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54528-3_16(281-300)Online publication date: 23-Feb-2024
  • (2023)SmartCoCo: Checking Comment-Code Inconsistency in Smart Contracts via Constraint Propagation and Binding2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00142(294-306)Online publication date: 11-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media