How effective are existing Java API specifications for finding bugs during runtime verification?

Legunsen, Owolabi; Al Awar, Nader; Xu, Xinyue; Hassan, Wajih Ul; Roşu, Grigore; Marinov, Darko

doi:10.1007/s10515-019-00267-1

How effective are existing Java API specifications for finding bugs during runtime verification?

Published: 21 November 2019

Volume 26, pages 795–837, (2019)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Owolabi Legunsen ORCID: orcid.org/0000-0001-5631-4816¹,
Nader Al Awar²,
Xinyue Xu¹,
Wajih Ul Hassan¹,
Grigore Roşu¹ &
…
Darko Marinov¹

726 Accesses
3 Citations
Explore all metrics

Abstract

Runtime verification can be used to find bugs early, during software development, by monitoring test executions against formal specifications (specs). The quality of runtime verification depends on the quality of the specs. While previous research has produced many specs for the Java API, manually or through automatic mining, there has been no large-scale study of their bug-finding effectiveness. Our conference paper presented the first in-depth study of the bug-finding effectiveness of previously proposed specs. We used JavaMOP to monitor 182 manually written and 17 automatically mined specs against more than 18K manually written and 2.1M automatically generated test methods in 200 open-source projects. The average runtime overhead was under \(4.3{\times }\). We inspected 652 violations of manually written specs and (randomly sampled) 200 violations of automatically mined specs. We reported 95 bugs, out of which developers already fixed or accepted 76. However, most violations, 82.81% of 652 and 97.89% of 200, were false alarms. Based on our empirical results, we conclude that (1) runtime verification technology has matured enough to incur tolerable runtime overhead during testing, and (2) the existing API specifications can find many bugs that developers are willing to fix; however, (3) the false alarm rates are worrisome and suggest that substantial effort needs to be spent on engineering better specs and properly evaluating their effectiveness. We repeated our experiments on a different set of 18 projects and inspected all resulting 742 violations. The results are similar, and our conclusions are the same.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools

Article 19 September 2014

Defects4J as a Challenge Case for the Search-Based Software Engineering Community

eMOP: A Maven Plugin for Evolution-Aware Runtime Verification

Notes

These specs are publicly available (Pradel 2015).

References

Allan, C., Avgustinov, P., Christensen, A.S., Hendren, L., Kuzins, S., Lhoták, O., de Moor, O., Sereni, D., Sittampalam, G., Tibble, J.: Adding trace matching with free variables to AspectJ. In: OOPSLA, pp. 345–364 (2005)
Arnold, M., Vechev, M., Yahav, E.: QVM: An efficient runtime for detecting defects in deployed systems. In: OOPSLA, pp. 143–162 (2008)
Beckman, N.E., Nori, A.V.: Probabilistic, modular and scalable inference of typestate specifications. In: PLDI, pp. 211–221 (2011)
Blackburn, S.M., Garner, R., Hoffmann, C., Khang, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B. The DaCapo benchmarks: Java benchmarking development and analysis. In: OOPSLA, pp. 169–190 (2006)
Bodden, E.: MOPBox: a library approach to runtime verification. In: RV Tool Demo, pp. 365–369 (2011)
Chapter Google Scholar
Bodden, E., Hendren, L., Lam, P., Lhoták, O., Naeem, N.A.: Collaborative runtime verification with tracematches. In: RV, pp. 22–37 (2007a)
Bodden, E., Hendren, L.J., Lhoták, O.: A staged static program analysis to improve the performance of runtime monitoring. In: ECOOP, pp. 525–549 (2007b)
Bodden, E., Lam, P., Hendren, L.: Finding programming errors earlier by evaluating runtime monitors ahead-of-time. In: FSE, pp. 36–47 (2008)
Chen, D., Zhang, Y., Wang, R., Li, X., Peng, L., Wei, W.: Mining universal specification based on probabilistic model. In: SEKE, pp. 471–476 (2015)
Chen, F., Roşu, G.: Towards monitoring-oriented programming: a paradigm combining specification and implementation. In: RV, pp. 108–127 (2003)
Cochran, W.G.: Sampling Techniques. Wiley, New York (1977)
MATH Google Scholar
Dallmeier, V., Knopp, N., Mallon, C., Hack, S., Zeller, A.: Generating test cases for specification mining. In: ISSTA, pp. 85–96 (2010)
Dwyer, M.B., Purandare, R., Person, S.: Runtime verification in context: can optimizing error detection improve fault diagnosis? In: RV, pp. 36–50 (2010)
Emopers: Closing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream. https://github.com/JodaOrg/joda-time/pull/339 (2015). Accessed 15 Nov 2019
Emopers: Checking the validity of input ListIterators. https://github.com/imglib/imglib2/pull/259 (2019). Accessed 15 Nov 2019
Forejt, V., Kwiatkowska, M., Parker, D., Qu, H., Ujma, M.: Incremental runtime verification of probabilistic systems. In: RV, pp. 314–319 (2012)
Formal Systems Laboratory: JavaMOP. http://fsl.cs.illinois.edu/index.php/JavaMOP (2014). Accessed 15 Nov 2019
Formal Systems Laboratory: Collections\(\_\)SynchronizedCollection. http://fsl.cs.illinois.edu/annotated-java/__properties/html/java/util/Collections_SynchronizedCollection.html (2015a). Accessed 15 Nov 2019
Formal Systems Laboratory: JavaMOPAgent Documentation. https://github.com/runtimeverification/javamop/blob/master/docs/JavaMOPAgentUsage.md (2015b). Accessed 15 Nov 2019
Formal Systems Laboratory: FSL Specification Database. https://runtimeverification.com/monitor/propertydb (2016). Accessed 15 Nov 2019
Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: ICSE, pp. 15–24 (2010)
Gabel, M., Su, Z.: Testing mined specifications. In: FSE, pp. 1–11 (2012)
Hussein, S., Meredith, P., Roşu, G.: Security-policy monitoring and enforcement with JavaMOP. In: PLAS, pp. 1–11 (2012)
Jin, D., Meredith, P.O., Griffith, D., Roşu, G.: Garbage collection for monitoring parametric properties. In: PLDI, pp. 415–424 (2011)
Jin, D., Meredith, P.O., Lee, C., Roşu, G.: JavaMOP: Efficient parametric runtime monitoring framework. In: ICSE Demo, pp. 1427–1430 (2012a)
Jin, D., Meredith, P.O., Roşu, G.: Scalable parametric runtime monitoring. Technical report, Computer Science Department, UIUC (2012b)
Joda, S.: Joda-Time. http://www.joda.org/joda-time/ (2016). Accessed 15 Nov 2019
Karaorman, M., Freeman, J.: jMonitor: Java runtime event specification and monitoring library. In: RV, pp. 181–200 (2004)
Article Google Scholar
Krka, I., Brun, Y., Medvidovic, N.: Automatic mining of specifications from invocation traces and method invariants. In: FSE, pp. 178–189 (2014)
Le Goues, C., Weimer, W.: Specification mining with few false positives. In: TACAS, pp. 292–306 (2009)
Lee, C., Chen, F., Roşu, G.: Mining parametric specifications. In: ICSE, pp. 591–600 (2011)
Lee, C., Jin, D., Meredith, P.O., Roşu, G.: Towards categorizing and formalizing the JDK API. Technical report, Computer Science Department, UIUC (2012)
Legunsen, O., Marinov, D., Roşu, G.: Evolution-aware monitoring-oriented programming. In: ICSE NIER, pp. 615–618 (2015)
Legunsen, O., Hariri, F., Shi, A., Lu, Y., Zhang, L., Marinov, D.: An extensive study of static regression test selection in modern software evolution. In: FSE, pp. 583–594 (2016a)
Legunsen, O., Hassan, W.U., Xu, X., Rosu, G., Marinov, D.: How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In: ASE, pp. 602–613 (2016b)
Legunsen, O., Hassan, W.U., Xu, X., Roşu, G., Marinov, D.: Supplementary material for this paper. http://fsl.cs.illinois.edu/spec-eval (2016c). Accessed 15 Nov 2019
Legunsen, O., Shi, A., Marinov, D.: STARTS: STAtic Regression Test Selection. In: ASE, pp. 949–954 (2017)
Legunsen, O., Zhang, Y., Hadzi-Tanovic, M., Roşu, G., Marinov, D.: Techniques for evolution-aware runtime verification. In: ICST, pp. 300–311 (2019)
Lemieux, C.: Mining temporal properties of data invariants. In: ICSE SRC, pp. 751–753 (2015)
Lemieux, C., Park, D., Beschastnikh, I.: General LTL specification mining. In: ASE, pp. 81–92 (2015)
Ley, M.: CompleteSearch DBLP. http://www.dblp.org/search/index.php (2015). Accessed 15 Nov 2019
Luo, Q., Zhang, Y., Lee, C., Jin, D., Meredith, P.O., Şerbănuţă, T.F., Roşu, G.: RV-Monitor: efficient parametric runtime verification with simultaneous properties. In: RV, pp. 285–300 (2014)
Chapter Google Scholar
Mao, D., Chen, L., Zhang, L.: An extensive study on cross-project predictive mutation testing. In: ICST, pp. 160–171 (2019)
Meredith, P., Roşu, G.: Efficient parametric runtime verification with deterministic string rewriting. In: ASE, pp. 70–80 (2013)
Meredith, P., Jin, D., Chen, F., Roşu, G.: Efficient monitoring of parametric context-free patterns. In: ASE, pp. 148–157 (2008)
Navabpour, S., Wu, C.W.W., Bonakdarpour, B., Fischmeister, S.: Efficient techniques for near-optimal instrumentation in time-triggered runtime verification. In: RV, pp. 208–222 (2011)
Chapter Google Scholar
Nguyen, A.C., Khoo, S.C.: Extracting significant specifications from mining through mutation testing. In: ICFEM, pp. 472–488 (2011)
Nguyen, H.A., Dyer, R., Nguyen, T.N., Rajan, H.: Mining preconditions of APIs in large-scale code corpus. In: FSE, pp. 166–177 (2014)
Oracle: java.lang.instrument. http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html (2015a). Accessed 15 Nov 2019
Oracle: java.lang.Math. https://docs.oracle.com/javase/7/docs/api/java/lang/Math.html (2015b). Accessed 15 Nov 2019
Oracle: java.net.URL. https://docs.oracle.com/javase/7/docs/api/java/net/URL.html (2015c). Accessed 15 Nov 2019
Oracle: java.util.Collections. https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html (2015d). Accessed 15 Nov 2019
Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: OOPSLA Companion, pp. 815–816 (2007)
Pacheco, C., Ernst, M.D.: Randoop. https://randoop.github.io/randoop/ (2016). Accessed 15 Nov 2019
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: ICSE, pp. 75–84 (2007)
Pacheco, C., Lahiri, S.K., Ball, T.: Finding errors in .NET with feedback-directed random testing. In: ISSTA, pp. 87–96 (2008)
Pradel, M.: Dynamically inferring, refining, and checking API usage protocols. In: OOPSLA Companion, pp. 773–774 (2009)
Pradel, M.: Statically checking API protocol conformance with mined multi-object specifications (supplementary material). http://mp.binaervarianz.de/icse2012-statically/ (2015). Accessed 15 Nov 2019
Pradel, M., Gross, T.R.: Automatic generation of object usage specifications from large method traces. In: ASE, pp. 371–382 (2009)
Pradel, M., Gross, T.R.: Leveraging test generation and specification mining for automated bug detection without false positives. In: ICSE, pp. 288–298 (2012)
Pradel, M., Bichsel, P., Gross, T.R.: A framework for the evaluation of specification miners based on finite state machines. In: ICSM, pp. 1–10 (2010)
Pradel, M., Jaspan, C., Aldrich, J., Gross, T.R.: Statically checking API protocol conformance with mined multi-object specifications. In: ICSE, pp. 925–935 (2012)
Purandare, R., Dwyer, M.B., Elbaum, S.: Optimizing monitoring of finite state properties through monitor compaction. In: ISSTA, pp. 280–290 (2013)
Reger, G., Barringer, H., Rydeheard, D.: A pattern-based approach to parametric specification mining. In: ASE, pp. 658–663 (2013)
Robillard, M.P., Bodden, E., Kawrykow, D., Mezini, M., Ratchford, T.: Automated API property inference techniques. TSE 39(5), 613–637 (2013)
Google Scholar
Shamshiri, S., Just, R., Rojas, J., Fraser, G., McMinn, P., Arcuri, A.: Do automatically generated unit tests find real faults? An empirical study of effectiveness and challenges. In: ASE, pp. 201–211 (2015)
Sun, J., Xiao, H., Liu, Y., Lin, S.W., Qin, S.: TLV: abstraction through testing, learning, and validation. In: ESEC/FSE, pp. 698–709 (2015)
Tan, S.H., Marinov, D., Tan, L., Leavens, G.T.: @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: ICST, pp. 260–269 (2012)
The JaCoCo Team: JaCoCo Java Code Coverage Library. https://www.jacoco.org/jacoco (2018). Accessed 15 Nov 2019
Thummalapenta, S., Xie, T.: Alattin: mining alternative patterns for detecting neglected conditions. In: ASE, pp. 283–294 (2009)
Wasylkowski, A., Zeller, A.: Mining temporal specifications from object usage. In: ASE, pp. 295–306 (2009)
Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: TACAS, pp. 461–476 (2005)
Wu, C.W.W., Kumar, D., Bonakdarpour, B., Fischmeister, S.: Reducing monitoring overhead by integrating event- and time-triggered techniques. In: RV, pp. 304–321 (2013)
Chapter Google Scholar
Wu, Q., Liang, G., Wang, Q., Xie, T., Mei, H.: Iterative mining of resource-releasing specifications. In: ASE, pp. 233–242 (2011)
Zhang, J., Wang, Z., Zhang, L., Hao, D., Zang, L., Cheng, S., Zhang, L.: Predictive mutation testing. In: ISSTA, pp. 342–353 (2016)
Zhang, J., Zhang, L., Harman, M., Hao, D., Jia, Y., Zhang, L.: Predictive mutation testing. In: TSE, pp. 898–918 (2018)
Article Google Scholar
Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: ASE, pp. 307–318 (2009)

Download references

Acknowledgements

Karl Hajal, Milica Hadzi-Tanovic and Igor Lima helped with inspecting violations in our validation study and submitting pull requests. We thank Alex Gyori, Farah Hariri, Cosmin Radoi, and August Shi for feedback on early drafts of this paper, Rahul Gopinath for discussions and help with Randoop, and He Xiao and Yi Zhang for help with JavaMOP. We also thank all authors of papers who replied to our emails concerning their mined specs. This research was partially supported by the NSF Grants CCF-1421503, CCF-1421575, CCF-1438982, CCF-1439957, CNS-1646305, CNS-1740916, and CCF-1763788. Wajih Ul Hassan was partially supported by the Sohaib and Sara Abassi Fellowship. We gratefully acknowledge support for research on testing from Microsoft and Qualcomm.

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
Owolabi Legunsen, Xinyue Xu, Wajih Ul Hassan, Grigore Roşu & Darko Marinov
American University of Beirut, Beirut, Lebanon
Nader Al Awar

Authors

Owolabi Legunsen
View author publications
You can also search for this author in PubMed Google Scholar
Nader Al Awar
View author publications
You can also search for this author in PubMed Google Scholar
Xinyue Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wajih Ul Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Grigore Roşu
View author publications
You can also search for this author in PubMed Google Scholar
Darko Marinov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Owolabi Legunsen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Legunsen, O., Al Awar, N., Xu, X. et al. How effective are existing Java API specifications for finding bugs during runtime verification?. Autom Softw Eng 26, 795–837 (2019). https://doi.org/10.1007/s10515-019-00267-1

Download citation

Received: 03 October 2018
Accepted: 11 October 2019
Published: 21 November 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10515-019-00267-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How effective are existing Java API specifications for finding bugs during runtime verification?

Abstract

Access this article

Similar content being viewed by others

To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools

Defects4J as a Challenge Case for the Search-Based Software Engineering Community

eMOP: A Maven Plugin for Evolution-Aware Runtime Verification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How effective are existing Java API specifications for finding bugs during runtime verification?

Abstract

Access this article

Similar content being viewed by others

To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools

Defects4J as a Challenge Case for the Search-Based Software Engineering Community

eMOP: A Maven Plugin for Evolution-Aware Runtime Verification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation