skip to main content
10.1145/3238147.3238224acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Effectiveness and challenges in generating concurrent tests for thread-safe classes

Published: 03 September 2018 Publication History

Abstract

Developing correct and efficient concurrent programs is difficult and error-prone, due to the complexity of thread synchronization. Often, developers alleviate such problem by relying on thread-safe classes, which encapsulate most synchronization-related challenges. Thus, testing such classes is crucial to ensure the reliability of the concurrency aspects of programs. Some recent techniques and corresponding tools tackle the problem of testing thread-safe classes by automatically generating concurrent tests. In this paper, we present a comprehensive study of the state-of-the-art techniques and an independent empirical evaluation of the publicly available tools. We conducted the study by executing all tools on the JaConTeBe benchmark that contains 47 well-documented concurrency faults. Our results show that 8 out of 47 faults (17%) were detected by at least one tool. By studying the issues of the tools and the generated tests, we derive insights to guide future research on improving the effectiveness of automated concurrent test generation.

References

[1]
2018. Effectiveness and Challenges in Generating Concurrent Tests for Threadsafe Classes. http://star.inf.usi.ch/star/software/contest2018/index.htm. (2018). 2018. JaConTeBe. http://sir.unl.edu/portal/bios/JaConTeBe.php. (2018).
[2]
Andrea Arcuri, Gordon Fraser, and Juan Pablo Galeotti. 2014. Automated Unit Test Generation for Classes with Environment Dependencies. In Proceedings of the International Conference on Automated Software Engineering (ASE ’14). ACM, 79–90.
[3]
Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525.
[4]
Victor R Basili, Richard W Selby, and David H Hutchens. 1986. Experimentation in Software Engineering. IEEE Transactions on Software Engineering 7 (1986), 733–743.
[5]
Francesco A. Bianchi, Alessandro Margara, and Mauro Pezzè. 2017. A Survey of Recent Trends in Testing Concurrent Software Systems. IEEE Transactions on Software Engineering (2017).
[6]
Francesco A. Bianchi, Mauro Pezzè, and Valerio Terragni. 2017. Reproducing Concurrency Failures from Crash Stacks. In Proceedings of the Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’17). ACM, 705–716.
[7]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the Conference on Object-Oriented Programming Systems and Applications (OOPSLA ’06). ACM, 169–190.
[8]
Sebastian Burckhardt, Chris Dern, Madanlal Musuvathi, and Roy Tan. 2010. Lineup: A Complete and Automatic Linearizability Checker. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI ’10). ACM, 330–340.
[9]
Yan Cai, Shangru Wu, and W. K. Chan. 2014. ConLock: A Constraint-based Approach to Dynamic Checking on Deadlocks in Multithreaded Programs. In Proceedings of the International Conference on Software Engineering (ICSE ’14). ACM, 491–502.
[10]
Antonio Carzaniga, Alberto Goffi, Alessandra Gorla, Andrea Mattavelli, and Mauro Pezzè. 2014. Cross-checking Oracles from Intrinsic Software Redundancy. In Proceedings of the International Conference on Software Engineering (ICSE ’14). ACM, 931–942.
[11]
Ankit Choudhary, Shan Lu, and Michael Pradel. 2017. Efficient Detection of Thread Safety Violations via Coverage-Guided Generation of Concurrent Tests. In Proceedings of the International Conference on Software Engineering (ICSE ’17). IEEE Computer Society, 266–277.
[12]
Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Automated Test Input Generation for Android: Are We There Yet?. In Proceedings of the International Conference on Automated Software Engineering (ASE ’16). IEEE Computer Society, 429–440.
[13]
Dongdong Deng, Wei Zhang, and Shan Lu. 2013. Efficient Concurrency-bug Detection Across Inputs. In Proceedings of the Conference on Object-Oriented Programming Systems and Applications (OOPSLA ’13). ACM, 785–802.
[14]
Yaniv Eytani, Klaus Havelund, Scott D Stoller, and Shmuel Ur. 2007. Towards a Framework and a Benchmark for Testing Tools for Multi-threaded Programs. Concurrency and Computation: Practice and Experience 19, 3 (2007), 267–279.
[15]
Azadeh Farzan, Andreas Holzer, Niloofar Razavi, and Helmut Veith. 2013. Con2colic Testing. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’13). ACM, 37–47.
[16]
Cormac Flanagan and Stephen N. Freund. 2004. Atomizer: A Dynamic Atomicity Checker for Multithreaded Programs. In Proceedings of the Symposium on Principles of Programming Languages (POPL ’04). ACM, 256–267.
[17]
Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-order Reduction for Model Checking Software. In Proceedings of the Symposium on Principles of Programming Languages (POPL ’05). ACM, 110–121.
[18]
Gordon Fraser and Andrea Arcuri. 2013. EvoSuite: On the Challenges of Test Case Generation in the Real World. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST ’13). IEEE Computer Society, 362–369.
[19]
Gordon Fraser and Andrea Arcuri. 2013. Whole Test Suite Generation. IEEE Transactions on Software Engineering 39, 2 (2013), 276–291.
[20]
Gordon Fraser and Andrea Arcuri. 2014. A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite. ACM Transactions on Software Engineering and Methodology 24, 2, Article 8 (Dec. 2014), 42 pages.
[21]
Brian Goetz and Tim Peierls. 2006. Java Concurrency in Practice. Pearson Education.
[22]
Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. 2016. Automatic Generation of Oracles for Exceptional Behaviors. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’16). ACM, 213–224.
[23]
Shengjian Guo, Markus Kusano, Chao Wang, Zijiang Yang, and Aarti Gupta. 2015. Assertion Guided Symbolic Execution of Multithreaded Programs. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’13). ACM, 854–865.
[24]
Klaus Havelund and Thomas Pressburger. 2000. Model Checking Java Programs Using Java Pathfinder. International Journal on Software Tools for Technology Transfer 2, 4 (2000), 366–381.
[25]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Transactions on Programming Languages and Systems 12, 3 (1990), 463–492.
[26]
Shin Hong and Moonzoo Kim. 2015. A Survey of Race Bug Detection Techniques for Multithreaded Programmes. Software Testing, Verification and Reliability 25, 3 (2015), 191–217.
[27]
Pallavi Joshi, Chang-Seo Park, Koushik Sen, and Mayur Naik. 2009. A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI ’09). ACM, 110–120.
[28]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’14). ACM, 437–440.
[29]
Zhifeng Lai, S. C. Cheung, and W. K. Chan. 2010. Detecting Atomic-set Serializability Violations in Multithreaded Programs Through Active Randomized Testing. In Proceedings of the International Conference on Software Engineering (ICSE ’10). ACM, 235–244.
[30]
Ziyi Lin, Darko Marinov, Hao Zhong, Yuting Chen, and Jianjun Zhao. 2015. JaConTeBe: A Benchmark Suite of Real-World Java Concurrency Bugs (T). In Proceedings of the International Conference on Automated Software Engineering (ASE ’15). IEEE Computer Society, 178–189.
[31]
Shan Lu, Weihang Jiang, and Yuanyuan Zhou. 2007. A Study of Interleaving Coverage Criteria. In Proceedings of the European Software Engineering Conference held jointly with the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC-FSE companion ’07). ACM, 533–536.
[32]
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’08). ACM, 329–339.
[33]
Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI ’08). USENIX Association, 267–280.
[34]
Adrian Nistor, Qingzhou Luo, Michael Pradel, Thomas R. Gross, and Darko Marinov. 2012. BALLERINA: Automatic Generation and Clustering of Efficient Random Unit Tests for Multithreaded Code. In Proceedings of the International Conference on Software Engineering (ICSE ’12). IEEE Computer Society, 727–737.
[35]
Semih Okur and Danny Dig. 2012. How Do Developers Use Parallel Libraries?. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’12). ACM, 54:1–54:11.
[36]
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In Proceedings of the International Conference on Software Engineering (ICSE ’07). ACM, 75–84.
[37]
Soyeon Park, Shan Lu, and Yuanyuan Zhou. 2009. CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’09). ACM, 25–36.
[38]
Mauro Pezzè and Cheng Zhang. 2015. Automated Test Oracles: A Survey. In Advances in Computers. Vol. 95. Elsevier, 1–48.
[39]
Michael Pradel and Thomas R. Gross. 2012. Fully Automatic and Precise Detection of Thread Safety Violations. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI ’12). ACM, 521–530.
[40]
Michael Pradel and Thomas R. Gross. 2013. Automatic Testing of Sequential and Concurrent Substitutability. In Proceedings of the International Conference on Software Engineering (ICSE ’13). IEEE Computer Society, 282–291.
[41]
Michael Pradel, Markus Huggler, and Thomas R. Gross. 2014. Performance Regression Testing of Concurrent Classes. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2014). ACM, 13–25.
[42]
Ganesan Ramalingam. 2000. Context-sensitive Synchronization-sensitive Analysis is Undecidable. ACM Transactions on Programming Languages and Systems 22, 2 (2000), 416–430.
[43]
Niloofar Razavi, Franjo Ivančić, Vineet Kahlon, and Aarti Gupta. 2012. Concurrent Test Generation Using Concolic Multi-trace Analysis. In Asian Symposium on Programming Languages and Systems (ASPLS ’10). Springer, 239–255.
[44]
Malavika Samak and Murali Krishna Ramanathan. 2014. Multithreaded Test Synthesis for Deadlock Detection. In Proceedings of the Conference on Object-Oriented Programming Systems and Applications (OOPSLA ’14). ACM, 473–489.
[45]
Malavika Samak and Murali Krishna Ramanathan. 2014. Omen+: A Precise Dynamic Deadlock Detector for Multithreaded Java Libraries. In Proceedings of the ASE ’18, September 3–7, 2018, Montpellier, France Valerio Terragni and Mauro Pezzè ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 735–738.
[46]
Malavika Samak and Murali Krishna Ramanathan. 2014. Trace Driven Dynamic Deadlock Detection and Reproduction. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). ACM, 29–42.
[47]
Malavika Samak and Murali Krishna Ramanathan. 2015. Synthesizing Tests for Detecting Atomicity Violations. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’15). ACM.
[48]
Malavika Samak, Murali Krishna Ramanathan, and Suresh Jagannathan. 2015. Synthesizing Racy Tests. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI ’15). ACM, 175–185.
[49]
Malavika Samak, Omer Tripp, and Murali Krishna Ramanathan. 2016. Directed Synthesis of Failing Concurrent Executions. In Proceedings of the Conference on Object-Oriented Programming Systems and Applications (OOPSLA ’16). ACM, 430–446.
[50]
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas E. Anderson. 1997. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. ACM Transactions on Computer Systems 15, 4 (1997), 391–411.
[51]
Jochen Schimmel, Korbinian Molitorisz, Ali Jannesari, and Walter F Tichy. 2013. Automatic Generation of Parallel Unit Tests. In Proceedings of the International Workshop on Automation of Software Test (AST ’10). IEEE Computer Society, 40–46.
[52]
Jochen Schimmel, Korbinian Molitorisz, Ali Jannesari, and Walter F Tichy. 2015. Combining Unit Tests for Data Race Detection. In Proceedings of the International Workshop on Automation of Software Test (AST ’15). IEEE Computer Society, 43–47.
[53]
Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges. In Proceedings of the International Conference on Automated Software Engineering (ASE ’15). IEEE Computer Society, 201–211.
[54]
Elena Sherman, Matthew B. Dwyer, and Sebastian Elbaum. 2009. Saturation-based Testing of Concurrent Programs. In Proceedings of the ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’09). ACM, 53–62.
[55]
L. A. Smith, J. M. Bull, and J. Obdrizalek. 2001. A Parallel Java Grande Benchmark Suite. In Supercomputing, ACM/IEEE 2001 Conference. 6–6.
[56]
Sebastian Steenbuck and Gordon Fraser. 2013. Generating Unit Tests for Concurrent Classes. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST ’13). IEEE Computer Society, 144–153.
[57]
Kunal Taneja, Yi Zhang, and Tao Xie. 2010. MODA: Automated Test Generation for Database Applications via Mock Objects. In Proceedings of the International Conference on Automated Software Engineering (ASE ’10). ACM, 289–292.
[58]
Valerio Terragni and Shing-Chi Cheung. 2016. Coverage-driven Test Code Generation for Concurrent Classes. In Proceedings of the International Conference on Software Engineering (ICSE ’16). ACM, 1121–1132.
[59]
Valerio Terragni, Shing-Chi Cheung, and Charles Zhang. 2015. RECONTEST: Effective Regression Testing of Concurrent Programs. In Proceedings of the International Conference on Software Engineering (ICSE ’15). IEEE Computer Society, 246–256.
[60]
Paul Thomson, Alastair F. Donaldson, and Adam Betts. 2014. Concurrency Testing Using Schedule Bounding: An Empirical Study. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). ACM, 15–28.
[61]
Paolo Tonella. 2004. Evolutionary Testing of Classes. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’04). ACM, 119–128.
[62]
Willem Visser, Klaus Havelund, Guillaume Brat, SeungJoon Park, and Flavio Lerda. 2003. Model Checking Programs. Automated Software Engineering 10, 2 (2003), 203–232.
[63]
Chao Wang, Rhishikesh Limaye, Malay Ganai, and Aarti Gupta. 2010. Trace-Based Symbolic Analysis for Atomicity Violations. In Proceedings of the International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS ’10). Springer, 328–342.
[64]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
[65]
Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. 2011. Precise Identification of Problems for Structural Test Generation. In Proceedings of the International Conference on Software Engineering (ICSE ’11). ACM, 611–620.

Cited By

View all
  • (2023)SegFuzz: Segmentizing Thread Interleaving to Discover Kernel Concurrency Bugs through Fuzzing2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179398(2104-2121)Online publication date: May-2023
  • (2023)TSVD4J: Thread-Safety Violation Detection for JavaProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00029(78-82)Online publication date: 14-May-2023
  • (2023)Effective Concurrency Testing for Go via Directional Primitive-Constrained Interleaving ExplorationProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00086(1364-1376)Online publication date: 11-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
September 2018
955 pages
ISBN:9781450359375
DOI:10.1145/3238147
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Concurrency faults
  2. Test generation
  3. Thread-safety

Qualifiers

  • Research-article

Conference

ASE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)SegFuzz: Segmentizing Thread Interleaving to Discover Kernel Concurrency Bugs through Fuzzing2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179398(2104-2121)Online publication date: May-2023
  • (2023)TSVD4J: Thread-Safety Violation Detection for JavaProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00029(78-82)Online publication date: 14-May-2023
  • (2023)Effective Concurrency Testing for Go via Directional Primitive-Constrained Interleaving ExplorationProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00086(1364-1376)Online publication date: 11-Nov-2023
  • (2022)Auxiliary Code Automatic Generation Algorithm of Intelligent Art Platform Design Framework based on Visual 3D Information Modeling2022 7th International Conference on Communication and Electronics Systems (ICCES)10.1109/ICCES54183.2022.9835878(310-314)Online publication date: 22-Jun-2022
  • (2021)Synthesizing Multi-threaded Tests from Sequential Traces to Detect Communication Deadlocks2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST49551.2021.00013(1-12)Online publication date: Apr-2021
  • (2021)Statically driven generation of concurrent tests for thread‐safe classesSoftware Testing, Verification and Reliability10.1002/stvr.177431:4Online publication date: 4-May-2021
  • (2020)ER catcherProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416639(324-335)Online publication date: 21-Dec-2020
  • (2020)ChemTestProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416638(548-560)Online publication date: 21-Dec-2020
  • (2020)Thread Scheduling Sequence Generation Based on All Synchronization Pair Coverage CriteriaInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402050005930:01(97-118)Online publication date: 27-Feb-2020
  • (2020)Verifying and Testing Concurrent Programs using Constraint Solver based Approaches2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME46990.2020.00105(834-838)Online publication date: Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media