skip to main content
10.1145/3395363.3397381acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Debugging the performance of Maven’s test isolation: experience report

Published: 18 July 2020 Publication History

Abstract

Testing is the most common approach used in industry for checking software correctness. Developers frequently practice reliable testing-executing individual tests in isolation from each other-to avoid test failures caused by test-order dependencies and shared state pollution (e.g., when tests mutate static fields). A common way of doing this is by running each test as a separate process. Unfortunately, this is known to introduce substantial overhead. This experience report describes our efforts to better understand the sources of this overhead and to create a system to confirm the minimal overhead possible. We found that different build systems use different mechanisms for communicating between these multiple processes, and that because of this design decision, running tests with some build systems could be faster than with others. Through this inquiry we discovered a significant performance bug in Apache Maven’s test running code, which slowed down test execution by on average 350 milliseconds per-test when compared to a competing build system, Ant. When used for testing real projects, this can result in a significant reduction in testing time. We submitted a patch for this bug which has been integrated into the Apache Maven build system, and describe our ongoing efforts to improve Maven’s test execution tooling.

References

[1]
Apache. 2018. Test XML file is not valid when rerun "fails" with an assumption. https://issues.apache.org/jira/projects/SUREFIRE/issues/SUREFIRE-1556.
[2]
Apache. 2018. Thread Pool in Maven Surefire Code. https://github.com/apache/maven-surefire.
[3]
Apache. 2019. Maven Surefire Plugin. https://maven.apache.org/surefire/mavensurefire-plugin/.
[4]
Apache. 2019. Maven Surefire Plugin-surefire:test. https://maven.apache.org/ surefire/maven-surefire-plugin/test-mojo.html.
[5]
Apache. 2019. Should Surefire specialize test runner when test isolation (i.e., fork) is needed? https://issues.apache.org/jira/browse/SUREFIRE-1516.
[6]
Jonathan Bell and Gail Kaiser. 2014. Unit Test Virtualization with VMVM. In International Conference on Software Engineering. 550-561.
[7]
Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Eficient Dependency Detection for Safe Java Test Acceleration. In International Symposium on Foundations of Software Engineering. 770-781.
[8]
J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In International Conference on Software Engineering. 433-444.
[9]
Cor-Paul Bezemer, Shane Mcintosh, Bram Adams, Daniel M. German, and Ahmed E. Hassan. 2017. An Empirical Study of Unspecified Dependencies in Make-Based Build Systems. Empirical Softw. Engg. 22, 6 ( 2017 ), 3117-3148.
[10]
Ahmet Celik, Alex Knaust, Aleksandar Milicevic, and Milos Gligoric. 2016. Build System with Lazy Retrieval for Java Projects. In International Symposium on Foundations of Software Engineering. 643-654.
[11]
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In International Symposium on Foundations of Software Engineering. 809-820.
[12]
Maria Christakis, K. Rustan M. Leino, and Wolfram Schulte. 2014. Formalizing and Verifying a Modern Build Language. In International Symposium on Formal Methods. 643-657.
[13]
Al Danial. 2020. Cloc. https://github.com/AlDanial/cloc.
[14]
Tibor Digana. 2019. [SUREFIRE-1516] Poor performance in reuseForks=false. https://github.com/apache/maven-surefire/commit/ 5148b02ba552cd79ac212b869dec10d01ba4d2e6.
[15]
Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In International Symposium on Foundations of Software Engineering. 235-245.
[16]
Sebastian Erdweg, Moritz Lichter, and Weiel Manuel. 2015. A Sound and Optimal Incremental Build System with Dynamic Dependencies. In Object-Oriented Programming, Systems, Languages & Applications. 89-106.
[17]
Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft's Distributed and Caching Build Service. In International Conference on Software Engineering, Software Engineering in Practice. 11-20.
[18]
Facebook. 2020. Nailgun. https://github.com/facebook/nailgun.
[19]
Martin Fowler. 2018. Eradicating Non-Determinism in Tests. http://martinfowler. com/articles/nonDeterminism.html.
[20]
Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making System User Interactive Tests Repeatable: When and What Should We Control?. In International Conference on Software Engineering. 55-65.
[21]
Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis. 211-222.
[22]
Google. 2020. Bazel. https://bazel.build/.
[23]
Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-polluting Tests to Prevent Test Dependency. In International Symposium on Software Testing and Analysis. 223-233.
[24]
Allan Heydon, Roy Levin, Timothy Mann, and Yuan Yu. 2002. The Vesta Software Configuration Management System. Research Report. http://www.hpl.hp.com/ techreports/Compaq-DEC/SRC-RR-177.pdf.
[25]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In Automated Software Engineering. 426-437.
[26]
Sam Kamin, Lars Clausen, and Ava Jarvis. 2003. Jumbo: Run-time Code Generation for Java and Its Applications. In International Symposium on Code Generation and Optimization. 48-56.
[27]
Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An Extensive Study of Static Regression Test Selection in Modern Software Evolution. In International Symposium on Foundations of Software Engineering. 583-594.
[28]
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In International Symposium on Foundations of Software Engineering. 643-653.
[29]
Shane Mcintosh, Bram Adams, and Ahmed E. Hassan. 2012. The Evolution of Java Build Systems. Empirical Software Engineering 17, 4-5 ( 2012 ), 578-608.
[30]
Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications: Models, Tools, and Controlling Flakiness. In International Conference on Software Engineering. 1479-1480.
[31]
Andrey Mokhov, Neil Mitchell, and Simon Peyton Jones. 2018. Build Systems à La Carte. Proc. ACM Program. Lang. 2, International Conference on Functional Programming ( 2018 ).
[32]
Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating Unit Tests. In International Symposium on Foundations of Software Engineering. 496-499.
[33]
Vladimir Nikolov, Rüdiger Kapitza, and Franz J Hauck. 2009. Recoverable Class Loaders for a Fast Restart of Java Applications. Mobile Networks and Applications 14, 1 ( 2009 ), 53-64.
[34]
Voas JM. Ofutt J, Pan J. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In International Conference on Testing Computer Software. 111-123.
[35]
Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing Regression Test Selection Techniques. Transactions on Software Engineering 22, 8 ( 1996 ), 529-551.
[36]
Peter Smith. 2011. Software Build Systems: Principles and Experience. AddisonWesley Professional.
[37]
Walid Taha. 2004. A Gentle Introduction to Multi-stage Programming. Springer Berlin Heidelberg, 30-50.
[38]
tevemadar. 2018. Blocking on stdin makes Java process take 350ms more to exit. https://stackoverflow.com/a/48979347.
[39]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In International Symposium on Foundations of Software Engineering. 805-816.
[40]
Guoqing Xu and Atanas Rountev. 2010. Detecting Ineficiently-used Containers to Avoid Bloat. In Conference on Programming Language Design and Implementation. 160-173.
[41]
Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. Journal of Software Testing, Verification and Reliability 22, 2 ( 2012 ), 67-120.
[42]
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically Revisiting the Test Independence Assumption. In International Symposium on Software Testing and Analysis. 385-396.

Cited By

View all
  • (2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
  • (2024)Reducing Test Runtime by Transforming Test FixturesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695541(1757-1769)Online publication date: 27-Oct-2024
  • (2024)Prioritizing Tests for Improved RuntimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695298(2273-2278)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. Debugging the performance of Maven’s test isolation: experience report

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2020
    591 pages
    ISBN:9781450380089
    DOI:10.1145/3395363
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Build system
    2. Maven
    3. test isolation

    Qualifiers

    • Research-article

    Conference

    ISSTA '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
    • (2024)Reducing Test Runtime by Transforming Test FixturesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695541(1757-1769)Online publication date: 27-Oct-2024
    • (2024)Prioritizing Tests for Improved RuntimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695298(2273-2278)Online publication date: 27-Oct-2024
    • (2024)Hierarchy-Aware Regression Test Prioritization2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00041(343-354)Online publication date: 28-Oct-2024
    • (2023)PACE: A Program Analysis Framework for Continuous Performance PredictionACM Transactions on Software Engineering and Methodology10.1145/363723033:4(1-23)Online publication date: 14-Dec-2023
    • (2023)Optimizing Continuous Development by Detecting and Preventing Unnecessary Content Generation2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00216(901-913)Online publication date: 11-Sep-2023
    • (2022)Probe-based syscall tracing for efficient and practical file-level test tracesProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527239(126-137)Online publication date: 17-May-2022
    • (2022)Comparing and combining analysis-based and learning-based regression test selectionProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527230(17-28)Online publication date: 19-Jul-2022
    • (2022)Build system aware multi-language regression test selection in continuous integrationProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice10.1145/3510457.3513078(87-96)Online publication date: 21-May-2022
    • (2022)Build System Aware Multi-language Regression Test Selection in Continuous Integration2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP55303.2022.9793870(87-96)Online publication date: May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media