skip to main content
10.1145/3395363.3397381acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Debugging the performance of Maven’s test isolation: experience report

Published:18 July 2020Publication History

ABSTRACT

Testing is the most common approach used in industry for checking software correctness. Developers frequently practice reliable testing-executing individual tests in isolation from each other-to avoid test failures caused by test-order dependencies and shared state pollution (e.g., when tests mutate static fields). A common way of doing this is by running each test as a separate process. Unfortunately, this is known to introduce substantial overhead. This experience report describes our efforts to better understand the sources of this overhead and to create a system to confirm the minimal overhead possible. We found that different build systems use different mechanisms for communicating between these multiple processes, and that because of this design decision, running tests with some build systems could be faster than with others. Through this inquiry we discovered a significant performance bug in Apache Maven’s test running code, which slowed down test execution by on average 350 milliseconds per-test when compared to a competing build system, Ant. When used for testing real projects, this can result in a significant reduction in testing time. We submitted a patch for this bug which has been integrated into the Apache Maven build system, and describe our ongoing efforts to improve Maven’s test execution tooling.

References

  1. Apache. 2018. Test XML file is not valid when rerun "fails" with an assumption. https://issues.apache.org/jira/projects/SUREFIRE/issues/SUREFIRE-1556.Google ScholarGoogle Scholar
  2. Apache. 2018. Thread Pool in Maven Surefire Code. https://github.com/apache/maven-surefire.Google ScholarGoogle Scholar
  3. Apache. 2019. Maven Surefire Plugin. https://maven.apache.org/surefire/mavensurefire-plugin/.Google ScholarGoogle Scholar
  4. Apache. 2019. Maven Surefire Plugin-surefire:test. https://maven.apache.org/ surefire/maven-surefire-plugin/test-mojo.html.Google ScholarGoogle Scholar
  5. Apache. 2019. Should Surefire specialize test runner when test isolation (i.e., fork) is needed? https://issues.apache.org/jira/browse/SUREFIRE-1516.Google ScholarGoogle Scholar
  6. Jonathan Bell and Gail Kaiser. 2014. Unit Test Virtualization with VMVM. In International Conference on Software Engineering. 550-561.Google ScholarGoogle Scholar
  7. Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Eficient Dependency Detection for Safe Java Test Acceleration. In International Symposium on Foundations of Software Engineering. 770-781.Google ScholarGoogle Scholar
  8. J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In International Conference on Software Engineering. 433-444.Google ScholarGoogle Scholar
  9. Cor-Paul Bezemer, Shane Mcintosh, Bram Adams, Daniel M. German, and Ahmed E. Hassan. 2017. An Empirical Study of Unspecified Dependencies in Make-Based Build Systems. Empirical Softw. Engg. 22, 6 ( 2017 ), 3117-3148.Google ScholarGoogle Scholar
  10. Ahmet Celik, Alex Knaust, Aleksandar Milicevic, and Milos Gligoric. 2016. Build System with Lazy Retrieval for Java Projects. In International Symposium on Foundations of Software Engineering. 643-654.Google ScholarGoogle Scholar
  11. Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In International Symposium on Foundations of Software Engineering. 809-820.Google ScholarGoogle Scholar
  12. Maria Christakis, K. Rustan M. Leino, and Wolfram Schulte. 2014. Formalizing and Verifying a Modern Build Language. In International Symposium on Formal Methods. 643-657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Al Danial. 2020. Cloc. https://github.com/AlDanial/cloc.Google ScholarGoogle Scholar
  14. Tibor Digana. 2019. [SUREFIRE-1516] Poor performance in reuseForks=false. https://github.com/apache/maven-surefire/commit/ 5148b02ba552cd79ac212b869dec10d01ba4d2e6.Google ScholarGoogle Scholar
  15. Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In International Symposium on Foundations of Software Engineering. 235-245.Google ScholarGoogle Scholar
  16. Sebastian Erdweg, Moritz Lichter, and Weiel Manuel. 2015. A Sound and Optimal Incremental Build System with Dynamic Dependencies. In Object-Oriented Programming, Systems, Languages & Applications. 89-106.Google ScholarGoogle Scholar
  17. Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft's Distributed and Caching Build Service. In International Conference on Software Engineering, Software Engineering in Practice. 11-20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Facebook. 2020. Nailgun. https://github.com/facebook/nailgun.Google ScholarGoogle Scholar
  19. Martin Fowler. 2018. Eradicating Non-Determinism in Tests. http://martinfowler. com/articles/nonDeterminism.html.Google ScholarGoogle Scholar
  20. Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making System User Interactive Tests Repeatable: When and What Should We Control?. In International Conference on Software Engineering. 55-65.Google ScholarGoogle ScholarCross RefCross Ref
  21. Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis. 211-222.Google ScholarGoogle Scholar
  22. Google. 2020. Bazel. https://bazel.build/.Google ScholarGoogle Scholar
  23. Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-polluting Tests to Prevent Test Dependency. In International Symposium on Software Testing and Analysis. 223-233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Allan Heydon, Roy Levin, Timothy Mann, and Yuan Yu. 2002. The Vesta Software Configuration Management System. Research Report. http://www.hpl.hp.com/ techreports/Compaq-DEC/SRC-RR-177.pdf.Google ScholarGoogle Scholar
  25. Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In Automated Software Engineering. 426-437.Google ScholarGoogle Scholar
  26. Sam Kamin, Lars Clausen, and Ava Jarvis. 2003. Jumbo: Run-time Code Generation for Java and Its Applications. In International Symposium on Code Generation and Optimization. 48-56.Google ScholarGoogle Scholar
  27. Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An Extensive Study of Static Regression Test Selection in Modern Software Evolution. In International Symposium on Foundations of Software Engineering. 583-594.Google ScholarGoogle Scholar
  28. Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In International Symposium on Foundations of Software Engineering. 643-653.Google ScholarGoogle Scholar
  29. Shane Mcintosh, Bram Adams, and Ahmed E. Hassan. 2012. The Evolution of Java Build Systems. Empirical Software Engineering 17, 4-5 ( 2012 ), 578-608.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications: Models, Tools, and Controlling Flakiness. In International Conference on Software Engineering. 1479-1480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Andrey Mokhov, Neil Mitchell, and Simon Peyton Jones. 2018. Build Systems à La Carte. Proc. ACM Program. Lang. 2, International Conference on Functional Programming ( 2018 ).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating Unit Tests. In International Symposium on Foundations of Software Engineering. 496-499.Google ScholarGoogle Scholar
  33. Vladimir Nikolov, Rüdiger Kapitza, and Franz J Hauck. 2009. Recoverable Class Loaders for a Fast Restart of Java Applications. Mobile Networks and Applications 14, 1 ( 2009 ), 53-64.Google ScholarGoogle Scholar
  34. Voas JM. Ofutt J, Pan J. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In International Conference on Testing Computer Software. 111-123.Google ScholarGoogle Scholar
  35. Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing Regression Test Selection Techniques. Transactions on Software Engineering 22, 8 ( 1996 ), 529-551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Peter Smith. 2011. Software Build Systems: Principles and Experience. AddisonWesley Professional.Google ScholarGoogle Scholar
  37. Walid Taha. 2004. A Gentle Introduction to Multi-stage Programming. Springer Berlin Heidelberg, 30-50.Google ScholarGoogle Scholar
  38. tevemadar. 2018. Blocking on stdin makes Java process take 350ms more to exit. https://stackoverflow.com/a/48979347.Google ScholarGoogle Scholar
  39. Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In International Symposium on Foundations of Software Engineering. 805-816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Guoqing Xu and Atanas Rountev. 2010. Detecting Ineficiently-used Containers to Avoid Bloat. In Conference on Programming Language Design and Implementation. 160-173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. Journal of Software Testing, Verification and Reliability 22, 2 ( 2012 ), 67-120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically Revisiting the Test Independence Assumption. In International Symposium on Software Testing and Analysis. 385-396.Google ScholarGoogle Scholar

Index Terms

  1. Debugging the performance of Maven’s test isolation: experience report

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2020
      591 pages
      ISBN:9781450380089
      DOI:10.1145/3395363

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate58of213submissions,27%

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader