ABSTRACT
Testing is the most common approach used in industry for checking software correctness. Developers frequently practice reliable testing-executing individual tests in isolation from each other-to avoid test failures caused by test-order dependencies and shared state pollution (e.g., when tests mutate static fields). A common way of doing this is by running each test as a separate process. Unfortunately, this is known to introduce substantial overhead. This experience report describes our efforts to better understand the sources of this overhead and to create a system to confirm the minimal overhead possible. We found that different build systems use different mechanisms for communicating between these multiple processes, and that because of this design decision, running tests with some build systems could be faster than with others. Through this inquiry we discovered a significant performance bug in Apache Maven’s test running code, which slowed down test execution by on average 350 milliseconds per-test when compared to a competing build system, Ant. When used for testing real projects, this can result in a significant reduction in testing time. We submitted a patch for this bug which has been integrated into the Apache Maven build system, and describe our ongoing efforts to improve Maven’s test execution tooling.
- Apache. 2018. Test XML file is not valid when rerun "fails" with an assumption. https://issues.apache.org/jira/projects/SUREFIRE/issues/SUREFIRE-1556.Google Scholar
- Apache. 2018. Thread Pool in Maven Surefire Code. https://github.com/apache/maven-surefire.Google Scholar
- Apache. 2019. Maven Surefire Plugin. https://maven.apache.org/surefire/mavensurefire-plugin/.Google Scholar
- Apache. 2019. Maven Surefire Plugin-surefire:test. https://maven.apache.org/ surefire/maven-surefire-plugin/test-mojo.html.Google Scholar
- Apache. 2019. Should Surefire specialize test runner when test isolation (i.e., fork) is needed? https://issues.apache.org/jira/browse/SUREFIRE-1516.Google Scholar
- Jonathan Bell and Gail Kaiser. 2014. Unit Test Virtualization with VMVM. In International Conference on Software Engineering. 550-561.Google Scholar
- Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Eficient Dependency Detection for Safe Java Test Acceleration. In International Symposium on Foundations of Software Engineering. 770-781.Google Scholar
- J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In International Conference on Software Engineering. 433-444.Google Scholar
- Cor-Paul Bezemer, Shane Mcintosh, Bram Adams, Daniel M. German, and Ahmed E. Hassan. 2017. An Empirical Study of Unspecified Dependencies in Make-Based Build Systems. Empirical Softw. Engg. 22, 6 ( 2017 ), 3117-3148.Google Scholar
- Ahmet Celik, Alex Knaust, Aleksandar Milicevic, and Milos Gligoric. 2016. Build System with Lazy Retrieval for Java Projects. In International Symposium on Foundations of Software Engineering. 643-654.Google Scholar
- Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In International Symposium on Foundations of Software Engineering. 809-820.Google Scholar
- Maria Christakis, K. Rustan M. Leino, and Wolfram Schulte. 2014. Formalizing and Verifying a Modern Build Language. In International Symposium on Formal Methods. 643-657.Google ScholarDigital Library
- Al Danial. 2020. Cloc. https://github.com/AlDanial/cloc.Google Scholar
- Tibor Digana. 2019. [SUREFIRE-1516] Poor performance in reuseForks=false. https://github.com/apache/maven-surefire/commit/ 5148b02ba552cd79ac212b869dec10d01ba4d2e6.Google Scholar
- Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In International Symposium on Foundations of Software Engineering. 235-245.Google Scholar
- Sebastian Erdweg, Moritz Lichter, and Weiel Manuel. 2015. A Sound and Optimal Incremental Build System with Dynamic Dependencies. In Object-Oriented Programming, Systems, Languages & Applications. 89-106.Google Scholar
- Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft's Distributed and Caching Build Service. In International Conference on Software Engineering, Software Engineering in Practice. 11-20.Google ScholarDigital Library
- Facebook. 2020. Nailgun. https://github.com/facebook/nailgun.Google Scholar
- Martin Fowler. 2018. Eradicating Non-Determinism in Tests. http://martinfowler. com/articles/nonDeterminism.html.Google Scholar
- Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making System User Interactive Tests Repeatable: When and What Should We Control?. In International Conference on Software Engineering. 55-65.Google ScholarCross Ref
- Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis. 211-222.Google Scholar
- Google. 2020. Bazel. https://bazel.build/.Google Scholar
- Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-polluting Tests to Prevent Test Dependency. In International Symposium on Software Testing and Analysis. 223-233.Google ScholarDigital Library
- Allan Heydon, Roy Levin, Timothy Mann, and Yuan Yu. 2002. The Vesta Software Configuration Management System. Research Report. http://www.hpl.hp.com/ techreports/Compaq-DEC/SRC-RR-177.pdf.Google Scholar
- Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In Automated Software Engineering. 426-437.Google Scholar
- Sam Kamin, Lars Clausen, and Ava Jarvis. 2003. Jumbo: Run-time Code Generation for Java and Its Applications. In International Symposium on Code Generation and Optimization. 48-56.Google Scholar
- Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An Extensive Study of Static Regression Test Selection in Modern Software Evolution. In International Symposium on Foundations of Software Engineering. 583-594.Google Scholar
- Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In International Symposium on Foundations of Software Engineering. 643-653.Google Scholar
- Shane Mcintosh, Bram Adams, and Ahmed E. Hassan. 2012. The Evolution of Java Build Systems. Empirical Software Engineering 17, 4-5 ( 2012 ), 578-608.Google ScholarDigital Library
- Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications: Models, Tools, and Controlling Flakiness. In International Conference on Software Engineering. 1479-1480.Google ScholarDigital Library
- Andrey Mokhov, Neil Mitchell, and Simon Peyton Jones. 2018. Build Systems à La Carte. Proc. ACM Program. Lang. 2, International Conference on Functional Programming ( 2018 ).Google ScholarDigital Library
- Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating Unit Tests. In International Symposium on Foundations of Software Engineering. 496-499.Google Scholar
- Vladimir Nikolov, Rüdiger Kapitza, and Franz J Hauck. 2009. Recoverable Class Loaders for a Fast Restart of Java Applications. Mobile Networks and Applications 14, 1 ( 2009 ), 53-64.Google Scholar
- Voas JM. Ofutt J, Pan J. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In International Conference on Testing Computer Software. 111-123.Google Scholar
- Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing Regression Test Selection Techniques. Transactions on Software Engineering 22, 8 ( 1996 ), 529-551.Google ScholarDigital Library
- Peter Smith. 2011. Software Build Systems: Principles and Experience. AddisonWesley Professional.Google Scholar
- Walid Taha. 2004. A Gentle Introduction to Multi-stage Programming. Springer Berlin Heidelberg, 30-50.Google Scholar
- tevemadar. 2018. Blocking on stdin makes Java process take 350ms more to exit. https://stackoverflow.com/a/48979347.Google Scholar
- Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In International Symposium on Foundations of Software Engineering. 805-816.Google ScholarDigital Library
- Guoqing Xu and Atanas Rountev. 2010. Detecting Ineficiently-used Containers to Avoid Bloat. In Conference on Programming Language Design and Implementation. 160-173.Google ScholarDigital Library
- Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. Journal of Software Testing, Verification and Reliability 22, 2 ( 2012 ), 67-120.Google ScholarDigital Library
- Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically Revisiting the Test Independence Assumption. In International Symposium on Software Testing and Analysis. 385-396.Google Scholar
Index Terms
- Debugging the performance of Maven’s test isolation: experience report
Recommendations
Quantifying the performance isolation properties of virtualization systems
ExpCS '07: Proceedings of the 2007 workshop on Experimental computer scienceIn this paper, we present the design of a performance isolation benchmark that quantifies the degree to which a virtualization system limits the impact of a misbehaving virtual machine on other well-behaving virtual machines running on the same physical ...
Finding bugs by isolating unit tests
ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineeringEven in simple programs there are hidden assumptions and dependencies between units that are not immediately visible in each involved unit. These dependencies are generally hard to identify and locate, and can lead to subtle faults that are often missed,...
Comments