research-article

Debugging the performance of Maven’s test isolation: experience report

Authors:

Aleksandar Milicevic,

Milos GligoricAuthors Info & Claims

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 249 - 259

https://doi.org/10.1145/3395363.3397381

Published: 18 July 2020 Publication History

Abstract

Testing is the most common approach used in industry for checking software correctness. Developers frequently practice reliable testing-executing individual tests in isolation from each other-to avoid test failures caused by test-order dependencies and shared state pollution (e.g., when tests mutate static fields). A common way of doing this is by running each test as a separate process. Unfortunately, this is known to introduce substantial overhead. This experience report describes our efforts to better understand the sources of this overhead and to create a system to confirm the minimal overhead possible. We found that different build systems use different mechanisms for communicating between these multiple processes, and that because of this design decision, running tests with some build systems could be faster than with others. Through this inquiry we discovered a significant performance bug in Apache Maven’s test running code, which slowed down test execution by on average 350 milliseconds per-test when compared to a competing build system, Ant. When used for testing real projects, this can result in a significant reduction in testing time. We submitted a patch for this bug which has been integrated into the Apache Maven build system, and describe our ongoing efforts to improve Maven’s test execution tooling.

References

[1]

Apache. 2018. Test XML file is not valid when rerun "fails" with an assumption. https://issues.apache.org/jira/projects/SUREFIRE/issues/SUREFIRE-1556.

[2]

Apache. 2018. Thread Pool in Maven Surefire Code. https://github.com/apache/maven-surefire.

[3]

Apache. 2019. Maven Surefire Plugin. https://maven.apache.org/surefire/mavensurefire-plugin/.

[4]

Apache. 2019. Maven Surefire Plugin-surefire:test. https://maven.apache.org/ surefire/maven-surefire-plugin/test-mojo.html.

[5]

Apache. 2019. Should Surefire specialize test runner when test isolation (i.e., fork) is needed? https://issues.apache.org/jira/browse/SUREFIRE-1516.

[6]

Jonathan Bell and Gail Kaiser. 2014. Unit Test Virtualization with VMVM. In International Conference on Software Engineering. 550-561.

[7]

Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Eficient Dependency Detection for Safe Java Test Acceleration. In International Symposium on Foundations of Software Engineering. 770-781.

[8]

J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. 2018. DeFlaker: Automatically Detecting Flaky Tests. In International Conference on Software Engineering. 433-444.

[9]

Cor-Paul Bezemer, Shane Mcintosh, Bram Adams, Daniel M. German, and Ahmed E. Hassan. 2017. An Empirical Study of Unspecified Dependencies in Make-Based Build Systems. Empirical Softw. Engg. 22, 6 ( 2017 ), 3117-3148.

[10]

Ahmet Celik, Alex Knaust, Aleksandar Milicevic, and Milos Gligoric. 2016. Build System with Lazy Retrieval for Java Projects. In International Symposium on Foundations of Software Engineering. 643-654.

[11]

Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression Test Selection Across JVM Boundaries. In International Symposium on Foundations of Software Engineering. 809-820.

[12]

Maria Christakis, K. Rustan M. Leino, and Wolfram Schulte. 2014. Formalizing and Verifying a Modern Build Language. In International Symposium on Formal Methods. 643-657.

Digital Library

[13]

Al Danial. 2020. Cloc. https://github.com/AlDanial/cloc.

[14]

Tibor Digana. 2019. [SUREFIRE-1516] Poor performance in reuseForks=false. https://github.com/apache/maven-surefire/commit/ 5148b02ba552cd79ac212b869dec10d01ba4d2e6.

[15]

Sebastian Elbaum, Gregg Rothermel, and John Penix. 2014. Techniques for Improving Regression Testing in Continuous Integration Development Environments. In International Symposium on Foundations of Software Engineering. 235-245.

[16]

Sebastian Erdweg, Moritz Lichter, and Weiel Manuel. 2015. A Sound and Optimal Incremental Build System with Dynamic Dependencies. In Object-Oriented Programming, Systems, Languages & Applications. 89-106.

[17]

Hamed Esfahani, Jonas Fietz, Qi Ke, Alexei Kolomiets, Erica Lan, Erik Mavrinac, Wolfram Schulte, Newton Sanches, and Srikanth Kandula. 2016. CloudBuild: Microsoft's Distributed and Caching Build Service. In International Conference on Software Engineering, Software Engineering in Practice. 11-20.

Digital Library

[18]

Facebook. 2020. Nailgun. https://github.com/facebook/nailgun.

[19]

Martin Fowler. 2018. Eradicating Non-Determinism in Tests. http://martinfowler. com/articles/nonDeterminism.html.

[20]

Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making System User Interactive Tests Repeatable: When and What Should We Control?. In International Conference on Software Engineering. 55-65.

[21]

Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. 2015. Practical Regression Test Selection with Dynamic File Dependencies. In International Symposium on Software Testing and Analysis. 211-222.

[22]

Google. 2020. Bazel. https://bazel.build/.

[23]

Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable Testing: Detecting State-polluting Tests to Prevent Test Dependency. In International Symposium on Software Testing and Analysis. 223-233.

Digital Library

[24]

Allan Heydon, Roy Levin, Timothy Mann, and Yuan Yu. 2002. The Vesta Software Configuration Management System. Research Report. http://www.hpl.hp.com/ techreports/Compaq-DEC/SRC-RR-177.pdf.

[25]

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects. In Automated Software Engineering. 426-437.

[26]

Sam Kamin, Lars Clausen, and Ava Jarvis. 2003. Jumbo: Run-time Code Generation for Java and Its Applications. In International Symposium on Code Generation and Optimization. 48-56.

[27]

Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. 2016. An Extensive Study of Static Regression Test Selection in Modern Software Evolution. In International Symposium on Foundations of Software Engineering. 583-594.

[28]

Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In International Symposium on Foundations of Software Engineering. 643-653.

[29]

Shane Mcintosh, Bram Adams, and Ahmed E. Hassan. 2012. The Evolution of Java Build Systems. Empirical Software Engineering 17, 4-5 ( 2012 ), 578-608.

Digital Library

[30]

Atif M. Memon and Myra B. Cohen. 2013. Automated Testing of GUI Applications: Models, Tools, and Controlling Flakiness. In International Conference on Software Engineering. 1479-1480.

Digital Library

[31]

Andrey Mokhov, Neil Mitchell, and Simon Peyton Jones. 2018. Build Systems à La Carte. Proc. ACM Program. Lang. 2, International Conference on Functional Programming ( 2018 ).

Digital Library

[32]

Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating Unit Tests. In International Symposium on Foundations of Software Engineering. 496-499.

[33]

Vladimir Nikolov, Rüdiger Kapitza, and Franz J Hauck. 2009. Recoverable Class Loaders for a Fast Restart of Java Applications. Mobile Networks and Applications 14, 1 ( 2009 ), 53-64.

[34]

Voas JM. Ofutt J, Pan J. 1995. Procedures for Reducing the Size of Coverage-based Test Sets. In International Conference on Testing Computer Software. 111-123.

[35]

Gregg Rothermel and Mary Jean Harrold. 1996. Analyzing Regression Test Selection Techniques. Transactions on Software Engineering 22, 8 ( 1996 ), 529-551.

Digital Library

[36]

Peter Smith. 2011. Software Build Systems: Principles and Experience. AddisonWesley Professional.

[37]

Walid Taha. 2004. A Gentle Introduction to Multi-stage Programming. Springer Berlin Heidelberg, 30-50.

[38]

tevemadar. 2018. Blocking on stdin makes Java process take 350ms more to exit. https://stackoverflow.com/a/48979347.

[39]

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. In International Symposium on Foundations of Software Engineering. 805-816.

Digital Library

[40]

Guoqing Xu and Atanas Rountev. 2010. Detecting Ineficiently-used Containers to Avoid Bloat. In Conference on Programming Language Design and Implementation. 160-173.

Digital Library

[41]

Shin Yoo and Mark Harman. 2012. Regression Testing Minimization, Selection and Prioritization: A Survey. Journal of Software Testing, Verification and Reliability 22, 2 ( 2012 ), 67-120.

Digital Library

[42]

Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D Ernst, and David Notkin. 2014. Empirically Revisiting the Test Independence Assumption. In International Symposium on Software Testing and Analysis. 385-396.

Cited By

Wang JWang KNie PFilkov VRay BZhou M(2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695551
Li CBaz AShi AFilkov VRay BZhou M(2024)Reducing Test Runtime by Transforming Test FixturesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695541(1757-1769)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695541
Baz AHuang MShi AFilkov VRay BZhou M(2024)Prioritizing Tests for Improved RuntimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695298(2273-2278)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695298
Show More Cited By

Index Terms

Debugging the performance of Maven’s test isolation: experience report
1. Software and its engineering
  1. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

Introducing Maven
Quantifying the performance isolation properties of virtualization systems
ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science

In this paper, we present the design of a performance isolation benchmark that quantifies the degree to which a virtualization system limits the impact of a misbehaving virtual machine on other well-behaving virtual machines running on the same physical ...
Finding bugs by isolating unit tests
ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering

Even in simple programs there are hidden assumptions and dependencies between units that are not immediately visible in each involved unit. These dependencies are generally hard to identify and locate, and can lead to subtle faults that are often missed,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2020

591 pages

ISBN:9781450380089

DOI:10.1145/3395363

General Chair:
Sarfraz Khurshid
University of Texas at Austin, USA
,
Program Chair:
Corina S. Păsăreanu
Carnegie Mellon University Silicon Valley / NASA Ames Research Center, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '20

Sponsor:

SIGSOFT

ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 18 - 22, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang JWang KNie PFilkov VRay BZhou M(2024)Efficient Incremental Code Coverage Analysis for Regression Test SuitesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695551(1882-1894)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695551
Li CBaz AShi AFilkov VRay BZhou M(2024)Reducing Test Runtime by Transforming Test FixturesProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695541(1757-1769)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695541
Baz AHuang MShi AFilkov VRay BZhou M(2024)Prioritizing Tests for Improved RuntimeProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695298(2273-2278)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695298
Wang HYi PParladorio JLam WMarinov DXie T(2024)Hierarchy-Aware Regression Test Prioritization2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00041(343-354)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00041
Biringa CKul G(2023)PACE: A Program Analysis Framework for Continuous Performance PredictionACM Transactions on Software Engineering and Methodology10.1145/363723033:4(1-23)Online publication date: 14-Dec-2023
https://dl.acm.org/doi/10.1145/3637230
Baral TRahman SChanumolu BBalcı BTuncer TShi ALam W(2023)Optimizing Continuous Development by Detecting and Preventing Unnecessary Content Generation2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00216(901-913)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00216
Eisner DWuersching RSchnappinger MPretschner AGarrido AWong WDe Angelis GDo HNguyen B(2022)Probe-based syscall tracing for efficient and practical file-level test tracesProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527239(126-137)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3524481.3527239
Zhang JLiu YGligoric MLegunsen OShi AGarrido AWong WDe Angelis GDo HNguyen B(2022)Comparing and combining analysis-based and learning-based regression test selectionProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527230(17-28)Online publication date: 19-Jul-2022
https://doi.org/10.1145/3524481.3527230
Elsner DWuersching RSchnappinger MPretschner AGraber MDammer RReimer SHarman MMiller H(2022)Build system aware multi-language regression test selection in continuous integrationProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice10.1145/3510457.3513078(87-96)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510457.3513078
Elsner DWuersching RSchnappinger MPretschner AGraber MDammer RReimer S(2022)Build System Aware Multi-language Regression Test Selection in Continuous Integration2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP55303.2022.9793870(87-96)Online publication date: May-2022
https://doi.org/10.1109/ICSE-SEIP55303.2022.9793870
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten