skip to main content
10.1145/3475716.3484488acmconferencesArticle/Chapter ViewAbstractPublication PagesesemConference Proceedingsconference-collections
keynote

How Empirical Research Supports Tool Development: A Retrospective Analysis and new Horizons

Published:11 October 2021Publication History

ABSTRACT

Empirical research provides two-fold support to the development of approaches and tools aimed at supporting software engineers. On the one hand, empirical studies help to understand a phenomenon or context of interest. On the other hand, studies compare approaches and evaluate how software engineers could benefit from them. Over the past decades, there has been a tangible evolution in how empirical evaluation is conducted in software engineering. This is due to multiple reasons. First, the research community has matured a lot thanks also to guidelines developed by several researchers. Second, the large availability of data and artifacts, mainly from the open-source, has made it possible to conduct larger evaluations, and in some cases to reach study participants. This keynote will first overview how empirical research has been used over the past decades to evaluate tools, and how this is changing over the years. Then, we will focus on the importance of combining quantitative and qualitative evaluations, and how sometimes "depth" turns out to be more useful than just "breadth". We will also emphasize how research is not a straightforward path, and negative results are often an essential component for future advances. Last, but not least, we will discuss how the role of empirical evaluation is changing with the pervasiveness of artificial intelligence methods in software engineering research.

References

  1. Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2008. Is it a bug or an enhancement?: a text-based approach to classify change requests. In Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative Research, October 27-30, 2008, Richmond Hill, Ontario, Canada. 23. https://doi.org/10.1145/1463788.1463819 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrea Arcuri and Lionel C. Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu, HI, USA, May 21-28, 2011. 1--10. https://doi.org/10.1145/1985793.1985795 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrea Arcuri and Lionel C. Briand. 2014. A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verification Reliab. 24, 3 (2014), 219--250. https://doi.org/10.1002/stvr.1486 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar T. Devanbu, and Abraham Bernstein. 2010. The missing links: bugs and bug-fix commits. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2010, Santa Fe, NM, USA, November 7-11, 2010. 97--106. https://doi.org/10.1145/1882291.1882308 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sebastian Baltes, Christoph Treude, and Stephan Diehl. 2019. SOTorrent: studying the origin, evolution, and usage of stack overflow code snippets. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada. 191--194. https://doi.org/10.1109/MSR.2019.00038 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Titus Barik and Emerson R. Murphy-Hill. 2016. A process for surviving survey design and sailing through survey deployment. In Perspectives on Data Science for Software Engineering. 213--219. https://doi.org/10.1016/b978-0-12-804206-9.00039-8Google ScholarGoogle Scholar
  7. Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. TravisTorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017. 447--450. https://doi.org/10.1109/MSR.2017.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, and Premkumar T. Devanbu. 2009. Fair and balanced?: bias in bug-fix datasets. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2009, Amsterdam, The Netherlands, August 24-28, 2009. 121--130. https://doi.org/10.1145/1595696.1595716 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pearl Brereton, Barbara A. Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 4 (2007), 571--583. https://doi.org/10.1016/_j.jss.2006.07.009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Massimiliano Di Penta. 2016. Combining quantitative and qualitative methods (when mining software data). In Perspectives on Data Science for Software Engineering. 205--211. https://doi.org/10.1016/b978-0-12-804206-9.00038-6Google ScholarGoogle Scholar
  11. Nicolas E. Gold and Jens Krinke. 2020. Ethical Mining: A Case Study on MSR Mining Challenges. In MSR '20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29-30 June, 2020. 265--276. https://doi.org/10.1145/3379597.3387462 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Georgios Gousios. 2013. The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 13, San Francisco, CA, USA, May 18-19, 2013. 233--236. https://doi.org/10.1109/MSR.2013.6624034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kim Herzig, Sascha Just, and Andreas Zeller. 2015. It's Not a Bug, It's a Feature: How Misclassification Impacts Bug Prediction. In Software Engineering & Management 2015, Multikonferenz der GI-Fachbereiche Softwaretechnik (SWT) und Wirtschaftsinformatik (WI), FA WI-MAW, 17. März - 20. März 2015, Dresden, Germany. 103--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-offs in continuous integration: assurance, security, and flexibility. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017. 197--207. https://doi.org/10.1145/3106237.3106270 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rashina Hoda. 2021. Socio-Technical Grounded Theory for Software Engineering. IEEE Trans. Software Eng. (2021), 1--1. https://doi.org/10.1109/TSE.2021.3106280Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Natalia Juristo Juzgado and Ana María Moreno. 2001. Basics of software engineering experimentation. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. Germán, and Daniela E. Damian. 2014. The promises and perils of mining GitHub. In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014, Hyderabad, India. 92--101. https://doi.org/10.1145/2597073.2597074 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 96--107. https://doi.org/10.1145/2884781.2884783 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Barbara A. Kitchenham and Shari Lawrence Pfleeger. 2008. Personal Opinion Surveys. In Guide to Advanced Empirical Software Engineering. 63--92. https://doi.org/10.1007/978-1-84800-044-5_3Google ScholarGoogle Scholar
  20. Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, and Michele Lanza. 2019. Pattern-based mining of opinions in Q&A websites. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019. 548--559. https://doi.org/10.1109/ICSE.2019.00066 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bin Lin, Fiorella Zampetti, Gabriele Bavota, Massimiliano Di Penta, Michele Lanza, and Rocco Oliveto. 2018. Sentiment analysis for software engineering: how far can we go?. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. 94--104. https://doi.org/10.1145/3180155.3180195 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2017. ARENA: An Approach for the Automated Generation of Release Notes. IEEE Trans. Software Eng. 43, 2 (2017), 106--127. https://doi.org/10.1109/TSE.2016.2591536 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Davide Di Ruscio, and Massimiliano Di Penta. 2021. Recommending API Function Calls and Code Snippets to Support Software Development. IEEE Trans. Software Eng. (2021), 1--1. https://doi.org/10.1109/TSE.2021.3059907Google ScholarGoogle Scholar
  24. Moses Openja, Bram Adams, and Foutse Khomh. 2020. Analysis of Modern Release Engineering Topics: - A Large-Scale Study using StackOverflow -. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020. 104--114. https://doi.org/10.1109/ICSME46990.2020.00020Google ScholarGoogle ScholarCross RefCross Ref
  25. Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to turn the IDE into a self-confident programming prompter. In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31 - June 1, 2014, Hyderabad, India. 102--111. https://doi.org/10.1145/2597073.2597077 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin P. Robillard, Walid Maalej, Robert J. Walker, and Thomas Zimmermann (Eds.). 2014. Recommendation Systems in Software Engineering. Springer. https://doi.org/10.1007/978-3-642-45135-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Per Runeson, Martin Höst, Austen Rainer, and Björn Regnell. 2012. Case Study Research in Software Engineering - Guidelines and Examples. Wiley. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-1118104358.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pradeep K. Venkatesh, Shaohua Wang, Feng Zhang, Ying Zou, and Ahmed E. Hassan. 2016. What Do Client Developers Concern When Using Web APIs? An Empirical Study on Developer Forums and Stack Overflow. In IEEE International Conference on Web Services, ICWS 2016, San Francisco, CA, USA, June 27 - July 2, 2016. 131--138. https://doi.org/10.1109/ICWS.2016.25Google ScholarGoogle Scholar
  29. Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, and Björn Regnell. 2012. Experimentation in Software Engineering. Springer. https://doi.org/10.1007/978-3-642-29044-2 Google ScholarGoogle ScholarCross RefCross Ref
  30. Fiorella Zampetti, Carmine Vassallo, Sebastiano Panichella, Gerardo Canfora, Harald C. Gall, and Massimiliano Di Penta. 2020. An empirical characterization of bad practices in continuous integration. Empir. Softw. Eng. 25, 2 (2020), 1095--1135. https://doi.org/10.1007/s10664-019-09785-8Google ScholarGoogle ScholarCross RefCross Ref
  31. Thomas Zimmermann. 2016. Card-sorting. In Perspectives on Data Science for Software Engineering. 137--141. https://doi.org/10.1016/b978-0-12-804206-9.00027-1Google ScholarGoogle Scholar

Index Terms

  1. How Empirical Research Supports Tool Development: A Retrospective Analysis and new Horizons

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
      October 2021
      368 pages
      ISBN:9781450386654
      DOI:10.1145/3475716

      Copyright © 2021 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 October 2021

      Check for updates

      Qualifiers

      • keynote
      • Research
      • Refereed limited

      Acceptance Rates

      ESEM '21 Paper Acceptance Rate24of124submissions,19%Overall Acceptance Rate130of594submissions,22%
    • Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader