skip to main content
10.1145/3180155.3180161acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Statistical errors in software engineering experiments: a preliminary literature review

Authors Info & Claims
Published:27 May 2018Publication History

ABSTRACT

Background: Statistical concepts and techniques are often applied incorrectly, even in mature disciplines such as medicine or psychology. Surprisingly, there are very few works that study statistical problems in software engineering (SE). Aim: Assess the existence of statistical errors in SE experiments. Method: Compile the most common statistical errors in experimental disciplines. Survey experiments published in ICSE to assess whether errors occur in high quality SE publications. Results: The same errors as identified in others disciplines were found in ICSE experiments, where 30% of the reviewed papers included several error types such as: a) missing statistical hypotheses, b) missing sample size calculation, c) failure to assess statistical test assumptions, and d) uncorrected multiple testing. This rather large error rate is greater for research papers where experiments are confined to the validation section. The origin of the errors can be traced back to: a) researchers not having sufficient statistical training, and, b) a profusion of exploratory research. Conclusions: This paper provides preliminary evidence that SE research suffers from the same statistical problems as other experimental disciplines. However, the SE community appears to be unaware of any shortcomings in its experiments, whereas other disciplines work hard to avoid these threats. Further research is necessary to find the underlying causes and set up corrective measures, but there are some potentially effective actions and are a priori easy to implement: a) improve the statistical training of SE researchers, and b) enforce quality assessment and reporting guidelines in SE publications.

References

  1. Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabiraman. 2014. Understanding JavaScript event-based interactions. In Proceedings of the 36th International Conference on Software Engineering. ACM, 367--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Douglas G Altman. 1998. Statistical reviewing for medical journals. Statistics in medicine 17, 23 (1998), 2661--2674.Google ScholarGoogle Scholar
  3. Paul V Anderson, Sarah Heckman, Mladen Vouk, David Wright, Michael Carter, Janet E Burge, and Gerald C Gannod. 2015. CS/SE instructors can improve student writing without reducing class time devoted to technical content: experimental results. In Proceedings of the 37th International Conference on Software Engineering-Volume 2. IEEE Press, 455--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability 24, 3 (2014), 219--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marjan Bakker and Jelte M Wicherts. 2011. The (mis) reporting of statistical results in psychology journals. Behavior Research Methods 43, 3 (2011), 666--678.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kirk R Baumgardner. 1997. A review of key research design and statistical analysis issues. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontology 84, 5 (1997), 550--556.Google ScholarGoogle Scholar
  7. Gabriele Bavota, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshy-vanyk, and Andrea De Lucia. 2013. An empirical study on the developers' perception of software coupling. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 692--701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A Bhatt. 2010. Evolution of Clinical Research: A History Before and Beyond James Lind. Perspectives in Clinical Research 1, 1 (March 2010), 6--10.Google ScholarGoogle ScholarCross RefCross Ref
  9. Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality?: an empirical case study of windows vista. Commun. ACM 52, 8 (2009), 85--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marc Branch. 2014. Malignant side effects of null-hypothesis significance testing. Theory & Psychology 24, 2 (2014), 256--277.Google ScholarGoogle ScholarCross RefCross Ref
  11. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77--101.Google ScholarGoogle Scholar
  12. James K Brewer. 1985. Behavioral statistics textbooks: Source of myths and misconceptions? Journal of Educational and Behavioral Statistics 10, 3 (1985), 252--268.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yan Cai and WK Chan. 2012. MagicFuzzer: scalable deadlock detection for large-scale applications. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 606--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mariano Ceccato, Alessandro Marchetto, Leonardo Mariani, Cu D Nguyen, and Paolo Tonella. 2012. An empirical study about the effectiveness of debugging when random test cases are used. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 452--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chris Chambers, Marcus Munafo, and more than 80 signatories. 2013. Trust in science would be improved by study pre-registration. The Guardian, 5 June 2013. Available: https://www.theguardian.com/science/blog/2013/jun/05/trust-in-science-study-pre-registration {Last accessed: 16 August 2017}. (2013).Google ScholarGoogle Scholar
  16. Hyun-Chul Cho and Shuzo Abe. 2013. Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research 66, 9 (2013), 1261--1266.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ilinca Ciupa, Andreas Leitner, Manuel Oriol, and Bertrand Meyer. 2008. ARTOO: adaptive random testing for object-oriented software. In Proceedings of the 30th international conference on Software engineering. ACM, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James Clause and Alessandro Orso. 2010. LEAKPOINT: pinpointing the causes of memory leaks. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 515--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steve Cohen, George Smith, Richard A Chechile, Glen Burns, and Frank Tsai. 1996. Identifying impediments to learning probability and statistics from an assessment of instructional software. Journal of Educational and Behavioral Statistics 21, 1 (1996), 35--54.Google ScholarGoogle ScholarCross RefCross Ref
  20. Lucas Cordeiro and Bernd Fischer. 2011. Verifying multi-threaded software using smt-based context-bounded model checking. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John W Creswell. 2002. Educational research: Planning, conducting, and evaluating quantitative. Prentice Hall.Google ScholarGoogle Scholar
  22. DanielaS Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on. IEEE, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tore Dybå, Vigdis By Kampenes, and Dag IK Sjøberg. 2006. A systematic review of statistical power in software engineering experiments. Information and Software Technology 48, 8 (2006), 745--755.Google ScholarGoogle ScholarCross RefCross Ref
  24. Stefan Endrikat, Stefan Hanenberg, Romain Robbes, and Andreas Stefik. 2014. How do api documentation and static typing affect api usability?. In Proceedings of the 36th International Conference on Software Engineering. ACM, 632--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ilker Ercan, Yaning Yang, Guven Özkaya, Sengul Cangur, Bulent Ediz, Ismet Kan, et al. 2008. Misusage of statistics in medical research. (2008).Google ScholarGoogle Scholar
  26. Filomena Ferrucci, Mark Harman, Jian Ren, and Federica Sarro. 2013. Not going to take this anymore: multi-objective overtime planning for software engineering projects. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 462--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 2013. Statistical methods for rates and proportions. John Wiley & Sons.Google ScholarGoogle Scholar
  28. Christine A Franklin. 2007. Guidelines for assessment and instruction in statistics education (GAISE) report: A pre-K-12 curriculum framework. American Statistical Association.Google ScholarGoogle Scholar
  29. Phillip I Good and James W Hardin. 2012. Common errors in statistics (and how to avoid them). John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sheila M Gore, Ian G Jones, and Eilif C Rytter. 1977. Misuse of statistical methods: critical assessment of articles in BMJ from January to March 1976. BMJ 1, 6053 (1977), 85--87.Google ScholarGoogle ScholarCross RefCross Ref
  31. K.L. Gwet. 2014. Handbook of Inter-Rater Reliability. The Definitive Guide to Measuring the Extent of Agreement Among Raters (4 ed.). Advanced Analytics, LLC.Google ScholarGoogle Scholar
  32. M Sayeed Haque and Sanju George. 2007. Use of statistics in the Psychiatric Bulletin: author guidelines. The Psychiatrist 31, 7 (2007), 265--267.Google ScholarGoogle Scholar
  33. Hwa-You Hsu and Alessandro Orso. 2009. MINTS: A general framework and tool for supporting test-suite minimization. In Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on. IEEE, 419--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Schuyler W Huck. 2009. Statistical misconceptions. Routledge.Google ScholarGoogle Scholar
  35. John P.A. Ioannidis. 2005. Why most published research findings are false. PLoS Medicine 2, 8 (2005), 696--701.Google ScholarGoogle ScholarCross RefCross Ref
  36. David S Janzen, John Clements, and Michael Hilton. 2013. An evaluation of interactive test-driven labs with WebIDE in CS0. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 1090--1098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 96--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Magne Jørgensen, Tore Dybå, Knut Liestøl, and Dag IK Sjøberg. 2016. Incorrect results in software engineering experiments: How to improve research practices. Journal of Systems and Software 116 (2016), 133--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Carol Kilkenny, Nick Parsons, Ed Kadyszewski, Michael FW Festing, Innes C Cuthill, Derek Fry, Jane Hutton, and Douglas G Altman. 2009. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PloS one 4, 11 (2009), e7824.Google ScholarGoogle ScholarCross RefCross Ref
  40. Andrew King, Sam Procter, Dan Andresen, John Hatcliff, Steve Warren, William Spees, Raoul Jetley, Paul Jones, and Sandy Weininger. 2009. An open test bed for medical device integration and coordination. In Software Engineering-Companion Volume, 2009. ICSE-Companion 2009. 31st International Conference on. IEEE, 141--151.Google ScholarGoogle ScholarCross RefCross Ref
  41. B. Kitchenham, J. Fry, and S. Linkman. 2003. The case against cross-over designs in software engineering. In Software Technology and Engineering Practice, 2003. Eleventh Annual International Workshop on. 65--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2016. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering (2016), 1--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fredrik Kjolstad, Danny Dig, Gabriel Acevedo, and Marc Snir. 2011. Transformation for class immutability. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Christian FJ Lange and Michel RV Chaudron. 2006. Effects of defects in UML models: an experimental investigation. In Proceedings of the 28th international conference on Software engineering. ACM, 401--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Otavio Augusto Lazzarini Lemos, Fabiano Cutigi Ferrari, Fábio Fagundes Silveira, and Alessandro Garcia. 2012. Development of auxiliary functions: should you be agile? an empirical assessment of pair programming and test-first programming. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 529--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rupak Majumdar and Koushik Sen. 2007. Hybrid concolic testing. In Software Engineering, 2007. ICSE 2007. 29th International Conference on. IEEE, 416--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David Mandelin, Doug Kimelman, and Daniel Yellin. 2006. A Bayesian approach to diagram matching with application to architectural models. In Proceedings of the 28th international conference on Software engineering. ACM, 222--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mika V Mäntylä, Kai Petersen, Timo OA Lehtinen, and Casper Lassenius. 2014. Time pressure: a controlled experiment of test case development and requirements review. In Proceedings of the 36th International Conference on Software Engineering. ACM, 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lijun Mei, WK Chan, and TH Tse. 2008. Data flow testing of service-oriented workflow applications. In Proceedings of the 30th international conference on Software engineering. ACM, 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Habshah Midi, AHM Rahmatullah Imon, and Azmi Jaafar. 2012. The Misconceptions of Some Statistical Techniques In Research. Jurnal Teknologi 47, 1 (2012), 21--36.Google ScholarGoogle Scholar
  52. James Miller. 1999. Can results from software engineering experiments be safely combined?. In Software Metrics Symposium, 1999. Proceedings. Sixth International. IEEE, 152--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rahul Mohanani, Paul Ralph, and Ben Shreeve. 2014. Requirements fixation. In Proceedings of the 36th International Conference on Software Engineering. ACM, 895--906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sebastian C Müller and Thomas Fritz. 2015. Stuck and frustrated or in flow and happy: Sensing developers' emotions and progress. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 1. IEEE, 688--699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Marcus R Munafò, Brian A Nosek, Dorothy VM Bishop, Katherine S Button, Christopher D Chambers, Nathalie Percie du Sert, Uri Simonsohn, Eric-Jan Wa-genmakers, Jennifer J Ware, and John PA Ioannidis. 2017. A manifesto for reproducible science. Nature Human Behaviour 1 (2017), 0021.Google ScholarGoogle ScholarCross RefCross Ref
  56. Noboru Nakamichi, Kazuyuki Shima, Makoto Sakai, and Ken-ichi Matsumoto. 2006. Detecting low usability web pages using quantitative data of users' behavior. In Proceedings of the 28th international conference on Software engineering. ACM, 569--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. TH Ng, Shing Chi Cheung, WK Chan, and Yuen-Tak Yu. 2007. Do maintainers utilize deployed design patterns effectively?. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Raymond S Nickerson. 2000. Null hypothesis significance testing: a review of an old and continuing controversy. Psychological methods 5, 2 (2000), 241.Google ScholarGoogle Scholar
  59. Adrian Nistor, Qingzhou Luo, Michael Pradel, Thomas R Gross, and Darko Mari-nov. 2012. Ballerina: Automatic generation and clustering of efficient random unit tests for multithreaded code. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 727--737. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Aditya V Nori and Sriram K Rajamani. 2010. An empirical study of optimizations in YOGI. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 355--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Renato Novais, Camila Nunes, Caio Lima, Elder Cirilo, Francisco Dantas, Alessan-dro Garcia, and Manoel Mendonça. 2012. On the proactive and interactive visualization for feature evolution comprehension: An industrial investigation. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 1044--1053. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Regina Nuzzo et al. 2014. Statistical errors. Nature 506, 7487 (2014), 150--152.Google ScholarGoogle Scholar
  63. Cara H Olsen. 2003. Review of the use of statistics in infection and immunity. Infection and immunity 71, 12 (2003), 6689--6692.Google ScholarGoogle Scholar
  64. Sangmin Park, Richard W Vuduc, and Mary Jean Harrold. 2010. Falcon: fault localization in concurrent programs. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 245--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Fayola Peters, Tim Menzies, and Lucas Layman. 2015. LACE2: Better privacy-preserving data sharing for cross project defect prediction. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 801--811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 254--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Steven P Reiss. 2008. Tracking source locations. In Proceedings of the 30th international conference on Software engineering. ACM, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, and Mariano Ceccato. 2007. The role of experience and ability in comprehension tasks supported by UML stereotypes. In ICSE, Vol. 7. 375--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Paige Rodeghero, Collin McMillan, Paul W McBurney, Nigel Bosch, and Sidney D'Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th International Conference on Software Engineering. ACM, 390--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Norsaremah Salleh, Emilia Mendes, John Grundy, and Giles St J Burch. 2010. An empirical study of the effects of conscientiousness in pair programming using the five-factor personality model. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 577--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Jesper W Schneider. 2015. Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics 102, 1 (2015), 411--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Kenneth F Schulz, Douglas G Altman, and David Moher. 2010. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMC medicine 8, 1 (2010), 18.Google ScholarGoogle Scholar
  73. Janet Siegmund, Christian Kästner, Sven Apel, Chris Parnin, Anja Bethmann, Thomas Leich, Gunter Saake, and André Brechmann. 2014. Understanding understanding source code with functional magnetic resonance imaging. In Proceedings of the 36th International Conference on Software Engineering. ACM, 378--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Janice Singer. 1999. Using the American Psychological Association (APA) style guidelines to report experimental results. In Proceedings of workshop on empirical studies in software maintenance. 71--75.Google ScholarGoogle Scholar
  75. Ana Elisa Castro Sotos, Stijn Vanhoof, Wim Van den Noortgate, and Patrick Onghena. 2007. Students misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review 2, 2(2007), 98--113.Google ScholarGoogle ScholarCross RefCross Ref
  76. Matt Staats, Gregory Gay, and Mats PE Heimdahl. 2012. Automated oracle creation support, or: how I learned to stop worrying about fault propagation and love mutation testing. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 870--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Denes Szucs and John Ioannidis. 2017. When null hypothesis significance testing is unsuitable for research: a reassessment. Frontiers in Human Neuroscience 11 (2017), 390.Google ScholarGoogle ScholarCross RefCross Ref
  78. Jianbin Tan, George S Avrunin, and Lori A Clarke. 2006. Managing space for finite-state verification. In Proceedings of the 28th international conference on Software engineering. ACM, 152--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Shin Hwei Tan and Abhik Roychoudhury. 2015. relifix: Automated repair of software regressions. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 471--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Matthew Thompson, Arpita Tiwari, Rongwei Fu, Esther Moe, and David I Buckley. 2012. A Framework To Facilitate the Use of Systematic Reviews and Meta-Analyses in the Design of Primary Research Studies. (2012).Google ScholarGoogle Scholar
  81. S. Vegas, C. Apa, and N. Juristo. 2016. Crossover Designs in Software Engineering Experiments: Benefits and Perils. IEEE Transactions on Software Engineering 42, 2 (February 2016), 120--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Andrew Vickers. 2010. What is a P-value anyway?: 34 stories to help you actually understand statistics. Addison-Wesley Longman.Google ScholarGoogle Scholar
  83. Gerald E Welch and Steven G Gabbe. 1996. Review of statistics usage in the American Journal of Obstetrics and Gynecology. American journal of obstetrics and gynecology 175, 5 (1996), 1138--1141.Google ScholarGoogle Scholar
  84. Richard Wettel, Michele Lanza, and Romain Robbes. 2011. Software systems as cities: A controlled experiment. In Proceedings of the 33rd International Conference on Software Engineering. ACM, 551--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Michael W Whalen, Suzette Person, Neha Rungta, Matt Staats, and Daniela Grijincu. 2015. A flexible and non-intrusive approach for computing complex structural coverage metrics. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 506--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Stefan Winter, Oliver Schwahn, Roberto Natella, Neeraj Suri, and Domenico Cotroneo. 2015. No PAIN, no gain?: the utility of PArallel fault INjections. In Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 494--505. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Chang Xu, Shing-Chi Cheung, and Wing-Kwong Chan. 2006. Incremental consistency checking for pervasive context. In Proceedings of the 28th international conference on Software engineering. ACM, 292--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Koen Yskout, Riccardo Scandariato, and Wouter Joosen. 2012. Does organizing security patterns focus architectural choices?. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, 617--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Koen Yskout, Riccardo Scandariato, and Wouter Joosen. 2015. Do security patterns really help designers?. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, Vol. 1. IEEE, 292--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yanbing Yu, James A Jones, and Mary Jean Harrold. 2008. An empirical study of the effects of test-suite reduction on fault localization. In Proceedings of the 30th international conference on Software engineering. ACM, 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Carmen Zannier, Grigori Melnik, and Frank Maurer. 2006. On the success of empirical studies in the international conference on software engineering. In Proceedings of the 28th international conference on Software engineering. ACM, 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Fadi Zaraket, Adnan Aziz, and Sarfraz Khurshid. 2007. Sequential circuits for relational analysis. In Software Engineering, 2007. ICSE 2007. 29th International Conference on. IEEE, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Dina Zayan, Michal Antkiewicz, and Krzysztof Czarnecki. 2014. Effects of using examples on structural model comprehension: a controlled experiment. In Proceedings of the 36th International Conference on Software Engineering. ACM, 955--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Lingming Zhang, Dan Hao, Lu Zhang, Gregg Rothermel, and Hong Mei. 2013. Bridging the gap between the total and additional test-case prioritization strategies. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 192--201. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Statistical errors in software engineering experiments: a preliminary literature review

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '18: Proceedings of the 40th International Conference on Software Engineering
      May 2018
      1307 pages
      ISBN:9781450356381
      DOI:10.1145/3180155
      • Conference Chair:
      • Michel Chaudron,
      • General Chair:
      • Ivica Crnkovic,
      • Program Chairs:
      • Marsha Chechik,
      • Mark Harman

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 May 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2024

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader