skip to main content
10.1145/2491411.2491415acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Diversity in software engineering research

Published: 18 August 2013 Publication History

Abstract

One of the goals of software engineering research is to achieve generality: Are the phenomena found in a few projects reflective of others? Will a technique perform as well on projects other than the projects it is evaluated on? While it is common sense to select a sample that is representative of a population, the importance of diversity is often overlooked, yet as important. In this paper, we combine ideas from representativeness and diversity and introduce a measure called sample coverage, defined as the percentage of projects in a population that are similar to the given sample. We introduce algorithms to compute the sample coverage for a given set of projects and to select the projects that increase the coverage the most. We demonstrate our technique on research presented over the span of two years at ICSE and FSE with respect to a population of 20,000 active open source projects monitored by Ohloh.net. Knowing the coverage of a sample enhances our ability to reason about the findings of a study. Furthermore, we propose reporting guidelines for research: in addition to coverage scores, papers should discuss the target population of the research (universe) and dimensions that potentially can influence the outcomes of a research (space).

References

[1]
Basili, V.R., Shull, F., and Lanubile, F. Building knowledge through families of experiments. Software Engineering, IEEE Transactions on, 25 (1999), 456--473.
[2]
Robbes, R., Tanter, E., and Rothlisberger, D. How developers use the dynamic features of programming languages: the case of smalltalk. Proceedings of the International Working Conference on Mining Software Repositories (2011).
[3]
Gabel, M. and Su, Z. A study of the uniqueness of source code. In FSE'10: Proceedings of the International Symposium on Foundations of Software Engineering (2010), 147-156.
[4]
NIH. NIH Guideline on The Inclusion of Women and Minorities., 2001. http://grants.nih.gov/grants/funding/women_min/guideline s_amended_10_2001.htm.
[5]
Allmark, P. Should research samples reflect the diversity of the population? Journal Medical Ethics, 30 (2004), 185- 189.
[6]
DEPARTMENT OF HEALTH. Research governance framework for health and social care., 2001.
[7]
Mulrow, C.D., Thacker, S.B., and Pugh, J.A. A proposal for more informative abstracts of review articles. Annals of internal medicine, 108 (1988), 613--615.
[8]
The R Project for Statistical Computing. http://www.rproject.org/.
[9]
Kitchenham, B.A., Mendes, E., and Travassos, G.H. Cross versus Within-Company Cost Estimation Studies: A Systematic Review. IEEE Trans. Software Eng. (TSE), 33, 5 (2007), 316-329.
[10]
Hill, P.R. Practical Software Project Estimation. McGraw-Hill Osborne Media, 2010.
[11]
BLACK DUCK SOFTWARE. Ohloh, http://www.ohloh.net/.
[12]
Sands, R. Measuring Project Activity. http://meta.ohloh.net/2012/04/measuring-project-activity/. 2012.
[13]
Apel, S., Liebig, J., Brandl, B., Lengauer, C., and Kästner, C. Semistructured merge: rethinking merge in revision control systems. In ESEC/FSE'11: European Software Engineering Conference and Symposium on Foundations of Software Engineering (2011), 190-200.
[14]
Beck, F. and Diehl, S. On the congruence of modularity and code coupling. In ESEC/FSE'11: European Software Engineering Conference and Symposium on Foundations of Software Engineering (2011), 354-364.
[15]
Uddin, G., Dagenais, B., and Robillard, M.P. Temporal analysis of API usage concepts. In ICSE'12: Proceedings of 34th International Conference on Software Engineering (2012), 804-814.
[16]
Jin, W. and Orso, A. BugRedux: Reproducing field failures for in-house debugging. In ICSE'12: Proceedings of 34th International Conference on Software Engineering (2012), 474-484.
[17]
Zhou, J., Zhang, H., and Lo, D. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In International Conference on Software Engineering (2012).
[18]
Kitchenham, B.A. and Mendes, E. A comparison of crosscompany and within-company effort estimation models for web applications. In Proceedings of the 8th International Conference on Empirical Assessment in Software Engineering (2004), 47-55.
[19]
Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. A systematic review of fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 99 (2011).
[20]
Murphy-Hill, E., Murphy, G.C., and Griswold, W.G. Understanding Context: Creating a Lasting Impact in Experimental Software Engineering Research. In Proceedings of the Workshop on Future of Software Engineering (2010), 255-258.
[21]
Kahneman, D. and Tversky, A. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3 (1972), 430 - 454.
[22]
Tversky, A. and Kahneman, D. Judgment under Uncertainty: Heuristics and Biases. Science, 185 (1974), pp. 1124-1131.
[23]
Nilsson, H., Juslin, P., and Olsson, H. Exemplars in the mist: The cognitive substrate of the representativeness heuristic. Scandinavian Journal of Psychology, 49, 201-- 212.
[24]
Robinson, D., Woerner, M.G., Pollack, S., and Lerner, G. Subject Selection Biases in Clinical Trials: Data From a Multicenter Schizophrenia Treatment Study. Journal of Clinical Psychopharmacology, 16, 2 (April 1996), 170-176.
[25]
Khan, K.S. et al., eds. NHS Centre for Reviews and Dissemination, University of York, 2001.
[26]
Kitchenham, B. Procedures for undertaking systematic reviews. Technical Report TR/SE-0401, Department of Computer Science, Keele University and National ICT, Australia Ltd (2004).
[27]
Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., and Khalil, M. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80 (2007), 571 - 583.
[28]
Standards for Systematic Reviews. www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews/Standards.aspx?page=2.
[29]
Boehm, B.W., Abts, C., Brown, A.W., Chulani, S., Clark, B.K., Horowitz, E., Madachy, R., Reifer, D.J., and Steece, B.t.=.S.C.E.w.C.I. NHS Centre for Reviews and Dissemination, University of York, 2000.
[30]
Center for Systems and Software Engineering. http://csse.usc.edu/csse/research/COCOMOII/cocomo_mai n.html.
[31]
Kemerer, C.F. An empirical validation of software cost estimation models. Commun. ACM, 30 (may 1987), 416-- 429.
[32]
Chen, Z., Menzies, T., Port, D., and Boehm, D. Finding the right data for software cost modeling. Software, IEEE, 22 (nov.-dec. 2005), 38 - 46.
[33]
Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B., and Wesslen, A. Experimentation in software engineering: an introduction. Kluwer Academic Publishers, 2000.
[34]
Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M., Jones, P.W., Hoaglin, D.C., Emam, E.K., and Rosenberg, J. Preliminary Guidelines for Empirical Research in Software Engineering. IEEE Transactions on Software Engineering, 28 (aug 2002), 721--734.
[35]
Jedlitschka, A. and Pfahl, D. Reporting guidelines for controlled experiments in software engineering. In Empirical Software Engineering, 2005. 2005 International Symposium on (nov. 2005), 10 pp.
[36]
Kitchenham, B., Al-Khilidar, H., Babar, M.A., Berry, M., Cox, K., Keung, J., Kurniawati, F., Staples, M., Zhang, H., and Zhu, L. Evaluating guidelines for reporting empirical software engineering studies. Empirical Softw. Engg., 13 (feb 2008), 97--121.
[37]
Runeson, P. and Host, M. Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Engg., 14 (Apr 2009), 131--164.
[38]
Harrold, M.J., Jones, J.A., Li, T., Lian, D., Orso, A., Pennings, M., Sinha, S., Spoon, S.A., and Gujarathi, A. Regression test selection for Java software. In OOPSLA '01: Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (2001).
[39]
Briand, L.C., Labiche, Y., and Soccar, G. Automating impact analysis and regression test selection based on UML designs. In ICSM '02: Proceedings of the International Conference on Software Maintenance (2002), 252-261.
[40]
Marré, M. and Bertolino, A. Using spanning sets for coverage testing. IEEE Transactions on Software Engineering, 29, 11 (Nov 2003), 974-984.
[41]
Li, Z., Harman, M., and Hierons, R.M. Search Algorithms for Regression Test Case Prioritization. IEEE Transactions on Software Engineering, 33, 4 (April 2007), 225-237.
[42]
Yoo, S. and Harman, M. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability, 22, 2 (2012), 67-120.
[43]
Graves, T.L., Harrold, M.J., Kim, J.-M., Porter, A., and Rothermel, G. An empirical study of regression test selection techniques. In ICSE '98: Proceedings of the 20th International Conference on Software engineering (1998), 188-197.
[44]
Rothermel, G. and Harrold, M.J. Analyzing regression test selection techniques. IEEE Transactions on Software Engineering, 22, 8 (August 1996), 529-551.
[45]
Harrell, J.M. Orthogonal Array Testing Strategy (OATS)., 2001. http://www.51testing.com/ddimg/uploadsoft/20090113/O ATSEN.pdf.

Cited By

View all
  • (2024)On the Creation of Representative Samples of Software RepositoriesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690747(434-439)Online publication date: 24-Oct-2024
  • (2024)Revealing Software Development Work Patterns with PR-Issue Graph TopologiesProceedings of the ACM on Software Engineering10.1145/36608131:FSE(2402-2423)Online publication date: 12-Jul-2024
  • (2024)Who's actually being Studied? A Call for Population Analysis in Software Engineering ResearchProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648200(48-51)Online publication date: 16-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
August 2013
738 pages
ISBN:9781450322379
DOI:10.1145/2491411
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Coverage
  2. Diversity
  3. Representativeness
  4. Sampling

Qualifiers

  • Research-article

Conference

ESEC/FSE'13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)On the Creation of Representative Samples of Software RepositoriesProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690747(434-439)Online publication date: 24-Oct-2024
  • (2024)Revealing Software Development Work Patterns with PR-Issue Graph TopologiesProceedings of the ACM on Software Engineering10.1145/36608131:FSE(2402-2423)Online publication date: 12-Jul-2024
  • (2024)Who's actually being Studied? A Call for Population Analysis in Software Engineering ResearchProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648200(48-51)Online publication date: 16-Apr-2024
  • (2024)Understanding participation and corporatization in service of diversity in free/libre and open source software development projectsJournal of Systems and Software10.1016/j.jss.2024.112163217:COnline publication date: 1-Nov-2024
  • (2024)Detecting semantic conflicts with unit testsJournal of Systems and Software10.1016/j.jss.2024.112070214:COnline publication date: 1-Aug-2024
  • (2023)Pitfalls in Experiments with DNN4SE: An Analysis of the State of the PracticeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616320(528-540)Online publication date: 30-Nov-2023
  • (2023)Reflecting on the Use of the Policy-Process-Product Theory in Empirical Software EngineeringProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613075(2112-2116)Online publication date: 30-Nov-2023
  • (2023)Organizational Culture and Diversity Supporting Software Development2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00061(304-306)Online publication date: 3-Oct-2023
  • (2023)Spork: Structured Merge for Java With Formatting PreservationIEEE Transactions on Software Engineering10.1109/TSE.2022.314376649:1(64-83)Online publication date: 1-Jan-2023
  • (2023)Diversity Awareness in Software Engineering Participant Research2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)10.1109/ICSE-SEIS58686.2023.00017(120-131)Online publication date: May-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media