ABSTRACT
Empirical studies that use software repository artifacts have become popular in the last decade due to the ready availability of open source project archives. In this paper, we survey empirical studies in the last three years of ICSE and FSE proceedings, and categorize these studies in terms of open source projects vs. proprietary source projects and the diversity of subject programs used in these studies. Our survey has shown that almost half (49%) of recent empirical studies used solely open source projects. Existing studies either draw general conclusions from these results or explicitly disclaim any conclusions that can extend beyond specific subject software.
We conclude that researchers in empirical software engineering must consider the external validity concerns that arise from using only several well-known open source software projects, and that discussion of data source selection is an important discussion topic in software engineering research. Furthermore, we propose a community research infrastructure for software repository benchmarks and sharing the empirical analysis results, in order to address external validity concerns and to raise the bar for empirical software engineering research that analyzes software artifacts.
- Open source software engineering workshop series. In WOSSE, 2001-2005.Google Scholar
- Working conference on mining software repositories. In MSR, 2004-2010.Google Scholar
- Working conference on mining software repositories: Mining challenges. In MSR Challenge Track, 2006-2010.Google Scholar
- Workshop on emerging trends in free/libre/open source software research and development. In FLOSS, 2010.Google Scholar
- P. J. Ágerfalk, B. Fitzgerald, H. H. Olsson, and E. O. Conchúir. Benefits of global software development: the known and unknown. In ICSP'08: Proceedings of the Software process, 2008 international conference on Making globally distributed software development a success story, pages 1--9, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- S. Blackburn, R. Garner, C. Hoffmann, A. Khang, K. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Guyer, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, page 190. ACM, 2006. Google ScholarDigital Library
- M. Conway. How do committees invent. Datamation, 14(4):28--31, 1968.Google Scholar
- K. Crowston and J. Howison. The social structure of free and open source software development. First Monday, 10(2), 2005.Google Scholar
- M. D'Ambros, M. Lanza, and R. Robbes. An extensive comparison of bug prediction approaches. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 31--41, 2--3 2010.Google ScholarCross Ref
- H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering, 10(4):405--435, 2005. Google ScholarDigital Library
- I. Herraiz, D. Izquierdo-Cortazar, F. Rivas-Hernández, J. Gonzalez-Barahona, G. Robles, S. nas Dominguez, C. Garcia-Campos, J. Gato, and L. Tovar. FLOSSMetrics: Free/libre/open source software metrics. In Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR). IEEE Computer Society, 2009. Google ScholarDigital Library
- J. Howison and K. Crowston. The perils and pitfalls of mining SourceForge. Proceedings of the International Workshop on Mining Software Repositories (MSR 2004), pages 7--11, 2004.Google ScholarCross Ref
- J. Howison and K. Crowston. FLOSSmole: A collaborative repository for FLOSS research data and analyses. Int. J. of Information Technology and Web Engineering, 1(3):17--26, 2006.Google ScholarCross Ref
- J. Howison, K. Inoue, and K. Crowston. Social dynamics of free and open source team communications. International Federation for Information Processing Digital Library, 203(1), 2009.Google Scholar
- D. Perry, N. Staudenmayer, and L. Votta. People, organizations, and process improvement. IEEE SOFTWARE, pages 36--45, 1994. Google ScholarDigital Library
- R. Robbes, D. Pollet, and M. Lanza. Replaying ide interactions to evaluate and improve change prediction approaches. In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on, pages 161--170, 2--3 2010.Google ScholarCross Ref
- R. Rosenthal and R. Rosnow. Essentials of behavioural research. McGraw, 1991.Google Scholar
- S. Sim, S. Easterbrook, and R. Holt. Using benchmarking to advance research: A challenge to software engineering. In Proceedings of the 25th International Conference on Software Engineering, page 83. IEEE Computer Society, 2003. Google ScholarDigital Library
- W. F. Tichy. Should computer scientists experiment more? IEEE Computer, 31(5):32--40, 1998. Google ScholarDigital Library
- M. Van Antwerp and G. Madey. Advances in the sourceforge research data archive (srda). In Fourth International Conference on Open Source Systems, IFIP 2.13 (WoPDaSD 2008), Milan, Italy, September 2008.Google Scholar
- R. Yin. Case study research: Design and methods. Sage Pubns, 2008.Google Scholar
Index Terms
- Validity concerns in software engineering research
Recommendations
A systematic review of research on open source software in commercial software product development
EASE'10: Proceedings of the 14th international conference on Evaluation and Assessment in Software EngineeringBackground: The popularity of the open source software development in the last decade, has brought about an increased interest from the industry on how to use open source components, participate in the open source community, build business models around ...
The Promise of Research on Open Source Software
Breaking with many established assumptions about how innovation ought to work, open source software projects offer eye-opening examples of novel innovation practices for students and practitioners in many fields. In this article we briefly review existing ...
Software Reuse in Open Source: A Case Study
A promising way to support software reuse is based on Component-Based Software Development CBSD. Open Source Software OSS products are increasingly available that can be freely used in product development. However, OSS communities still face several ...
Comments