Skip to main content
Log in

Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Research studies in software maintenance are notoriously hard to reproduce due to lack of datasets, tools, implementation details (e.g., parameter values, environmental settings) and other factors. The progress in the field is hindered by the challenge of comparing new techniques against existing ones, as researchers have to devote a large portion of their resources to the tedious and error-prone process of reproducing previously introduced approaches. In this paper, we address the problem of experiment reproducibility in software maintenance and provide a long-term solution towards ensuring that future experiments will be reproducible and extensible. We conducted a preliminary mapping study of a number of representative maintenance techniques and approaches and implemented them as a set of experiments and a library of components that we make publicly available with TraceLab, called the Component Library. The goal of these experiments and components is to create a body of actionable knowledge that would (i) facilitate future research and (ii) allow the research community to contribute to it as well. In addition, to illustrate the process of using and adapting these techniques, we present an example of creating new techniques based on existing ones, in order to produce improved results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://web.soccerlab.polymtl.ca/tefse09/Challenge.htm

  2. http://www.ikvm.net/

  3. https://rdotnet.codeplex.com/

  4. www.mono-project.com/

  5. http://coest.org/coest-projects/projects/semeru/wiki

  6. https://github.com/CoEST

References

  • Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: 16th IEEE International Conference on Program Comprehension (ICPC’08), Amsterdam, The Netherlands. pp 103–112

  • Alhindawi N, Meqdadi O, Bartman B, Maletic JI (2013) A tracelab-based solution for identifying traceability links using LSI. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 79-82

  • Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: 32nd International Conference on Software Engineering (ICSE’10)

  • Barr E, Bird C, Hyatt E, Menzies T, Robles G (2010) On the shoulders of giants. In: FSE/SDP Workshop on Future of Software Engineering Research (FoSER’10), Santa Fe, New Mexico, USA, ACM, 1882368, pp 23–28. doi:10.1145/1882362.1882368

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Borg M, Runeson P, Ardö A (2013) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng (EMSE):1–52. doi:10.1007/s10664-013-9255-y

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: 7th International Conference on World Wide Web, Brisbane, Australia. pp 107–117

  • Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2009) On the role of the nouns in IR-based traceability recovery. In: 17th IEEE International Conference on Program Comprehension (ICPC’09), Vancouver, British Columbia, Canada, May 17–19. pp 148–157

  • Chang J, Blei DM (2010) Hierarchical relational models for document networks. Statistics, Annals of Applied

    MATH  Google Scholar 

  • Chen X, Hosking J, Grundy J (2011) A combination approach for enhancing automated traceability In: 33rd IEEE/ACM International Conference on Software Engineering (ICSE’11), NIER Track, Honolulu, Hawaii, USA, May 21–28. ACM, 1985943, pp 912–915. doi:10.1145/1985793.1985943

  • Cleland-Huang J, Czauderna A, Dekhtyar A, O. G, Huffman Hayes J, Keenan E, Leach G, Maletic J, Poshyvanyk D, Shin Y, Zisman A, Antoniol G, Berenbach B, Egyed A, Maeder P (2011) Grand challenges, benchmarks, and TraceLab: developing infrastructure for the software traceability research community. In: 6th ICSE2011 International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE2011), Honolulu, HI, USA, May 23

  • Cleland-Huang J, Shin Y, Keenan E, Czauderna A, Leach G, Moritz E, Gethers M, Poshyvanyk D, Hayes JH, Li W (2012) Toward actionable, broadly accessible contests in software engineering. In: 34th IEEE/ACM International Conference on Software Engineering (ICSE’12), New Ideas and Emerging Results Track, Zurich, Switzerland, June 2–9. pp 1329–1332

  • Cleland-Huang J, Mirakhorli M, Czauderna A, Wieloch M (2013) Decision-Centric Traceability of architectural concerns. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 5–11

  • D’Ambros M, Lanza M, Robbes R (2012) Evaluating Defect Prediction Approaches: a Benchmark and an Extensive Comparison. Empir Softw Eng (ESE) 17(4–5):531–577. doi:10.1007/s10664-011-9173-9

    Article  Google Scholar 

  • De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011) Improving IR-based traceability recovery using smoothing filters. In: 19th IEEE International Conference on Program Comprehension (ICPC’11), Kingston, Ontario, Canada, June 22–24. IEEE, pp 21–30

  • De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2012) Using IR methods for labeling source code artifacts: is it worthwhile? In: 20th IEEE International Conference on Program Comprehension (ICPC’12), Passau, Germany. pp 193–202

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by Latent Semantic Analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Dekhtyar A, Hilton M (2013) Human recoverability index: a TraceLab experiment. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 37–43

  • Dit B, Guerrouj L, Poshyvanyk D, Antoniol G (2011) Can better identifier splitting techniques help feature location? In: 19th IEEE International Conference on Program Comprehension (ICPC’11), Kingston, Ontario, Canada, June 22–24. pp 11–20

  • Dit B, Moritz E, Poshyvanyk D (2012) A TraceLab-based solution for creating, conducting, and sharing feature location experiments. In: 20th IEEE International Conference on Program Comprehension (ICPC’12), Passau, Germany, June 11–13. pp 203–208

  • Dit B, Holtzhauer A, Poshyvanyk D, Kagdi H (2013a) A Dataset from change history to support evaluation of software maintenance tasks. In: 10th Working Conference on Mining Software Repositories (MSR’13), Data Track, San Francisco, CA, May 18–19. pp 131–134

  • Dit B, Moritz E, Linares-Vásquez M, Poshyvanyk D (2013b) Supporting and accelerating reproducible research in software maintenance using TraceLab component library. In: 29th IEEE International Conference on Software Maintenance (ICSM’13), Eindhoven, the Netherlands, September 22–28. pp 330–339

  • Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013c) Configuring topic models for software engineering tasks in TraceLab. In: 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13), San Francisco, California, May 19. pp 105–109

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2013d) Feature Location in Source Code: A Taxonomy and Survey. J Softw: Evol Process (JSEP) 25(1):53–95. doi:10.1002/smr.567

    Google Scholar 

  • Dit B, Revelle M, Poshyvanyk D (2013e) Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software. Empir Softw Eng 18(2):277–309. doi:10.1007/s10664-011-9194-4

    Article  Google Scholar 

  • Do H, Elbaum S, Rothermel G (2005) Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact. Empir Softw Eng 10(4):405–435

    Article  Google Scholar 

  • Egyed A (2003) A scenario-driven approach to trace dependency analysis. IEEE Trans Softw Eng (TSE) 29(2):116–132

    Article  Google Scholar 

  • Eisenbarth T, Koschke R, Simon D (2001) Feature-driven program understanding using concept analysis of execution traces. Paper presented at the IWPC

  • Enslen E, Hill E, Pollock L, Vijay-Shanker K (2009) Mining source code to automatically split identifiers for software analysis. In: 6th IEEE Working Conference on Mining Software Repositories (MSR’09), Vancouver, BC, Canada May 16–17. pp 71–80

  • FETCH (2014) (Fact Extraction Tool CHain) University of Antwerp. http://lore.ua.ac.be/fetchWiki/. Accessed 15 April 2014

  • Gay G, Haiduc S, Marcus M, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 25th IEEE International Conference on Software Maintenance (ICSM’09), Edmonton, Canada, September. pp 351–360

  • Gethers M, Oliveto R, Poshyvanyk D, De Lucia A (2011) On integrating orthogonal information retrieval methods to improve traceability link recovery. In: 27th IEEE International Conference on Software Maintenance (ICSM’11), Williamsburg, Virginia, USA, September 25–30. pp 133–142

  • González-Barahona JM, Robles G (2012) On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empir Softw Eng (ESE) 17(1–2):75–89. doi:10.1007/s10664-011-9181-9

    Article  Google Scholar 

  • Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: 32nd ACM/IEEE International Conference on Software Engineering (ICSE’10), Cape Town, South Africa. ACM, 1810335, pp 223–226. doi:10.1145/1810295.1810335

  • Hays M, Hayes JH, Stromberg AJ, Bathke AC (2013) Statistical analysis for traceability experiments: Software verification and validation research laboratory (SVVRL) of the University of Kentucky. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 90–94

  • Jørgensen M (2004) A review of studies on expert estimation of software development effort. J Syst Softw (JSS) 70(1):37–60

    Article  Google Scholar 

  • Kaushik N, Tahvildari L (2012) A comparative study of the performance of IR models on duplicate bug detection. In: 16th European Conference on Software Maintenance and Reengineering (CSMR’12). IEEE Computer Society, 2192561, pp 159–168. doi:10.1109/csmr.2012.78

  • Keenan E, Czauderna A, Leach G, Cleland-Huang J, Shin Y, Moritz E, Gethers M, Poshyvanyk D, Maletic J, Hayes JH, Dekhtyar A, Manukian D, Hussein S, Hearn D (2012) TraceLab: an experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions. In: 34th IEEE/ACM International Conference on Software Engineering (ICSE’12), Zurich, Switzerland, June 2–9. pp 1375–1378

  • Kepler (2013) The Kepler Project - University of California.https://kepler-project.org/. Accessed 15 April 2014

  • Kitchenham BA, Budgen D, Brereton OP (2011) Using Mapping Studies as the Basis for Further Research - A Participant-Observer Case Study. Inf Softw Technol 53(6):638–651. doi:10.1016/j.infsof.2010.12.011

    Article  Google Scholar 

  • Kleinberg JM (1999) Authoritative Sources in a Hyperlinked Environment. J ACM 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  • Li W, Hayes JH (2013) Query+ enhancement for semantic tracing (QuEST): Software verification and validation research laboratory (SVVRL) of the University of Kentucky. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 95–99

  • Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE’07), Atlanta, Georgia, November 5–9. pp 234–243

  • Marcus A, Sergeyev A, Rajlich V, Maletic J (2004) An information retrieval approach to concept location in source code. In: 11th IEEE Working Conference on Reverse Engineering (WCRE’04), Delft, The Netherlands, November 9–12. pp 214–223

  • MathWorks (2013) Matlab http://www.mathworks.com/products/matlab/. Accessed 15 April 2014

  • MathWorks (2013) Simulink http://www.mathworks.com/products/simulink/. Accessed 15 April 2014

  • Menzies T, Caglayan B, Kocaguneli E, Krall J, Peters F, Turhan B (2012) The PROMISE repository of empirical software engineering data. http://promisedata.googlecode.com

  • Mytkowicz T, Diwan A, Hauswirth M, Sweeney P (2010) The effect of omitted-variable bias on the evaluation of compiler optimizations. IEEE Comput 43(9):62–67. doi:10.1109/mc.2010.214

    Article  Google Scholar 

  • Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: 18th IEEE International Conference on Program Comprehension (ICPC’10), Braga, Portugal, June 30 - July 2. pp 68–71

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013a) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: 35th IEEE/ACM International Conference on Software Engineering (ICSE’13), San Francisco, CA, May 18–26. pp 522–531

  • Panichella A, McMillan C, Moritz E, Palmieri D, Oliveto R, Poshyvanyk D, De Lucia A (2013b) When and how using structural information to improve IR-based traceability recovery. In: 17th European Conference on Software Mainenance and Reengineering (CSMR’13), Genova, Italy, March 5–8. pp 199–208

  • Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. Paper presented at the 12th International Conference on Evaluation and Assessment in Software Engineering (EASE’08), Italy

  • Poshyvanyk D, Guéhéneuc YG, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng (TSE) 33(6):420–432

    Article  Google Scholar 

  • Rapid-I (2013) Rapid Miner http://rapid-i.com/content/view/181/190/. Accessed 15 April 2014

  • Rempel P, Mader P, Kuschke T (2013) Towards feature-aware retrieval of refinement traces. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 100–104

  • Revelle M, Poshyvanyk D (2009) An exploratory study on assessing feature location techniques. In: 17th IEEE International Conference on Program Comprehension (ICPC’09), Vancouver, British Columbia, Canada, May 17–19. pp 218–222

  • Robles G (2010) Replicating MSR: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 7th IEEE Working Conference on Mining Software Repositories (MSR’10), Cape Town, South Africa, May 2–3. pp 171–180. doi:10.1109/msr.2010.5463348

  • R-Project (2013) R http://www.r-project.org/. Accessed 15 April 2014

  • Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: 29th IEEE/ACM International Conference on Software Engineering (ICSE’07), Minneapolis, MN, USA, May 20–26. pp 499–510

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM (CACM) 18(11):613–620

    Article  MATH  Google Scholar 

  • Sayyad SJ, Menzies TJ (2005) The PROMISE repository of software engineering databases. http://promise.site.uottawa.ca/SERepository. Accessed July 17 2007

  • Scanniello G, Marcus A (2011) Clustering support for static concept location in source code. In: 19th IEEE International Conference on Program Comprehension (ICPC’11), Kingston, Ontario, Canada, June 22–24. pp 1–10

  • Sheffield TUo (2011) GATE: general architecture for text engineering. http://gate.ac.uk/. Accessed April 24 2013

  • Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in Empirical Software Engineering. Empir Softw Eng 13(2):211–218. doi:10.1007/s10664-008-9060-1

    Article  Google Scholar 

  • Taverna, (2014) myGrid team http://www.taverna.org.uk/. Accessed 15 April 2014

  • Tian K, Revelle M, Poshyvanyk D (2009) Using latent Dirichlet allocation for automatic categorization of software. In: 6th IEEE Working Conference on Mining Software Repositories (MSR’09), Vancouver, British Columbia, Canada, May 16–17. pp 163–166

  • Waikato TUo (2013) WEKA http://www.cs.waikato.ac.nz/ml/weka/. Accessed 15 April 2014

  • Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: 30th IEEE/ACM International Conference on Software Engineering (ICSE’08), Leipzig, Germany, May 10–18. pp 461–470

  • Wieloch M, Amornborvornwong S, Cleland-Huang J (2013) Trace-by-classification: a machine learning approach to generate trace links for frequently occurring software artifacts. In: International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE’13). pp 110–114

  • Wiese A, Ho V, Hill E (2011) A Comparison of stemmers on source code identifiers for software search. In: 27th IEEE International Conference on Software Maintenance (ICSM’11), Williamsburg, Virginia, USA, September 25–30. pp 496–499

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: 3rd International Workshop on Predictor Models in Software Engineering (PROMISE’07), Minneapolis, MN, USA, May 19–20. IEEE, p 9

Download references

Acknowledgments

This work is supported in part by the United States NSF CNS-0959924, NSF CCF-1218129, and NSF CCF-1016868 grants. Any opinions, findings and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors. We would also like to acknowledge the team of researchers from DePaul University led by Jane Cleland-Huang: Ed Keenan, Adam Czauderna, Greg Leach, and Piotr Pruski. This work would not have been possible without their continuous support on the TraceLab project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denys Poshyvanyk.

Additional information

Communicated by: Yann-Gaël Guéhéneuc and Tom Mens

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dit, B., Moritz, E., Linares-Vásquez, M. et al. Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library. Empir Software Eng 20, 1198–1236 (2015). https://doi.org/10.1007/s10664-014-9339-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9339-3

Keywords

Navigation