Abstract
In this paper, we hypothesize that the distorted traceability tracks of a software system can be systematically re-established through refactoring, a set of behavior-preserving transformations for keeping the system quality under control during evolution. To test our hypothesis, we conduct an experimental analysis using three requirements-to-code datasets from various application domains. Our objective is to assess the impact of various refactoring methods on the performance of automated tracing tools based on information retrieval. Results show that renaming inconsistently named code identifiers, using Rename Identifier refactoring, often leads to improvements in traceability. In contrast, removing code clones, using eXtract Method (XM) refactoring, is found to be detrimental. In addition, results show that moving misplaced code fragments, using Move Method refactoring, has no significant impact on trace link retrieval. We further evaluate Rename Identifier refactoring by comparing its performance with other strategies often used to overcome the vocabulary mismatch problem in software artifacts. In addition, we propose and evaluate various techniques to mitigate the negative impact of XM refactoring. An effective traceability sign analysis is also conducted to quantify the effect of these refactoring methods on the vocabulary structure of software systems.
Similar content being viewed by others
References
Advani D, Hassoun Y, Counsell S (2005) Refactoring trends across N versions of N Java open source systems: an empirical study. SCSIS-Birkbeck, University of London Technical Report
Anquetil N, Fourrier C, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: Working conference on reverse engineering, pp 235–255
Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Conference of the centre for advanced studies on collaborative research, pp 4–14
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970–983
Antoniol G, Di Penta M, Merlo E (2004) An automatic approach to identify class evolution discontinuities. In: International workshop on principles of software evolution, pp 31–40
Aslam J, Yilmaz E, Pavlu V (2005) A geometric interpretation of r-precision and its correlation with average precision. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 573–574
Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: International conference on software engineering, pp 95–104
Aversano L, Cerulo L, Di Penta M (2010) How clones are maintained: an empirical study. In: European conference on software maintenance and reengineering, pp 81–90
Baker B (1995) On finding duplication and near-duplication in large software systems. In: Working conference on reverse engineering, pp 86–95
Baxter I, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM, pp 368–377
Ben Charrada E, Koziolek A, Glinz M (2012) Identifying outdated requirements based on source code changes. In: International requirements engineering conference, pp 61 –70
Binkley D, Lawrie D, Maex S, Morrell C (2009) Identifier length and limited programmer memory. Sci Comput Program 74(7):430–445
Blei D, Ng A, Jordan MI (2003) Allocation. J Mach Learn Res 3:993–1022
Bourquin F, Keller R (2007) High-impact refactoring based on architecture violations. In: European conference on software maintenance and reengineering, pp 149–158
Bruntink M, Van Deursen A, Van Engelen R, Tourwé T (2005) On the use of clone detection for identifying crosscutting concern coden. IEEE Trans Softw Eng 31:804–818
Caprile B, Tonella P (2000) Restructuring program identifier names. In: International conference on software maintenance, pp 97–107
Cleland-Huang J, Chang C, Christensen M (2003) Event-based traceability for managing evolutionary change. IEEE Trans Softw Eng 29(9):796–810
Cleland-Huang J, Heimdahl M, Huffman-Hayes J, Lutz R, Mäder P (2012) Trace queries for safety requirements in high assurance systems. In: International conference on requirements engineering: foundation for software quality, pp 179–193
Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: International conference on requirements engineering, pp 135–144
Cleland-Huang J, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35
David K (2003) Selected papers on computer languages. In: CSLI lecture notes, vol 139. Center for the Study of Language and Information
De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichelle S (2012) Using IR methods for labeling source code artifacts: Is it worthwhile? In: International conference on program comprehension, pp 193–202
De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: International conference on software maintenance, pp 299–309
De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92
Dean A, Voss D (1999) Design and analysis of experiments. Springer, New York
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Deissenböck F, Pizka M (2005) Concise and consistent naming. In: International workshop on program comprehension, pp 97–106
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dig D, Johnson R (2005) The role of refactorings in API evolution. In: International conference on software maintenance, pp 389–398
DualaEkoko E, Robillard M (2010) Clone region descriptors: representing and tracking duplication in source code. ACM Trans Softw Eng Methodol 20(1):1–31
Egyed A (2003) A scenario-driven approach to trace dependency analysis. IEEE Trans Softw Eng 9(2):116–132
Eick S, Graves T, Karr A, Marron J, Mockus A (1998) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12
Feilkas M, Ratiu D, Jurgens E (2009) The loss of architectural knowledge during system evolution: an industrial case study. In: International conference on program comprehension, pp 188–197
Fokaefs M, Tsantalis N, Stroulia E, Chatzigeorgiou A (2012) Identification and application of extract class refactorings in object-oriented systems. J Syst Softw 85(10):2241–2260
Fontanaa F, Braionea P, Zanonia M (2011) Automatic detection of bad smells in code: an experimental assessment. J Object Technol 11(2):1–8
Fowler M (1999) Refactoring: improving the design of existing code. Addison–Wesley, Reading
Furnas G, Deerwester S, Dumais S, Landauer T, Xarshman R, Streeter L, Lochbaum K (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: international joint conference on artificial intelligence, pp 1606–1611
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading
Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: International conference on automated software engineering, pp 245–254
Giulio A, Caprile B, PotrichA Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1–4):35–58
Gotel O, Cleland-Huang J, Huffman-Hayes J, Zisman A, Egyed A, Grünbacher P, Antoniol G (2012) The quest for ubiquity: a roadmap for software and systems traceability research. In: international conference on requirements engineering, pp 71–80
Gotel O, Morris S (2011) Out of the labyrinth: leveraging other disciplines for requirements traceability. In: IEEE international requirements engineering conference, pp 121–130
Guerrouj L (2013) Normalizing source code vocabulary to support program comprehension and software quality. In: International conference on software engineering, pp 1385–1388
Haiduc S, Aponte J, Moreno L, Marcus A (2010) On the use of automated text summarization techniques for summarizing source code. In: Working conference on reverse engineering, pp 35–44
Han E, Karypis G (2000) Centroid-based document classification: analysis and experimental results. In: European conference on principles of data mining and knowledge discovery, pp 424–431
Huffman-Hayes J, Dekhtyar A, Osborne (2003) J Improving requirements tracing via information retrieval. In: International conference on requirements engineering, pp 138–147
Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19
Jones K (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Katić M, Fertalj K (2009) Towards an appropriate software refactoring tool support. In: WSEAS international conference on applied computer science, pp 140–145
Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in OOPL. In: International symposium on empirical software engineering, pp 83–92
Kolb R, Muthig D, Patzke T, Yamauchi K (2006) Refactoring a legacy component for reuse in a software product line: a case study: practice articles. J Softw Maint Evol 18(2):109–132
Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Working conference on reverse engineering, pp 253–262
Laitinen K (1996) Estimating understandability of software documents. SIGSOFT Softw Eng Notes 21(4):81–92
Lawrie D, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Working conference on reverse engineering, pp 3–12
Lawrie D, Feild H, Binkley D (2007) Extracting meaning from abbreviated identifiers. In: International working conference on source code analysis and manipulation, pp 213–222
Lehman M (1984) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1(3):213–221
Lethbridge T, Singer J, Forward A (2003) How software engineers use documentation: the state of the practice. IEEE Softw 20(6):35–39
Luo J, Meng B, Liu M, Tu X, Zhang K (2012) Query expansion using explicit semantic analysis. In: International conference on internet multimedia computing and service, pp 123–126
Mäder P, Gotel O, Philippow I (2008) Rule-based maintenance of post-requirements traceability relations. In: International requirements engineering conference, pp 23–32
Mahmoud A, Niu N (2011) Source code indexing for automated tracing. In: International workshop on traceability in emerging forms of software engineering, pp 3–9
Mahmoud A, Niu N (2013) Supporting requirements traceability through refactoring. In: International requirements engineering conference, pp 32–41
Mahmoud A, Niu N, Xu S (2012) A semantic relatedness approach for traceability link recovery. In: International conference on program comprehension, pp 183–192
Maletic J, Marcus A (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: International conference on tools with artificial intelligence, pp 46–53
Manning C, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Mäntylä M, Lassenius C (2006) Drivers for software refactoring decisions. In: International symposium on empirical software engineering, pp 297–306
Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: International conference on software maintenance, pp 244–253
Mealy E, Carrington D, Strooper P, Wyeth P (2007) Improving usability of software refactoring tools. In: Australian software engineering conference, pp 307–318
Meneely A, Smith B, Williams L (2012) iTrust electronic health care system: a case study, chap. software and systems traceability. Springer, New York
Mens T, Tourwé T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139
Moser R, Sillitti A, Abrahamsson P, Succi G (2006) Does refactoring improve reusability? In: International conference on reuse of off-the-shelf components, pp 287–297
Murphy G, Kersten M, Findlater L (2006) How are java software developers using the eclipse IDE. IEEE Softw 23(4):76–83
Murphy-Hill E, Black AP (2008) Breaking the barriers to successful refactoring: observations and tools for extract method. In: ICSE, pp 421–430
Murphy-Hill E, Parnin C, Black AP (2009) How we refactor and how we know it. In: International conference on software engineering, pp 287–297
Niu N, Mahmoud A (2012) Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: IEEE International requirements engineering conference, pp 81–90
Niu N, Mahmoud A, Chen Z, Bradshaw G (2013) Departures from optimality: understanding human analysts information foraging in assisted requirements tracing. In: International conference on software engineering, pp 572–581
Opdyke W (1992) Refactoring object-oriented frameworks. Doctoral thesis, Department of Computer Science, University of Illinois at Urbana-Champaign
Opdyke W, Johnson R (1990) Refactoring: an aid in designing application frameworks and evolving object-oriented systems. In: Symposium on object-oriented programming emphasizing practical applications
Porter M (1997) An algorithm for suffix stripping. In: Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 313–316
Roy C, Cordy J (2007) A survey on software clone detection research. Technical report 541. School of Computing TR 2007-541, Queens University
Roy C, Cordy J (2008) An empirical study of function clones in open source software. In: Working conference on reverse engineering, pp 81–90
Spanoudakis G, Zisman A (2004) Software traceability: a roadmap. Handb Softw Eng Knowl Eng 3:395–428
Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K (2010) Towards automatically generating summary comments for java methods. In: International conference on automated software engineering, pp 43–52
Sultanov H, Huffman-Hayes J, Kong W (2011) Application of swarm techniques to requirements tracing. Requir Eng J 16(3):209–226
Sundaram S, Huffman-Hayes J, Dekhtyar A, Holbrook E (2010) Assessing traceability of software engineering artifacts. Requir Eng J 15(3):313–335
Takang A, Grubb P, Macredie R (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang 4(3):143–167
Teufel S (2007) An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering. In: Dybkjaer L, Hemsen H, Minker W (eds) Evaluation of text and speech systems. Springer, Netherlands, pp 163–186
Thies A, Roth C (2010) Recommending rename refactorings. In: International workshop on recommendation systems for software engineering, pp 1–5
Tourwé T, Mens T (2003) Identifying refactoring opportunities using logic meta programming. In: European conference on software maintenance and reengineering, pp 91–100
Tsantalis N, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367
Wilking D, Kahn U, Kowalewski S (2007) An empirical evaluation of refactoring. e-Inf Softw Eng J 1(1):44–60
Acknowledgments
We would like to thank the partner company for the generous support of our research. This work is supported in part by the US NSF (National Science Foundation) Grant No. CCF1238336.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahmoud, A., Niu, N. Supporting requirements to code traceability through refactoring. Requirements Eng 19, 309–329 (2014). https://doi.org/10.1007/s00766-013-0197-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-013-0197-0