Skip to main content
Log in

Supporting requirements to code traceability through refactoring

  • RE 2013
  • Published:
Requirements Engineering Aims and scope Submit manuscript

Abstract

In this paper, we hypothesize that the distorted traceability tracks of a software system can be systematically re-established through refactoring, a set of behavior-preserving transformations for keeping the system quality under control during evolution. To test our hypothesis, we conduct an experimental analysis using three requirements-to-code datasets from various application domains. Our objective is to assess the impact of various refactoring methods on the performance of automated tracing tools based on information retrieval. Results show that renaming inconsistently named code identifiers, using Rename Identifier refactoring, often leads to improvements in traceability. In contrast, removing code clones, using eXtract Method (XM) refactoring, is found to be detrimental. In addition, results show that moving misplaced code fragments, using Move Method refactoring, has no significant impact on trace link retrieval. We further evaluate Rename Identifier refactoring by comparing its performance with other strategies often used to overcome the vocabulary mismatch problem in software artifacts. In addition, we propose and evaluate various techniques to mitigate the negative impact of XM refactoring. An effective traceability sign analysis is also conducted to quantify the effect of these refactoring methods on the vocabulary structure of software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.jdeodorant.org/.

  2. http://wiki.eclipse.org/Duplicated_code_detection_tool_(SDD).

References

  1. Advani D, Hassoun Y, Counsell S (2005) Refactoring trends across N versions of N Java open source systems: an empirical study. SCSIS-Birkbeck, University of London Technical Report

  2. Anquetil N, Fourrier C, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: Working conference on reverse engineering, pp 235–255

  3. Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Conference of the centre for advanced studies on collaborative research, pp 4–14

  4. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28:970–983

    Google Scholar 

  5. Antoniol G, Di Penta M, Merlo E (2004) An automatic approach to identify class evolution discontinuities. In: International workshop on principles of software evolution, pp 31–40

  6. Aslam J, Yilmaz E, Pavlu V (2005) A geometric interpretation of r-precision and its correlation with average precision. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 573–574

  7. Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: International conference on software engineering, pp 95–104

  8. Aversano L, Cerulo L, Di Penta M (2010) How clones are maintained: an empirical study. In: European conference on software maintenance and reengineering, pp 81–90

  9. Baker B (1995) On finding duplication and near-duplication in large software systems. In: Working conference on reverse engineering, pp 86–95

  10. Baxter I, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM, pp 368–377

  11. Ben Charrada E, Koziolek A, Glinz M (2012) Identifying outdated requirements based on source code changes. In: International requirements engineering conference, pp 61 –70

  12. Binkley D, Lawrie D, Maex S, Morrell C (2009) Identifier length and limited programmer memory. Sci Comput Program 74(7):430–445

    MATH  MathSciNet  Google Scholar 

  13. Blei D, Ng A, Jordan MI (2003) Allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  14. Bourquin F, Keller R (2007) High-impact refactoring based on architecture violations. In: European conference on software maintenance and reengineering, pp 149–158

  15. Bruntink M, Van Deursen A, Van Engelen R, Tourwé T (2005) On the use of clone detection for identifying crosscutting concern coden. IEEE Trans Softw Eng 31:804–818

    Google Scholar 

  16. Caprile B, Tonella P (2000) Restructuring program identifier names. In: International conference on software maintenance, pp 97–107

  17. Cleland-Huang J, Chang C, Christensen M (2003) Event-based traceability for managing evolutionary change. IEEE Trans Softw Eng 29(9):796–810

    Google Scholar 

  18. Cleland-Huang J, Heimdahl M, Huffman-Hayes J, Lutz R, Mäder P (2012) Trace queries for safety requirements in high assurance systems. In: International conference on requirements engineering: foundation for software quality, pp 179–193

  19. Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: International conference on requirements engineering, pp 135–144

  20. Cleland-Huang J, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35

    Google Scholar 

  21. David K (2003) Selected papers on computer languages. In: CSLI lecture notes, vol 139. Center for the Study of Language and Information

  22. De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichelle S (2012) Using IR methods for labeling source code artifacts: Is it worthwhile? In: International conference on program comprehension, pp 193–202

  23. De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: International conference on software maintenance, pp 299–309

  24. De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92

    Google Scholar 

  25. Dean A, Voss D (1999) Design and analysis of experiments. Springer, New York

    MATH  Google Scholar 

  26. Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Google Scholar 

  27. Deissenböck F, Pizka M (2005) Concise and consistent naming. In: International workshop on program comprehension, pp 97–106

  28. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  29. Dig D, Johnson R (2005) The role of refactorings in API evolution. In: International conference on software maintenance, pp 389–398

  30. DualaEkoko E, Robillard M (2010) Clone region descriptors: representing and tracking duplication in source code. ACM Trans Softw Eng Methodol 20(1):1–31

    Google Scholar 

  31. Egyed A (2003) A scenario-driven approach to trace dependency analysis. IEEE Trans Softw Eng 9(2):116–132

    Google Scholar 

  32. Eick S, Graves T, Karr A, Marron J, Mockus A (1998) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12

    Google Scholar 

  33. Feilkas M, Ratiu D, Jurgens E (2009) The loss of architectural knowledge during system evolution: an industrial case study. In: International conference on program comprehension, pp 188–197

  34. Fokaefs M, Tsantalis N, Stroulia E, Chatzigeorgiou A (2012) Identification and application of extract class refactorings in object-oriented systems. J Syst Softw 85(10):2241–2260

    Google Scholar 

  35. Fontanaa F, Braionea P, Zanonia M (2011) Automatic detection of bad smells in code: an experimental assessment. J Object Technol 11(2):1–8

    Google Scholar 

  36. Fowler M (1999) Refactoring: improving the design of existing code. Addison–Wesley, Reading

    Google Scholar 

  37. Furnas G, Deerwester S, Dumais S, Landauer T, Xarshman R, Streeter L, Lochbaum K (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480

  38. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: international joint conference on artificial intelligence, pp 1606–1611

  39. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading

    Google Scholar 

  40. Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: International conference on automated software engineering, pp 245–254

  41. Giulio A, Caprile B, PotrichA Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1–4):35–58

    Google Scholar 

  42. Gotel O, Cleland-Huang J, Huffman-Hayes J, Zisman A, Egyed A, Grünbacher P, Antoniol G (2012) The quest for ubiquity: a roadmap for software and systems traceability research. In: international conference on requirements engineering, pp 71–80

  43. Gotel O, Morris S (2011) Out of the labyrinth: leveraging other disciplines for requirements traceability. In: IEEE international requirements engineering conference, pp 121–130

  44. Guerrouj L (2013) Normalizing source code vocabulary to support program comprehension and software quality. In: International conference on software engineering, pp 1385–1388

  45. Haiduc S, Aponte J, Moreno L, Marcus A (2010) On the use of automated text summarization techniques for summarizing source code. In: Working conference on reverse engineering, pp 35–44

  46. Han E, Karypis G (2000) Centroid-based document classification: analysis and experimental results. In: European conference on principles of data mining and knowledge discovery, pp 424–431

  47. Huffman-Hayes J, Dekhtyar A, Osborne (2003) J Improving requirements tracing via information retrieval. In: International conference on requirements engineering, pp 138–147

  48. Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19

    Google Scholar 

  49. Jones K (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481

    Google Scholar 

  50. Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670

    Google Scholar 

  51. Katić M, Fertalj K (2009) Towards an appropriate software refactoring tool support. In: WSEAS international conference on applied computer science, pp 140–145

  52. Kim M, Bergman L, Lau T, Notkin D (2004) An ethnographic study of copy and paste programming practices in OOPL. In: International symposium on empirical software engineering, pp 83–92

  53. Kolb R, Muthig D, Patzke T, Yamauchi K (2006) Refactoring a legacy component for reuse in a software product line: a case study: practice articles. J Softw Maint Evol 18(2):109–132

    Google Scholar 

  54. Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: Working conference on reverse engineering, pp 253–262

  55. Laitinen K (1996) Estimating understandability of software documents. SIGSOFT Softw Eng Notes 21(4):81–92

    Google Scholar 

  56. Lawrie D, Binkley D, Morrell C (2010) Normalizing source code vocabulary. In: Working conference on reverse engineering, pp 3–12

  57. Lawrie D, Feild H, Binkley D (2007) Extracting meaning from abbreviated identifiers. In: International working conference on source code analysis and manipulation, pp 213–222

  58. Lehman M (1984) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1(3):213–221

    Google Scholar 

  59. Lethbridge T, Singer J, Forward A (2003) How software engineers use documentation: the state of the practice. IEEE Softw 20(6):35–39

    Google Scholar 

  60. Luo J, Meng B, Liu M, Tu X, Zhang K (2012) Query expansion using explicit semantic analysis. In: International conference on internet multimedia computing and service, pp 123–126

  61. Mäder P, Gotel O, Philippow I (2008) Rule-based maintenance of post-requirements traceability relations. In: International requirements engineering conference, pp 23–32

  62. Mahmoud A, Niu N (2011) Source code indexing for automated tracing. In: International workshop on traceability in emerging forms of software engineering, pp 3–9

  63. Mahmoud A, Niu N (2013) Supporting requirements traceability through refactoring. In: International requirements engineering conference, pp 32–41

  64. Mahmoud A, Niu N, Xu S (2012) A semantic relatedness approach for traceability link recovery. In: International conference on program comprehension, pp 183–192

  65. Maletic J, Marcus A (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: International conference on tools with artificial intelligence, pp 46–53

  66. Manning C, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  67. Mäntylä M, Lassenius C (2006) Drivers for software refactoring decisions. In: International symposium on empirical software engineering, pp 297–306

  68. Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: International conference on software maintenance, pp 244–253

  69. Mealy E, Carrington D, Strooper P, Wyeth P (2007) Improving usability of software refactoring tools. In: Australian software engineering conference, pp 307–318

  70. Meneely A, Smith B, Williams L (2012) iTrust electronic health care system: a case study, chap. software and systems traceability. Springer, New York

    Google Scholar 

  71. Mens T, Tourwé T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139

    Google Scholar 

  72. Moser R, Sillitti A, Abrahamsson P, Succi G (2006) Does refactoring improve reusability? In: International conference on reuse of off-the-shelf components, pp 287–297

  73. Murphy G, Kersten M, Findlater L (2006) How are java software developers using the eclipse IDE. IEEE Softw 23(4):76–83

    Google Scholar 

  74. Murphy-Hill E, Black AP (2008) Breaking the barriers to successful refactoring: observations and tools for extract method. In: ICSE, pp 421–430

  75. Murphy-Hill E, Parnin C, Black AP (2009) How we refactor and how we know it. In: International conference on software engineering, pp 287–297

  76. Niu N, Mahmoud A (2012) Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: IEEE International requirements engineering conference, pp 81–90

  77. Niu N, Mahmoud A, Chen Z, Bradshaw G (2013) Departures from optimality: understanding human analysts information foraging in assisted requirements tracing. In: International conference on software engineering, pp 572–581

  78. Opdyke W (1992) Refactoring object-oriented frameworks. Doctoral thesis, Department of Computer Science, University of Illinois at Urbana-Champaign

  79. Opdyke W, Johnson R (1990) Refactoring: an aid in designing application frameworks and evolving object-oriented systems. In: Symposium on object-oriented programming emphasizing practical applications

  80. Porter M (1997) An algorithm for suffix stripping. In: Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 313–316

  81. Roy C, Cordy J (2007) A survey on software clone detection research. Technical report 541. School of Computing TR 2007-541, Queens University

  82. Roy C, Cordy J (2008) An empirical study of function clones in open source software. In: Working conference on reverse engineering, pp 81–90

  83. Spanoudakis G, Zisman A (2004) Software traceability: a roadmap. Handb Softw Eng Knowl Eng 3:395–428

    Google Scholar 

  84. Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K (2010) Towards automatically generating summary comments for java methods. In: International conference on automated software engineering, pp 43–52

  85. Sultanov H, Huffman-Hayes J, Kong W (2011) Application of swarm techniques to requirements tracing. Requir Eng J 16(3):209–226

    Google Scholar 

  86. Sundaram S, Huffman-Hayes J, Dekhtyar A, Holbrook E (2010) Assessing traceability of software engineering artifacts. Requir Eng J 15(3):313–335

    Google Scholar 

  87. Takang A, Grubb P, Macredie R (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang 4(3):143–167

    Google Scholar 

  88. Teufel S (2007) An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering. In: Dybkjaer L, Hemsen H, Minker W (eds) Evaluation of text and speech systems. Springer, Netherlands, pp 163–186

    Google Scholar 

  89. Thies A, Roth C (2010) Recommending rename refactorings. In: International workshop on recommendation systems for software engineering, pp 1–5

  90. Tourwé T, Mens T (2003) Identifying refactoring opportunities using logic meta programming. In: European conference on software maintenance and reengineering, pp 91–100

  91. Tsantalis N, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Trans Softw Eng 35(3):347–367

    Google Scholar 

  92. Wilking D, Kahn U, Kowalewski S (2007) An empirical evaluation of refactoring. e-Inf Softw Eng J 1(1):44–60

    Google Scholar 

Download references

Acknowledgments

We would like to thank the partner company for the generous support of our research. This work is supported in part by the US NSF (National Science Foundation) Grant No. CCF1238336.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anas Mahmoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmoud, A., Niu, N. Supporting requirements to code traceability through refactoring. Requirements Eng 19, 309–329 (2014). https://doi.org/10.1007/s00766-013-0197-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00766-013-0197-0

Keywords

Navigation