Skip to main content
Log in

An empirical study on the importance of source code entities for requirements traceability

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers’ eye movements while they verify RT links. We analyse the obtained data to identify and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers’ preferred types of SCEs and not their locations that attract developers’ attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers’ preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In this paper, we call “source code entities” any domain-level term, implementation-level term, class name, method name, variable name, or comment found in a piece of code. Domain concepts are concepts pertaining to the use of the system by users. Implementation concepts relate to data structures, GUI elements, databases, and algorithms. For example, in the Pooka e-mail client, addAddress in AddressBook.java class and addFocusListener in AddressEntryTextArea.java are domain-level and implementation-level concepts, respectively.

  2. http://www.ptidej.net/download/experiments/emse13b/

  3. http://www.eyeresponse.com/

  4. http://www.ptidej.net/research/taupe/

  5. We consider any object X is a source code class, i.e., c i .

  6. http://agile.csc.ncsu.edu/iTrust

  7. http://lucene.apache.org/

  8. http://www.suberic.net/pooka/

  9. http://mallet.cs.umass.edu

References

  • Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Proceeding of 16th IEEE international conference on program comprehension, pp 103 –112

  • Abebe SL, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceeding of 18th working conference on reverse engineering (WCRE), pp 77–86

  • Ali N, Guéhéneuc Y-G, Antoniol G (2011a) Factors impacting the inputs of traceability recovery approaches. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability, chapter 7. Springer, New York

    Google Scholar 

  • Ali N, Gueheneuc Y-G, Antoniol G (2011b) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of 18th working conference on reverse engineering, WCRE ’11. IEEE Computer Society, Washington, DC, pp pp 45–54

  • Ali N, Guéhéneuc Y-G, Antoniol G (2011c) Trust-based requirements traceability. In: Proceeding of 19th IEEE international conference on program comprehension. IEEE Computer Society, Washington, DC,p 10

  • Ali N, Guéhéneuc Y-G, Antoniol G (2012a) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 99(PrePrints):1

    Google Scholar 

  • Ali N, Sharafi Z, Guéhéneuc Y-G, Antoniol G (2012b) An empirical study on requirements traceability using eye-tracking. In: Proceedings of IEEE international conference on software maintenance, pp 191–200

  • Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1):35–58

    Article  Google Scholar 

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983

    Article  Google Scholar 

  • Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10. ACM, New York, pp 97–106

  • Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. Sigplan Not 43(10):543–562

    Article  Google Scholar 

  • Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing program comprehension processes. In: Proceedings of the 2006 symposium on eye tracking research & applications. ETRA ’06. ACM, New York, pp 125–132

  • Bunge M (1977) Treatise on basic philosophy: vol. 3: ontology I: the furniture of the world. Reidel, Boston

    Book  MATH  Google Scholar 

  • Busjahn T, Schulte C, Busjahn A (2011) Analysis of code reading to gain more insight in program comprehension. In: Proceedings of the 11th Koli calling international conference on computing education research. Koli Calling ’11. ACM, New York, pp 1–9

  • Cepeda Porras G, Guéhéneuc Y-G (2010) An empirical study on the efficiency of different design pattern representations in uml class diagrams. Empir Softw Eng 15:493–522

    Article  Google Scholar 

  • Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1. ICSE ’10. ACM, New York, pp 275–284

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4)

  • De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011a) Improving ir-based traceability recovery using smoothing filters. In: Proceeding of 19th IEEE international conference on program comprehension, pp 21 –30

  • De Lucia A, Di Penta M, Oliveto R (2011b) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37:205–227

    Article  Google Scholar 

  • De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability, pp 71–98

  • De Smet B, Lempereur L, Sharafi Z, Guéhéneuc Y-G, Antoniol G, Habra N (2012) Taupe: visualizing and analyzing eye-tracking data. Sci Comput Program

  • Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in tracelab. In: Proceedings of 7th ACM/IEEE international conference in software engineering, vol 13, pp 105–109

  • Duchowski AT (2002) A breadth-first survey of eye-tracking applications. Behav Res Methods 34(4):455–470

    Article  Google Scholar 

  • Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer, New York

    Google Scholar 

  • Erol B, Berkner K, Joshi S (2006) Multimedia thumbnails for documents. In: Proceedings of the 14th annual ACM international conference on Multimedia. MULTIMEDIA ’06. ACM, New York, pp 231–240

  • Gethers M, Savage T, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2011) Codetopics: which topic am i coding now? In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1034–1036

  • Gotel OCZ, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: 1st international conference on requirements engineering, pp 94–101

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235

    Article  Google Scholar 

  • Guéhéneuc YG (2006) Taupe: towards understanding program comprehension. In: Proceedings of conference of the center for advanced studies on collaborative research. ACM, pp 1–13

  • Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng 28(8):721–734

    Article  Google Scholar 

  • Kowalski G (2010) Information retrieval architecture and algorithms. Springer, New York

    Google Scholar 

  • Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555

    Google Scholar 

  • Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS Press, Portland, pp 125–135

  • Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, ISEC ’08. ACM, New York, pp 113–120

  • Pan B, Hembrooke H, Joachims T, Lorigo L, Gay G, Granka L (2007) In Google we trust: users’ decisions on rank, position, and relevance. J Comput-Mediat Commun 12(3):801–823

    Article  Google Scholar 

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531

  • Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432

    Article  Google Scholar 

  • Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372

    Article  Google Scholar 

  • Seeing Machine (2012) Seeing Machine’s website - FaceLAB. http://www.seeingmachines.com/product/facelab/ Accessed 13 July 2012

  • Sharif B, Kagdi H (2011) On the use of eye tracking in software traceability. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering (TEFSE). New York, pp 67–70

  • Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of 18th international conference on program comprehension (ICPC). IEEE, pp 196–205

  • Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the symposium on eye tracking research and applications. ETRA ’12. ACM, New York, pp 381–384

  • Sun YH, He PL, Chen ZG (2004) An improved term weighting scheme for vector space model. In: Proceedings of 2004 international conference on machine learning and cybernetics, vol 3. IEEE, pp 1692–1695

  • Uwano H, Nakamura M, Monden A, Matsumoto K (2006) Analyzing individual performance of source code review using reviewers’ eye movement. In: Proceedings of the 2006 symposium on eye tracking research & applications (ETRA). ACM, New York, pp 133–140

  • Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of 27th IEEE international conference on software maintenance (ICSM), pp 213–222

  • Yusuf S, Kagdi H, Maletic JI (2007) Assessing the comprehension of uml class diagrams via eye tracking. In: Proceedings of 15th IEEE international conference on program comprehension (ICPC). IEEE,pp 113–122

Download references

Acknowledgments

The authors would like to thank all the participants of the case studies as this work would not be possible without their collaboration. This work has been partially supported by the NSERC Research Chairs on Software Cost-effective Change, Evolution and on Software Patterns and Patterns of Software, and by Fonds de recherche du Québec – Nature et technologies(FRQNT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasir Ali.

Additional information

Communicated by: Massimiliano Di Penta and Jonathan Maletic

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, N., Sharafi, Z., Guéhéneuc, YG. et al. An empirical study on the importance of source code entities for requirements traceability. Empir Software Eng 20, 442–478 (2015). https://doi.org/10.1007/s10664-014-9315-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9315-y

Keywords

Navigation