Skip to main content
Log in

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Feature location is a frequent software maintenance activity that aims to identify initial source code location pertinent to a software feature. Most of feature location approaches are based, at least in part, on text analysis methods which originate from the natural language context. However, the natural language context and the text data in software repositories have different properties that reveal the need for adaption of the methods to apply in the context of software repositories. One of the differences is the existence of a set of metadata, such as developer information and time stamp, which is associated with the data in the repositories. However, this difference has not been fully considered in previous feature location research studies. This study proposes a feature location approach that analyzes developer expertise profiles, which contain source code entities modified by the associated software developers, to identify the most similar location pertinent to a desired feature. This approach uses a time-aware term-weighting technique to determine the similarity. An experimental evaluation on four open-source projects shows an improvement in the accuracy, performance, and effectiveness up to 55, 39, and 29 %, respectively, compared to the high-performing information retrieval methods used in feature location. Moreover, the proposed time-aware technique increases the accuracy, performance, and effectiveness of the typical term-weighting technique, tf-idf, as much as 15, 11, and 13 %, respectively. Finally, the proposed approach outperforms our previous approach, noun-based feature location, as much as 17 %. These experimental results demonstrate that time-aware analysis of developers’ expertise significantly improves the feature location process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A changeset is an atomic set of changes of the source code files committed to the source code repository by a project developer during a maintenance activity [30].

  2. The source code repository of software projects.

  3. Note that the time difference is calculated in days.

  4. http://www.eclipse.org/jdt/.

  5. http://www.eclipse.org/aspectj/.

  6. https://netbeans.org/.

  7. https://developer.mozilla.org/en/docs/Rhino.

  8. The source code locations that modified to fix these change requests need to be determined. The change requests that the corresponding locations cannot be correctly determined were removed from this test set.

  9. https://docs.google.com/file/d/0B0sa-hXpOgiJeWdYelBNRUZZeDQ/edit?usp=sharing.

  10. http://softeng.polito.it/software/effsize/.

  11. http://jeldoclet.sourceforge.net/.

  12. http://www.aktors.org/technologies/annie/.

  13. http://gate.ac.uk/.

  14. http://coest.org/coest-projects/projects/tracelab.

  15. https://drive.google.com/file/d/0B0sahXpOgiJZjNudkJGbnI2WnM/edit?usp=sharing.

  16. http://alias-i.com/lingpipe/

  17. System properties: (Processor: Intel(R)Core(TM)i5-3470 cpu,3.20GHZ) and (Installed Memory(RAM): 12GB).

References

  1. Abebe SL, Tonella P (2010) Natural language parsing of program element names for concept extraction. In: IEEE 18th international conference on program comprehension (ICPC). IEEE, pp 156–159

  2. Anvik J (2006) Automating bug report assignment. In: Proceedings of the 28th international conference on software engineering (ICSE). ACM, pp 937–940

  3. Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, ICSE ’06, New York, NY, USA. ACM, pp 361–370. ISBN: 1-59593-375-1. doi:10.1145/1134285.1134336

  4. Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 375–384

  5. Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston. ISBN: 020139829X

  6. Bai J, Nie J-Y, Paradis F (2004) Using language models for text classification. In: Asia information retrieval symposium (AIRS), Beijing, China

  7. Biggerstaff TJ, Mitbander BG, Webster D (1993) The concept assignment problem in program understanding. In: Proceedings of the 15th international conference on software engineering (ICSE). IEEE Computer Society Press, pp 482–498

  8. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  9. Butler S, Wermelinger M, Yu Y, Sharp H (2011) Improving the tokenisation of identifier names. In: ECOOP 2011-object-oriented programming, pp 130–154

  10. Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762

    Article  Google Scholar 

  11. Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130

    Article  Google Scholar 

  12. Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 168–175

  13. Dit B, Moritz E, Poshyvanyk D (2012) A tracelab-based solution for creating, conducting, and sharing feature location experiments. In: 2012 IEEE 20th international conference on program comprehension (ICPC). IEEE, pp 203–208

  14. Dit B, Revelle M, Gethers M, Poshyvanyk D (2013a) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95

    Article  Google Scholar 

  15. Dit B, Revelle M, Poshyvanyk D (2013b) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Eng 18(2):277–309

    Article  Google Scholar 

  16. Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: ICSM 2009. IEEE international conference on software maintenance (ICSM). IEEE, pp 351–360

  17. Gómez VU, Kellens A, Brichau J, D’Hondt T (2009) Time warp, an approach for reasoning over system histories. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, pp 79–88

  18. Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st international conference on software engineering (ICSE). IEEE Computer Society, pp 232–242

  19. Hossen K, Kagdi HH, Poshyvanyk D (2014) Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: ICPC, pp 130–141

  20. Kagdi H, Maletic JI, Sharif B (2007) Mining software repositories for traceability links. In: ICPC’07. 15th IEEE international conference on program comprehension (ICPC). IEEE, pp 145–154

  21. Kagdi H, Gethers M, Poshyvanyk D, Hammad M (2012) Assigning change requests to software developers. J Softw Evol Process 24(1):3–33

    Article  Google Scholar 

  22. Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 234–243

  23. Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  24. Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  25. Petrenko M, Rajlich V, Vanciu R (2008) Partial domain comprehension in software evolution and maintenance. In: ICPC 2008. The 16th IEEE international conference on program comprehension (ICPC). IEEE, pp 13–22

  26. Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: ICPC 2006. 14th IEEE international conference on program comprehension (ICPC). IEEE, pp 137–148

  27. Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432

    Article  Google Scholar 

  28. Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol (TOSEM) 21(4):23

    Article  Google Scholar 

  29. Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceeding of the 8th working conference on mining software repositories (MSR), pp 43–52 (2011)

  30. Ratanotayanon S, Choi HJ, Sim SE (2010) Using transitive changesets to support feature location. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, pp 341–344

  31. Ratiu D, Deissenboeck F (2007) From reality to programs and (not quite) back again. In: ICPC’07. 15th IEEE international conference on program comprehension. IEEE, pp 91–102

  32. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220 (ISSN 0001–0782)

    Article  MATH  Google Scholar 

  33. Schuler D, Zimmermann T (2008) Mining usage expertise from version archives. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 121–124

  34. Servant F, Jones JA (2012) Whosefault: automatic developer-to-fault assignment through fault localization. In: 2012 34th International conference on software engineering (ICSE), pp 36–46

  35. Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development. ACM, pp 212–224

  36. Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the tenth international workshop on mining software repositories. IEEE Press, pp 2–11

  37. Ramin S, John A, Kasirun ZM, Zamani S (2014) Improving automatic bug assignment using time-metadata in term-weighting. Institution of Engineering and Technology, IET (2014)

  38. Wang S, Lo D, Xing Z, Jiang L (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: 18th Working conference on reverse engineering (WCRE2011). IEEE, pp 92–96

  39. Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62

    Article  Google Scholar 

  40. Wohlin C, Runeson P, Hst M, Ohlsson MC, Regnell B, Wessln A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated. ISBN: 3642290434, 9783642290435

  41. Zamani S, Lee SP, Shokripour R, Anvik J (2014) A noun-based approach to feature location using time-aware term-weighting. Inf Softw Technol 56(8):991–1011

    Article  Google Scholar 

  42. Zhai Chengxiang, Lafferty John (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214

    Article  Google Scholar 

  43. Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th International conference on software engineering (ICSE). IEEE, pp 14–24

Download references

Acknowledgments

This work is carried out within the framework of the research project supported by High Impact Research Grant with reference UM.C/625/1/HIR/MOHE/FCSIT/13, funded by the Ministry of Education, Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sima Zamani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zamani, S., Lee, S.P., Shokripour, R. et al. A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles. Knowl Inf Syst 49, 629–659 (2016). https://doi.org/10.1007/s10115-015-0909-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0909-5

Keywords

Navigation