Abstract
Feature location is a frequent software maintenance activity that aims to identify initial source code location pertinent to a software feature. Most of feature location approaches are based, at least in part, on text analysis methods which originate from the natural language context. However, the natural language context and the text data in software repositories have different properties that reveal the need for adaption of the methods to apply in the context of software repositories. One of the differences is the existence of a set of metadata, such as developer information and time stamp, which is associated with the data in the repositories. However, this difference has not been fully considered in previous feature location research studies. This study proposes a feature location approach that analyzes developer expertise profiles, which contain source code entities modified by the associated software developers, to identify the most similar location pertinent to a desired feature. This approach uses a time-aware term-weighting technique to determine the similarity. An experimental evaluation on four open-source projects shows an improvement in the accuracy, performance, and effectiveness up to 55, 39, and 29 %, respectively, compared to the high-performing information retrieval methods used in feature location. Moreover, the proposed time-aware technique increases the accuracy, performance, and effectiveness of the typical term-weighting technique, tf-idf, as much as 15, 11, and 13 %, respectively. Finally, the proposed approach outperforms our previous approach, noun-based feature location, as much as 17 %. These experimental results demonstrate that time-aware analysis of developers’ expertise significantly improves the feature location process.
Similar content being viewed by others
Notes
A changeset is an atomic set of changes of the source code files committed to the source code repository by a project developer during a maintenance activity [30].
The source code repository of software projects.
Note that the time difference is calculated in days.
The source code locations that modified to fix these change requests need to be determined. The change requests that the corresponding locations cannot be correctly determined were removed from this test set.
System properties: (Processor: Intel(R)Core(TM)i5-3470 cpu,3.20GHZ) and (Installed Memory(RAM): 12GB).
References
Abebe SL, Tonella P (2010) Natural language parsing of program element names for concept extraction. In: IEEE 18th international conference on program comprehension (ICPC). IEEE, pp 156–159
Anvik J (2006) Automating bug report assignment. In: Proceedings of the 28th international conference on software engineering (ICSE). ACM, pp 937–940
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, ICSE ’06, New York, NY, USA. ACM, pp 361–370. ISBN: 1-59593-375-1. doi:10.1145/1134285.1134336
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 375–384
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston. ISBN: 020139829X
Bai J, Nie J-Y, Paradis F (2004) Using language models for text classification. In: Asia information retrieval symposium (AIRS), Beijing, China
Biggerstaff TJ, Mitbander BG, Webster D (1993) The concept assignment problem in program understanding. In: Proceedings of the 15th international conference on software engineering (ICSE). IEEE Computer Society Press, pp 482–498
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Butler S, Wermelinger M, Yu Y, Sharp H (2011) Improving the tokenisation of identifier names. In: ECOOP 2011-object-oriented programming, pp 130–154
Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762
Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 168–175
Dit B, Moritz E, Poshyvanyk D (2012) A tracelab-based solution for creating, conducting, and sharing feature location experiments. In: 2012 IEEE 20th international conference on program comprehension (ICPC). IEEE, pp 203–208
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013a) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95
Dit B, Revelle M, Poshyvanyk D (2013b) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Eng 18(2):277–309
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: ICSM 2009. IEEE international conference on software maintenance (ICSM). IEEE, pp 351–360
Gómez VU, Kellens A, Brichau J, D’Hondt T (2009) Time warp, an approach for reasoning over system histories. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, pp 79–88
Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st international conference on software engineering (ICSE). IEEE Computer Society, pp 232–242
Hossen K, Kagdi HH, Poshyvanyk D (2014) Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: ICPC, pp 130–141
Kagdi H, Maletic JI, Sharif B (2007) Mining software repositories for traceability links. In: ICPC’07. 15th IEEE international conference on program comprehension (ICPC). IEEE, pp 145–154
Kagdi H, Gethers M, Poshyvanyk D, Hammad M (2012) Assigning change requests to software developers. J Softw Evol Process 24(1):3–33
Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 234–243
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Petrenko M, Rajlich V, Vanciu R (2008) Partial domain comprehension in software evolution and maintenance. In: ICPC 2008. The 16th IEEE international conference on program comprehension (ICPC). IEEE, pp 13–22
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: ICPC 2006. 14th IEEE international conference on program comprehension (ICPC). IEEE, pp 137–148
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol (TOSEM) 21(4):23
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceeding of the 8th working conference on mining software repositories (MSR), pp 43–52 (2011)
Ratanotayanon S, Choi HJ, Sim SE (2010) Using transitive changesets to support feature location. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, pp 341–344
Ratiu D, Deissenboeck F (2007) From reality to programs and (not quite) back again. In: ICPC’07. 15th IEEE international conference on program comprehension. IEEE, pp 91–102
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220 (ISSN 0001–0782)
Schuler D, Zimmermann T (2008) Mining usage expertise from version archives. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 121–124
Servant F, Jones JA (2012) Whosefault: automatic developer-to-fault assignment through fault localization. In: 2012 34th International conference on software engineering (ICSE), pp 36–46
Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development. ACM, pp 212–224
Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the tenth international workshop on mining software repositories. IEEE Press, pp 2–11
Ramin S, John A, Kasirun ZM, Zamani S (2014) Improving automatic bug assignment using time-metadata in term-weighting. Institution of Engineering and Technology, IET (2014)
Wang S, Lo D, Xing Z, Jiang L (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: 18th Working conference on reverse engineering (WCRE2011). IEEE, pp 92–96
Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62
Wohlin C, Runeson P, Hst M, Ohlsson MC, Regnell B, Wessln A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated. ISBN: 3642290434, 9783642290435
Zamani S, Lee SP, Shokripour R, Anvik J (2014) A noun-based approach to feature location using time-aware term-weighting. Inf Softw Technol 56(8):991–1011
Zhai Chengxiang, Lafferty John (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th International conference on software engineering (ICSE). IEEE, pp 14–24
Acknowledgments
This work is carried out within the framework of the research project supported by High Impact Research Grant with reference UM.C/625/1/HIR/MOHE/FCSIT/13, funded by the Ministry of Education, Malaysia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zamani, S., Lee, S.P., Shokripour, R. et al. A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles. Knowl Inf Syst 49, 629–659 (2016). https://doi.org/10.1007/s10115-015-0909-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0909-5