A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

Zamani, Sima; Lee, Sai Peck; Shokripour, Ramin; Anvik, John

doi:10.1007/s10115-015-0909-5

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

Regular Paper
Published: 13 January 2016

Volume 49, pages 629–659, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Sima Zamani ORCID: orcid.org/0000-0002-7671-9228¹,
Sai Peck Lee¹,
Ramin Shokripour² &
…
John Anvik³

388 Accesses
6 Citations
Explore all metrics

Abstract

Feature location is a frequent software maintenance activity that aims to identify initial source code location pertinent to a software feature. Most of feature location approaches are based, at least in part, on text analysis methods which originate from the natural language context. However, the natural language context and the text data in software repositories have different properties that reveal the need for adaption of the methods to apply in the context of software repositories. One of the differences is the existence of a set of metadata, such as developer information and time stamp, which is associated with the data in the repositories. However, this difference has not been fully considered in previous feature location research studies. This study proposes a feature location approach that analyzes developer expertise profiles, which contain source code entities modified by the associated software developers, to identify the most similar location pertinent to a desired feature. This approach uses a time-aware term-weighting technique to determine the similarity. An experimental evaluation on four open-source projects shows an improvement in the accuracy, performance, and effectiveness up to 55, 39, and 29 %, respectively, compared to the high-performing information retrieval methods used in feature location. Moreover, the proposed time-aware technique increases the accuracy, performance, and effectiveness of the typical term-weighting technique, tf-idf, as much as 15, 11, and 13 %, respectively. Finally, the proposed approach outperforms our previous approach, noun-based feature location, as much as 17 %. These experimental results demonstrate that time-aware analysis of developers’ expertise significantly improves the feature location process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact Analysis of Granularity Levels on Feature Location Technique

Is Requirements Similarity a Good Proxy for Software Similarity? An Empirical Investigation in Industry

Link analysis algorithms for static concept location: an empirical assessment

Article 05 August 2014

Notes

A changeset is an atomic set of changes of the source code files committed to the source code repository by a project developer during a maintenance activity [30].
The source code repository of software projects.
Note that the time difference is calculated in days.
http://www.eclipse.org/jdt/.
http://www.eclipse.org/aspectj/.
https://netbeans.org/.
https://developer.mozilla.org/en/docs/Rhino.
The source code locations that modified to fix these change requests need to be determined. The change requests that the corresponding locations cannot be correctly determined were removed from this test set.
https://docs.google.com/file/d/0B0sa-hXpOgiJeWdYelBNRUZZeDQ/edit?usp=sharing.
http://softeng.polito.it/software/effsize/.
http://jeldoclet.sourceforge.net/.
http://www.aktors.org/technologies/annie/.
http://gate.ac.uk/.
http://coest.org/coest-projects/projects/tracelab.
https://drive.google.com/file/d/0B0sahXpOgiJZjNudkJGbnI2WnM/edit?usp=sharing.
http://alias-i.com/lingpipe/
System properties: (Processor: Intel(R)Core(TM)i5-3470 cpu,3.20GHZ) and (Installed Memory(RAM): 12GB).

References

Abebe SL, Tonella P (2010) Natural language parsing of program element names for concept extraction. In: IEEE 18th international conference on program comprehension (ICPC). IEEE, pp 156–159
Anvik J (2006) Automating bug report assignment. In: Proceedings of the 28th international conference on software engineering (ICSE). ACM, pp 937–940
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on software engineering, ICSE ’06, New York, NY, USA. ACM, pp 361–370. ISBN: 1-59593-375-1. doi:10.1145/1134285.1134336
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 375–384
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston. ISBN: 020139829X
Bai J, Nie J-Y, Paradis F (2004) Using language models for text classification. In: Asia information retrieval symposium (AIRS), Beijing, China
Biggerstaff TJ, Mitbander BG, Webster D (1993) The concept assignment problem in program understanding. In: Proceedings of the 15th international conference on software engineering (ICSE). IEEE Computer Society Press, pp 482–498
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Butler S, Wermelinger M, Yu Y, Sharp H (2011) Improving the tokenisation of identifier names. In: ECOOP 2011-object-oriented programming, pp 130–154
Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762
Article Google Scholar
Cleary B, Exton C, Buckley J, English M (2009) An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir Softw Eng 14(1):93–130
Article Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 168–175
Dit B, Moritz E, Poshyvanyk D (2012) A tracelab-based solution for creating, conducting, and sharing feature location experiments. In: 2012 IEEE 20th international conference on program comprehension (ICPC). IEEE, pp 203–208
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013a) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95
Article Google Scholar
Dit B, Revelle M, Poshyvanyk D (2013b) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir Softw Eng 18(2):277–309
Article Google Scholar
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: ICSM 2009. IEEE international conference on software maintenance (ICSM). IEEE, pp 351–360
Gómez VU, Kellens A, Brichau J, D’Hondt T (2009) Time warp, an approach for reasoning over system histories. In: Proceedings of the joint international and annual ERCIM workshops on principles of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, pp 79–88
Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st international conference on software engineering (ICSE). IEEE Computer Society, pp 232–242
Hossen K, Kagdi HH, Poshyvanyk D (2014) Amalgamating source code authors, maintainers, and change proneness to triage change requests. In: ICPC, pp 130–141
Kagdi H, Maletic JI, Sharif B (2007) Mining software repositories for traceability links. In: ICPC’07. 15th IEEE international conference on program comprehension (ICPC). IEEE, pp 145–154
Kagdi H, Gethers M, Poshyvanyk D, Hammad M (2012) Assigning change requests to software developers. J Softw Evol Process 24(1):3–33
Article Google Scholar
Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 234–243
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Article Google Scholar
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Book MATH Google Scholar
Petrenko M, Rajlich V, Vanciu R (2008) Partial domain comprehension in software evolution and maintenance. In: ICPC 2008. The 16th IEEE international conference on program comprehension (ICPC). IEEE, pp 13–22
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2006) Combining probabilistic ranking and latent semantic indexing for feature identification. In: ICPC 2006. 14th IEEE international conference on program comprehension (ICPC). IEEE, pp 137–148
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Article Google Scholar
Poshyvanyk D, Gethers M, Marcus A (2012) Concept location using formal concept analysis and information retrieval. ACM Trans Softw Eng Methodol (TOSEM) 21(4):23
Article Google Scholar
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceeding of the 8th working conference on mining software repositories (MSR), pp 43–52 (2011)
Ratanotayanon S, Choi HJ, Sim SE (2010) Using transitive changesets to support feature location. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, pp 341–344
Ratiu D, Deissenboeck F (2007) From reality to programs and (not quite) back again. In: ICPC’07. 15th IEEE international conference on program comprehension. IEEE, pp 91–102
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620. doi:10.1145/361219.361220 (ISSN 0001–0782)
Article MATH Google Scholar
Schuler D, Zimmermann T (2008) Mining usage expertise from version archives. In: Proceedings of the 2008 international working conference on mining software repositories. ACM, pp 121–124
Servant F, Jones JA (2012) Whosefault: automatic developer-to-fault assignment through fault localization. In: 2012 34th International conference on software engineering (ICSE), pp 36–46
Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the 6th international conference on aspect-oriented software development. ACM, pp 212–224
Shokripour R, Anvik J, Kasirun ZM, Zamani S (2013) Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation. In: Proceedings of the tenth international workshop on mining software repositories. IEEE Press, pp 2–11
Ramin S, John A, Kasirun ZM, Zamani S (2014) Improving automatic bug assignment using time-metadata in term-weighting. Institution of Engineering and Technology, IET (2014)
Wang S, Lo D, Xing Z, Jiang L (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: 18th Working conference on reverse engineering (WCRE2011). IEEE, pp 92–96
Wilde N, Scully MC (1995) Software reconnaissance: mapping program features to code. J Softw Maint Res Pract 7(1):49–62
Article Google Scholar
Wohlin C, Runeson P, Hst M, Ohlsson MC, Regnell B, Wessln A (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated. ISBN: 3642290434, 9783642290435
Zamani S, Lee SP, Shokripour R, Anvik J (2014) A noun-based approach to feature location using time-aware term-weighting. Inf Softw Technol 56(8):991–1011
Article Google Scholar
Zhai Chengxiang, Lafferty John (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214
Article Google Scholar
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th International conference on software engineering (ICSE). IEEE, pp 14–24

Download references

Acknowledgments

This work is carried out within the framework of the research project supported by High Impact Research Grant with reference UM.C/625/1/HIR/MOHE/FCSIT/13, funded by the Ministry of Education, Malaysia.

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Sima Zamani & Sai Peck Lee
Information Technology Department, Iran Telecommunication Research Center, Tehran, Iran
Ramin Shokripour
Mathematics and Computer Science, University of Lethbridge, Lethbridge, AB, Canada
John Anvik

Authors

Sima Zamani
View author publications
You can also search for this author in PubMed Google Scholar
Sai Peck Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Shokripour
View author publications
You can also search for this author in PubMed Google Scholar
John Anvik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sima Zamani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zamani, S., Lee, S.P., Shokripour, R. et al. A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles. Knowl Inf Syst 49, 629–659 (2016). https://doi.org/10.1007/s10115-015-0909-5

Download citation

Received: 27 August 2014
Revised: 24 September 2015
Accepted: 25 October 2015
Published: 13 January 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s10115-015-0909-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

Abstract

Access this article

Similar content being viewed by others

Impact Analysis of Granularity Levels on Feature Location Technique

Is Requirements Similarity a Good Proxy for Software Similarity? An Empirical Investigation in Industry

Link analysis algorithms for static concept location: an empirical assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A feature location approach supported by time-aware weighting of terms associated with developer expertise profiles

Abstract

Access this article

Similar content being viewed by others

Impact Analysis of Granularity Levels on Feature Location Technique

Is Requirements Similarity a Good Proxy for Software Similarity? An Empirical Investigation in Industry

Link analysis algorithms for static concept location: an empirical assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation