Skip to main content
Log in

Link analysis algorithms for static concept location: an empirical assessment

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

During software evolution, one of the most important comprehension activities is concept location in source code, as it identifies the places in the code where changes are to be made in response to a modification request. Change requests (such as, bug fixing or new feature requests) are usually formulated in natural language, while the source code also includes large amounts of text. In consequence, many of the existing concept location techniques are based on text search or text retrieval. Such approaches reformulate concept location as a document retrieval problem. We refine and improve such solutions by leveraging dependencies between source code elements. Dependency information is used by a link analysis algorithm to rank the document space and to improve concept location based on text retrieval. We implemented our solution to concept location using the PageRank algorithm, used in web document retrieval applications. The results of an empirical evaluation indicate that the new approach leads to better retrieval performance than baseline approaches that use text retrieval and clustering. In addition, we present the results of a controlled experiment and of a differentiated replication to assess whether the new technique supports users in identifying the places in the code where changes are to be made. The results of these experiments revealed that the users exploiting our technique were significantly better supported in the identification of the code to be changed in response to a bug fixing request compared to the users who did not use this technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. www.cs.wayne.edu/~severe/pagerank.

  2. Applying Bonferroni the corrected α value is equal to \(\frac {0.05}{9} = 0.0055\), where 9 is the number of systems studied in our empirical evaluation. We used this corrected α value, when the data from all the software systems have been analyzed together.

  3. Although there is no formal standard for the power of a statistical test, the value 0.80 is considered as a reasonable threshold for adequacy (Ellis 2010)

  4. Differentiated replications introduce variations in essential aspects of the experimental conditions (Basili et al. 1999). One prominent variation concerns the executions of replications with different kinds of participants and different design. In Shull et al. (2008), this kind of replication is also named independent or conceptual replication.

  5. www.cs.wayne.edu/~severe/pagerank

  6. In Italy, the exam grades are expressed as integers and assume values in between 18 and 30. The lowest grade is 18, while the highest is 30.

  7. They are line graphs in which the means of the dependent variables for each level of one factor are plotted over all the levels of the second factor. If the lines are nearly parallel, then no interaction is present, and an interaction is present otherwise. Intersecting lines are a clear evidence of an interaction between factors.

  8. We chose boxplots to show the results, rather than clustered bar charts, for example, because of the different designs used in the experiments and the different number of participants in these two experiments. For instance, in USB1 all the participants answered the questions from Q1 to Q5, while only those used PR answered the question from Q6 to Q9. In USB2 all the participants answered all the questions. The adoption of clustered bar chart could then introduce some distortions, when summarizing the post-experiment data for the discussion.

  9. Given two values (a, b), it is computed as (ab)/b∗100

References

  • Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: International conference on program comprehension. IEEE CS Press, Washington, DC, pp 103–112

    Google Scholar 

  • Abrahão S, Gravino C, Pelozo EI, Scanniello G, Tortora G (2013) Assessing the effectiveness of sequence diagrams in the comprehension of functional requirements: results from a family of five experiments. IEEE Trans Soft Eng 39 (3):327–342

    Article  Google Scholar 

  • Ali N, Sabane A, Guéhéneuc Y-G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of international working conference on source code analysis and manipulation (SCAM). IEEE Computer Society, Washington, DC, p 174–183

  • Aranda J, Ernst N, Horkoff J, Easterbrook S (2007) A framework for empirical evaluation of model comprehensibility. In: Proceedings of modeling in software engineering. ICSE Workshop, pp 7–13. IEEE

  • Arisholm E, Briand LC, Hove SE, Labiche Y (2006) The impact of UML documentation on software maintenance: an experimental evaluation. IEEE Trans Soft Eng 32:365–381

    Article  Google Scholar 

  • Bajracharya SK, Ngo TC, Linstead E, Dou Y, Rigor P, Baldi P, Lopes CV (2006) Sourcerer: a search engine for open source code supporting structure-based search. In: Tarr PL, Cook WR (eds) Companion to the 21th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications (OOPSLA), Portland, pp 681–682. ACM

  • Basili V, Caldiera G, Rombach DH (1994) The goal question metric paradigm, encyclopedia of software engineering. Wiley

  • Basili VR, Shull F, Lanubile F (1999) Building knoledge through families of experiments. In: IEEE Transactions on Software Engineering, IEEE

  • Beard M, Kraft N, Etzkorn L, Lukins S (2011) Measuring the accuracy of information retrieval based bug localization techniques. In: Proceedings of working conference on reverse engineering (WCRE). IEEE Computer Society, Washington, DC, pp 124–128

    Google Scholar 

  • Briand LC, Labiche Y, Di Penta M, Yan-Bondoc H (2005) An experimental investigation of formality in UML-based development. IEEE Trans Soft Eng 31 (10):833–849

    Article  Google Scholar 

  • Brien MPO, Buckley J (2005) Modelling the information-seeking behaviour of programmers - an empirical approach. In: Proceedings of workshop on program comprehension (IWPC). IEEE Computer Society, pp 125–134

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web 7, (WWW7). Elsevier, Amsterdam, pp 107–117

    Google Scholar 

  • Buckner J, Buchta J, Petrenko M, Jripples V (2005) Rajlich: a tool for program comprehension during incremental change. In: Proceedings of international workshop on program comprehension, (IWPC). IEEE Computer Society, pp 149–152

  • Carver J, Jaccheri L, Morasca S, Shull F (2003) Issues in using students in empirical studies in software engineering education. In: Proceedings of international symposium on software metrics. IEEE Computer Society, Washington, DC, pp 239–250

    Google Scholar 

  • Chan W-K, Cheng H, Lo D (2012) Searching connected API subgraph via text phrases. In: Proceedings of symposium on the foundations of software engineering. SIGSOFT FSE. ACM, p 10

  • Chen K, Rajlich V (2000) Case study of feature location using dependence graph. In: Proc. of 8th international workshop on program comprehension, pp 241–247

  • Ciolkowski M, Muthig D, Rech J (2004) Using academic courses for empirical validation of software development processes. In: Proceedings of EUROMICRO Conference. IEEE Computer Society, Washington, DC, pp 354–361

    Google Scholar 

  • Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114 (3):494–509

    Article  MathSciNet  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn., Lawrence Earlbaum Associates, Hillsdale

  • Colosimo M, De Lucia A, Scanniello G, Tortora G (2009) Evaluating legacy system migration technologies through empirical studies. Inf Soft Technol 51 (12):433–447

    Article  Google Scholar 

  • Conover WJ (1998) Practical Nonparametric Statistics, 3rd edn. Wiley

  • Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41 (6):391–407

    Article  Google Scholar 

  • Devore JL, Farnum N (1999) Applied statistics for engineers and scientists. Duxbury

  • De Lucia A, Oliveto R, Tortora G (2009) Assessing ir-based traceability recovery tools through controlled experiments. Empirical Softw Eng 14 (1):57–92

    Article  Google Scholar 

  • Dit B, Revelle M, Poshyvanyk D (2013a) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Softw Engg 18(2):277–309. doi:10.1007/s10664-011-9194-4

  • Dit B, Revelle M, GethersM, Poshyvanyk D (2013b) Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process 25(1):53–95. doi:10.1002/smr.567

  • Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64

    Article  MATH  MathSciNet  Google Scholar 

  • Eaddy M, Aho AV, Antoniol G, Guéhéneuc Y-G (2008) Cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: Proceedings of international conference on program comprehension, ICPC ’08. IEEE Computer Society, Washington, DC, pp 53–62

    Google Scholar 

  • Ellis P (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings of international conference on software maintenance. IEEE Computer Society, Washington, DC, pp 351–360

    Google Scholar 

  • Gold N, Harman M, Li Z, Mahdavi K (2006) Allowing overlapping boundaries in source code using a search based approach to concept binding. In: Proceedings of international conference on software maintenance, (ICSM). IEEE Computer Society, Washington, DC, pp 310–319

    Google Scholar 

  • Grant S, Cordy JR, Skillicorn D, Automated concept location using independent component analysis. In: Proceedings of working conference on reverse engineering WCRE (2008). IEEE Computer Society, Washington, DC, pp 138–142

  • Gravino C, Risi M, Scanniello G, Tortora G (2012) Do professional developers benefit from design pattern documentation? A replication in the context of source code comprehension. In: Proceedings of conference on model driven engineering languages and systems, lecture notes in computer science, Springer, pp 185–201

  • Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2010) A search engine for finding highly relevant applications. In: Proceedings of international conference on software engineering, ICSE, vol 1, ACM, New York

  • Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of international conference on software engineering, ICSE. IEEE Press, Piscataway, pp 842–851

    Google Scholar 

  • Hannay J, Jørgensen M (2008) The role of deliberate artificial design elements in software engineering experiments. IEEE Trans Softw Eng 34 (2):242–259

    Article  Google Scholar 

  • Harman M, Gold N, Hierons RM, Binkley D (2002) Code extraction algorithms which unify slicing and concept assignment. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, Richmond, pp 11–21

    Google Scholar 

  • Hill E, Pollock L, Vijay-Shanker K (2007) Exploring the neighborhood with dora to expedite software maintenance. In: Proceedings of international conference on automated software engineering, ASE, ACM, New York

  • Inoue K, Yokomori R, Yamamoto T, Matsushita M, Kusumoto S (2005) Ranking significance of software components based on use relations. IEEE Trans Softw Eng 31 (3):213–225

    Article  Google Scholar 

  • Juristo N, Moreno A (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Englewood Cliffs

    Book  MATH  Google Scholar 

  • Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Soft Technol 49 (11–12):1073–1086

    Article  Google Scholar 

  • Kitchenham B, Al-Khilidar H, Babar M, Berry M, Cox K, Keung J, Kurniawati F, Staples M, Zhang H, Zhu L (2008) Evaluating guidelines for reporting empirical software engineering studies. Empir Soft Eng 13:97–121

    Article  Google Scholar 

  • Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Soft Eng 32 (12):971–987

    Article  Google Scholar 

  • Li Z (2009) Identifying high-level dependence structures using slice-based dependence analysis. In: 25th IEEE international conference on software maintenance (ICSM). Edmonton, pp 457–460. IEEE

  • Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, Washington, DC, pp 155–164

    Google Scholar 

  • Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52 (9):972–990

    Article  Google Scholar 

  • Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Marcus A, Haiduc S (2013) Text retrieval approaches for concept location in source code. In: Software engineering, volume 7171 of lecture notes in computer science. Springer, pp 126–158

  • Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of international conference on software engineering, ICSE. IEEE Computer Society, Portland, pp 124–135

    Google Scholar 

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of working conference on reverse engineering, WCRE’ 04. IEEE Computer Society, Washington, DC, pp 214–223

    Google Scholar 

  • McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of International Conference on Software Engineering, ICSE, ACM, New York

  • McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2012) Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans Soft Eng 38 (5):1069–1087

    Article  Google Scholar 

  • Moreno L, Bandara W, Haiduc S, Marcus A (2013) On the relationship between the vocabulary of bug reports and source code. In: International conference on software maintenance, ICSM, IEEE Computer Society

  • Ngomo ACN (2009) Low-bias extraction of domain-specific concepts. Ph.D Thesis

  • Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London

    Google Scholar 

  • Panichella A, McMillan C, Moritz E, Palmieri D, Oliveto R, Poshyvanyk D, De Lucia A (2013) When and how using structural information to improve ir-based traceability recovery. In: European conference on software maintenance and reengineering, CSMR. IEEE Computer Society, Washington, DC, pp 199– 208

    Google Scholar 

  • Petrenko M., Rajlich V. (2013) Concept location using program dependencies and information retrieval (depir). Inf Softw Technol 55 (4):651–659

    Article  Google Scholar 

  • Poshyvanyk D, Gethers M, Marcus A, Concept location using formal concept analysis and information retrieval (2013). ACM Trans Softw Eng Methodol 21 (4):23:1–23:34

    Google Scholar 

  • Poshyvanyk D., Marcus A (2007) Combining formal concept analysis with information retrieval for concept location in source code. In: Proceedings of the 15th ieee international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 37–48

    Google Scholar 

  • Puppin D, Silvestri F (2006) The social network of java classes. In: Proceedings of symposium on applied computing, (SAC), ACM, New York

  • Rajlich V, Wilde N (2002) The role of concepts in program comprehension. In: Proceedings of international workshop on program comprehension, IWP. IEEE Computer Society, Washington, DC, pp 271–278

    Book  Google Scholar 

  • Revelle M, Dit B, Poshyvanyk D (2010) Using data fusion and web mining to support feature location in software. In: Proceedings of international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 14–23

    Google Scholar 

  • Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2010) How developers’ experience and ability influence Web application comprehension tasks supported by UML stereotypes: a series of four experiments. IEEE Trans Soft Eng 36 (1):96–118

    Article  Google Scholar 

  • Robillard MP (2008) Topology analysis of software dependencies. ACM Trans Softw Eng Methodol 17 (4):18:1–18:36

    Article  Google Scholar 

  • Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys? In: Annual meeting of the Florida association of institutional research

  • Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw Hill, New York

    MATH  Google Scholar 

  • Scanniello G, D’Amico A, D’Amico C, D’Amico T (2010) Using the kleinberg algorithm and vector space model for software system clustering. In: International conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 180–189

    Google Scholar 

  • Scanniello G, Gravino C, Genero M, Cruz-Lemus JA, Tortora G (2014) On the impact of UML analysis models on source code comprehensibility and modifiability. ACM Trans Sofw Eng Meth 23 (2):13:1–13:26

    Google Scholar 

  • Scanniello G, Gravino C, Tortora G (2010) Investigating the role of UML in the software modeling and maintenance - a preliminary industrial survey. In: Proceedings of the international conference on enterprise information systems. pp 141–148

  • Scanniello G, Marcus A (2011) Clustering support for static concept location in source code. In: Proceedings of international conference on program comprehension, ICPC. IEEE Computer Society, Washington, DC, pp 1–10

    Google Scholar 

  • Seaman CB (2002) The information gathering strategies of software maintainers. In: Proceedings of the international conference on software maintenance, ICSM. IEEE Computer Society, Washington, DC, pp 141–149

    Google Scholar 

  • Shapiro S, Wilk M (1965) An analysis of variance test for normality. Biometrika 52 (3–4):591–611

    Article  MATH  MathSciNet  Google Scholar 

  • Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Soft Eng 13 (2):211–218

    Article  Google Scholar 

  • Sjoberg DIK, Hannay JE, Hansen O, Kampenes VB, Karahasanovic A, Liborg N, Rekdal AC (2005) A survey of controlled experiments in software engineering. IEEE Trans Soft Eng 31 (9):733–753

    Article  Google Scholar 

  • Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of international conference on software maintenance, ICSM. IEEE Computer Society, pp 213–222

  • Wang S, Lo D, Jiang L (2011) Code search via topic-enriched dependence graph matching. In: Working conference on reverse engineering, WCRE. IEEE Computer Society, pp 119–123

  • Wang S., Lo D., Xing Z., Jiang L. (2011) Concern localization using information retrieval: an empirical study on linux kernel. In: Proceedings of working conference on reverse engineering, WCRE. IEEE Computer Society, pp 92–96

  • Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer

  • Zhao W, Zhang L, Liu Y, Sun J, Yang F (2004) Sniafl: towards a static non-interactive approach to feature location. In: Proceedings of international conference on software engineering, ICSE. IEEE Computer Society, Washington, DC, pp 293–303

    Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: International conference on software engineering, ICSE. IEEE pp 14–24

Download references

Acknowledgments

We would like to thank Michele Brescia, who developed some of the software modules of the prototype used in the experimentation presented here, and Pasquale Ricciardi for helping us in the execution of the replication. We also thank the participants in the controlled experiments. Andrian Marcus was supported in part by grants from the US National Science Foundation: CCF-1017263 and CCF-0845706.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Scanniello.

Additional information

Communicated by: Ahmed E. Hassan

Appendix :

Appendix :

In this appendix, we summarize CLC (Scanniello and Marcus 2011), namely one of the baseline approaches we have selected for the investigation presented in this paper. The main steps are:

1. Corpus Creation. :

Each method results in one document in the corpus. All the comments and identifiers of a method are included in a document. Lead comments for the methods (if any) were also included in the corresponding document.

2. Corpus Normalization. :

The normalization is performed as for PR (see Section 3).

3. Corpus Indexing.:

A text retrieval engine is used to index the corpus. A numerical index associated with each document in the corpus is created. Later, this index is used to determine similarity measures between documents. We used VSM as text retrieval engine.

4. Computing Lexical Similarities Between Source Code Documents.:

It is a necessary step to perform the clustering. We use the cosine similarity and compute it between all the documents in the corpus.

5. Extracting Dependencies in Software.:

We represent the software system as a directed graph G=(V,E). V is the set of methods in the system, while E is the set of edges (i.e., ordered pair of elements of V). Each edge represents a directed relationship between two methods. We take a conservative approach in this work and only consider direct references between methods. That is, (m i ,m j )∈E, if there is a reference to the method m j in the body of the method m i . These dependencies have been identified by employing JRipples.

6. Clustering.:

x The graph G is is turned into a directed weighted graph G =(V,E,ω). In particular, the lexical similarity (i.e., cosine similarity) between two methods m i and m j is used as the weight (i.e., ω(m i ,m j )) of the edge (if present) between the nodes corresponding to these methods. According to how the graph G is built, we can assert that it summarizes both the structural and the lexical information of a subject system.

The BorderFlow clustering algorithm (Ngomo 2009) is applied to G’. The algorithm is a general-purpose graph clustering algorithm. It can be used for soft clustering (i.e., a node of an input graph can be in one or more clusters) and hard clustering (i.e., each node of an input graph can be in exactly one cluster). The hard clustering variant is used in CLC.

The idea behind BorderFlow is to maximize the flow from the border of each cluster to its inner nodes (i.e., the nodes within the cluster) while minimizing the flow from the cluster to the nodes outside of the cluster. Therefore, a cluster X is a subset of V such that a cluster maximizes the border flow ratio:

$$F(X) = \frac{\Omega(b(X), X)}{\Omega(b(X), n(X))} $$

where b(X) is the set of border nodes of X, while n(X) is a function used to identify the set of direct neighbors of X. Ω is a function that assigns the total weight of the edges from a subset of V to another one to these subsets (i.e., the flow between the first and the second subset). This function is computed as follows:

$${\Omega} (X, Y) = \sum\limits_{x \in X, y \in Y} \omega(x, y)$$

The algorithm iteratively selects nodes from n(X) and then inserts them in X until F(X) is maximized. The selection of the nodes is performed according to the following two steps:

  1. 1.

    Computing the set C(X) that will contain all the nodes uXV such that \(F(X \bigcup \{u\}) > F(X)\).

  2. 2.

    Selecting the candidates uC(X) to get the set C f (X). This set contains all the nodes u that maximize Ω(u,n(X)).

If \(F(X \cup C_{f}(X)) \geqslant F(X)\), then the nodes of C f (X) are added to the set X. The iterative selection of nodes concludes when |n(X)| equals to 0 for each set of nodes X identified by the BorderFlow algorithm. Each set of nodes forms a cluster.

7. Formulating a Query.:

Developers formulate textual queries based on the information they have about the change request. Most text retrieval engines do not rely on a predefined vocabulary or grammar; hence the queries do not need to be correct sentences. The query is normalized in same way as the corpus.

8. Ranking the Documents.:

In text retrieval-based approaches documents are retrieved based on their lexical similarity to the query. In CLC, the position of the methods in the ranked list is modified according to the clustering results. Specifically, for all the clusters c k C, where C is the set of identified clusters, we compute the similarity of each method m i c k with the query q as follows:

$$S(q, m_{i}, c_{k}) = \max\limits_{{m_{j} \in c_{k}}}\{sim(q, m_{j}) | i \neq j \}$$

where s i m(q,m j ) is the lexical similarity (or cosine similarity) between then query q and method m j . The methods are then sorted to get a new ranked list. From a practical perspective, CLC no longer retrieves individual methods but instead clusters of related methods (related both structurally and lexically). The retrieval order of the clusters is still based on the lexical similarity to the use’s query.

Differences and Similarity between CLC and the New Approach. Steps 1, 2, and 3 are similar to the steps 1, 2, and 3 of the new approach (see Section 3). Also, the steps Extracting Dependencies in Software and Formulating a Query are similar in both the approaches. The most relevant differences are concerned with the use of the BorderFlow clustering algorithm, the computation of the lexical similarity among pairs of methods that is not required in the new approach, and how the ranked list is obtained. Due to these differences, the new approach scales better on larger systems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scanniello, G., Marcus, A. & Pascale, D. Link analysis algorithms for static concept location: an empirical assessment. Empir Software Eng 20, 1666–1720 (2015). https://doi.org/10.1007/s10664-014-9327-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-014-9327-7

Keywords

Navigation