Experimenting with information retrieval methods in the recovery of feature-code SPL traces

Vale, Tassio; de Almeida, Eduardo Santana

doi:10.1007/s10664-018-9652-3

Experimenting with information retrieval methods in the recovery of feature-code SPL traces

Published: 10 November 2018

Volume 24, pages 1328–1368, (2019)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

498 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

Context

The information retrieval research provides alternatives to recover traces from existing software information. In the software product line (SPL) engineering context, further research on information retrieval methods is required to explore the existence of products’ source code and support the SPL adoption by providing traceability information.

Objective

This work evaluates the methods’ retrieval quality targeting the extraction of feature-code trace links in a set of variant-rich software development projects.

Method

We propose a research method comprising two experiments with five information retrieval methods (Classic vector, Latent semantic indexing, Neural network, Extended boolean and BM25), applied to forty-one projects. The SPLTrac suite automates the research operation, using the information retrieval methods results as input to perform a five-step quantitative data analysis procedure based on parametric and non-parametric statistical techniques. The quality measurement is expressed by four dependent variables: precision, recall, F-measure and execution time.

Results

With a homogeneous result for execution time between methods, there are discrepancies for the other metrics. While Extended boolean presents the best results for precision and F-measure, BM25 provides the greatest recall results.

Conclusions

Such evidence indicates there is not a dominant method in terms of retrieval quality, since they need further improvements to achieve better performance in all quality perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Listing 2

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Amal Alazba, Hamoud Aljamaan & Mohammad Alshayeb

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Applications of AI in classical software engineering

Article Open access 26 July 2020

Marco Barenkamp, Jonas Rebstadt & Oliver Thomas

Notes

Antenna variability implementation technology - http://antenna.sourceforge.net/
The SPL2Go repository is http://spl2go.cs.ovgu.de/
The Freecode index is http://freecode.com/
Relevant traces extractor module - https://github.com/tassiovale/spltrac/tree/1.3.0/features_extraction
https://www.python.org/
SPLTrac repository - https://github.com/tassiovale/spltrac
R project for statistical computing - https://www.r-project.org/
SPL metrics results https://goo.gl/CgUp9G
SPL hypothesis testing results: https://goo.gl/a3kY1C
SPL correlation results: https://goo.gl/U4SMcU
Regression analysis results: https://goo.gl/W3qoYS
Preprocessor metrics results: https://goo.gl/rtkbWB
Preprocessor hypothesis testing results: https://goo.gl/LNACyd
Preprocessor correlation results: https://goo.gl/3aJxX2
Comparison results: https://goo.gl/LFknPx
Effect size estimation results: https://goo.gl/n3Jr7N
NLTK - https://www.nltk.org/

References

Apel S, Beyer D (2011) Feature cohesion in software product lines: an exploratory study. In: Proceedings of the international conference on software engineering, ser. ICSE ’11. ACM, New York, pp 421–430
Apel S, Kastner C, Lengauer C (2013) Language-independent and automated software composition: the featurehouse experience. IEEE Trans Softw Eng 39(1):63–79
Article Google Scholar
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval - the concepts and technology behind search, 2nd edn. Pearson Education Ltd, Harlow
Google Scholar
Basili VR, Rombach HD (1988) The tame project: towards improvement-oriented software environments. IEEE Trans Softw Eng 14(6):758–773
Article Google Scholar
Batory D, Sarvela JN, Rauschmayer A (2004) Scaling step-wise refinement. IEEE Trans Softw Eng 30(6):355–371
Article Google Scholar
Bayer J, Widen T (2002) Introducing traceability to product lines. In: Revised Papers from the 4th international workshop on software product-family engineering, ser. PFE ’01. Springer, London, pp 409–416
Berger T, Rublack R, Nair D, Atlee JM, Becker M, Czarnecki K, Wasowski A (2013) A survey of variability modeling in industrial practice. In: Proceedings of the international workshop on variability modelling of software-intensive systems, ser. VaMoS ’13. ACM, New York, pp 7:1–7:8
Borg M, Runeson P (2013) IR in software traceability: from a bird’s eye view. In: Proceedings of the IIEEE international symposium on empirical software engineering and measurement, ser. ESEM ’13, pp 243–246
Borg M, Runeson P, Ardö A (2014) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng 19(6):1565–1616
Article Google Scholar
Capilla R, Bosch J, Kang K (2013) Systems and software variability management, concepts, tools and experiences. Springer-Verlag Berlin Heidelberg Inc, Berlin
Book Google Scholar
Cleland-Huang J, Gotel O, Zisman A (2012) Software and systems traceability. Springer Publishing Company, Berlin. Incorporated
Book Google Scholar
Cleland-Huang J, Gotel OCZ, Huffman Hayes J, Mäder P., Zisman A (2014) Software traceability: trends and future directions. In: Proceedings of the on future of software engineering, ser. FOSE ’14. ACM, New York, pp 55–69
Clements P, Northrop L (2001) Software product lines: practices and patterns. Addison-Wesley Longman Publishing Co. Inc, Boston
Google Scholar
Deelstra S, Sinnema M, Bosch J (2009) Variability assessment in software product families. Inf Softw Technol 51(1):195–218
Article Google Scholar
DeGroot MH, Schervish MJ (2002) Probability and statistics, 3rd edn. Addison-Wesley, Reading
Google Scholar
Furnas GW, Deerwester S, Dumais ST, Landauer TK, Harshman RA, Streeter LA, Lochbaum KE (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, ser. SIGIR ’88. ACM, New York, pp 465–480
Gelman A (2005) Analysis of variance? Why it is more important than ever. Ann Stat 33(1):1–53
Article MathSciNet MATH Google Scholar
Ossher H, Tarr P (2002) Multi-dimensional separation of concerns and the hyperspace approach. In: Akşit M. (ed) Software architectures and component technology. Springer, Boston, pp 293–323
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32 (1):4–19
Article Google Scholar
Hedges L, Olkin I, Hedges L (1985) Statistical methods for meta-analysis. Academic Press
Hochberg Y (1988) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800–802
Article MathSciNet MATH Google Scholar
Jirapanthong W, Zisman A (2005) Supporting product line development through traceability. In: Proceedings of the Asia-Pacific software engineering conference, ser. APSEC ’05. IEEE Computer Society, Washington, pp 506–514
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) Systematic review: a systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086
Article Google Scholar
Kang K, Cohen SG, Hess JA, Novak WE, Peterson AS (1990) Feature-oriented domain analysis (foda) feasibility study, Carnegie-Mellon University Software Engineering Institute, Tech. Rep.
Kastner C (2007) CIDE: decomposing legacy applications into features. In: Proceedings of the international software product line conference. Second Volume (Workshops), ser, SPLC ’07, pp 149–150
Keenan E, Czauderna A, Leach G, Cleland-Huang J, Shin Y, Moritz E, Gethers M, Poshyvanyk D, Maletic J, Hayes JH, Dekhtyar A, Manukian D, Hossein S, Hearn D (2012) TraceLab: an experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions. In: 34th International Conference on Software Engineering, ser. ICSE ’12
Kernighan BW, Ritchie DM (1988) The C programming language. Prentice Hall Press, Upper Saddle River
MATH Google Scholar
Klock S, Gethers M, Dit B, Poshyvanyk D (2011) Traceclipse: an eclipse plug-in for traceability link recovery and management. In: Proceedings of the international workshop on traceability in emerging forms of software engineering, ser. TEFSE ’11. ACM, New York, pp 24–30
Kolesnikov S, Roth J, Apel S (2014) On the relation between internal and external feature interactions in feature-oriented product lines: a case study. In: Proceedings of the 6th international workshop on feature-oriented software development, ser. FOSD ’14. ACM, New York, pp 1–8
Kong L, Li J, Li Y, Yang Y, Wang Q (2009) A requirement traceability refinement method based on relevance feedback. In: Proceedings of the international conference on software engineering and knowledge engineering, ser. SEKE ’09
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Article MATH Google Scholar
Liebig J, Apel S, Lengauer C, Kästner C., Schulze M (2010) An analysis of the variability in forty preprocessor-based software product lines. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering - volume 1, ser. ICSE ’10. ACM, New York, pp 105–114
Lindvall M, Sandahl K (1996) Practical implications of traceability. Software: Practice and Experience 26(10):1161–1180
Google Scholar
Lopez-Herrejon RE, Linsbauer L, Galindo JA, Parejo JA, Benavides D, Segura S, Egyed A (2015) An assessment of search-based techniques for reverse engineering feature models. J Syst Softw 103:353– 369
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics 18(1):50–60
Article MathSciNet MATH Google Scholar
Metzger A, Pohl K (2014) Software product line engineering and variability management: achievements and challenges. In: Proceedings of the on future of software engineering, ser. FOSE ’14. ACM, New York, pp 70–84
Montgomery DC, Peck EA, Vining GG (2007) Introduction to linear regression analysis, solutions manual (wiley series in probability and statistics). Wiley-Interscience, New York
Google Scholar
Parvathy AG, Vasudevan BG, Balakrishnan R (2008) A comparative study of document correlation techniques for traceability analysis. In: Proceedings of the international conference on enterprise information systems, ser. ICEIS ’08, pp 64–69
Pohl K, Böckle G., van der Linden FJ (2005) Software product line engineering: foundations, principles and techniques. Springer-Verlag New York Inc., Secaucus
Book MATH Google Scholar
Riebisch M, Philippow I (2001) Evolution of product lines using traceability. In: Proceedings of the workshop on engineering complex object-oriented systems for evolution
Robertson S, Zaragoza H, Taylor M (2004) Simple bm25 extension to multiple weighted fields. In: Proceedings of the ACM international conference on information and knowledge management, ser. CIKM ’04. ACM, New York, pp 42–49
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, ser. SIGIR ’94. Springer-Verlag New York, Inc, New York, pp 232–241
Robertson SE, Walker S, Jones S, Hancock-Beaulieu M, Gatford M (1994) Okapi at TREC-3. In: Proceedings of the text retrieval conference, ser. TREC-3. pp 109–126
Salton G, Fox EA, Wu H (1983) Extended boolean information retrieval. Commun ACM 26(11):1022–1036
Article MathSciNet MATH Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Salton G, Yang CS (1973) On the specification of term values in automatic indexing. J Doc 29(4):351–372
Article Google Scholar
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Article MathSciNet MATH Google Scholar
Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in empirical software engineering. Empir Softw Eng 13(2):211–218
Article Google Scholar
Sobernig S, Apel S, Kolesnikov S, Siegmund N (2016) Quantifying structural attributes of system decompositions in 28 feature-oriented software product lines. Empir Softw Eng 21(4):1670–1705
Article Google Scholar
Spanoudakis G, Zisman A (2004) Software traceability: a roadmap. In: Handbook of software engineering and knowledge engineering. World Scientific Publishing, pp 395–428
Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99–114
Article MathSciNet Google Scholar
Vale T, de Almeida ES, Alves V, Kulesza U, Niu N, de Lima R (2017) Software product lines traceability: a systematic mapping study. Inf Softw Technol 84 (C):1–18
Article Google Scholar
Welch BL (1947) The generalization of ‘student’s problem when several different population variances are involved. Biometrika 34(1/2):28–35
Article MathSciNet MATH Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1 (6):80–83
Article Google Scholar
Wilkinson R, Hingston P (1991) Using the cosine measure in a neural network for document retrieval. In: Proceedings of the annual international ACM SIGIR conference on research and development in information retrieval, ser. SIGIR ’91. ACM, New York, pp 202–210
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Publishing Company, Berlin. Incorporated
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center of Exact Sciences and Technology, Federal University of Recôncavo da Bahia, Rua Rui Barbosa, 710, Centro Cruz das Almas, Salvador, BA, Brazil
Tassio Vale
Computer Science Department, Federal University of Bahia, Av. Adhemar de Barros, S/N, Ondina, Salvador, BA, Brazil
Eduardo Santana de Almeida

Authors

Tassio Vale
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Santana de Almeida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tassio Vale.

Additional information

Communicated by: Paul Grünbacher

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Statistical Results

1.1 A.1 SPL2Go Dataset Results

Table 8 SPL context - precision results (%)

Experimenting with information retrieval methods in the recovery of feature-code SPL traces

Abstract

Context

Objective

Method

Results

Conclusions

Access this article

Similar content being viewed by others

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

How different are different diff algorithms in Git?

Applications of AI in classical software engineering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix A: Statistical Results

Appendix A: Statistical Results

1.1 A.1 SPL2Go Dataset Results

1.2 A.2 Preprocessor Dataset Results

1.3 A.3 Normality Test

1.4 A.4 Hypothesis Testing

1.5 A.5 Strength of Association Between Variables

1.6 A.6 Cross-Sample Analysis

1.7 A.7 Effect Size Estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation