Skip to main content
Log in

Collaborative feature location in models through automatic query expansion

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Collaboration with other people is a major theme in the information-seeking process. However, most existing works that address the location of features during the maintenance or evolution of software do not support collaboration, or they are focused on code as the main software artifact. Hence, collaborative feature location in models has not enjoyed much attention to date. In this work, we address this concern by proposing an approach, CoFLiM, that enables the collaboration of several domain experts in order to locate the model fragment of a target feature. CoFLiM uses the feature descriptions of the domain experts and their self-rated confidence level to automatically reformulate the relevant feature descriptions in a single query. This query guides the evolutionary algorithm of our approach that finds the model fragment of the feature being located. We evaluate CoFLiM in a real-world case study from our industrial partner. We analyze the impact of CoFLiM in terms of recall, precision, and the F-measure. Moreover, we compare the reformulation of CoFLiM with four baselines. We also perform a statistical analysis to show that the impact of the results is significant. Our results show that collaboration pays off in the location of features in models. The results also show that the self-rated confidence level can be used to locate features in models. Finally, the results show that there are no significant improvements when more than three domain experts are involved, which is relevant in those industrial contexts where the availability of domain experts is scarce.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. www.caf.net/en.

References

  • Ambreen, T., Ikram, N., Usman, M., Niazi, M.: Empirical research in requirements engineering: trends and opportunities. Requir. Eng. 2, 1–33 (2016). https://doi.org/10.1007/s00766-016-0258-2

    Google Scholar 

  • Apache opennlp: Toolkit for the processing of natural language text (2016). https://opennlp.apache.org/

  • Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014). https://doi.org/10.1002/stvr.1486

    Article  Google Scholar 

  • Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013)

    Article  Google Scholar 

  • Arens, Y., Knoblock, C.A., Shen, W.-M.: Query reformulation for dynamic information integration. J. Intell. Inf. Syst. 6(2), 99–130 (1996). https://doi.org/10.1007/BF00122124

    Article  Google Scholar 

  • Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 95–104. ACM (2010)

  • Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 491–498. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390419

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  • Boyd-Graber, J., Hu, Y., Mimno, D.: Applications of topic models. Found. Trends®in Inf. Retr. 11(2–3), 143–296 (2017). https://doi.org/10.1561/1500000030

    Article  Google Scholar 

  • Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). https://doi.org/10.1145/2071389.2071390

    Article  MATH  Google Scholar 

  • Cavalcanti, Y.C., do Carmo Machado, I., Neto, P.A., da Mota S., de Almeida, E.S., de Lemos Meira, S.R.: Combining rule-based and information retrieval techniques to assign software change requests. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 325–330. ACM, New York (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642964

  • Clements, P.C., Northrop, L.: Software Product Lines: Practices and Patterns. SEI Series in Software Engineering. Addison-Wesley, Boston (2001)

    Google Scholar 

  • Cliff, N.: Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 114(3), 494 (1993)

    Article  Google Scholar 

  • Cliff, N.: Ordinal Methods for Behavioral Data Analysis. Psychology Press, London (1996)

    Google Scholar 

  • de Oliveira Barros, M., Dias Neto, A.C.: Threats to validity in search-based software engineering empirical studies. Technical Report 0006/2011 (2011)

  • Dietrich, T., Cleland-Huang, J., Shin, Y.: Learning effective query transformations for enhanced requirements trace retrieval. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 586–591 (2013). https://doi.org/10.1109/ASE.2013.6693117

  • Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Evol. Process 25(1), 53–95 (2013)

    Article  Google Scholar 

  • Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., Mirakhorli, M.: On-demand feature recommendations derived from mining public product descriptions. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pp. 181–190 (2011). ISBN: 978-1-4503-0445-0. https://doi.org/10.1145/1985793.1985819

  • Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java) (2016). http://watchmaker.uncommons.org/. Accessed 2 Dec 2016

  • Efficient java matrix library (2016). http://ejml.org/. Accessed 2 Dec 2016

  • English (porter2) stemming algorithm (2017). http://snowball.tartarus.org/algorithms/english/stemmer.htm. Accessed 2 Dec 2016

  • Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Building software product lines from conceptualized model patterns. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 46–55 (2015a). https://doi.org/10.1145/2791060.2791085

  • Font, J., Ballarín, M., Haugen, Ø., Cetina, C.: Automating the variability formalization of a model family by means of common variability language. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 411–418 (2015b). https://doi.org/10.1145/2791060.2793678

  • Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: Proceedings of the 15th International Conference on Software Reuse: Bridging with Social-Awareness (2016a)

  • Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems (2016b)

  • Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Achieving feature location in families of models through the use of search-based software engineering. IEEE Trans. Evol. Comput. 99, 1 (2017). https://doi.org/10.1109/TEVC.2017.2751100

    Google Scholar 

  • Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in IR-based concept location. In: ICSM, IEEE Computer Society, pp. 351–360. (2009). ISBN: 978-1-4244-4897-5. https://doi.org/10.1109/TEVC.2017.2751100. Accessed 2 Dec 2016

  • Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)

    Google Scholar 

  • Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 842–851. IEEE Press, Piscataway (2013). ISBN: 978-1-4673-3076-3

  • Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: Proceedings of the 13th International Conference on Fundamental Approaches to Software Engineering, FASE’10, pp. 1–12. Springer, Berlin (2010). ISBN: 3-642-12028-8, 978-3-642-12028-2

  • Harman, M., Jia, Y., Krinke, J., Langdon, W.B., Petke, J., Zhang, Y.: Search based software engineering for software product line engineering: a survey and directions for future work. In: Proceedings of the 18th International Software Product Line Conference—volume 1, SPLC ’14, pp. 5–18. ACM, New York (2014). ISBN: 978-1-4503-2740-4. https://doi.org/10.1145/2648511.2648513

  • Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, Gø.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Software Product Line Conference, 2008. SPLC ’08. 12th International, pp. 139–148 (2008). https://doi.org/10.1109/SPLC.2008.25

  • Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp. 232–242. IEEE Computer Society, Washington (2009). ISBN: 978-1-4244-3453-4. https://doi.org/10.1109/ICSE.2009.5070524

  • Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (1999)

  • Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference, vol. 2, pp. 36–43 (2014). ISBN: 978-1-4503-2739-8. https://doi.org/10.1145/2647908.2655965

  • Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)

  • Kimmig, M., Monperrus, M., Mezini, M.: Querying source code with natural language. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 376–379 (2011). ISBN: 978-1-4577-1638-6. https://doi.org/10.1109/ASE.2011.6100076

  • Kotelyanskii, A., Kapfhammer, G.M.: Parameter tuning for search-based test-data generation revisited: support for previous results. In: 2014 14th International Conference on Quality Software, pp. 79–84 (2014). https://doi.org/10.1109/QSIC.2014.43

  • Kumaran, G., Allan, J.: Effective and efficient user interaction for long queries. In: SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390339. http://portal.acm.org/citation.cfm?id=1390339

  • Kumaran, G., Carvalho, V.R.: Reducing long queries using query quality predictors. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 564–571. ACM, New York (2009). ISBN: 978-1-60558-483-6. https://doi.org/10.1145/1571941.1572038

  • Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)

    Article  Google Scholar 

  • Lapeña, R., Pérez, F., Cetina, C.: On the influence of models-to-natural-language transformation in traceability link recovery among requirements and conceptual models. In: ER FORUM 2017 (2017)

  • Lopez-Herrejon, R.E., Ferrer, J., Chicano, F., Linsbauer, L., Egyed, A., Alba, E.: A hitchhiker’s guide to search-based software engineering for software product lines. CoRR, abs/1406.2823 (2014). http://arxiv.org/abs/1406.2823

  • Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103(C), 353–369 (2015). ISSN 0164-1212

    Article  Google Scholar 

  • Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549 (2015). https://doi.org/10.1109/SANER.2015.7081874

  • Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: Codehow: effective code search based on API understanding and extended Boolean model. In: Automated Software Engineering (ASE2015) (2015)

  • Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, WCRE ’04, pp. 214–223. Washington (2004). ISBN: 0-7695-2243-2. http://dl.acm.org/citation.cfm?id=1038267.1039053

  • Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015a). https://doi.org/10.1145/2791060.2791086

  • Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015b). https://doi.org/10.1109/ASE.2015.44

  • Morris, M.R.: Interfaces for collaborative exploratory web search: motivations and directions for multi-user designs. In: CHI 2007 Workshop on Exploratory Search and HCI (2007)

  • Pérez, F., Marcén, A.C., Lapeña, R., Cetina, C.: Introducing collaboration for locating features in models: approach and industrial evaluation. In: Proceedings of the 25th International Conference on Cooperative Information Systems, CoopIS, pp. 114–131 (2017). https://doi.org/10.1007/978-3-319-69462-7_9

  • Pérez, F., Font, J., Arcega, L., Cetina, C.: Automatic query reformulations for feature location in a model-based family of software products. Data Knowl. Eng. (2018). ISSN: 0169-023X. https://doi.org/10.1016/j.datak.2018.06.001

  • Rivas, A.R., Iglesias, E.L., Borrajo, L.: Study of query expansion techniques and their application in the biomedical information retrieval. Sci. World J. 2014 (2014). https://doi.org/10.1155/2014/132158

  • Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J.: Appropriate statistics for ordinal level data: Should we really be using t-test and cohensd for evaluating group differences on the NSSE and other surveys. In: Annual Meeting of the Florida Association of Institutional Research, pp. 1–33 (2006)

  • Rubin, J., Chechik, M.: A survey of feature location techniques. In: Reinhartz-Berger, I., Sturm, A., Clark, T., Cohen, S., Bettin, J. (eds.) Domain Engineering, pp. 29–58. Springer, Berlin (2013)

    Chapter  Google Scholar 

  • Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)

    Google Scholar 

  • Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)

    MATH  Google Scholar 

  • Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013). https://doi.org/10.1109/ASE.2013.6693104

  • Shah, C.: Collaborative information seeking: a literature review. In: Woodsworth, A. (ed.) Advances in Librarianship, vol. 32, pp. 3–33. Emerald Group Publishing Limited (2010). ISBN: 978-1-84950-978-7. https://doi.org/10.1108/S0065-2830(2010)0000032004

  • Sisman, B., Kak, A.C.: Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, San Francisco, CA, USA, May 18-19, 2013, pp. 309–318 (2013). https://doi.org/10.1109/MSR.2013.6624044

  • Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), pp. 44–53 (2014). https://doi.org/10.1109/CSMR-WCRE.2014.6747213

  • Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of Mcgraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000). https://doi.org/10.3102/10769986025002101

    Google Scholar 

  • Wang, S., Lo, D., Jiang, L.: Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 677–682 (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642947

  • Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013). ISBN: 978-1-4503-2325-3. https://doi.org/10.1145/2499777.2500708

  • Yang, J., Tan, L.: Inferring semantically related words from software context. In: Mining Software Repositories (MSR), pp. 161–170 (2012). https://doi.org/10.1109/MSR.2012.6224276

  • Zeng, Q.T., Redd, D., Rindflesch, T., Nebeker, J.: Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 1050. American Medical Informatics Association (2012)

  • Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012). https://doi.org/10.1109/APSEC.2012.76

  • Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011). ISBN: 978-0-7695-4487-8. https://doi.org/10.1109/SPLC.2011.24

  • Zou, Y., Ye, T., Lu, Y., Mylopoulos, J., Zhang, L.: Learning to rank for question-oriented software text retrieval. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), pp. 1–11 (2015). ISBN: 978-1-5090-0025-8. URL http://dblp.uni-trier.de/db/conf/kbse/ase2015.html#ZouYLM015

Download references

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the Project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisca Pérez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pérez, F., Font, J., Arcega, L. et al. Collaborative feature location in models through automatic query expansion. Autom Softw Eng 26, 161–202 (2019). https://doi.org/10.1007/s10515-019-00251-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-019-00251-9

Keywords

Navigation