Ontological Evolutionary Encoding to Bridge Machine Learning and Conceptual Models: Approach and Industrial Evaluation

Marcén, Ana C.; Pérez, Francisca; Cetina, Carlos

doi:10.1007/978-3-319-69904-2_37

Ana C. Marcén^17,18,
Francisca Pérez¹⁸ &
Carlos Cetina¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10650))

Included in the following conference series:

International Conference on Conceptual Modeling

1706 Accesses
4 Citations

Abstract

In this work, we propose an evolutionary ontological encoding approach to enable Machine Learning techniques to be used to perform Software Engineering tasks in models. The approach is based on a domain ontology to encode a model and on an Evolutionary Algorithm to optimize the encoding. As a result, the encoded model that is returned by the approach can then be used by Machine Learning techniques to perform Software Engineering tasks such as concept location, traceability link retrieval, reuse, impact analysis, etc. We have evaluated the approach with an industrial case study to recover the traceability link between the requirements and the models through a Machine Learning technique (RankBoost). Our results in terms of recall, precision, and the combination of both (F-measure) show that our approach outperforms the baseline (Latent Semantic Indexing). We also performed a statistical analysis to assess the magnitude of the improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

A survey on search-based model-driven engineering

Article 01 April 2017

Ontology-Based Optimization for Systems Engineering

OntoReq: An Ontology Focused Collective Knowledge Approach for Requirement Traceability Modelling

Notes

1.
www.caf.net/en.

References

Apache opennlp: Toolkit for the processing of natural language text. https://opennlp.apache.org/. Accessed Apr 2017
Efficient Java matrix library. http://ejml.org/. Accessed Apr 2017
The English (porter2) stemming algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html. Accessed Apr 2017
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)
Article Google Scholar
Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014)
Article Google Scholar
Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Softw. Eng. 18(3), 594–623 (2013)
Article Google Scholar
B Le, T.D., Lo, D., Le Goues, C., Grunske, L.: A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 177–188. ACM (2016)
Google Scholar
Bianchini, M., Maggini, M., Jain, L.C.: Handbook on Neural Information Processing. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36657-4
Book Google Scholar
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 129–136. ACM, New York (2007)
Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Dang, V.: The lemur project - wiki - ranklib (2013). http://sourceforge.net/p/lemur/wiki/RankLib/. Accessed Apr 2017
De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Enhancing an artefact management system with traceability recovery features. In: Proceedings of 20th IEEE International Conference on Software Maintenance, pp. 306–315. IEEE (2004)
Google Scholar
Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for Java). http://watchmaker.uncommons.org/. Accessed Apr 2017
Eaddy, M., Aho, A., Murphy, G.C.: Identifying, assigning, and quantifying crosscutting concerns. In: Proceedings of the First International Workshop on Assessment of Contemporary Modularization Techniques, p. 2 (2007)
Google Scholar
Eaddy, M., Aho, A.V., Antoniol, G., Guéhéneuc, Y.G.: Cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: ICPC 2008 Conference, pp. 53–62. IEEE (2008)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)
MathSciNet MATH Google Scholar
Haiduc, S., Bavota, G., Oliveto, R., De Lucia, A., Marcus, A.: Automatic query performance assessment during the retrieval of software artifacts. In: International Conference on Automated Software Engineering, pp. 90–99. ACM (2012)
Google Scholar
Hirzel, A.H., Le Lay, G., Helfer, V., Randin, C., Guisan, A.: Evaluating the ability of habitat suitability models to predict species presences. Ecol. Model. 199(2), 142–152 (2006)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683
Chapter Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, Stanford, CA, vol. 14, pp. 1137–1145 (1995)
Google Scholar
Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223, November 2004
Google Scholar
Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th International Conference on Software Engineering, pp. 125–135. IEEE (2003)
Google Scholar
Navot, A., Shpigelman, L., Tishby, N., Vaadia, E.: Nearest neighbor based feature selection for regression and its application to neural activity. Adv. Neural Inf. Process. Syst. 18, 995 (2006)
Google Scholar
Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007)
Article Google Scholar
Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474, November 2013
Google Scholar
Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep. 14(1), 16–29 (2009)
Article Google Scholar
Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of CVL. In: Bosch, J., Lee, J. (eds.) SPLC 2010. LNCS, vol. 6287, pp. 106–120. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15579-6_8
Chapter Google Scholar
Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)
Google Scholar
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer Science & Business Media, Heidelberg (2012). doi:10.1007/978-3-642-29044-2
Book MATH Google Scholar
Wolf, L., Martin, I.: Robust boosting for learning from few examples. In: Computer Vision and Pattern Recognition, vol. 1, pp. 359–364. IEEE (2005)
Google Scholar
Xuan, J., Monperrus, M.: Learning to combine multiple ranking metrics for fault localization. In: Proceedings of the 30th International Conference on Software Maintenance and Evolution (2014)
Google Scholar
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)
Article Google Scholar
Ye, X., Bunescu, R., Liu, C.: Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 689–699. ACM (2014)
Google Scholar
Ye, X., Bunescu, R., Liu, C.: Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans. Softw. Eng. 42(4), 379–402 (2016)
Article Google Scholar
Zisman, A., Spanoudakis, G., Pérez-Miñana, E., Krause, P.: Tracing software requirements artifacts. In: Software Engineering Research and Practice, pp. 448–455 (2003)
Google Scholar

Download references

Acknowledgments

This work has been developed with the financial support of the Spanish Ministry of Economy and Competitiveness under the project TIN2016-80811-P and co-financed with ERDF. We also thank both ITEA3 15010 REVaMP\(^2\) Project and MINECO TIN2015-64397-R VARIAMOS Project.

Author information

Authors and Affiliations

Centro de Investigación en Métodos de Producción de Software, Universitat Politècnica de València, Camino de Vera, s/n, 46022, Valencia, Spain
Ana C. Marcén
SVIT Research Group, Universidad San Jorge, Autovía A-23 Zaragoza-Huesca Km. 299, 50830, Zaragoza, Spain
Ana C. Marcén, Francisca Pérez & Carlos Cetina

Authors

Ana C. Marcén
View author publications
You can also search for this author in PubMed Google Scholar
Francisca Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Cetina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana C. Marcén .

Editor information

Editors and Affiliations

University of Klagenfurt, Klagenfurt, Austria
Heinrich C. Mayr
Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Giancarlo Guizzardi
Victoria University of Wellington, Wellington, New Zealand
Hui Ma
Valencia University of Technology, Valencia, Spain
Oscar Pastor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marcén, A.C., Pérez, F., Cetina, C. (2017). Ontological Evolutionary Encoding to Bridge Machine Learning and Conceptual Models: Approach and Industrial Evaluation. In: Mayr, H., Guizzardi, G., Ma, H., Pastor, O. (eds) Conceptual Modeling. ER 2017. Lecture Notes in Computer Science(), vol 10650. Springer, Cham. https://doi.org/10.1007/978-3-319-69904-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-69904-2_37
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69903-5
Online ISBN: 978-3-319-69904-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics