An approach for bug localization in models using two levels: model and metamodel

Arcega, Lorena; Font, Jaime; Haugen, Øystein; Cetina, Carlos

doi:10.1007/s10270-019-00727-y

An approach for bug localization in models using two levels: model and metamodel

Regular Paper
Published: 14 March 2019

Volume 18, pages 3551–3576, (2019)
Cite this article

Software and Systems Modeling Aims and scope Submit manuscript

Lorena Arcega^1,2,
Jaime Font^1,2,
Øystein Haugen³ &
…
Carlos Cetina¹

473 Accesses
13 Citations
2 Altmetric
Explore all metrics

Abstract

Bug localization is a common task in software engineering, especially when maintaining and evolving software products. This paper introduces a bug localization approach that, in contrast to existing source code approaches, takes advantage of domain information found in the model and the metamodel. Throughout this paper, we present an approach for bug localization in models (BLiM2) that applies the source code ideas for bug localization (textual similarity to the bug description and the Defect Localization Principle) and takes advantage of the domain information from the model and the metamodel. We evaluated our approach in BSH, a real-world industrial case study in the induction hob domain measuring the results in terms of recall, precision, the combination of both the F-measure and the Matthews correlation coefficient. Our study shows that our BLiM2 approach, which combines information from the model and the metamodel for the textual similarity and differentiates between the timespan from the model and metamodel, provides the best results in this work. We also performed a statistical analysis to provide evidence of the significance of the results. The values obtained show that there exist significant differences in the performance of the best BLiM2 approach with the approach used by our industrial partner. Finally, the effect size statistics reveals that the best BLiM2 approach obtains better results in the 78% of the times in the worst case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 12

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Test case selection and prioritization using machine learning: a systematic literature review

Article 14 December 2021

Rongqi Pan, Mojtaba Bagherzadeh, … Lionel Briand

Notes

Although we have performed the entire statistical significance analysis, here we decide to show only the combinations with the algorithm that obtained the best results. Table 5 shows the entire table with the sixty-six combinations (at the end of the paper).
Although we have performed the entire size effect statistics analysis, here we decide to show only the combinations with the algorithm that obtained the best results. Table 6 shows the entire table with the sixty-six combinations (at the end of the paper).

References

Apache opennlp: Toolkit for the processing of natural language text. http://opennlp.apache.org/ (2010). Online; Accessed 04 April 2017
Alves, E., Gligoric, M., Jagannath, V., d’Amorim, M.: Fault-localization using dynamic slicing and change impact analysis. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 520–523. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/ASE.2011.6100114
Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of models at run-time traces in dynamic feature location. In: Modelling Foundations and Applications - 13th European Conference, ECMFA 2017, Held as Part of STAF 2017, Marburg, Germany, July 19–20, 2017, Proceedings (2017)
Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of modification timespan weightings in the location of bugs in models. In: Proceedings of the 26th International Conference on Information Systems Development, ISD 2017, Larnaca, Cyprus, September 6–8, 2017 (2017)
Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014). https://doi.org/10.1002/stvr.1486
Article Google Scholar
Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013). https://doi.org/10.1007/s10664-013-9249-9
Article Google Scholar
Bencomo, N., Hallsteinsen, S., de Almeida, E.Santana: A view of the dynamic software product line landscape. Computer 45(10), 36–41 (2012). https://doi.org/10.1109/MC.2012.292
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4/5), 993–1022 (2003)
MATH Google Scholar
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLOS ONE 12(6), 1–17 (2017). https://doi.org/10.1371/journal.pone.0177678
Article Google Scholar
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, Hoboken (1999)
Google Scholar
de Oliveira Barros, M., Dias-Neto, A.C.: 0006/2011-threats to validity in search-based software engineering empirical studies. RelaTe-DIA 5(1), (2011)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. Trans. Evol. Comput. 6(2), 182–197 (2002). https://doi.org/10.1109/4235.996017
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Maint. Evol. Res. Pract. (2011)
Dyer, D.W.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java). http://watchmaker.uncommons.org/ (2006). Online; Accessed 04 April 2017
Efficient Java Matrix Library. https://ejml.org (2016). Online; Accessed 04 April 2017
Font, J., Arcega, L., Haugen, O., Cetina, C.: Leveraging variability modeling to address metamodel revisions in model-based software product lines. Comput. Lang. Syst. Struct. 48, 20–38 (2017). https://doi.org/10.1016/j.cl.2016.08.003
Google Scholar
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: 15th International Conference on Software Reuse, ICSR 2016, Limassol, Cyprus (2016)
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, MODELS ’16, pp. 272–282. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2976767.2976789
Garca, S., Fernndez, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010). https://doi.org/10.1016/j.ins.2009.12.010
Article Google Scholar
Gong, L., Lo, D., Jiang, L., Zhang, H.: Interactive fault localization leveraging simple user feedback. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp. 67–76 (2012). https://doi.org/10.1109/ICSM.2012.6405255
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)
Google Scholar
Hassan, A.E., Holt, R.C.: The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), pp. 263–272 (2005). https://doi.org/10.1109/ICSM.2005.91
Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, G.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Proceedings of the 2008 12th International Software Product Line Conference, SPLC ’08, pp. 139–148. IEEE Computer Society, Washington, DC, USA (2008). https://doi.org/10.1109/SPLC.2008.25
Hoang, T.V., Oentaryo, R.J., Le, T.B., Lo, D.: Network-clustered multi-modal bug localization. IEEE Trans. Softw. Eng. (2018). https://doi.org/10.1109/TSE.2018.2810892
Google Scholar
Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference: Volume 2, pp. 36–43 (2014). https://doi.org/10.1145/2647908.2655965
Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012). https://doi.org/10.1002/smr.530
Article Google Scholar
Kim, D., Tao, Y., Kim, S., Zeller, A.: Where should we fix this bug? A two-phase recommendation model. IEEE Trans. Softw. Eng. 39(11), 1597–1610 (2013)
Article Google Scholar
Kusumoto, S., Nishimatsu, A., Nishie, K., Inoue, K.: Experimental evaluation of program slicing for fault localization. Empir. Softw. Eng. 7(1), 49–76 (2002). https://doi.org/10.1023/A:1014823126938
Article MATH Google Scholar
Lam, A.N., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 218–229 (2017). https://doi.org/10.1109/ICPC.2017.24
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998). https://doi.org/10.1080/01638539809545028
Article Google Scholar
Le, T.D.B., Oentaryo, R.J., Lo, D.: Information retrieval and spectrum based bug localization: Better together. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 579–590. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2786805.2786880
Lehman, M.M., Ramil, J., Kahen, G.: A paradigm for the behavioural modelling of software processes using system dynamics. Tech. rep., Imperial College of Science, Technology and Medicine, Department of Computing (2001)
Liang, D., Harrold, M.J.: Equivalence analysis and its application in improving the efficiency of program slicing. ACM Trans. Softw. Eng. Methodol. 11(3), 347–383 (2002). https://doi.org/10.1145/567793.567796
Article Google Scholar
Liu, D., Marcus, A., Poshyvanyk, D., Rajlich, V.: Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pp. 234–243. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1321631.1321667
Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015). https://doi.org/10.1016/j.jss.2014.10.037
Article Google Scholar
Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015). https://doi.org/10.1016/j.jss.2014.10.037
Article Google Scholar
Lukins, S.K., Kraft, N.A., Etzkorn, L.H.: Bug localization using latent dirichlet allocation. Inf. Softw. Technol. 52(9), 972–990 (2010). https://doi.org/10.1016/j.infsof.2010.04.002
Article Google Scholar
Mao, X., Lei, Y., Dai, Z., Qi, Y., Wang, C.: Slice-based statistical fault localization. J. Syst. Softw. 89, 51–62 (2014). https://doi.org/10.1016/j.jss.2013.08.031
Article Google Scholar
Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223 (2004). https://doi.org/10.1109/WCRE.2004.10
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., l. Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015). https://doi.org/10.1109/ASE.2015.44
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Traon, Y.L.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015). https://doi.org/10.1145/2791060.2791086
Neumann, G., Harman, M., Poulding, S.: Transformed Vargha-Delaney Effect Size, pp. 318–324. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22183-0_29
Google Scholar
Panichella, A., Dit, B., Oliveto, R., Penta, M.D., Poshyvanyk, D., Lucia, A.D.: Parameterizing and assembling ir-based solutions for se tasks using genetic algorithms. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 314–325 (2016). https://doi.org/10.1109/SANER.2016.97
Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007). https://doi.org/10.1109/TSE.2007.1016
Article Google Scholar
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Rahman, M.M., Chakraborty, S., Ray, B.: Which similarity metric to use for software documents? A study on information retrieval based software engineering tasks. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, ICSE ’18, pp. 335–336. ACM, New York, NY, USA (2018). https://doi.org/10.1145/3183440.3194997
Rao, S., Kak, A.: Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pp. 43–52. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1985441.1985451
Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: IEEE 18th International Conference on Program Comprehension (ICPC), pp. 14–23 (2010). https://doi.org/10.1109/ICPC.2010.10
Saha, R.K., Lease, M., Khurshid, S., Perry, D.E.: Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 345–355 (2013). https://doi.org/10.1109/ASE.2013.6693093
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York (1986)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220
Article MATH Google Scholar
Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: A straw to break the camel’s back. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013). https://doi.org/10.1109/ASE.2013.6693104
Sisman, B., Kak, A.C.: Incorporating version histories in information retrieval based bug localization. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 50–59 (2012). https://doi.org/10.1109/MSR.2012.6224299
Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of cvl. In: Proceedings of the 14th International Conference on Software Product Lines: Going Beyond, SPLC’10, pp. 106–120. Springer-Verlag, Berlin, Heidelberg (2010). http://dl.acm.org/citation.cfm?id=1885639.1885650
The English (porter2) Stemming Algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (2002). Online; Accessed 04 April 2017
Thomas, S.W., Hassan, A.E., Blostein, D.: Mining Unstructured Software Repositories, pp. 139–162. Springer, Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45398-4_5
Google Scholar
Vargha, A., Delaney, H.D.: A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000). https://doi.org/10.3102/10769986025002101
Google Scholar
Wang, S., Lo, D.: Amalgam+: composing rich information sources for accurate bug localization. J. Softw. Evol. Process 28(10), 921–942 (2016). https://doi.org/10.1002/smr.1801
Article Google Scholar
Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013). https://doi.org/10.1145/2499777.2500708
Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)
Article Google Scholar
Zamani, S., Lee, S.P., Shokripour, R., Anvik, J.: A noun-based approach to feature location using time-aware term-weighting. Inf. Softw. Technol. 56(8), 991–1011 (2014). https://doi.org/10.1016/j.infsof.2014.03.007
Article Google Scholar
Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011). https://doi.org/10.1109/SPLC.2011.24
Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012). https://doi.org/10.1109/APSEC.2012.76
Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 14–24. IEEE Press, Piscataway, NJ, USA (2012). http://dl.acm.org/citation.cfm?id=2337223.2337226
Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp. 563–572. IEEE Computer Society, Washington, DC, USA (2004). http://dl.acm.org/citation.cfm?id=998675.999460

Download references

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.

Author information

Authors and Affiliations

Escuela de Arquitectura y Tecnologia, Universidad San Jorge, Zaragoza, Spain
Lorena Arcega, Jaime Font & Carlos Cetina
Department of Informatics, University of Oslo, Oslo, Norway
Lorena Arcega & Jaime Font
Faculty of Computer Science, Østfold University College, Halden, Norway
Øystein Haugen

Authors

Lorena Arcega
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Font
View author publications
You can also search for this author in PubMed Google Scholar
Øystein Haugen
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Cetina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lorena Arcega.

Additional information

Communicated by Prof. Lionel Briand.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arcega, L., Font, J., Haugen, Ø. et al. An approach for bug localization in models using two levels: model and metamodel. Softw Syst Model 18, 3551–3576 (2019). https://doi.org/10.1007/s10270-019-00727-y

Download citation

Received: 16 March 2018
Revised: 24 January 2019
Accepted: 04 March 2019
Published: 14 March 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10270-019-00727-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach for bug localization in models using two levels: model and metamodel

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An approach for bug localization in models using two levels: model and metamodel

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation