Skip to main content
Log in

An approach for bug localization in models using two levels: model and metamodel

  • Regular Paper
  • Published:
Software and Systems Modeling Aims and scope Submit manuscript

Abstract

Bug localization is a common task in software engineering, especially when maintaining and evolving software products. This paper introduces a bug localization approach that, in contrast to existing source code approaches, takes advantage of domain information found in the model and the metamodel. Throughout this paper, we present an approach for bug localization in models (BLiM2) that applies the source code ideas for bug localization (textual similarity to the bug description and the Defect Localization Principle) and takes advantage of the domain information from the model and the metamodel. We evaluated our approach in BSH, a real-world industrial case study in the induction hob domain measuring the results in terms of recall, precision, the combination of both the F-measure and the Matthews correlation coefficient. Our study shows that our BLiM2 approach, which combines information from the model and the metamodel for the textual similarity and differentiates between the timespan from the model and metamodel, provides the best results in this work. We also performed a statistical analysis to provide evidence of the significance of the results. The values obtained show that there exist significant differences in the performance of the best BLiM2 approach with the approach used by our industrial partner. Finally, the effect size statistics reveals that the best BLiM2 approach obtains better results in the 78% of the times in the worst case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Although we have performed the entire statistical significance analysis, here we decide to show only the combinations with the algorithm that obtained the best results. Table 5 shows the entire table with the sixty-six combinations (at the end of the paper).

  2. Although we have performed the entire size effect statistics analysis, here we decide to show only the combinations with the algorithm that obtained the best results. Table 6 shows the entire table with the sixty-six combinations (at the end of the paper).

References

  1. Apache opennlp: Toolkit for the processing of natural language text. http://opennlp.apache.org/ (2010). Online; Accessed 04 April 2017

  2. Alves, E., Gligoric, M., Jagannath, V., d’Amorim, M.: Fault-localization using dynamic slicing and change impact analysis. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 520–523. IEEE Computer Society, Washington, DC, USA (2011). https://doi.org/10.1109/ASE.2011.6100114

  3. Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of models at run-time traces in dynamic feature location. In: Modelling Foundations and Applications - 13th European Conference, ECMFA 2017, Held as Part of STAF 2017, Marburg, Germany, July 19–20, 2017, Proceedings (2017)

  4. Arcega, L., Font, J., Haugen, Ø., Cetina, C.: On the influence of modification timespan weightings in the location of bugs in models. In: Proceedings of the 26th International Conference on Information Systems Development, ISD 2017, Larnaca, Cyprus, September 6–8, 2017 (2017)

  5. Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014). https://doi.org/10.1002/stvr.1486

    Article  Google Scholar 

  6. Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013). https://doi.org/10.1007/s10664-013-9249-9

    Article  Google Scholar 

  7. Bencomo, N., Hallsteinsen, S., de Almeida, E.Santana: A view of the dynamic software product line landscape. Computer 45(10), 36–41 (2012). https://doi.org/10.1109/MC.2012.292

    Article  Google Scholar 

  8. Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4/5), 993–1022 (2003)

    MATH  Google Scholar 

  9. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLOS ONE 12(6), 1–17 (2017). https://doi.org/10.1371/journal.pone.0177678

    Article  Google Scholar 

  10. Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, Hoboken (1999)

    Google Scholar 

  11. de Oliveira Barros, M., Dias-Neto, A.C.: 0006/2011-threats to validity in search-based software engineering empirical studies. RelaTe-DIA 5(1), (2011)

  12. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. Trans. Evol. Comput. 6(2), 182–197 (2002). https://doi.org/10.1109/4235.996017

    Article  Google Scholar 

  13. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9

  14. Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Maint. Evol. Res. Pract. (2011)

  15. Dyer, D.W.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java). http://watchmaker.uncommons.org/ (2006). Online; Accessed 04 April 2017

  16. Efficient Java Matrix Library. https://ejml.org (2016). Online; Accessed 04 April 2017

  17. Font, J., Arcega, L., Haugen, O., Cetina, C.: Leveraging variability modeling to address metamodel revisions in model-based software product lines. Comput. Lang. Syst. Struct. 48, 20–38 (2017). https://doi.org/10.1016/j.cl.2016.08.003

    Google Scholar 

  18. Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: 15th International Conference on Software Reuse, ICSR 2016, Limassol, Cyprus (2016)

  19. Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, MODELS ’16, pp. 272–282. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2976767.2976789

  20. Garca, S., Fernndez, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010). https://doi.org/10.1016/j.ins.2009.12.010

    Article  Google Scholar 

  21. Gong, L., Lo, D., Jiang, L., Zhang, H.: Interactive fault localization leveraging simple user feedback. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp. 67–76 (2012). https://doi.org/10.1109/ICSM.2012.6405255

  22. Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)

    Google Scholar 

  23. Hassan, A.E., Holt, R.C.: The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), pp. 263–272 (2005). https://doi.org/10.1109/ICSM.2005.91

  24. Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, G.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Proceedings of the 2008 12th International Software Product Line Conference, SPLC ’08, pp. 139–148. IEEE Computer Society, Washington, DC, USA (2008). https://doi.org/10.1109/SPLC.2008.25

  25. Hoang, T.V., Oentaryo, R.J., Le, T.B., Lo, D.: Network-clustered multi-modal bug localization. IEEE Trans. Softw. Eng. (2018). https://doi.org/10.1109/TSE.2018.2810892

    Google Scholar 

  26. Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference: Volume 2, pp. 36–43 (2014). https://doi.org/10.1145/2647908.2655965

  27. Kagdi, H., Gethers, M., Poshyvanyk, D., Hammad, M.: Assigning change requests to software developers. J. Softw. Evol. Process 24(1), 3–33 (2012). https://doi.org/10.1002/smr.530

    Article  Google Scholar 

  28. Kim, D., Tao, Y., Kim, S., Zeller, A.: Where should we fix this bug? A two-phase recommendation model. IEEE Trans. Softw. Eng. 39(11), 1597–1610 (2013)

    Article  Google Scholar 

  29. Kusumoto, S., Nishimatsu, A., Nishie, K., Inoue, K.: Experimental evaluation of program slicing for fault localization. Empir. Softw. Eng. 7(1), 49–76 (2002). https://doi.org/10.1023/A:1014823126938

    Article  MATH  Google Scholar 

  30. Lam, A.N., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC), pp. 218–229 (2017). https://doi.org/10.1109/ICPC.2017.24

  31. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998). https://doi.org/10.1080/01638539809545028

    Article  Google Scholar 

  32. Le, T.D.B., Oentaryo, R.J., Lo, D.: Information retrieval and spectrum based bug localization: Better together. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 579–590. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2786805.2786880

  33. Lehman, M.M., Ramil, J., Kahen, G.: A paradigm for the behavioural modelling of software processes using system dynamics. Tech. rep., Imperial College of Science, Technology and Medicine, Department of Computing (2001)

  34. Liang, D., Harrold, M.J.: Equivalence analysis and its application in improving the efficiency of program slicing. ACM Trans. Softw. Eng. Methodol. 11(3), 347–383 (2002). https://doi.org/10.1145/567793.567796

    Article  Google Scholar 

  35. Liu, D., Marcus, A., Poshyvanyk, D., Rajlich, V.: Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE ’07, pp. 234–243. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1321631.1321667

  36. Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015). https://doi.org/10.1016/j.jss.2014.10.037

    Article  Google Scholar 

  37. Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103, 353–369 (2015). https://doi.org/10.1016/j.jss.2014.10.037

    Article  Google Scholar 

  38. Lukins, S.K., Kraft, N.A., Etzkorn, L.H.: Bug localization using latent dirichlet allocation. Inf. Softw. Technol. 52(9), 972–990 (2010). https://doi.org/10.1016/j.infsof.2010.04.002

    Article  Google Scholar 

  39. Mao, X., Lei, Y., Dai, Z., Qi, Y., Wang, C.: Slice-based statistical fault localization. J. Syst. Softw. 89, 51–62 (2014). https://doi.org/10.1016/j.jss.2013.08.031

    Article  Google Scholar 

  40. Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, pp. 214–223 (2004). https://doi.org/10.1109/WCRE.2004.10

  41. Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., l. Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015). https://doi.org/10.1109/ASE.2015.44

  42. Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Traon, Y.L.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015). https://doi.org/10.1145/2791060.2791086

  43. Neumann, G., Harman, M., Poulding, S.: Transformed Vargha-Delaney Effect Size, pp. 318–324. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22183-0_29

    Google Scholar 

  44. Panichella, A., Dit, B., Oliveto, R., Penta, M.D., Poshyvanyk, D., Lucia, A.D.: Parameterizing and assembling ir-based solutions for se tasks using genetic algorithms. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 314–325 (2016). https://doi.org/10.1109/SANER.2016.97

  45. Poshyvanyk, D., Gueheneuc, Y.G., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng. 33(6), 420–432 (2007). https://doi.org/10.1109/TSE.2007.1016

    Article  Google Scholar 

  46. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  47. Rahman, M.M., Chakraborty, S., Ray, B.: Which similarity metric to use for software documents? A study on information retrieval based software engineering tasks. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, ICSE ’18, pp. 335–336. ACM, New York, NY, USA (2018). https://doi.org/10.1145/3183440.3194997

  48. Rao, S., Kak, A.: Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pp. 43–52. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1985441.1985451

  49. Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: IEEE 18th International Conference on Program Comprehension (ICPC), pp. 14–23 (2010). https://doi.org/10.1109/ICPC.2010.10

  50. Saha, R.K., Lease, M., Khurshid, S., Perry, D.E.: Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 345–355 (2013). https://doi.org/10.1109/ASE.2013.6693093

  51. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York (1986)

    MATH  Google Scholar 

  52. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220

    Article  MATH  Google Scholar 

  53. Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: A straw to break the camel’s back. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013). https://doi.org/10.1109/ASE.2013.6693104

  54. Sisman, B., Kak, A.C.: Incorporating version histories in information retrieval based bug localization. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 50–59 (2012). https://doi.org/10.1109/MSR.2012.6224299

  55. Svendsen, A., Zhang, X., Lind-Tviberg, R., Fleurey, F., Haugen, Ø., Møller-Pedersen, B., Olsen, G.K.: Developing a software product line for train control: a case study of cvl. In: Proceedings of the 14th International Conference on Software Product Lines: Going Beyond, SPLC’10, pp. 106–120. Springer-Verlag, Berlin, Heidelberg (2010). http://dl.acm.org/citation.cfm?id=1885639.1885650

  56. The English (porter2) Stemming Algorithm. http://snowball.tartarus.org/algorithms/english/stemmer.html (2002). Online; Accessed 04 April 2017

  57. Thomas, S.W., Hassan, A.E., Blostein, D.: Mining Unstructured Software Repositories, pp. 139–162. Springer, Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-642-45398-4_5

    Google Scholar 

  58. Vargha, A., Delaney, H.D.: A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000). https://doi.org/10.3102/10769986025002101

    Google Scholar 

  59. Wang, S., Lo, D.: Amalgam+: composing rich information sources for accurate bug localization. J. Softw. Evol. Process 28(10), 921–942 (2016). https://doi.org/10.1002/smr.1801

    Article  Google Scholar 

  60. Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013). https://doi.org/10.1145/2499777.2500708

  61. Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)

    Article  Google Scholar 

  62. Zamani, S., Lee, S.P., Shokripour, R., Anvik, J.: A noun-based approach to feature location using time-aware term-weighting. Inf. Softw. Technol. 56(8), 991–1011 (2014). https://doi.org/10.1016/j.infsof.2014.03.007

    Article  Google Scholar 

  63. Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011). https://doi.org/10.1109/SPLC.2011.24

  64. Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012). https://doi.org/10.1109/APSEC.2012.76

  65. Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp. 14–24. IEEE Press, Piscataway, NJ, USA (2012). http://dl.acm.org/citation.cfm?id=2337223.2337226

  66. Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp. 563–572. IEEE Computer Society, Washington, DC, USA (2004). http://dl.acm.org/citation.cfm?id=998675.999460

Download references

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lorena Arcega.

Additional information

Communicated by Prof. Lionel Briand.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arcega, L., Font, J., Haugen, Ø. et al. An approach for bug localization in models using two levels: model and metamodel. Softw Syst Model 18, 3551–3576 (2019). https://doi.org/10.1007/s10270-019-00727-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10270-019-00727-y

Keywords

Navigation