Skip to main content
Log in

Topic modeling and intuitionistic fuzzy set-based approach for efficient software bug triaging

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Modern software development involves multiple developers working remotely in a distributed manner around the world. Software bugs are continuously generated for multiple reasons across various modules. It is possible that one software bug can affect multiple modules, and there can be multiple developers associated with it. Furthermore, many software bug reports are unlabeled, vague, and noisy. The triager faces significant challenges in identifying multiple causes of software bugs and finding expert developers for bug fixing. In this paper, the fuzzy set is extended to Intuitionistic Fuzzy Sets (IFS), and a novel bug triaging approach based on Intuitionistic Fuzzy Similarity (IFSim) measures is presented to overcome the aforementioned problems. The topic model is used to discover multiple relationships between developers and software bugs. IFS is used to separate developers based on their degree of membership and non-membership in a particular software category, with a degree of hesitation for some developers. For a new bug, 15 different IFSim measure techniques are investigated to compute the similarity with the existing software bugs. Finally, a fuzzy \(\alpha \)-cut is applied to find expert developers to repair it. The best results are obtained by considering the number of topics of 15 and 12 taxonomic terms for each topic. Among all the IFSim measure techniques, the similarity techniques proposed by Ye outperform other techniques. Experiments are carried out on available benchmark data sets, and the results are compared to traditional machine learning algorithms and the fuzzy logic-based Bugzie model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://bugs.eclipse.org/bugs/.

  2. https://bugs.eclipse.org/bugs/.

  3. https://bugs.eclipse.org/bugs/.

  4. https://bugzilla.mozilla.org/describecomponents.cgi.

  5. https://netbeans.org/bugzilla/.

  6. https://www.r-project.org.

  7. https://bugs.eclipse.org/bugs/.

References

  1. Alazzam I, Aleroud A, Al Latifah Z, Karabatis G (2020) Automatic bug triage in software systems using graph neighborhood relations for feature augmentation. IEEE Trans Comput Soc Syst 7(5):1288–1303

    Article  Google Scholar 

  2. Alkhazi B, DiStasi A, Aljedaani W, Alrubaye H, Ye X, Mkaouer MW (2020) Learning to rank developers for bug report assignment. Appl Soft Comput 95:106667

    Article  Google Scholar 

  3. Almhana R, Kessentini M (2021) Considering dependencies between bug reports to improve bugs triage. Autom Softw Eng 28(1):1–26

    Article  Google Scholar 

  4. Almhana R, Kessentini M, Mkaouer W (2021) Method-level bug localization using hybrid multi-objective search. Inf Softw Technol 131:106474

    Article  Google Scholar 

  5. Aung TWW, Wan Y, Huo H, Sui Y (2022) Multi-triage: a multi-task learning framework for bug triage. J Syst Softw 184:111133

    Article  Google Scholar 

  6. Bouchet A, Montes S, Ballarin V, Diaz I (2020) Intuitionistic fuzzy set and fuzzy mathematical morphology applied to color leukocytes segmentation. SIViP 14(3):557–564

    Article  Google Scholar 

  7. Chen SM (1995) Measures of similarity between vague sets. Fuzzy Sets Syst 74(2):217–223

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen SM, Cheng SH, Lan TC (2016) A novel similarity measure between intuitionistic fuzzy sets based on the centroid points of transformed fuzzy numbers with applications to pattern recognition. Inf Sci 343:15–40

    Article  MathSciNet  MATH  Google Scholar 

  9. Chen TH, Thomas SW, Hassan AE (2016) A survey on the use of topic models when mining software repositories. Empir Softw Eng 21(5):1843–1919

    Article  Google Scholar 

  10. Cheng Y, Li Y, Yang J (2021) Multi-attribute decision-making method based on a novel distance measure of linguistic intuitionistic fuzzy sets. J Intell Fuzzy Syst 40(1):1147–1160

    Article  Google Scholar 

  11. Corley CS, Damevski K, Kraft NA (2018) Changeset-based topic modeling of software repositories. IEEE Trans Softw Eng 46(10):1068–1080

    Article  Google Scholar 

  12. Falessi D, Huang J, Narayana L, Thai JF, Turhan B (2020) On the need of preserving order of data when validating within-project defect classifiers. Empir Softw Eng 25(6):4805–4830

    Article  Google Scholar 

  13. Fan L, Zhangyan X (2001) Measures of similarity between vague sets. J Softw 12(6):922–927

    Google Scholar 

  14. Garg H, Kumar K (2018) Distance measures for connection number sets based on set pair analysis and its applications to decision-making process. Appl Intell 48(10):3346–3359

    Article  Google Scholar 

  15. Ge X, Zheng S, Wang J, Li H (2020) High-dimensional hybrid data reduction for effective bug triage. Math Probl Eng 2020:1–20

    Google Scholar 

  16. Goguen J (1973) La zadeh. fuzzy sets. information and control, vol. 8 (1965), pp. 338–353.-la zadeh. similarity relations and fuzzy orderings. information sciences, vol. 3 (1971), pp. 177–200. J Symb. Logic 38(4):656–657

    Article  Google Scholar 

  17. Guo S, Chen R, Wei M, Li H, Liu Y (2018) Ensemble data reduction techniques and multi-RSMOTE via fuzzy integral for bug report classification. IEEE Access 6:45934–45950

    Article  Google Scholar 

  18. Guo S, Zhang X, Yang X, Chen R, Guo C, Li H, Li T (2020) Developer activity motivated bug triaging: via convolutional neural network. Neural Process Lett 51(3):2589–2606

    Article  Google Scholar 

  19. Gupta C, Freire MM (2021) A decentralized blockchain oriented framework for automated bug assignment. Inf Softw Technol 134:106540

    Article  Google Scholar 

  20. Hamdy A, Ezzat G (2020) Deep mining of open source software bug repositories. Int J Comput Appl 44(7):614–622

    Google Scholar 

  21. Herbold S, Trautsch A, Trautsch F (2020) On the feasibility of automated prediction of bug and non-bug issues. Empir Softw Eng 25(6):5333–5369

    Article  Google Scholar 

  22. Hong DH, Kim C (1999) A note on similarity measures between vague sets and between elements. Inf Sci 115(1–4):83–96

    Article  MathSciNet  MATH  Google Scholar 

  23. Hung WL, Yang MS (2008) On similarity measures between intuitionistic fuzzy sets. Int J Intell Syst 23(3):364–383

    Article  MATH  Google Scholar 

  24. Jahanshahi H, Chhabra K, Cevik M, Baar A (2021) DABT: a dependency-aware bug triaging method. In: Evaluation and assessment in software engineering. ACM, pp 221–230

  25. Jiang Q, Jin X, Lee SJ, Yao S (2019) A new similarity/distance measure between intuitionistic fuzzy sets based on the transformed isosceles triangles and its applications to pattern recognition. Expert Syst Appl 116:439–453

    Article  Google Scholar 

  26. Kashiwa Y, Ohira M (2020) A release-aware bug triaging method considering developers’ bug-fixing loads. IEICE Trans Inf Syst 103(2):348–362

    Article  Google Scholar 

  27. Kaushal M, Lohani QD (2021) Generalized intuitionistic fuzzy c-means clustering algorithm using an adaptive intuitionistic fuzzification technique. Granul Comput 7:183–195

    Article  Google Scholar 

  28. Krassimir TA, Parvathi R (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96

    Article  MATH  Google Scholar 

  29. Lee DG, Seo YS (2020) Improving bug report triage performance using artificial intelligence based document generation model. HCIS 10(1):1–22

    Google Scholar 

  30. Li Y, Olson DL, Qin Z (2007) Similarity measures between intuitionistic fuzzy (vague) sets: a comparative analysis. Pattern Recognit Lett 28(2):278–285

    Article  Google Scholar 

  31. Liu HW (2005) New similarity measures between intuitionistic fuzzy sets and between elements. Math Comput Model 42(1–2):61–70

    Article  MathSciNet  MATH  Google Scholar 

  32. Liu Q, Huang H, Xuan J, Zhang G, Gao Y, Lu J (2020) A fuzzy word similarity measure for selecting top-k similar words in query expansion. IEEE Trans Fuzzy Syst 29(8):2132–2144

    Article  Google Scholar 

  33. Maheshan M, Harish B (2021) A modified intuitionistic fuzzy clustering approach for sclera segmentation. SN Comput Sci 2(4):1–8

    Article  Google Scholar 

  34. Ngan RT, Cuong BC, Ali M et al (2018) H-max distance measure of intuitionistic fuzzy sets in decision making. Appl Soft Comput 69:393–425

    Article  Google Scholar 

  35. Panda RR, Nagwani NK (2019) Software bug categorization technique based on fuzzy similarity. In: 2019 IEEE 9th international conference on advanced computing (IACC). IEEE, pp 1–6

  36. Panda RR, Nagwani NK (2021) Multi-label software bug categorisation based on fuzzy similarity. Int J Comput Sci Eng 24(3):244–258

    Google Scholar 

  37. Pandolfo G, D’Ambrosio A, Cannavacciuolo L, Siciliano R (2020) Fuzzy logic aggregation of crisp data partitions as learning analytics in triage decisions. Expert Syst Appl 158:113512

    Article  Google Scholar 

  38. Panichella S, Zaugg N (2020) An empirical investigation of relevant changes and automation needs in modern code review. Empir Softw Eng 25(6):4833–4872

    Article  Google Scholar 

  39. Raji-Lawal HY, Akinwale AT, Folorunsho O, Mustapha AO (2020) Decision support system for dementia patients using intuitionistic fuzzy similarity measure. Soft Comput Lett 2:100005

    Article  Google Scholar 

  40. Rodríguez-Pérez G, Robles G, Serebrenik A, Zaidman A, Germán DM, Gonzalez-Barahona JM (2020) How bugs are born: a model to identify how bugs are introduced in software components. Empir Softw Eng 5(2):1294–1340

    Article  Google Scholar 

  41. Soltani M, Hermans F, Bäck T (2020) The significance of bug report elements. Empir Softw Eng 25(6):5255–5294

    Article  Google Scholar 

  42. Song Y, Wang X, Lei L, Xue A (2014) A new similarity measure between intuitionistic fuzzy sets and its application to pattern recognition. In: Abstract and applied analysis, vol 2014. Hindawi

  43. Su Y, Xing Z, Peng X, Xia X, Wang C, Xu X, Zhu L (2021) Reducing bug triaging confusion by learning from mistakes with a bug tossing knowledge graph. In: 2021 36th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 191–202

  44. Sugeno M, Terano T (1977) A model of learning based on fuzzy information. Kybernetes 6(3):157–166

    Article  MATH  Google Scholar 

  45. Tamrawi A, Nguyen TT, Al-Kofahi J, Nguyen TN (2011) Fuzzy set-based automatic bug triaging (NIER track). In: Proceedings of the 33rd international conference on software engineering, pp 884–887

  46. Tamrawi A, Nguyen TT, Al-Kofahi JM, Nguyen TN (2011) Fuzzy set and cache-based approach for bug triaging. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp 365–375

  47. Thao NX (2020) Similarity measures of picture fuzzy sets based on entropy and their application in MCDM. Pattern Anal Appl 23(3):1203–1213

    Article  MathSciNet  Google Scholar 

  48. Tran HM, Le ST, Van Nguyen S, Ho PT (2020) An analysis of software bug reports using machine learning techniques. SN Comput Sci 1(1):4

    Article  Google Scholar 

  49. Wang Y, Yao Y, Tong H, Huo X, Li M, Xu F, Lu J (2020) Enhancing supervised bug localization with metadata and stack-trace. Knowl Inf Syst 62(6):2461–2484

    Article  Google Scholar 

  50. Wu X, Zheng W, Pu M, Chen J, Mu D (2020) Invalid bug reports complicate the software aging situation. Softw Qual J 28(1):195–220

    Article  Google Scholar 

  51. Xi SQ, Yao Y, Xiao XS, Xu F, Lv J (2019) Bug triaging based on tossing sequence modeling. J Comput Sci Technol 34(5):942–956

    Article  Google Scholar 

  52. Yager RR (1979) On the measure of fuzziness and negation part I: membership in the unit interval. Int J Gen Syst 5:221–229

    Article  MATH  Google Scholar 

  53. Yang K, Cai Y, Leung HF, Lau RY, Li Q (2019) ITWF: a framework to apply term weighting schemes in topic model. Neurocomputing 350:248–260

    Article  Google Scholar 

  54. Ye J (2011) Cosine similarity measures for intuitionistic fuzzy sets and their applications. Math Comput Model 53(1–2):91–97

    Article  MathSciNet  MATH  Google Scholar 

  55. Zaidi SFA, Lee CG (2021) Learning graph representation of bug reports to triage bugs using graph convolution network. In: 2021 international conference on information networking (ICOIN). IEEE, pp 504–507

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rama Ranjan Panda or Naresh Kumar Nagwani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Illustrative example

A sample of the Eclipse data setFootnote 7 is used to illustrate the proposed approach. Initially, a fixed number of \(S_\mathrm{b}\) in a given range is considered for conducting the experiment. Here, all the developers having bug counts between 45 and 70 are selected, and it is illustrated in Table 8.

Table 8 A sample bug distribution between developers for Eclipse data set

In this article, the actual names of developers involved with the bug data set are not listed for privacy purposes. The sample data is divided into training and testing data. The training data consists of 251 software bugs, and the testing data consists of 59 software bugs. Initially, the pre-processing is performed for training data and the pre-processed software bugs are transformed into \(D_\mathrm{t}\) matrix for finding the occurrence of terms associated with each developer. The \(D_\mathrm{t}\) matrix has 5 observations with 709 terms.

A sample of the developer-associated term matrix is shown in Table 9. In the next step, the number of topics and their taxonomic terms are selected based on the user. LDA is used to create 10 topics and for each topic 5 taxonomic terms are considered for further operation. The list of taxonomic terms and the associated topics is shown in Table 10. Then the taxonomic terms of each topic are mapped with \(D_\mathrm{t}\) matrix to compute the total number of the taxonomic terms of each developer. For Topic1, the computed values are shown in Table 11. The total column in Table 11 shows the total number of taxonomic terms for each developer present in a particular topic. In the same way, for other topics, the total number of taxonomic terms is calculated, and it is shown in Table 12. The data in Table 12 shows the total number of terms that each developer belongs to each topic.

Once the relationship between the developer and taxonomic terms is determined for training data, the same thing will be carried out for testing data without considering the name of the developer of each software bug. Among the 60 testing data a sample of 2 testing data is shown in Table 13. In the next step, the \(\mu _F(x)\), \(\nu _F(x)\) and \(\pi _F(x)\) values are computed for training and testing data. The computed IFS membership grade, non-membership grade and hesitancy grade for training data set is shown in the Tables 14, 15 and 16 respectively. The IFS for a developer \(D_A\) with Topic 1 in the training data set is represented as \(F=\{(x,0.080,0.885,0.035)\}\). Similarly, the computed IFS membership grade, non-membership grade, and hesitancy grade for the testing data set is shown in the Tables 17, 18 and 19 respectively. The IFS for test-1 in testing data with Topic 2 is represented as \(F=\{(x,0.333,0.572,0.095)\}\).Once the training and testing data are represented using IFS, the different existing IFSim measures listed in Table 3 are used to find the similarity between the training and testing data. Here in this illustrative example, \(S_L(P,Q)\) is utilized to compute the similarity between testing data (Test 1 and Test 2) with training data and it is shown in the Table 20. To identify experts for fixing Test 1 and Test 2, a fuzzy \(\alpha \)-cut is applied to the computed similarity values. Suppose the \(\alpha \) value is 0.7, then for fixing Test 1, the expert developers are \(D_A\), \(D_C\) and \(D_E\), and for Test 2, all developers are eligible.

Table 9 Developer-associated term matrix
Table 10 Topic and its taxonomic terms
Table 11 Taxonomic terms and its count for Topic 1
Table 12 Developer and taxonomic terms count for each topic
Table 13 A sample of test data set with its taxonomic terms count for each topic
Table 14 IFS membership grade of each developer associated with each topic for training data set
Table 15 IFS non-membership grade of each developer associated with each topic for training data set
Table 16 IFS hesitant grade of each developer associated with each topic for training data set
Table 17 IFS membership grade of each developer associated with each topic for testing data set
Table 18 IFS non-membership grade of each developer associated with each topic for testing data set
Table 19 IFS hesitant grade of each developer associated with each topic for testing data set
Table 20 Similarity values of test 1 and test 2 with training data

Experimental results

See Tables 2122, 23, and 24.

Table 21 Effect of number of topics and taxonomic terms on different data sets
Table 22 Performance measure of IFS using different similarity techniques on Eclipse data set
Table 23 Performance measure of IFS using different similarity techniques on Mozilla data set
Table 24 Performance measure of IFS using different similarity techniques on NetBeans data set

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panda, R.R., Nagwani, N.K. Topic modeling and intuitionistic fuzzy set-based approach for efficient software bug triaging. Knowl Inf Syst 64, 3081–3111 (2022). https://doi.org/10.1007/s10115-022-01735-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01735-z

Keywords

Navigation