Skip to main content
Log in

Multi-instance learning for software quality estimation in object-oriented systems: a case study

  • Software Engineering
  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail, each set of classes that have an inheritance relation, named ‘class hierarchy’, is regarded as a bag, while each class in the set is regarded as an instance. The learning task in this study is to estimate the label of unseen bags, i.e., the fault-proneness of untested class hierarchies. A fault-prone class hierarchy contains at least one fault-prone (negative) class, while a non-fault-prone (positive) one has no negative class. Based on the modification records (MRs) of the previous project releases and OO software metrics, the fault-proneness of an untested class hierarchy can be predicted. Several selected MI learning algorithms were evaluated on five datasets collected from an industrial software project. Among the MI learning algorithms investigated in the experiments, the kernel method using a dedicated MI-kernel was better than the others in accurately and correctly predicting the fault-proneness of the class hierarchies. In addition, when compared to a supervised support vector machine (SVM) algorithm, the MI-kernel method still had a competitive performance with much less cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Andrews, S., Tsochantaridis, I., Hofmann, T., 2003. Support Vector Machines for Multiple-Instance Learning. Proc. 15th Advances in Neural Information Processing Systems, p.561–568.

  • Auer, P., Ortner, R., 2004. A Boosting Approach to Multiple Instance Learning. Proc. 15th European Conf. on Machine Learning, p.63–74. [doi:10.1007/b100702]

  • Basili, V., Briand, L., Melo, W., 1996. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Software Eng., 22(10):751–761. [doi:10.1109/32.544352]

    Article  Google Scholar 

  • Berard, E.V., 1998. Metrics for Object-Oriented Software Engineering. Available at http://www.ipipan.gda.pl/~marek/objects/TOA/moose.html [Accessed on Dec. 10, 2009].

  • Briand, L., Wust, J., Daly, J., Victor, P.D., 2000. Exploring the relationships between design measures and software quality in object-oriented systems. J. Syst. Software, 51(3):245–273. [doi:10.1016/S0164-1212(99)00102-8]

    Article  Google Scholar 

  • Cartwright, M., Shepperd, M., 2000. An empirical investigation of an object-oriented software system. IEEE Trans. Software Eng., 26(8):786–796. [doi:10.1109/32.879814]

    Article  Google Scholar 

  • Catal, C., Diri, B., 2008. A fault prediction model with limited fault data to improve test process. LNCS, 5089:244–257. [doi:10.1007/978-3-540-69566-0_21]

    Google Scholar 

  • Chen, Y., Bi, J., Wang, J.Z., 2006. MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell., 28(12):1931–1947. [doi:10.1109/TPAMI.2006.248]

    Article  Google Scholar 

  • Chevaleyre, Y., Zucker, J.D., 2001. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. LNCS, 2056:204–214. [doi:10.1007/3-540-45153-6]

    Google Scholar 

  • Chidamber, S., Kemerer, C., 1994. A metrics suite for object-oriented design. IEEE Trans. Software Eng., 20(6): 476–493. [doi:10.1109/32.295895]

    Article  Google Scholar 

  • Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T., 1997. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1–2):31–71. [doi:10.1016/S0004-3702(96)00034-3]

    Article  MATH  Google Scholar 

  • Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C., 2006. Using weighted nearest neighbor to benefit from unlabeled data. LNCS, 3918:60–69. [doi:10.1007/11731139]

    Google Scholar 

  • Elish, K.O., Elish, M.O., 2008. Predicting defect-prone software modules using support vector machines. J. Syst. Software, 81(5):649–660. [doi:10.1016/j.jss.2007.07.040]

    Article  Google Scholar 

  • Evett, M., Khoshgoftar, T., Chien, P.D., Allen, E., 1998. GP-Based Software Quality Prediction. Proc. 3rd Annual Genetic Programming Conf., p.60–65.

  • Fenton, N., Krause, P., Neil, M., 2002. Software measurement: uncertainty and causal modeling. Software, 19(4):116–122. [doi:10.1109/MS.2002.1020298]

    Article  Google Scholar 

  • Gartner, T., Flach, P.A., Kowalczyk, A., Smola, A.J., 2002. Multi-Instance Kernels. Proc. 19th Int. Conf. on Machine Learning, p.179–186.

  • Guo, L., Ma, Y., Cukic, B., Singh, H., 2004. Robust Prediction of Fault-Proneness by Random Forests. Proc. 15th Int. Symp. on Software Reliability Engineering, p.417–428. [doi:10.1109/ISSRE.2004.35]

  • Huang, P., Zhu, J., 2008. Predicting the fault-proneness of class hierarchy in object-oriented software using a layered kernel. J. Zhejiang Univ. Sci. A, 9(10):1390–1397. [doi:10.1631/jzus.A0720073]

    Article  MATH  Google Scholar 

  • Huang, S.J., Lin, C.Y., Chiu, N.H., 2006. Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. J. Inf. Sci. Eng., 22(2): 297–313.

    Google Scholar 

  • Kanmani, S., Uthariaraj, V.R., Sankaranarayanan, V., 2007. Object-oriented software fault prediction using neural networks. Inf. Software Technol., 49(5):483–492. [doi:10.1016/j.infsof.2006.07.005]

    Article  Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Hudepohl, J.P., Aud, S.J., 1997. Application of neural networks to software quality modeling of a very large telecommunications systems. IEEE Trans. Neur. Networks, 8(4):902–909. [doi:10.1109/72.595888]

    Article  Google Scholar 

  • Khoshgoftaar, T.M., Allen, E.B., Deng, J., 2002. Using regression trees to classify fault-prone software modules. IEEE Trans. Rel., 51(4):455–462. [doi:10.1109/TR.2002.804488]

    Article  Google Scholar 

  • Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proc. 4th Int. Joint Conf. on Artificial Intelligence, p.1137–1143.

  • Maron, O., Lozano-Pérez, T., 1998. A Framework for Multiple Instance Learning. Proc. 10th Advances in Neural Information Processing Systems, p.570–576.

  • Reformat, M., Pedrycz, W., Pizzi, N.J., 2003. Software quality analysis with the use of computational intelligence. Inf. Software Technol., 45(7):405–417. [doi:10.1016/S0950-5849(03)00012-0]

    Article  Google Scholar 

  • Ruffo, G., 2000. Learning Single and Multiple Instance Decision Trees for Computer Security Applications. PhD Thesis, Department of Computer Science, University of Turin, Torino, Italy, p.425–432.

    Google Scholar 

  • Seliya, N., Khoshgoftaar, T.M., 2007. Software quality estimation with limited fault data: a semi-supervised learning perspective. Software Qual. J., 15(3):327–344. [doi:10.1007/s11219-007-9013-8]

    Article  Google Scholar 

  • Tang, M.H., Kao, M.H., Chen, M.H., 1999. An Empirical Study on Object Oriented Metrics. Proc. 6th Int. Conf. on Software Metrics Symp., p.242–249.

  • Vishwanathan, S.V.N., Smola, A.J., Murty, M.N., 2003. Simple SVM. Proc. 20th Int. Conf. on Machine Learning, p.760–767.

  • Wang, J., Zucker, J.D., 2000. Solving Multiple-Instance Problem: A Lazy Learning Approach. Proc. 17th Int. Conf. on Machine Learning, p.1119–1125.

  • Weidmann, N., Frank, E., Pfahringer, B., 2003. A Two-level Learning Method for Generalized Multi-Instance Problem. Proc. European Conf. on Machine Learning, p.468–479. [doi:10.1007/b13633]

  • Zhang, M.L., Zhou, Z.H., 2004. Improve multi-instance neural networks through feature selection. Neur. Process. Lett., 19(1):1–10. [doi:10.1023/B:NEPL.0000016836.03614.9f]

    Article  MATH  Google Scholar 

  • Zhang, Q., Goldman, S.A., 2001. EM-DD: An Improved Multiple-Instance Learning Technique. Proc. 14th Advances in Neural Information Processing Systems, p.1073–1080.

  • Zhou, Z.H., Zhang, M.L., 2006. Multi-Instance Multi-Label Learning with Application to Scene Classification. Proc. Advances in Neural Information Processing Systems, p.1609–1616.

  • Zhou, Z.H., Jiang, K., Li, M., 2005. Multi-instance learning based Web mining. Appl. Intell., 22(2):135–147. [doi:10.1007/s10489-005-5602-z]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, P., Zhu, J. Multi-instance learning for software quality estimation in object-oriented systems: a case study. J. Zhejiang Univ. - Sci. C 11, 130–138 (2010). https://doi.org/10.1631/jzus.C0910084

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C0910084

Key words

CLC number

Navigation