Multi-instance learning for software quality estimation in object-oriented systems: a case study

Huang, Peng; Zhu, Jie

doi:10.1631/jzus.C0910084

Multi-instance learning for software quality estimation in object-oriented systems: a case study

Software Engineering
Published: 20 January 2010

Volume 11, pages 130–138, (2010)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Peng Huang¹ &
Jie Zhu¹

104 Accesses
3 Citations
Explore all metrics

Abstract

We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail, each set of classes that have an inheritance relation, named ‘class hierarchy’, is regarded as a bag, while each class in the set is regarded as an instance. The learning task in this study is to estimate the label of unseen bags, i.e., the fault-proneness of untested class hierarchies. A fault-prone class hierarchy contains at least one fault-prone (negative) class, while a non-fault-prone (positive) one has no negative class. Based on the modification records (MRs) of the previous project releases and OO software metrics, the fault-proneness of an untested class hierarchy can be predicted. Several selected MI learning algorithms were evaluated on five datasets collected from an industrial software project. Among the MI learning algorithms investigated in the experiments, the kernel method using a dedicated MI-kernel was better than the others in accurately and correctly predicting the fault-proneness of the class hierarchies. In addition, when compared to a supervised support vector machine (SVM) algorithm, the MI-kernel method still had a competitive performance with much less cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A variable-level automated defect identification model based on machine learning

Article Open access 23 March 2019

Yuwei Zhang, Ying Xing, … Feng Liu

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Sofware Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications

References

Andrews, S., Tsochantaridis, I., Hofmann, T., 2003. Support Vector Machines for Multiple-Instance Learning. Proc. 15th Advances in Neural Information Processing Systems, p.561–568.
Auer, P., Ortner, R., 2004. A Boosting Approach to Multiple Instance Learning. Proc. 15th European Conf. on Machine Learning, p.63–74. [doi:10.1007/b100702]
Basili, V., Briand, L., Melo, W., 1996. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Software Eng., 22(10):751–761. [doi:10.1109/32.544352]
Article Google Scholar
Berard, E.V., 1998. Metrics for Object-Oriented Software Engineering. Available at http://www.ipipan.gda.pl/~marek/objects/TOA/moose.html [Accessed on Dec. 10, 2009].
Briand, L., Wust, J., Daly, J., Victor, P.D., 2000. Exploring the relationships between design measures and software quality in object-oriented systems. J. Syst. Software, 51(3):245–273. [doi:10.1016/S0164-1212(99)00102-8]
Article Google Scholar
Cartwright, M., Shepperd, M., 2000. An empirical investigation of an object-oriented software system. IEEE Trans. Software Eng., 26(8):786–796. [doi:10.1109/32.879814]
Article Google Scholar
Catal, C., Diri, B., 2008. A fault prediction model with limited fault data to improve test process. LNCS, 5089:244–257. [doi:10.1007/978-3-540-69566-0_21]
Google Scholar
Chen, Y., Bi, J., Wang, J.Z., 2006. MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell., 28(12):1931–1947. [doi:10.1109/TPAMI.2006.248]
Article Google Scholar
Chevaleyre, Y., Zucker, J.D., 2001. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. LNCS, 2056:204–214. [doi:10.1007/3-540-45153-6]
Google Scholar
Chidamber, S., Kemerer, C., 1994. A metrics suite for object-oriented design. IEEE Trans. Software Eng., 20(6): 476–493. [doi:10.1109/32.295895]
Article Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T., 1997. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1–2):31–71. [doi:10.1016/S0004-3702(96)00034-3]
Article MATH Google Scholar
Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C., 2006. Using weighted nearest neighbor to benefit from unlabeled data. LNCS, 3918:60–69. [doi:10.1007/11731139]
Google Scholar
Elish, K.O., Elish, M.O., 2008. Predicting defect-prone software modules using support vector machines. J. Syst. Software, 81(5):649–660. [doi:10.1016/j.jss.2007.07.040]
Article Google Scholar
Evett, M., Khoshgoftar, T., Chien, P.D., Allen, E., 1998. GP-Based Software Quality Prediction. Proc. 3rd Annual Genetic Programming Conf., p.60–65.
Fenton, N., Krause, P., Neil, M., 2002. Software measurement: uncertainty and causal modeling. Software, 19(4):116–122. [doi:10.1109/MS.2002.1020298]
Article Google Scholar
Gartner, T., Flach, P.A., Kowalczyk, A., Smola, A.J., 2002. Multi-Instance Kernels. Proc. 19th Int. Conf. on Machine Learning, p.179–186.
Guo, L., Ma, Y., Cukic, B., Singh, H., 2004. Robust Prediction of Fault-Proneness by Random Forests. Proc. 15th Int. Symp. on Software Reliability Engineering, p.417–428. [doi:10.1109/ISSRE.2004.35]
Huang, P., Zhu, J., 2008. Predicting the fault-proneness of class hierarchy in object-oriented software using a layered kernel. J. Zhejiang Univ. Sci. A, 9(10):1390–1397. [doi:10.1631/jzus.A0720073]
Article MATH Google Scholar
Huang, S.J., Lin, C.Y., Chiu, N.H., 2006. Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. J. Inf. Sci. Eng., 22(2): 297–313.
Google Scholar
Kanmani, S., Uthariaraj, V.R., Sankaranarayanan, V., 2007. Object-oriented software fault prediction using neural networks. Inf. Software Technol., 49(5):483–492. [doi:10.1016/j.infsof.2006.07.005]
Article Google Scholar
Khoshgoftaar, T.M., Allen, E.B., Hudepohl, J.P., Aud, S.J., 1997. Application of neural networks to software quality modeling of a very large telecommunications systems. IEEE Trans. Neur. Networks, 8(4):902–909. [doi:10.1109/72.595888]
Article Google Scholar
Khoshgoftaar, T.M., Allen, E.B., Deng, J., 2002. Using regression trees to classify fault-prone software modules. IEEE Trans. Rel., 51(4):455–462. [doi:10.1109/TR.2002.804488]
Article Google Scholar
Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proc. 4th Int. Joint Conf. on Artificial Intelligence, p.1137–1143.
Maron, O., Lozano-Pérez, T., 1998. A Framework for Multiple Instance Learning. Proc. 10th Advances in Neural Information Processing Systems, p.570–576.
Reformat, M., Pedrycz, W., Pizzi, N.J., 2003. Software quality analysis with the use of computational intelligence. Inf. Software Technol., 45(7):405–417. [doi:10.1016/S0950-5849(03)00012-0]
Article Google Scholar
Ruffo, G., 2000. Learning Single and Multiple Instance Decision Trees for Computer Security Applications. PhD Thesis, Department of Computer Science, University of Turin, Torino, Italy, p.425–432.
Google Scholar
Seliya, N., Khoshgoftaar, T.M., 2007. Software quality estimation with limited fault data: a semi-supervised learning perspective. Software Qual. J., 15(3):327–344. [doi:10.1007/s11219-007-9013-8]
Article Google Scholar
Tang, M.H., Kao, M.H., Chen, M.H., 1999. An Empirical Study on Object Oriented Metrics. Proc. 6th Int. Conf. on Software Metrics Symp., p.242–249.
Vishwanathan, S.V.N., Smola, A.J., Murty, M.N., 2003. Simple SVM. Proc. 20th Int. Conf. on Machine Learning, p.760–767.
Wang, J., Zucker, J.D., 2000. Solving Multiple-Instance Problem: A Lazy Learning Approach. Proc. 17th Int. Conf. on Machine Learning, p.1119–1125.
Weidmann, N., Frank, E., Pfahringer, B., 2003. A Two-level Learning Method for Generalized Multi-Instance Problem. Proc. European Conf. on Machine Learning, p.468–479. [doi:10.1007/b13633]
Zhang, M.L., Zhou, Z.H., 2004. Improve multi-instance neural networks through feature selection. Neur. Process. Lett., 19(1):1–10. [doi:10.1023/B:NEPL.0000016836.03614.9f]
Article MATH Google Scholar
Zhang, Q., Goldman, S.A., 2001. EM-DD: An Improved Multiple-Instance Learning Technique. Proc. 14th Advances in Neural Information Processing Systems, p.1073–1080.
Zhou, Z.H., Zhang, M.L., 2006. Multi-Instance Multi-Label Learning with Application to Scene Classification. Proc. Advances in Neural Information Processing Systems, p.1609–1616.
Zhou, Z.H., Jiang, K., Li, M., 2005. Multi-instance learning based Web mining. Appl. Intell., 22(2):135–147. [doi:10.1007/s10489-005-5602-z]
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Peng Huang & Jie Zhu

Authors

Peng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, P., Zhu, J. Multi-instance learning for software quality estimation in object-oriented systems: a case study. J. Zhejiang Univ. - Sci. C 11, 130–138 (2010). https://doi.org/10.1631/jzus.C0910084

Download citation

Received: 11 February 2009
Accepted: 18 June 2009
Published: 20 January 2010
Issue Date: February 2010
DOI: https://doi.org/10.1631/jzus.C0910084

Key words

CLC number

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-instance learning for software quality estimation in object-oriented systems: a case study

Abstract

Access this article

Similar content being viewed by others

A variable-level automated defect identification model based on machine learning

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Sofware Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Multi-instance learning for software quality estimation in object-oriented systems: a case study

Abstract

Access this article

Similar content being viewed by others

A variable-level automated defect identification model based on machine learning

The Impact of Instance Selection Algorithms on Maintenance Effort Estimation for Open-Source Software

Sofware Quality Prediction: An Investigation Based on Artificial Intelligence Techniques for Object-Oriented Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation