Impact of Using Information Gain in Software Defect Prediction Models

Rana, Zeeshan Ali; Awais, Mian M.; Shamail, Shafay

doi:10.1007/978-3-319-09333-8_69

Impact of Using Information Gain in Software Defect Prediction Models

Zeeshan Ali Rana^18,19,
Mian M. Awais¹⁸ &
Shafay Shamail¹⁸

Conference paper

2940 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8588))

Abstract

Presence or absence of defective modules in software is an indicator of quality of the software. Every company aspires to deliver good quality software with minimum number of defective modules. To achieve this goal, defect prediction models are used in different phases of software lifecycle. These models have to deal with a large number software metrics (as input parameters to the models). These metrics have correlation issues that affect a model’s performance. Also, in some cases using all the metrics negatively impacts the models’ performances. In order to reduce size of input space and resolve the possible issues of correlation in input data, models reported in literature use Principal Component Analysis (PCA) and Information Gain (IG) based dimension reduction. PCA reduces the dimensions but keeps the representation of all the input variables intact. Use of PCA is not suitable where representation of all the metrics is declining a model’s performance. To handle such situations, this paper advocates use of Information Gain (IG) based technique to reduce size of input space by dropping the irrelevant metrics. Afterwards, only relevant metrics are used to develop a prediction model. This paper compares the PCA and IC based techniques to develop classification tree and fuzzy inferencing system based models. In order to study the impact of using IG, percentage improvement in Recall, Accuracy and Misclassification Rate have been calculated for the aforementioned models. The results show that use of IG improves the models’ performances more often than PCA does.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altidor, W., Khoshgoftaar, T.M., Van Hulse, J.: An empirical study on wrapper-based feature ranking. In: 21st International Conference on Tools with Artificial Intelligence, ICTAI 2009, pp. 75–82 (November 2009)
Google Scholar
Azhagusundari, B., Thanamani, A.S.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (January 2013)
Google Scholar
BPB Editorial Board. Data Mining: Data Mining: Typical Data Mining Process for Predictive Modeling, 1st edn. BPB Publications, Connaught Place (2004)
Google Scholar
Bouktif, S., Azar, D., Precup, D., Sahraoui, H., Kegl, B.: Improving rule set based software quality prediction: A genetic algorithm-based approach. Journal of Object Technology 3(4), 227–241 (2004)
Article Google Scholar
Briand, L.C., Wst, J., Daly, J.W., Victor Porter, D.: Exploring the relationship between design measures and software quality in object-oriented systems. Journal of Systems and Software 51(3), 245–273 (2000)
Article Google Scholar
Challagulla, V.U.B., Bastani, F.B., Paul, R.A.: Empirical assessment of machine learning based sofwtare defect prediction techniques. In: Proceedings of 10th Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2005), Washington, DC, USA, pp. 263–270. IEEE Computer Society (2005)
Google Scholar
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–687 (1999)
Article Google Scholar
Ganesan, K., Khosgoftaar, T.M., Allen, E.B.: Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2), 139–152 (2000)
Article Google Scholar
Gao, K., Khoshgoftaar, T.M.: Software defect prediction for high-dimensional and class-imbalanced data. In: SEKE, pp. 89–94, Knowledge Systems Institute Graduate School (2011)
Google Scholar
Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of PROMISE 2008. ACM (May 2008)
Google Scholar
Khosgoftaar, T.M., Munson, J.C.: Predicting software development errors using software complexity metrics. IEEE Journal on Selected Areas In Communications 8(2), 253–261 (1990)
Article Google Scholar
Khoshgoftaar, T.M., Allen, E.B.: Predicting fault-prone software modules in embedded systems with classification trees. In: Proceedings of the 4th IEEE International Symposium on High-Assurance Systems Engineering. IEEE Computer Society (1999)
Google Scholar
Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., Goel, N.: Early quality prediction: A case studv in telecommunications. IEEE Software Early Quality Prediction: A Case Studv in Telecommunications 13(1), 65–71 (1996)
Google Scholar
Khoshgoftaar, T.M., Cukic, B., Seliya, N.: Predicting fault-prone modules in embedded systems using analogy-based classification models. International Journal of Software Engineering and Knowledge Engineering 12, 201–221 (2002)
Article Google Scholar
Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)
Article Google Scholar
Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data (June 2012)
Google Scholar
Menzies, T., Di Stefano, J.S., Chapman, M.: Learning early lifecycle ivv quality indicators. In: Proceedings of IEEE Metrics 2003. IEEE (2003)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Pacharaney, U.S., Salankar, P.S., Mandalapu, S.: Dimensionality reduction for fast and accurate video search and retrieval in a large scale database. In: 2013 Nirma University International Conference on Engineering (NUiCONE), pp. 1–9 (November 2013)
Google Scholar
Palghamol, T.N., Metkar, S.P.: Constant dimensionality reduction for large databases using localized pca with an application to face recognition. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP), pp. 560–565 (December 2013)
Google Scholar
Rana, Z.A., Awais, M.M., Shamail, S.: An FIS for early detection of defect prone modules. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755, pp. 144–153. Springer, Heidelberg (2009)
Chapter Google Scholar
Rana, Z.A., Shamail, S., Awais, M.M.: Ineffectiveness of use of software science metrics as predictors of defects in object oriented software. In: WCSE 2009: Proceedings of the 2009 WRI World Congress on Software Engineering, May 19-21, pp. 3–7. IEEE Computer Society, Washington, DC (2009)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)
Google Scholar
Roobaert, D., Karakoulas, G., Chawla, N.V.: Information Gain, Correlation and Support Vector Machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)
Chapter Google Scholar
Seliya, N., Khoshgoftaar, T.M.: Software quality estimation with limited fault data: A semi-supervised learning perspective. Software Quality Journal 15, 327–344 (2007)
Article Google Scholar
Wang, Q., Zhu, J., Yu, B.: Extract rules from software quality prediction model based on neural network. In: Proceedings of the 11th International Conference on Evaluation and Assessment in Software Engineering, EASE (April 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, SBA School of Science and Engineering, Lahore University of Management Sciences (LUMS), Lahore, Pakistan
Zeeshan Ali Rana, Mian M. Awais & Shafay Shamail
Faculty of Information Technology, University of Central Punjab, Lahore, Pakistan
Zeeshan Ali Rana

Authors

Zeeshan Ali Rana
View author publications
You can also search for this author in PubMed Google Scholar
Mian M. Awais
View author publications
You can also search for this author in PubMed Google Scholar
Shafay Shamail
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
Electrical and Electronics Department, Politecnico of Bari, Via Orabona, 4,, 70125, Bari, Italy
Vitoantonio Bevilacqua
School of Electrical, Computer and Telecommunications Engineering, University of Wollongon, 2522, North Wollongon, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rana, Z.A., Awais, M.M., Shamail, S. (2014). Impact of Using Information Gain in Software Defect Prediction Models. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theory. ICIC 2014. Lecture Notes in Computer Science, vol 8588. Springer, Cham. https://doi.org/10.1007/978-3-319-09333-8_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-09333-8_69
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09332-1
Online ISBN: 978-3-319-09333-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics