Skip to main content

Impact of Using Information Gain in Software Defect Prediction Models

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8588))

Abstract

Presence or absence of defective modules in software is an indicator of quality of the software. Every company aspires to deliver good quality software with minimum number of defective modules. To achieve this goal, defect prediction models are used in different phases of software lifecycle. These models have to deal with a large number software metrics (as input parameters to the models). These metrics have correlation issues that affect a model’s performance. Also, in some cases using all the metrics negatively impacts the models’ performances. In order to reduce size of input space and resolve the possible issues of correlation in input data, models reported in literature use Principal Component Analysis (PCA) and Information Gain (IG) based dimension reduction. PCA reduces the dimensions but keeps the representation of all the input variables intact. Use of PCA is not suitable where representation of all the metrics is declining a model’s performance. To handle such situations, this paper advocates use of Information Gain (IG) based technique to reduce size of input space by dropping the irrelevant metrics. Afterwards, only relevant metrics are used to develop a prediction model. This paper compares the PCA and IC based techniques to develop classification tree and fuzzy inferencing system based models. In order to study the impact of using IG, percentage improvement in Recall, Accuracy and Misclassification Rate have been calculated for the aforementioned models. The results show that use of IG improves the models’ performances more often than PCA does.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altidor, W., Khoshgoftaar, T.M., Van Hulse, J.: An empirical study on wrapper-based feature ranking. In: 21st International Conference on Tools with Artificial Intelligence, ICTAI 2009, pp. 75–82 (November 2009)

    Google Scholar 

  2. Azhagusundari, B., Thanamani, A.S.: Feature selection based on information gain. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 2(2) (January 2013)

    Google Scholar 

  3. BPB Editorial Board. Data Mining: Data Mining: Typical Data Mining Process for Predictive Modeling, 1st edn. BPB Publications, Connaught Place (2004)

    Google Scholar 

  4. Bouktif, S., Azar, D., Precup, D., Sahraoui, H., Kegl, B.: Improving rule set based software quality prediction: A genetic algorithm-based approach. Journal of Object Technology 3(4), 227–241 (2004)

    Article  Google Scholar 

  5. Briand, L.C., Wst, J., Daly, J.W., Victor Porter, D.: Exploring the relationship between design measures and software quality in object-oriented systems. Journal of Systems and Software 51(3), 245–273 (2000)

    Article  Google Scholar 

  6. Challagulla, V.U.B., Bastani, F.B., Paul, R.A.: Empirical assessment of machine learning based sofwtare defect prediction techniques. In: Proceedings of 10th Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2005), Washington, DC, USA, pp. 263–270. IEEE Computer Society (2005)

    Google Scholar 

  7. Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–687 (1999)

    Article  Google Scholar 

  8. Ganesan, K., Khosgoftaar, T.M., Allen, E.B.: Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering 10(2), 139–152 (2000)

    Article  Google Scholar 

  9. Gao, K., Khoshgoftaar, T.M.: Software defect prediction for high-dimensional and class-imbalanced data. In: SEKE, pp. 89–94, Knowledge Systems Institute Graduate School (2011)

    Google Scholar 

  10. Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of PROMISE 2008. ACM (May 2008)

    Google Scholar 

  11. Khosgoftaar, T.M., Munson, J.C.: Predicting software development errors using software complexity metrics. IEEE Journal on Selected Areas In Communications 8(2), 253–261 (1990)

    Article  Google Scholar 

  12. Khoshgoftaar, T.M., Allen, E.B.: Predicting fault-prone software modules in embedded systems with classification trees. In: Proceedings of the 4th IEEE International Symposium on High-Assurance Systems Engineering. IEEE Computer Society (1999)

    Google Scholar 

  13. Khoshgoftaar, T.M., Allen, E.B., Kalaichelvan, K.S., Goel, N.: Early quality prediction: A case studv in telecommunications. IEEE Software Early Quality Prediction: A Case Studv in Telecommunications 13(1), 65–71 (1996)

    Google Scholar 

  14. Khoshgoftaar, T.M., Cukic, B., Seliya, N.: Predicting fault-prone modules in embedded systems using analogy-based classification models. International Journal of Software Engineering and Knowledge Engineering 12, 201–221 (2002)

    Article  Google Scholar 

  15. Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering 8(3), 255–283 (2003)

    Article  Google Scholar 

  16. Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data (June 2012)

    Google Scholar 

  17. Menzies, T., Di Stefano, J.S., Chapman, M.: Learning early lifecycle ivv quality indicators. In: Proceedings of IEEE Metrics 2003. IEEE (2003)

    Google Scholar 

  18. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  19. Pacharaney, U.S., Salankar, P.S., Mandalapu, S.: Dimensionality reduction for fast and accurate video search and retrieval in a large scale database. In: 2013 Nirma University International Conference on Engineering (NUiCONE), pp. 1–9 (November 2013)

    Google Scholar 

  20. Palghamol, T.N., Metkar, S.P.: Constant dimensionality reduction for large databases using localized pca with an application to face recognition. In: 2013 IEEE Second International Conference on Image Information Processing (ICIIP), pp. 560–565 (December 2013)

    Google Scholar 

  21. Rana, Z.A., Awais, M.M., Shamail, S.: An FIS for early detection of defect prone modules. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755, pp. 144–153. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Rana, Z.A., Shamail, S., Awais, M.M.: Ineffectiveness of use of software science metrics as predictors of defects in object oriented software. In: WCSE 2009: Proceedings of the 2009 WRI World Congress on Software Engineering, May 19-21, pp. 3–7. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  23. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)

    Google Scholar 

  24. Roobaert, D., Karakoulas, G., Chawla, N.V.: Information Gain, Correlation and Support Vector Machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. STUDFUZZ, vol. 207, pp. 463–470. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Seliya, N., Khoshgoftaar, T.M.: Software quality estimation with limited fault data: A semi-supervised learning perspective. Software Quality Journal 15, 327–344 (2007)

    Article  Google Scholar 

  26. Wang, Q., Zhu, J., Yu, B.: Extract rules from software quality prediction model based on neural network. In: Proceedings of the 11th International Conference on Evaluation and Assessment in Software Engineering, EASE (April 2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rana, Z.A., Awais, M.M., Shamail, S. (2014). Impact of Using Information Gain in Software Defect Prediction Models. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theory. ICIC 2014. Lecture Notes in Computer Science, vol 8588. Springer, Cham. https://doi.org/10.1007/978-3-319-09333-8_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09333-8_69

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09332-1

  • Online ISBN: 978-3-319-09333-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics