ABSTRACT
The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly. This paper describes the training on errors procedure to apply fault-prone filtering in practice. Since no pre-training is required, this procedure can be applied to actual development field immediately. In order to show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.
- P. Bellini, I. Bruno, P. Nesi, and D. Rogai. Comparing fault-proneness estimation models. In Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05), pages 205--214, 2005. Google ScholarDigital Library
- bogofilter. http://bogofilter.sourceforge.net/.Google Scholar
- L. C. Briand, W. L. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 28(7):706--720, 2002. Google ScholarDigital Library
- S. Chhabra, W. S. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In Proc. of Fourth IEEE International Conference on Data Mining (ICDM 2004), pages 347--350, 2004. Google ScholarDigital Library
- CRM114 -- the Controllable Regex Mutilator. http://crm114.sourceforge.net/.Google Scholar
- G. Denaro and M. Pezze. An empirical evaluation of fault-proneness models. In Proc. of 24th International Conference on Software Engineering (ICSE'02), pages 241--251, 2002. Google ScholarDigital Library
- Eclipse Project. http://www.eclipse.org/.Google Scholar
- P. Graham. Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pages 121--129. O'Reilly Media, 2004.Google Scholar
- L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by the dempster--shafer belief networks. In Proc. of 18th IEEE International Conference on Automated Software Engineering (ASE'03), pages 249--252, 2003.Google ScholarDigital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. on Software Engineering, 28(7):654--670, 2002. Google ScholarDigital Library
- T. M. Khoshgoftaar and E. B. Allen. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineeering, 6(4):303--317, 1999.Google ScholarCross Ref
- T. M. Khoshgoftaar and E. B. Allen. Controlling overfitting in classification tree models of software quality. Empirical Software Engineering, 6(1):59--79, 2001. Google ScholarDigital Library
- T. M. Khoshgoftaar, E. B. Allen, and J. Deng. Using regressin trees to classify fault-prone software modules. IEEE Transactions on Reliability, 51(4):455--462, 2002.Google ScholarCross Ref
- T. M. Khoshgoftaar and N. Seliya. Software quality classification modeling using SPRINT decision tree algorithm. In Proc. of 14th International Conference on Tools with Artificial Intelligence, pages 365--374, 2002. Google ScholarDigital Library
- T. M. Khoshgoftaar and N. Seliya. Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering, 9:229--257, 2004. Google ScholarDigital Library
- T. M. Khoshgoftaar, R. Shan, and E. B. Allen. Using product, process, and execution metrics to predict fault-prone software modules with classification trees. In Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE'00), pages 301--310, 2000. Google ScholarDigital Library
- T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 33(1):2--13, January 2007. Google ScholarDigital Library
- O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In Proc. of 2007 International Workshop on Mining Software Repositories (MSR2007), page 4, 2007. Google ScholarDigital Library
- NASA's Metrics Data Program. http://mdp.ivv.nasa.gov/.Google Scholar
- POPFile. http://popfile.sourceforge.net/.Google Scholar
- Postini Inc. Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news events/pr/pr120606.php.Google Scholar
- N. Seliya, T. M. Khoshgoftaar, and S. Zhong. Analyzing software quality with limited fault-proneness defect data. In Proc. of Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05), pages 89--98, 2005. Google ScholarDigital Library
- C. Siefkes, F. Assis, S. Chhabra, and W. S. Yerazunis. Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proc. of Conference on Machine Learning (ECML) / European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004. Google ScholarDigital Library
- J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? (on fridays.). In Proc. of Mining Software Repository 2005, pages 24--28, 2005. Google ScholarDigital Library
- SpamAssassin. http://spamassassin.apache.org/index.html.Google Scholar
- C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in software engineering: An introduction. Kluwer Academic Publishers, 2000. Google ScholarDigital Library
Index Terms
- Training on errors experiment to detect fault-prone software modules by spam filter
Recommendations
An extension of fault-prone filtering using precise training and a dynamic threshold
MSR '08: Proceedings of the 2008 international working conference on Mining software repositoriesFault-prone module detection in source code is important for assurance of software quality. Most previous fault-prone detection approaches have been based on software metrics. Such approaches, however, have difficulties in collecting the metrics and in ...
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator
This paper describes a novel approach for detecting fault-prone modules using a spam filtering technique. Fault-prone module detection in source code is important for the assurance of software quality. Most previous fault-prone detection approaches have ...
Predicting Fault-Prone Software Modules in Telephone Switches
An empirical study was carried out at Ericsson Telecom AB to investigate the relationship between several design metrics and the number of function test failure reports associated with software modules. A tool, ERIMET, was developed to analyze the ...
Comments