skip to main content
10.1145/1287624.1287683acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Training on errors experiment to detect fault-prone software modules by spam filter

Published:07 September 2007Publication History

ABSTRACT

The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly. This paper describes the training on errors procedure to apply fault-prone filtering in practice. Since no pre-training is required, this procedure can be applied to actual development field immediately. In order to show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.

References

  1. P. Bellini, I. Bruno, P. Nesi, and D. Rogai. Comparing fault-proneness estimation models. In Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05), pages 205--214, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. bogofilter. http://bogofilter.sourceforge.net/.Google ScholarGoogle Scholar
  3. L. C. Briand, W. L. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 28(7):706--720, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chhabra, W. S. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In Proc. of Fourth IEEE International Conference on Data Mining (ICDM 2004), pages 347--350, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CRM114 -- the Controllable Regex Mutilator. http://crm114.sourceforge.net/.Google ScholarGoogle Scholar
  6. G. Denaro and M. Pezze. An empirical evaluation of fault-proneness models. In Proc. of 24th International Conference on Software Engineering (ICSE'02), pages 241--251, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eclipse Project. http://www.eclipse.org/.Google ScholarGoogle Scholar
  8. P. Graham. Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pages 121--129. O'Reilly Media, 2004.Google ScholarGoogle Scholar
  9. L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by the dempster--shafer belief networks. In Proc. of 18th IEEE International Conference on Automated Software Engineering (ASE'03), pages 249--252, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. on Software Engineering, 28(7):654--670, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Khoshgoftaar and E. B. Allen. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineeering, 6(4):303--317, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. M. Khoshgoftaar and E. B. Allen. Controlling overfitting in classification tree models of software quality. Empirical Software Engineering, 6(1):59--79, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. M. Khoshgoftaar, E. B. Allen, and J. Deng. Using regressin trees to classify fault-prone software modules. IEEE Transactions on Reliability, 51(4):455--462, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. M. Khoshgoftaar and N. Seliya. Software quality classification modeling using SPRINT decision tree algorithm. In Proc. of 14th International Conference on Tools with Artificial Intelligence, pages 365--374, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. M. Khoshgoftaar and N. Seliya. Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering, 9:229--257, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. M. Khoshgoftaar, R. Shan, and E. B. Allen. Using product, process, and execution metrics to predict fault-prone software modules with classification trees. In Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE'00), pages 301--310, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 33(1):2--13, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In Proc. of 2007 International Workshop on Mining Software Repositories (MSR2007), page 4, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. NASA's Metrics Data Program. http://mdp.ivv.nasa.gov/.Google ScholarGoogle Scholar
  20. POPFile. http://popfile.sourceforge.net/.Google ScholarGoogle Scholar
  21. Postini Inc. Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news events/pr/pr120606.php.Google ScholarGoogle Scholar
  22. N. Seliya, T. M. Khoshgoftaar, and S. Zhong. Analyzing software quality with limited fault-proneness defect data. In Proc. of Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05), pages 89--98, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Siefkes, F. Assis, S. Chhabra, and W. S. Yerazunis. Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proc. of Conference on Machine Learning (ECML) / European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? (on fridays.). In Proc. of Mining Software Repository 2005, pages 24--28, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SpamAssassin. http://spamassassin.apache.org/index.html.Google ScholarGoogle Scholar
  26. C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in software engineering: An introduction. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Training on errors experiment to detect fault-prone software modules by spam filter

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
        September 2007
        638 pages
        ISBN:9781595938114
        DOI:10.1145/1287624

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 September 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader