Article

Training on errors experiment to detect fault-prone software modules by spam filter

Authors:
Osamu Mizuno

Osaka University, Suita, Japan

Osaka University, Suita, Japan
View Profile

,
Tohru Kikuno

Osaka University, Suita, Japan

Osaka University, Suita, Japan
View Profile

ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineeringSeptember 2007Pages 405–414https://doi.org/10.1145/1287624.1287683

Published:07 September 2007Publication History

ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Pages 405–414

ABSTRACT

The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly. This paper describes the training on errors procedure to apply fault-prone filtering in practice. Since no pre-training is required, this procedure can be applied to actual development field immediately. In order to show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.

References

P. Bellini, I. Bruno, P. Nesi, and D. Rogai. Comparing fault-proneness estimation models. In Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05), pages 205--214, 2005. Google ScholarDigital Library
bogofilter. http://bogofilter.sourceforge.net/.Google Scholar
L. C. Briand, W. L. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 28(7):706--720, 2002. Google ScholarDigital Library
S. Chhabra, W. S. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In Proc. of Fourth IEEE International Conference on Data Mining (ICDM 2004), pages 347--350, 2004. Google ScholarDigital Library
CRM114 -- the Controllable Regex Mutilator. http://crm114.sourceforge.net/.Google Scholar
G. Denaro and M. Pezze. An empirical evaluation of fault-proneness models. In Proc. of 24th International Conference on Software Engineering (ICSE'02), pages 241--251, 2002. Google ScholarDigital Library
Eclipse Project. http://www.eclipse.org/.Google Scholar
P. Graham. Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pages 121--129. O'Reilly Media, 2004.Google Scholar
L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by the dempster--shafer belief networks. In Proc. of 18th IEEE International Conference on Automated Software Engineering (ASE'03), pages 249--252, 2003.Google ScholarDigital Library
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. on Software Engineering, 28(7):654--670, 2002. Google ScholarDigital Library
T. M. Khoshgoftaar and E. B. Allen. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineeering, 6(4):303--317, 1999.Google ScholarCross Ref
T. M. Khoshgoftaar and E. B. Allen. Controlling overfitting in classification tree models of software quality. Empirical Software Engineering, 6(1):59--79, 2001. Google ScholarDigital Library
T. M. Khoshgoftaar, E. B. Allen, and J. Deng. Using regressin trees to classify fault-prone software modules. IEEE Transactions on Reliability, 51(4):455--462, 2002.Google ScholarCross Ref
T. M. Khoshgoftaar and N. Seliya. Software quality classification modeling using SPRINT decision tree algorithm. In Proc. of 14th International Conference on Tools with Artificial Intelligence, pages 365--374, 2002. Google ScholarDigital Library
T. M. Khoshgoftaar and N. Seliya. Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering, 9:229--257, 2004. Google ScholarDigital Library
T. M. Khoshgoftaar, R. Shan, and E. B. Allen. Using product, process, and execution metrics to predict fault-prone software modules with classification trees. In Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE'00), pages 301--310, 2000. Google ScholarDigital Library
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 33(1):2--13, January 2007. Google ScholarDigital Library
O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In Proc. of 2007 International Workshop on Mining Software Repositories (MSR2007), page 4, 2007. Google ScholarDigital Library
NASA's Metrics Data Program. http://mdp.ivv.nasa.gov/.Google Scholar
POPFile. http://popfile.sourceforge.net/.Google Scholar
Postini Inc. Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news events/pr/pr120606.php.Google Scholar
N. Seliya, T. M. Khoshgoftaar, and S. Zhong. Analyzing software quality with limited fault-proneness defect data. In Proc. of Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05), pages 89--98, 2005. Google ScholarDigital Library
C. Siefkes, F. Assis, S. Chhabra, and W. S. Yerazunis. Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proc. of Conference on Machine Learning (ECML) / European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004. Google ScholarDigital Library
J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? (on fridays.). In Proc. of Mining Software Repository 2005, pages 24--28, 2005. Google ScholarDigital Library
SpamAssassin. http://spamassassin.apache.org/index.html.Google Scholar
C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in software engineering: An introduction. Kluwer Academic Publishers, 2000. Google ScholarDigital Library

Index Terms

Training on errors experiment to detect fault-prone software modules by spam filter
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

An extension of fault-prone filtering using precise training and a dynamic threshold
MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

Fault-prone module detection in source code is important for assurance of software quality. Most previous fault-prone detection approaches have been based on software metrics. Such approaches, however, have difficulties in collecting the metrics and in ...
Read More
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

This paper describes a novel approach for detecting fault-prone modules using a spam filtering technique. Fault-prone module detection in source code is important for the assurance of software quality. Most previous fault-prone detection approaches have ...
Read More
Predicting Fault-Prone Software Modules in Telephone Switches

An empirical study was carried out at Ericsson Telecom AB to investigate the relationship between several design metrics and the number of function test failure reports associated with software modules. A tool, ERIMET, was developed to analyze the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
September 2007
638 pages
ISBN:9781595938114
DOI:10.1145/1287624
General Chair:
Ivica Crnkovic
Mälardalen University, Sweden
,
Program Chair:
Antonia Bertolino
ISTI-CNR, Italy
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fault-prone modules
spam filter
text mining
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 530
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Training on errors experiment to detect fault-prone software modules by spam filter

ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An extension of fault-prone filtering using precise training and a dynamic threshold

Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

Predicting Fault-Prone Software Modules in Telephone Switches