AutoODC: Automated generation of orthogonal defect classifications

Huang, LiGuo; Ng, Vincent; Persing, Isaac; Chen, Mingrui; Li, Zeheng; Geng, Ruili; Tian, Jeff

doi:10.1007/s10515-014-0155-1

AutoODC: Automated generation of orthogonal defect classifications

Published: 03 June 2014

Volume 22, pages 3–46, (2015)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

LiGuo Huang¹,
Vincent Ng²,
Isaac Persing²,
Mingrui Chen¹,
Zeheng Li¹,
Ruili Geng¹ &
…
Jeff Tian¹

852 Accesses
26 Citations
Explore all metrics

Abstract

Orthogonal defect classification (ODC), the most influential framework for software defect classification and analysis, provides valuable in-process feedback to system development and maintenance. Conducting ODC classification on existing organizational defect reports is human-intensive and requires experts’ knowledge of both ODC and system domains. This paper presents AutoODC, an approach for automating ODC classification by casting it as a supervised text classification problem. Rather than merely applying the standard machine learning framework to this task, we seek to acquire a better ODC classification system by integrating experts’ ODC experience and domain knowledge into the learning process via proposing a novel relevance annotation framework. We have trained AutoODC using two state-of-the-art machine learning algorithms for text classification, Naive Bayes (NB) and support vector machine (SVM), and evaluated it on both an industrial defect report from the social network domain and a larger defect list extracted from a publicly accessible defect tracker of the open source system FileZilla. AutoODC is a promising approach: not only does it leverage minimal human effort beyond the human annotations typically required by standard machine learning approaches, but it achieves overall accuracies of 82.9 % (NB) and 80.7 % (SVM) on the industrial defect report, and accuracies of 77.5 % (NB) and 75.2 % (SVM) on the larger, more diversified open source defect list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Due to proprietary rules, we anonymize the industrial company by referring to it as “Company P” throughout this paper.
The definitions and taxonomy of ODC v5.2 attributes are accessible at http://researcher.watson.ibm.com/researcher/files/us-pasanth/ODC-5-2.pdf.
Elgg is an open source social networking engine. The defect (issue) tracker of Elgg can be accessed at https://github.com/Elgg/Elgg/issues.
Other stemmers, such as the Porter stemmer (Porter 1980), can be used, but we found that the WordNet stemmer yields slightly better accuracy.
FileZilla is a free FTP solution composed of three subsystems: FileZilla Client, FileZilla Server, and Other. The defect tracker for the three subsystems of FileZilla is accessible at http://trac.filezilla-project.org/wiki/Queries.
To train a multi-class SVM classifier, we use \(SVM^{multiclass}\) (Tsochantaridis et al. 2004). To train a multi-class NB classifier, we use the implementation in Weka.

References

Ahsan, S.N., Ferzund, J., Wotawa, F.: Automatic classification of software change request using multi-label machine learning methods. In: Proceedings of the 33rd IEEE Software Engineering, Workshop, pp. 79–86 (2009)
Aizawa, A.: Linguistic techniques to improve the performance of automatic text categorization. In: Proceedings of NLPRS-01, 6th Natural Language Processing Pacific Rim Symposium, pp. 307–314 (2001)
Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 95–104 (2010)
Bellucci, S., Portaluri, B.: Automatic calculation of orthogonal defect classification (odc) fields (2012). https://www.google.com/patents/US8214798. US Patent 8,214,798
Bridge, N., Miller, C.: Orthogonal defect classification: using defect data to improve software development. Softw. Qual. 3(1), 1–8 (1998)
Google Scholar
Caropreso, M., Matwin, S., Sebastiani, F.: A learner independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Chin, A.G. (ed.) Text Databases and Document Management, Theory and Practice, pp. 78–102. Idea Group Publishing, Hershey (2001)
Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. In: SIGKDD Exploration Newsletter, pp. 1–6 (2004)
Chillarege, R.: Orthogonal defect classification. In: Lyu, M. (ed.) Handbook of Software Reliability Engineering, pp. 359–400. McGraw-Hill, New York (1995)
Google Scholar
Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Wong, M.Y.: Orthogonal defect classification-a concept for in-process measurements. IEEE Trans. Softw. Eng. 18(11), 943–956 (1992)
Article Google Scholar
Chillarege, R., Biyani, S.: Identifying risk using odc based growth models. In: Proceedings of the 5th International Symposium on Software, Reliability Engineering, pp. 282–288 (1994)
Cubranic, D., Murphy, G.C.: Automatic bug triage using text categorization. In: Proceedings of the 6th International Conference on Software Engineering and Knowledge, Engineering, pp. 92–97 (2004)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: an industrial case study. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 11–20 (2010)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Article Google Scholar
Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd International Conference on Software Engineering, pp. 155–164 (2010)
Hussain, I., Ormandjieva, O., Kosseim, L.: Automatic quality assessment of srs text by means of a decision-tree-based text classifier. In: Proceedings of the 7th International Conference on Quality Software, pp. 209–218 (2007)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142. Springer, Berlin (1998)
Kiekel, P., Cooke, N., Foltz, P., Gorman, J., Martin, M.: Some promising results of communication-based automatic measures of team cognition. In: Proceedings of Human Factors and Ergonomics Society: 46th Annual Meeting, pp. 298–302 (2002)
Ko, A., Myers, B.: A linguistic analysis of how people describe software problems. In: IEEE Symposium on Visual Languages and Human-Centric, Computing, pp. 127–134 (2006)
Lamkanfi, A., Demeyer, S., Giger, E., Goethals, B.: Predicting the severity of a reported bug. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, pp. 1–10 (2010)
Lin, Z., Ng, H.T., Kan, M.Y.: A pdtb-styled end-to-end discourse parser. Nat. Lang. Eng. 20, 151–184 (2014)
Article Google Scholar
Lutz, R., Mikulski, C.: Empirical analysis of safety-critical anomalies during operations. IEEE Trans. Softw. Eng. 30(3), 172–180 (2004)
Article Google Scholar
Lutz, R., Mikulski, C.: Ongoing requirements discovery in high integrity systems. IEEE Softw. 21(2), 19–25 (2004)
Article Google Scholar
Ma, L., Tian, J.: Analyzing errors and referral pairs to characterize common problems and improve web reliability. In: Proceedings of the 3rd International Conference on Web, Engineering, pp. 314–323 (2003)
Ma, L., Tian, J.: Web error classification and analysis for reliability improvement. J. Syst. Softw. 80(6), 795–804 (2007)
Article Google Scholar
Mays, R., Jones, C., Holloway, G., Stundisky, D.: Experiences with defects prevention process. IBM Syst. J. 29(1), 4–32 (1990)
Article Google Scholar
Menzies, T., Lutz, R., Mikulski, C.: Better analysis of defect data at NASA. In: Proceedings of the 5th International Conference on Software Engineering and Knowledge, Engineering, pp. 607–611 (2003)
Menzies, T., Marcus, A.: Automated severity assessment of software defect reports. In: Proceedings of the International Conference on Software, Maintenance, pp. 346–355 (2008)
Ormandjieva, O., Kosseim, L., Hussain, I.: Toward a text classification system for the quality assessment of software requirements written in natural language. In: Proceedings of the 4th International Workshop on Software Quality Assurance, pp. 39–45 (2007)
Pandita, R., Xiao, X., Yang, W., Enck, W., Xie, T.: Whyper: towards automating risk assessment of mobile application. In: Proceedings of 22nd USENIX Security Symposium, pp. 527–542 (2013)
Polpinij, J., Ghose, A.: An automatic elaborate requirement specification by using hierarchical text classification. In: Proceedings of the 2008 International Conference on Computer Science and Software Engineering, pp. 706–709 (2008)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumption of naive bayes text classifiers. In: Proceedings of International Conference on Machine Learning, pp. 616–623 (2003)
Romano, D., Pinzger, M.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Sebastiani, F.: Text categorization. In: Zanasi, A. (ed.) Texting Mining and Its Applications, pp. 109–129. MIT Press, Cambridge (2005)
Google Scholar
Swigger, K., Brazile, R., Dafoulas, G., Serce, F.C., Alpaslan, F.N., Lopez, V.: Using content and text classification methods to characterize team performance. In: Proceedings of the 5th International Conference on Global, Software Engineering, pp. 192–200 (2010)
Tamrawi, A., Nguyen, T.T., AI-Kofahi, J., Nguyen, T.N.: Fuzzy set-based automatic bug triaging. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 884–887 (2011)
Thung, F., Lo, D., Jiang, L.: Automatic defect categorization. In: Proceedings of 19th Working Conference on Reverse Engineering, pp. 205–214 (2012)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2001)
Google Scholar
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st International Conference on Machine Learning, pp. 104–112 (2004)
Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)
Book MATH Google Scholar
Yang, C., Hou, C., Kao, W., Chen, I.: An empirical study on improving severity prediction of defect reports using feature selection. In: Proceedings of the 19th Asia-Pacific, Software Engineering Conference, pp. 240–249 (2012)
Zheng, J., Williams, L., Nagappan, N., Hudpohl, J.: On the value of static analysis tools for fault detection. IEEE Trans. Softw. Eng. 32(44), 240–253 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Southern Methodist University, Dallas, TX, USA
LiGuo Huang, Mingrui Chen, Zeheng Li, Ruili Geng & Jeff Tian
University of Texas at Dallas, Richardson, TX, USA
Vincent Ng & Isaac Persing

Authors

LiGuo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Ng
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Persing
View author publications
You can also search for this author in PubMed Google Scholar
Mingrui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zeheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruili Geng
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeheng Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, L., Ng, V., Persing, I. et al. AutoODC: Automated generation of orthogonal defect classifications. Autom Softw Eng 22, 3–46 (2015). https://doi.org/10.1007/s10515-014-0155-1

Download citation

Received: 08 October 2013
Accepted: 03 May 2014
Published: 03 June 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10515-014-0155-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AutoODC: Automated generation of orthogonal defect classifications

Abstract

Access this article

Similar content being viewed by others

Improving Defect Localization by Classifying the Affected Asset Using Machine Learning

Predicting software defect type using concept-based classification

A Machine Learning Approach to Predict Software Faults

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AutoODC: Automated generation of orthogonal defect classifications

Abstract

Access this article

Similar content being viewed by others

Improving Defect Localization by Classifying the Affected Asset Using Machine Learning

Predicting software defect type using concept-based classification

A Machine Learning Approach to Predict Software Faults

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation