Abstract
Classification is an important data mining problem. A desirable property of a classifier is noise tolerance. Emerging Patterns (EPs) are itemsets whose supports change significantly from one data class to another. In this paper, we first introduce Chi Emerging Patterns (Chi EPs), which are more resistant to noise than other kinds of EPs. We then use Chi EPs in a probabilistic approach for classification. The classifier, Bayesian Classification by Chi Emerging Patterns (BCCEP), can handle noise very well due to the inherent noise tolerance of the Bayesian approach and high quality patterns used in the probability approximation. The empirical study shows that our method is superior to other well-known classification methods such as NB, C4.5, SVM and JEP-C in terms of overall predictive accuracy, on “noisy” as well as “clean” benchmark datasets from the UCI Machine Learning Repository. Out of the 116 cases, BCCEP wins on 70 cases, NB wins on 30, C4.5 wins on 33, SVM wins on 32 and JEP-C wins on 21.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proc. 5th ACM SIGKDD, San Diego, CA, USA, pp. 43–52 (1999)
Dong, G., Zhang, X., Wong, L., Li, J.: Classification by aggregating emerging patterns. In: Proc. 2nd Int’l Conf. on Discovery Science (DS 1999), Tokyo, Japan, pp. 30–42 (1999)
Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. Knowledge and Information Systems 3, 131–145 (2001)
Sun, Q., Zhang, X., Ramamohanarao, K.: The noise tolerance of ep-based classifiers. In: Proc. 16th Australian Conf. on Artificial Intelligence, Perth, Australia, pp. 796–806 (2003)
Bethea, R.M., Duran, B.S., Boullion, T.L.: Statistical methods for engineers and scientists. M. Dekker, New York (1995)
Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: Dong, G., Tang, C., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 189–201. Springer, Heidelberg (2003)
Domingos, P., Pazzani, M.J.: Beyond independence: Conditions for the optimality of the simple bayesian classifier. In: Proc. 13th ICML, pp. 105–112 (1996)
Fan, H., Ramamohanarao, K.: A bayesian approach to use emerging patterns for classification. In: Proc. 14th Australasian Database Conference (ADC 2003), Adelaide, Australia, pp. 39–48 (2003)
Fan, H., Ramamohanarao, K.: Noise tolerant classification by chi emerging patterns. Technical report, Department of Computer Science and Software Engineering, University of Melbourne (2004)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using MLC++: A machine learning library in C++. International Journal on Artificial Intelligence Tools 6, 537–566 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, H., Ramamohanarao, K. (2004). Noise Tolerant Classification by Chi Emerging Patterns. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive