ABSTRACT
Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from over-fitting. In this paper we propose a new type of distance-based classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance. Then we considered only the top portion of the ranking that leads the classifier to reach the maximum accuracy. We have experimented on a large collection of datasets from UCI archive with different proximity measures and different metrics of itemsets ranking.
We show that our method has many benefits: it reduces the number of distance computations, improves the classification accuracy of state-of-the art classifiers, like decision trees, SVM, k-nn, Naive Bayes, rule-based classifiers and association rule-based ones and outperforms the competitors especially on noise data.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB'94. Google ScholarDigital Library
- D. Aha and D. Kibler. Instance-based learning algorithms. Machine Learning, 6: 37--66, 1991. Google ScholarDigital Library
- B. Bigi. Using K-L distance for text categorization. Advances in Information Retrieval, 2633: 76, 2003. Google ScholarDigital Library
- Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. ICDE, 0: 716--725, 2007.Google Scholar
- W. Cohen. Fast effective rule induction. Proc. Int. Conf. Machine Learning, pages 115--123, 1995.Google ScholarCross Ref
- T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13: 21--27, 1967.Google ScholarDigital Library
- Pedro Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24(2): 141--168, 1996. Google ScholarDigital Library
- H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng., 18(6): 721--737, 2006. Google ScholarDigital Library
- Usama M. Fayyad and Keki B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. Proc. IJCAI'93, pp. 1022--1027.Google Scholar
- S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation, 13(3): 637--649, 2001. Google ScholarDigital Library
- Ron Kohavi. The power of decision tables. In Proc. ECML'95, LNAI 914, pp. 174--189, Springer Verlag. Google ScholarDigital Library
- S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22: 79--86, 1951.Google ScholarCross Ref
- Wenmin Li, Jiawei Han, and Jian Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM, Int. Conf. Data Mining, pages 369--376, 2001. Google ScholarDigital Library
- Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 80--86, 1998.Google Scholar
- R. Meo. Theory of dependence values. ACM TODS, 45(3), 2000. Google ScholarDigital Library
- Dimitris Meretakis and Beat Wüthrich. Extending Naïve Bayes classifiers using long itemsets. In Proc. KDD'99, pages 165--174, 1999. Google ScholarDigital Library
- R. F. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1--6): 579--589, 1991.Google Scholar
- T. Steinbach and Kumar. Introduction to Data Mining. Pearson education, 2006.Google Scholar
- D. Randall Wilson and Tony R. Martinez. Reduction techniques for instance-based learning algorithms. Mach. Learn., 38(3): 257--286, 2000. Google ScholarDigital Library
Index Terms
- A novel distance-based classifier built on pattern ranking
Recommendations
Double-layer bayesian classifier ensembles based on frequent itemsets
Numerous models have been proposed to reduce the classification error of Naïve Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of ...
An efficient ensemble classification method based on novel classifier selection technique
WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and SemanticsIndividual classification models have recently been challenged by ensemble of classifiers, also known as multiple classifier system, which often shows better classification accuracy. In terms of merging the outputs of an ensemble of classifiers, ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Comments