research-article

A novel distance-based classifier built on pattern ranking

Authors:
Dipankar Bachar

Università degli Studi di Torino, Italy

Università degli Studi di Torino, Italy
View Profile

,
Rosa Meo

Università degli Studi di Torino, Italy

Università degli Studi di Torino, Italy
View Profile

SAC '09: Proceedings of the 2009 ACM symposium on Applied ComputingMarch 2009Pages 1427–1432https://doi.org/10.1145/1529282.1529602

Published:08 March 2009Publication History

SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

Pages 1427–1432

ABSTRACT

Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from over-fitting. In this paper we propose a new type of distance-based classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance. Then we considered only the top portion of the ranking that leads the classifier to reach the maximum accuracy. We have experimented on a large collection of datasets from UCI archive with different proximity measures and different metrics of itemsets ranking.

We show that our method has many benefits: it reduces the number of distance computations, improves the classification accuracy of state-of-the art classifiers, like decision trees, SVM, k-nn, Naive Bayes, rule-based classifiers and association rule-based ones and outperforms the competitors especially on noise data.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB'94. Google ScholarDigital Library
D. Aha and D. Kibler. Instance-based learning algorithms. Machine Learning, 6: 37--66, 1991. Google ScholarDigital Library
B. Bigi. Using K-L distance for text categorization. Advances in Information Retrieval, 2633: 76, 2003. Google ScholarDigital Library
Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. ICDE, 0: 716--725, 2007.Google Scholar
W. Cohen. Fast effective rule induction. Proc. Int. Conf. Machine Learning, pages 115--123, 1995.Google ScholarCross Ref
T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13: 21--27, 1967.Google ScholarDigital Library
Pedro Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24(2): 141--168, 1996. Google ScholarDigital Library
H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng., 18(6): 721--737, 2006. Google ScholarDigital Library
Usama M. Fayyad and Keki B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. Proc. IJCAI'93, pp. 1022--1027.Google Scholar
S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation, 13(3): 637--649, 2001. Google ScholarDigital Library
Ron Kohavi. The power of decision tables. In Proc. ECML'95, LNAI 914, pp. 174--189, Springer Verlag. Google ScholarDigital Library
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22: 79--86, 1951.Google ScholarCross Ref
Wenmin Li, Jiawei Han, and Jian Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM, Int. Conf. Data Mining, pages 369--376, 2001. Google ScholarDigital Library
Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 80--86, 1998.Google Scholar
R. Meo. Theory of dependence values. ACM TODS, 45(3), 2000. Google ScholarDigital Library
Dimitris Meretakis and Beat Wüthrich. Extending Naïve Bayes classifiers using long itemsets. In Proc. KDD'99, pages 165--174, 1999. Google ScholarDigital Library
R. F. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1--6): 579--589, 1991.Google Scholar
T. Steinbach and Kumar. Introduction to Data Mining. Pearson education, 2006.Google Scholar
D. Randall Wilson and Tony R. Martinez. Reduction techniques for instance-based learning algorithms. Mach. Learn., 38(3): 257--286, 2000. Google ScholarDigital Library

Index Terms

A novel distance-based classifier built on pattern ranking
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications

Recommendations

Double-layer bayesian classifier ensembles based on frequent itemsets

Numerous models have been proposed to reduce the classification error of Naïve Bayes by weakening its attribute independence assumption and some have demonstrated remarkable error performance. Considering that ensemble learning is an effective method of ...
Read More
An efficient ensemble classification method based on novel classifier selection technique
WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics

Individual classification models have recently been challenged by ensemble of classifiers, also known as multiple classifier system, which often shows better classification accuracy. In terms of merging the outputs of an ensemble of classifiers, ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing
March 2009
2347 pages
ISBN:9781605581668
DOI:10.1145/1529282
Conference Chairs:
Sung Y. Shin
South Dakota State University, United States
,
Sascha Ossowski
University Rey Juan Carlos, Spain
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frequent itemsets
instance-base learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A novel distance-based classifier built on pattern ranking

SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Double-layer bayesian classifier ensembles based on frequent itemsets

An efficient ensemble classification method based on novel classifier selection technique

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values