Abstract
The paper illustrates a data compression approach to classification, based on a stochastic gradient algorithm for the minimization of the average misclassification risk performed by a Labeled Vector Quantizer. The main properties of the approach can be summarized in terms of both the efficiency of the learning process, and the efficiency and accuracy of the classification process. The approach is compared with the strictly related nearest neighbor rule, and with two data reduction algorithms, SVM and IB2, on a set of real data experiments taken from the UCI repository.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D.W. Aha, D. Kibler, and M.K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991.
T.M. Cover and P.E. Hart. Nearest neighbor Pattern Classification. IEEE Trans. on Information Theory, 13(1):21–27, Jan. 1967.
C. Diamantini and M. Panti. An efficient and scalable data compression approach to classification. ACM SIGKDD Explorations, 2(2):54–60, 2000.
C. Diamantini and A. Spalvieri. Quantizing for Minimum Average Misclassification Risk. IEEE Trans. on Neural Networks, 9(1):174–182, Jan. 1998.
P. Domingos and M. Pazzani. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29:103–130, 1997.
J.H. Friedman. On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, 1(1):55–77, 1997.
A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.
P.E. Hart. The condensed nearest neighbor rule. IEEE Trans. Information Theory, 4:515–516, May 1968.
T. Joachims. Making Large-Scale SVM Learning Practical. In B. Scholkopf, C.J. Burges, A.J. Smola, editor, Advances in Kernel Methods-Support Vector Learning. MIT Press, 1999.
T. Kohonen. The self organizing map. Proc. of the IEEE, 78(9):1464–1480, Sept. 1990.
G.F. McLean. Vector quantization for texture classification. IEEE Trans. on Systems, Man, and Cybernetics, 23:1637–649, May. 1993.
F.J. Provost and V. Kolluri. A survey of methods for scaling up inductive learning algorithms. Tech. Rep. ISL-97-3, Intelligent System Lab., Dept. of Comp. Science, Univ. Pittsburg, 1997.
B. Schoelkopf, C. Burges, and V. Vapnik. Extracting Support Data for a Given Task. In U. Fayyad and R. Uthurusamy, editors, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Menlo Park, CA, 1995. AAAI Press.
N.A. Syed, H. Liu, and K.K. Sung. A Study of Support Vectors on Model Independent Example Selection. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 272–276, New York, 1999. ACM Press.
N.A. Syed, H. Liu, and K.K. Sung. Handling Concept Drift in Incremental Learning with Support Vector Machines. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 317–321, New York, 1999. ACM Press.
V. Vapnik. Statistical Learning Theory. J.Wiley and Sons, New York, 1998.
Q. Xie, C.A. Laszlo, and R.K. Ward. Vector quantization technique for non-parametric classifier design. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(12):1326–1330, Dec. 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Diamantini, C., Panti, M. (2001). An Efficient Data Compression Approach to the Classification Task. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_47
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive