An Efficient Data Compression Approach to the Classification Task

Diamantini, Claudia; Panti, Maurizio

doi:10.1007/3-540-45357-1_47

Claudia Diamantini⁴ &
Maurizio Panti⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1344 Accesses

Abstract

The paper illustrates a data compression approach to classification, based on a stochastic gradient algorithm for the minimization of the average misclassification risk performed by a Labeled Vector Quantizer. The main properties of the approach can be summarized in terms of both the efficiency of the learning process, and the efficiency and accuracy of the classification process. The approach is compared with the strictly related nearest neighbor rule, and with two data reduction algorithms, SVM and IB2, on a set of real data experiments taken from the UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Using low-discrepancy points for data compression in machine learning: an experimental comparison

Article Open access 03 January 2025

Applying Compression to Hierarchical Clustering

Data Compression

References

D.W. Aha, D. Kibler, and M.K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991.
Google Scholar
T.M. Cover and P.E. Hart. Nearest neighbor Pattern Classification. IEEE Trans. on Information Theory, 13(1):21–27, Jan. 1967.
Article MATH Google Scholar
C. Diamantini and M. Panti. An efficient and scalable data compression approach to classification. ACM SIGKDD Explorations, 2(2):54–60, 2000.
Article Google Scholar
C. Diamantini and A. Spalvieri. Quantizing for Minimum Average Misclassification Risk. IEEE Trans. on Neural Networks, 9(1):174–182, Jan. 1998.
Article Google Scholar
P. Domingos and M. Pazzani. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29:103–130, 1997.
Article MATH Google Scholar
J.H. Friedman. On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, 1(1):55–77, 1997.
Article Google Scholar
A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.
Google Scholar
P.E. Hart. The condensed nearest neighbor rule. IEEE Trans. Information Theory, 4:515–516, May 1968.
Article Google Scholar
T. Joachims. Making Large-Scale SVM Learning Practical. In B. Scholkopf, C.J. Burges, A.J. Smola, editor, Advances in Kernel Methods-Support Vector Learning. MIT Press, 1999.
Google Scholar
T. Kohonen. The self organizing map. Proc. of the IEEE, 78(9):1464–1480, Sept. 1990.
Google Scholar
G.F. McLean. Vector quantization for texture classification. IEEE Trans. on Systems, Man, and Cybernetics, 23:1637–649, May. 1993.
Article Google Scholar
F.J. Provost and V. Kolluri. A survey of methods for scaling up inductive learning algorithms. Tech. Rep. ISL-97-3, Intelligent System Lab., Dept. of Comp. Science, Univ. Pittsburg, 1997.
Google Scholar
B. Schoelkopf, C. Burges, and V. Vapnik. Extracting Support Data for a Given Task. In U. Fayyad and R. Uthurusamy, editors, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Menlo Park, CA, 1995. AAAI Press.
Google Scholar
N.A. Syed, H. Liu, and K.K. Sung. A Study of Support Vectors on Model Independent Example Selection. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 272–276, New York, 1999. ACM Press.
Google Scholar
N.A. Syed, H. Liu, and K.K. Sung. Handling Concept Drift in Incremental Learning with Support Vector Machines. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 317–321, New York, 1999. ACM Press.
Google Scholar
V. Vapnik. Statistical Learning Theory. J.Wiley and Sons, New York, 1998.
MATH Google Scholar
Q. Xie, C.A. Laszlo, and R.K. Ward. Vector quantization technique for non-parametric classifier design. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(12):1326–1330, Dec. 1993.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Institute, University of Ancona, via Brecce Bianche, 60131, Ancona, Italy
Claudia Diamantini & Maurizio Panti

Authors

Claudia Diamantini
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Panti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Diamantini, C., Panti, M. (2001). An Efficient Data Compression Approach to the Classification Task. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_47

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_47
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics