Skip to main content

An Efficient Data Compression Approach to the Classification Task

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

  • 1307 Accesses

Abstract

The paper illustrates a data compression approach to classification, based on a stochastic gradient algorithm for the minimization of the average misclassification risk performed by a Labeled Vector Quantizer. The main properties of the approach can be summarized in terms of both the efficiency of the learning process, and the efficiency and accuracy of the classification process. The approach is compared with the strictly related nearest neighbor rule, and with two data reduction algorithms, SVM and IB2, on a set of real data experiments taken from the UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.W. Aha, D. Kibler, and M.K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37–66, 1991.

    Google Scholar 

  2. T.M. Cover and P.E. Hart. Nearest neighbor Pattern Classification. IEEE Trans. on Information Theory, 13(1):21–27, Jan. 1967.

    Article  MATH  Google Scholar 

  3. C. Diamantini and M. Panti. An efficient and scalable data compression approach to classification. ACM SIGKDD Explorations, 2(2):54–60, 2000.

    Article  Google Scholar 

  4. C. Diamantini and A. Spalvieri. Quantizing for Minimum Average Misclassification Risk. IEEE Trans. on Neural Networks, 9(1):174–182, Jan. 1998.

    Article  Google Scholar 

  5. P. Domingos and M. Pazzani. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29:103–130, 1997.

    Article  MATH  Google Scholar 

  6. J.H. Friedman. On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, 1(1):55–77, 1997.

    Article  Google Scholar 

  7. A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, 1992.

    Google Scholar 

  8. P.E. Hart. The condensed nearest neighbor rule. IEEE Trans. Information Theory, 4:515–516, May 1968.

    Article  Google Scholar 

  9. T. Joachims. Making Large-Scale SVM Learning Practical. In B. Scholkopf, C.J. Burges, A.J. Smola, editor, Advances in Kernel Methods-Support Vector Learning. MIT Press, 1999.

    Google Scholar 

  10. T. Kohonen. The self organizing map. Proc. of the IEEE, 78(9):1464–1480, Sept. 1990.

    Google Scholar 

  11. G.F. McLean. Vector quantization for texture classification. IEEE Trans. on Systems, Man, and Cybernetics, 23:1637–649, May. 1993.

    Article  Google Scholar 

  12. F.J. Provost and V. Kolluri. A survey of methods for scaling up inductive learning algorithms. Tech. Rep. ISL-97-3, Intelligent System Lab., Dept. of Comp. Science, Univ. Pittsburg, 1997.

    Google Scholar 

  13. B. Schoelkopf, C. Burges, and V. Vapnik. Extracting Support Data for a Given Task. In U. Fayyad and R. Uthurusamy, editors, Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, Menlo Park, CA, 1995. AAAI Press.

    Google Scholar 

  14. N.A. Syed, H. Liu, and K.K. Sung. A Study of Support Vectors on Model Independent Example Selection. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 272–276, New York, 1999. ACM Press.

    Google Scholar 

  15. N.A. Syed, H. Liu, and K.K. Sung. Handling Concept Drift in Incremental Learning with Support Vector Machines. In S. Chaudhuri and D. Madigan, editors, Proc. 5th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pages 317–321, New York, 1999. ACM Press.

    Google Scholar 

  16. V. Vapnik. Statistical Learning Theory. J.Wiley and Sons, New York, 1998.

    MATH  Google Scholar 

  17. Q. Xie, C.A. Laszlo, and R.K. Ward. Vector quantization technique for non-parametric classifier design. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(12):1326–1330, Dec. 1993.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Diamantini, C., Panti, M. (2001). An Efficient Data Compression Approach to the Classification Task. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics