Skip to main content

Purity Filtering: An Instance Selection Method for Support Vector Machines

  • Conference paper
  • First Online:
Artificial Intelligence XXXVI (SGAI 2019)

Abstract

Support Vector Machines can achieve levels of accuracy comparable to those achieved by Artificial Neural Networks, but they are also slower to train. In this paper a new algorithm, called Purity Filtering, is presented, designed to filter training data for binary classification SVMs, in order to choose an approximation of the data subset that is more relevant to the training process.

The proposed algorithm is parametrized so to allow a regulation of both spatial and temporal complexity, adapting to the needs and possibilities of each execution environment. A user-specified parameter, the purity, is used to indirectly regulate the number of filtered data, even though the algorithm has also been adapted to let the user directly specify the number of filtered data. Using this algorithm with real datasets, reductions up to 75% of training data (using only 25% of the data samples to train) were achieved with no major loss on the quality of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All the experiments were done in an Intel Core i7 2.8 GHz machine with 16 GB memory.

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  2. Amami, R., Ayed, D.B., Ellouze, N.: Practical selection of SVM supervised parameters with different feature representations for vowel recognition. arXiv preprint arXiv:1507.06020 (2015)

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  4. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)

    Google Scholar 

  5. Campbell, C., Ying, Y.: Learning with support vector machines. Synth. Lect. Artif. Intell. Mach. Learn. 5(1), 1–95 (2011)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  7. Jenssen, R., Principe, J.C., Erdogmus, D., Eltoft, T.: The Cauchy-Schwarz divergence and Parzen windowing: connections to graph theory and mercer kernels. J. Frankl. Inst. 343(6), 614–629 (2006)

    Article  MathSciNet  Google Scholar 

  8. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. no. 98th8468), pp. 41–48. IEEE (1999)

    Google Scholar 

  9. Neto, A.R., Barreto, G.A.: Opposite maps: vector quantization algorithms for building reduced-set SVM and LSSVM classifiers. Neural Process. Lett. 37(1), 3–19 (2013)

    Article  Google Scholar 

  10. Nguyen, D., Ho, T.: An efficient method for simplifying support vector machines. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 617–624. ACM (2005)

    Google Scholar 

  11. Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)

    Article  Google Scholar 

  12. Pedreira, C.E.: Learning vector quantization with training data selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 157–162 (2005)

    Article  MathSciNet  Google Scholar 

  13. Peres, R.T., Pedreira, C.E.: Generalized risk zone: selecting observations for classification. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1331–1337 (2008)

    Article  Google Scholar 

  14. Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report, Microsoft (1998)

    Google Scholar 

  15. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)

    Article  Google Scholar 

  16. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  17. Shashua, A.: On the relationship between the support vector machine for classification and sparsified Fisher’s linear discriminant. Neural Process. Lett. 9(2), 129–139 (1999)

    Article  Google Scholar 

  18. Tang, B., Mazzoni, D.: Multiclass reduced-set support vector machines. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 921–928. ACM (2006)

    Google Scholar 

  19. UCI: UCI ML repository: covertype data set. archive.ics.uci.edu/ml/datasets/covertype. Accessed 10 Sept 2019

  20. Yerukala, R., Boiroju, N.K.: Approximations to standard normal distribution function. Int. J. Sci. Eng. Res. 6(4), 515–518 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lluís A. Belanche-Muñoz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Morán-Pomés, D., Belanche-Muñoz, L.A. (2019). Purity Filtering: An Instance Selection Method for Support Vector Machines. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34885-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34884-7

  • Online ISBN: 978-3-030-34885-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics