Skip to main content

Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11508))

Included in the following conference series:

Abstract

The k nearest neighbour method (kNN) can be used not only on an entire data set, but also after a selection of instances is performed. Selection of instances should select prototypes which well represent the knowledge about a given problem. We propose a new algorithm of prototype selection. The algorithm is based on selection of instances which represent the borders between classes and additionally they are trustworthy instances. Moreover, our algorithm was optimized with a forest of dedicated locality sensitive hashing (LSH) trees to speed up the prototype selection and the classification process. The algorithm’s final expected complexity is \(O(m\log m)\). Additionally, results show that the new algorithm lays ground for accurate classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Inst. Electr. Electron. Eng. Trans. Inf. Theory 13(1), 21–27 (1967)

    MATH  Google Scholar 

  2. Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)

    Article  MathSciNet  Google Scholar 

  3. Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18(3), 431–433 (1972)

    Article  Google Scholar 

  4. Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II. Results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_87

    Chapter  MATH  Google Scholar 

  5. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  6. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  Google Scholar 

  7. Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_90

    Chapter  Google Scholar 

  8. Blachnik, M.: Metody bazujące na prototypach w zastosowaniu do eksploracji danych. Silesian Technical University (2019)

    Google Scholar 

  9. Kordos, M.: Optimization of evolutionary instance selection. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10245, pp. 359–369. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59063-9_32

    Chapter  Google Scholar 

  10. Arnaiz-González, Á., Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data. Knowl.-Based Syst. 107, 83–95 (2016)

    Article  Google Scholar 

  11. Sanchez, J., Pla, F., Ferri, F.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recognit. Lett. 18(6), 507–513 (1997)

    Article  Google Scholar 

  12. Garcia, S., Cano, J., Herrera, F.: A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit. 41(8), 2693–2709 (2008)

    Article  Google Scholar 

  13. Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, New Brunswick, NJ, USA, pp. 293–301 (1994)

    Google Scholar 

  14. Marchiori, E.: Hit miss networks with applications to instance selection. J. Mach. Learn. Res. 9, 997–1017 (2008)

    MathSciNet  MATH  Google Scholar 

  15. Marchiori, E.: Class conditional nearest neighbor for large margin instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 364–370 (2010)

    Article  Google Scholar 

  16. Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007)

    Article  Google Scholar 

  17. Brodley, C.: Recursive automatic bias selection for classifier construction. Mach. Learn. 20(1/2), 63–94 (1995)

    Article  Google Scholar 

  18. Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)

    Article  Google Scholar 

  19. Kuncheva, L.: Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit. Lett. 16(8), 809–814 (1995)

    Article  Google Scholar 

  20. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  21. Riquelme, J., Aguilar-Ruiz, J., Toro, M.: Finding representative patterns with ordered projections. Pattern Recognit. 36(4), 1009–1018 (2003)

    Article  Google Scholar 

  22. Barandela, R., Ferri, F., Sanchez, J.: Decision boundary preserving prototype selection for nearest neighbor classification. Int. J. Pattern Recognit. Artif. Intell. 19(6), 787–806 (2005)

    Article  Google Scholar 

  23. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)

    Article  Google Scholar 

  24. Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognit. 33(3), 521–528 (2000)

    Article  Google Scholar 

  25. Zhao, K., Zhou, S., Guan, J., Zhou, A.: C-pruner: an improved instance pruning algorithm. In: Proceedings of Second International Conference on Machine Learning and Cybernetics, Xi’an, China, pp. 94–99 (2003)

    Google Scholar 

  26. Lozano, M.T., Sánchez, J.S., Pla, F.: Using the geometrical distribution of prototypes for training set condensing. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.L. (eds.) TTIA 2003. LNCS, vol. 3040, pp. 618–627. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-25945-9_61

    Chapter  Google Scholar 

  27. Devi, V., Murty, M.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)

    Article  Google Scholar 

  28. Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)

    Article  MathSciNet  Google Scholar 

  29. Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)

    Google Scholar 

  30. Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, London (2006). https://doi.org/10.1007/978-1-84628-293-5

    Book  MATH  Google Scholar 

  31. Brown, R.: Building a balanced k-d tree in \(O(kn \log n\)) time. J. Comput. Graph. Tech. 4(1), 50–68 (2015)

    Google Scholar 

  32. Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8, 321–350 (2012)

    Article  MathSciNet  Google Scholar 

  33. Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan, pp. 651–660 (2005)

    Google Scholar 

  34. Merz, C.J., Murphy, P.M.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html

  35. Cameron-Jones, R.M.: Instance selection by encoding length heuristic with random mutation hill climbing. In: Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, Australia, pp. 99–106 (1995)

    Google Scholar 

  36. Loosli, G., Canu, S., Bottou, L.: Training invariant support vector machines using selective sampling. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines, pp. 301–320. MIT Press, Cambridge (2007)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Norbert Jankowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jankowski, N., Orliński, M. (2019). Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2019. Lecture Notes in Computer Science(), vol 11508. Springer, Cham. https://doi.org/10.1007/978-3-030-20912-4_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20912-4_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20911-7

  • Online ISBN: 978-3-030-20912-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics