Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes

Jankowski, Norbert; Orliński, Marek

doi:10.1007/978-3-030-20912-4_53

Norbert Jankowski²⁰ &
Marek Orliński²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11508))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1571 Accesses
1 Citations

Abstract

The k nearest neighbour method (kNN) can be used not only on an entire data set, but also after a selection of instances is performed. Selection of instances should select prototypes which well represent the knowledge about a given problem. We propose a new algorithm of prototype selection. The algorithm is based on selection of instances which represent the borders between classes and additionally they are trustworthy instances. Moreover, our algorithm was optimized with a forest of dedicated locality sensitive hashing (LSH) trees to speed up the prototype selection and the classification process. The algorithm’s final expected complexity is \(O(m\log m)\). Additionally, results show that the new algorithm lays ground for accurate classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Inst. Electr. Electron. Eng. Trans. Inf. Theory 13(1), 21–27 (1967)
MATH Google Scholar
Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)
Article MathSciNet Google Scholar
Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18(3), 431–433 (1972)
Article Google Scholar
Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II. Results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_87
Chapter MATH Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article Google Scholar
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_90
Chapter Google Scholar
Blachnik, M.: Metody bazujące na prototypach w zastosowaniu do eksploracji danych. Silesian Technical University (2019)
Google Scholar
Kordos, M.: Optimization of evolutionary instance selection. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10245, pp. 359–369. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59063-9_32
Chapter Google Scholar
Arnaiz-González, Á., Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data. Knowl.-Based Syst. 107, 83–95 (2016)
Article Google Scholar
Sanchez, J., Pla, F., Ferri, F.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recognit. Lett. 18(6), 507–513 (1997)
Article Google Scholar
Garcia, S., Cano, J., Herrera, F.: A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit. 41(8), 2693–2709 (2008)
Article Google Scholar
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, New Brunswick, NJ, USA, pp. 293–301 (1994)
Google Scholar
Marchiori, E.: Hit miss networks with applications to instance selection. J. Mach. Learn. Res. 9, 997–1017 (2008)
MathSciNet MATH Google Scholar
Marchiori, E.: Class conditional nearest neighbor for large margin instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 364–370 (2010)
Article Google Scholar
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007)
Article Google Scholar
Brodley, C.: Recursive automatic bias selection for classifier construction. Mach. Learn. 20(1/2), 63–94 (1995)
Article Google Scholar
Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)
Article Google Scholar
Kuncheva, L.: Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit. Lett. 16(8), 809–814 (1995)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Riquelme, J., Aguilar-Ruiz, J., Toro, M.: Finding representative patterns with ordered projections. Pattern Recognit. 36(4), 1009–1018 (2003)
Article Google Scholar
Barandela, R., Ferri, F., Sanchez, J.: Decision boundary preserving prototype selection for nearest neighbor classification. Int. J. Pattern Recognit. Artif. Intell. 19(6), 787–806 (2005)
Article Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Article Google Scholar
Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognit. 33(3), 521–528 (2000)
Article Google Scholar
Zhao, K., Zhou, S., Guan, J., Zhou, A.: C-pruner: an improved instance pruning algorithm. In: Proceedings of Second International Conference on Machine Learning and Cybernetics, Xi’an, China, pp. 94–99 (2003)
Google Scholar
Lozano, M.T., Sánchez, J.S., Pla, F.: Using the geometrical distribution of prototypes for training set condensing. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.L. (eds.) TTIA 2003. LNCS, vol. 3040, pp. 618–627. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-25945-9_61
Chapter Google Scholar
Devi, V., Murty, M.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)
Article Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)
Article MathSciNet Google Scholar
Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)
Google Scholar
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, London (2006). https://doi.org/10.1007/978-1-84628-293-5
Book MATH Google Scholar
Brown, R.: Building a balanced k-d tree in \(O(kn \log n\)) time. J. Comput. Graph. Tech. 4(1), 50–68 (2015)
Google Scholar
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8, 321–350 (2012)
Article MathSciNet Google Scholar
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan, pp. 651–660 (2005)
Google Scholar
Merz, C.J., Murphy, P.M.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Cameron-Jones, R.M.: Instance selection by encoding length heuristic with random mutation hill climbing. In: Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, Australia, pp. 99–106 (1995)
Google Scholar
Loosli, G., Canu, S., Bottou, L.: Training invariant support vector machines using selective sampling. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines, pp. 301–320. MIT Press, Cambridge (2007)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Norbert Jankowski & Marek Orliński

Authors

Norbert Jankowski
View author publications
You can also search for this author in PubMed Google Scholar
Marek Orliński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norbert Jankowski .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jankowski, N., Orliński, M. (2019). Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2019. Lecture Notes in Computer Science(), vol 11508. Springer, Cham. https://doi.org/10.1007/978-3-030-20912-4_53

Download citation

DOI: https://doi.org/10.1007/978-3-030-20912-4_53
Published: 24 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20911-7
Online ISBN: 978-3-030-20912-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics