Abstract
The k nearest neighbour method (kNN) can be used not only on an entire data set, but also after a selection of instances is performed. Selection of instances should select prototypes which well represent the knowledge about a given problem. We propose a new algorithm of prototype selection. The algorithm is based on selection of instances which represent the borders between classes and additionally they are trustworthy instances. Moreover, our algorithm was optimized with a forest of dedicated locality sensitive hashing (LSH) trees to speed up the prototype selection and the classification process. The algorithm’s final expected complexity is \(O(m\log m)\). Additionally, results show that the new algorithm lays ground for accurate classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Inst. Electr. Electron. Eng. Trans. Inf. Theory 13(1), 21–27 (1967)
Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)
Gates, G.: The reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18(3), 431–433 (1972)
Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II. Results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_87
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. Algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24844-6_90
Blachnik, M.: Metody bazujące na prototypach w zastosowaniu do eksploracji danych. Silesian Technical University (2019)
Kordos, M.: Optimization of evolutionary instance selection. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10245, pp. 359–369. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59063-9_32
Arnaiz-González, Á., Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data. Knowl.-Based Syst. 107, 83–95 (2016)
Sanchez, J., Pla, F., Ferri, F.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recognit. Lett. 18(6), 507–513 (1997)
Garcia, S., Cano, J., Herrera, F.: A memetic algorithm for evolutionary prototype selection: a scaling up approach. Pattern Recognit. 41(8), 2693–2709 (2008)
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: International Conference on Machine Learning, New Brunswick, NJ, USA, pp. 293–301 (1994)
Marchiori, E.: Hit miss networks with applications to instance selection. J. Mach. Learn. Res. 9, 997–1017 (2008)
Marchiori, E.: Class conditional nearest neighbor for large margin instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 364–370 (2010)
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Trans. Knowl. Data Eng. 19(11), 1450–1464 (2007)
Brodley, C.: Recursive automatic bias selection for classifier construction. Mach. Learn. 20(1/2), 63–94 (1995)
Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)
Kuncheva, L.: Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit. Lett. 16(8), 809–814 (1995)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Riquelme, J., Aguilar-Ruiz, J., Toro, M.: Finding representative patterns with ordered projections. Pattern Recognit. 36(4), 1009–1018 (2003)
Barandela, R., Ferri, F., Sanchez, J.: Decision boundary preserving prototype selection for nearest neighbor classification. Int. J. Pattern Recognit. Artif. Intell. 19(6), 787–806 (2005)
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognit. 33(3), 521–528 (2000)
Zhao, K., Zhou, S., Guan, J., Zhou, A.: C-pruner: an improved instance pruning algorithm. In: Proceedings of Second International Conference on Machine Learning and Cybernetics, Xi’an, China, pp. 94–99 (2003)
Lozano, M.T., Sánchez, J.S., Pla, F.: Using the geometrical distribution of prototypes for training set condensing. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.L. (eds.) TTIA 2003. LNCS, vol. 3040, pp. 618–627. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-25945-9_61
Devi, V., Murty, M.: An incremental prototype set building technique. Pattern Recognit. 35(2), 505–513 (2002)
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)
Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, London (2006). https://doi.org/10.1007/978-1-84628-293-5
Brown, R.: Building a balanced k-d tree in \(O(kn \log n\)) time. J. Comput. Graph. Tech. 4(1), 50–68 (2015)
Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8, 321–350 (2012)
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan, pp. 651–660 (2005)
Merz, C.J., Murphy, P.M.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Cameron-Jones, R.M.: Instance selection by encoding length heuristic with random mutation hill climbing. In: Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence, Australia, pp. 99–106 (1995)
Loosli, G., Canu, S., Bottou, L.: Training invariant support vector machines using selective sampling. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large-Scale Kernel Machines, pp. 301–320. MIT Press, Cambridge (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jankowski, N., Orliński, M. (2019). Fast Algorithm for Prototypes Selection—Trust-Margin Prototypes. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2019. Lecture Notes in Computer Science(), vol 11508. Springer, Cham. https://doi.org/10.1007/978-3-030-20912-4_53
Download citation
DOI: https://doi.org/10.1007/978-3-030-20912-4_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20911-7
Online ISBN: 978-3-030-20912-4
eBook Packages: Computer ScienceComputer Science (R0)