Abstract
Machine learning algorithms spend a lot of time processing data because they are not fast enough to commit huge data sets. Instance selection algorithms especially aim to tackle this trouble. However, even instance selection algorithms can suffer from it. We propose a new unsupervised instance selection algorithm based on conjectural hyper-rectangles. In this study, the proposed algorithm is compared with one conventional and four state-of-the-art instance selection algorithms by using fifty-five data sets from different domains. The experimental results demonstrate the supremacy of the proposed algorithm in terms of classification accuracy, reduction rate, and running time. The time and space complexities of the proposed algorithm are log-linear and linear, respectively. Furthermore, the proposed algorithm can obtain better results with an accuracy-reduction trade-off without decreasing reduction rates extremely. The source code of the proposed algorithm and the data sets are available at https://github.com/fatihaydin1/NIS for computational reproducibility.




Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
This study uses existing data, which is openly available at locations cited in the footnotes.
References
Saha S, Sarker PS, Al SA et al (2022) Cluster-oriented instance selection for classification problems. Inf Sci (Ny) 602:143–158. https://doi.org/10.1016/j.ins.2022.04.036
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143. https://doi.org/10.1007/s10462-010-9165-y
Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34:417–435. https://doi.org/10.1109/TPAMI.2011.142
García-Pedrajas N (2011) Evolutionary computation for training set selection. Wiley Interdiscip Rev Data Min Knowl Discov 1:512–523. https://doi.org/10.1002/widm.44
Hart P (1968) The condensed nearest neighbor rule (Corresp.). IEEE Trans Inf Theory 14:515–516. https://doi.org/10.1109/TIT.1968.1054155
Alpaydin E (1997) Voting over multiple condensed nearest neighbors. Artif Intell Rev 11:115–132. https://doi.org/10.1023/A:1006563312922
Gates G (1972) The reduced nearest neighbor rule (Corresp.). IEEE Trans Inf Theory 18:431–433. https://doi.org/10.1109/TIT.1972.1054809
Ullmann J (1974) Automatic selection of reference data for use in a nearest-neighbor method of pattern classification (Corresp.). IEEE Trans Inf Theory 20:541–543. https://doi.org/10.1109/TIT.1974.1055252
Ritter G, Woodruff H, Lowry S, Isenhour T (1975) An algorithm for a selective nearest neighbor decision rule (Corresp.). IEEE Trans Inf Theory 21:665–669. https://doi.org/10.1109/TIT.1975.1055464
Tomek I (1976) Two Modifications of CNN. IEEE Trans Syst Man Cybern SMC 6:769–772. https://doi.org/10.1109/TSMC.1976.4309452
Gowda K, Krishna G (1979) The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.). IEEE Trans Inf Theory 25:488–490. https://doi.org/10.1109/TIT.1979.1056066
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66. https://doi.org/10.1007/BF00153759
Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19:1450–1464. https://doi.org/10.1109/TKDE.2007.190645
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC 2:408–421. https://doi.org/10.1109/TSMC.1972.4309137
Zhao S, Li J (2020) ELS: a fast parameter-free edition algorithm with natural neighbors-based local sets for k nearest neighbor. IEEE Access 8:123773–123782. https://doi.org/10.1109/ACCESS.2020.3005815
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172. https://doi.org/10.1023/A:1014043630878
Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541. https://doi.org/10.1007/s10489-019-01598-y
García-Osorio C, de Haro-García A, García-Pedrajas N (2010) Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif Intell 174:410–441. https://doi.org/10.1016/j.artint.2010.01.001
de Haro-García A, Cerruela-García G, García-Pedrajas N (2019) Instance selection based on boosting for instance-based learners. Pattern Recognit 96:106959. https://doi.org/10.1016/j.patcog.2019.07.004
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7:561–575. https://doi.org/10.1109/TEVC.2003.819265
de Haro-García A, Pérez-Rodríguez J, García-Pedrajas N (2018) Combining three strategies for evolutionary instance selection for instance-based learning. Swarm Evol Comput 42:160–172. https://doi.org/10.1016/j.swevo.2018.02.022
Dornaika F (2021) Joint feature and instance selection using manifold data criteria: application to image classification. Artif Intell Rev 54:1735–1765. https://doi.org/10.1007/s10462-020-09889-4
Triguero I, Peralta D, Bacardit J et al (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345. https://doi.org/10.1016/j.neucom.2014.04.078
Arnaiz-González Á, Díez-Pastor J-F, Rodríguez JJ, García-Osorio C (2016) Instance selection of linear complexity for big data. Knowledge-Based Syst 107:83–95. https://doi.org/10.1016/j.knosys.2016.05.056
Aslani M, Seipel S (2020) A fast instance selection method for support vector machines in building extraction. Appl Soft Comput 97:106716. https://doi.org/10.1016/j.asoc.2020.106716
Aslani M, Seipel S (2021) Efficient and decision boundary aware instance selection for support vector machines. Inf Sci (Ny) 577:579–598. https://doi.org/10.1016/j.ins.2021.07.015
Liu C, Wang W, Wang M et al (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowledge-Based Syst 116:58–73. https://doi.org/10.1016/j.knosys.2016.10.031
Akinyelu AA, Ezugwu AE (2019) Nature inspired instance selection techniques for support vector machine speed optimization. IEEE Access 7:154581–154599. https://doi.org/10.1109/ACCESS.2019.2949238
Rico-Juan JR, Valero-Mas JJ, Calvo-Zaragoza J (2019) Extensions to rank-based prototype selection in k-nearest neighbour classification. Appl Soft Comput 85:105803. https://doi.org/10.1016/j.asoc.2019.105803
Ruiz IL, Gómez-Nieto MÁ (2020) Prototype selection method based on the rivality and reliability indexes for the improvement of the classification models and external predictions. J Chem Inf Model 60:3009–3021. https://doi.org/10.1021/acs.jcim.0c00176
Wang Z, Tsai C-F, Lin W-C (2021) Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. Data Technol Appl. https://doi.org/10.1108/DTA-01-2021-0027
Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6:115–130. https://doi.org/10.1023/A:1014056429969
Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40:6894–6900. https://doi.org/10.1016/j.eswa.2013.06.053
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7:25–45. https://doi.org/10.1007/s13042-014-0239-z
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recognit 48:1878–1889. https://doi.org/10.1016/j.patcog.2014.11.005
Leyva E, González A, Pérez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48:1523–1537. https://doi.org/10.1016/j.patcog.2014.10.001
Yang L, Zhu Q, Huang J et al (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245. https://doi.org/10.1007/s00500-019-03865-z
Kordos M, Blachnik M, Scherer R (2022) Fuzzy clustering decomposition of genetic algorithm-based instance selection for regression problems. Inf Sci (Ny) 587:23–40. https://doi.org/10.1016/j.ins.2021.12.016
Herrera-Semenets V, Hernández-León R, van den Berg J (2022) A fast instance reduction algorithm for intrusion detection scenarios. Comput Electr Eng 101:107963. https://doi.org/10.1016/j.compeleceng.2022.107963
Villuendas-Rey Y (2022) Hybrid data selection with preservation rough sets. Soft Comput. https://doi.org/10.1007/s00500-022-07439-4
Zhai J, Song D (2022) Optimal instance subset selection from big data using genetic algorithm and open source framework. J Big Data 9:87. https://doi.org/10.1186/s40537-022-00640-0
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors have declared that no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aydin, F. Unsupervised instance selection via conjectural hyperrectangles. Neural Comput & Applic 35, 5335–5349 (2023). https://doi.org/10.1007/s00521-022-07974-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07974-z
Keywords
Profiles
- Fatih Aydin View author profile