Skip to main content

Advertisement

Log in

Unsupervised instance selection via conjectural hyperrectangles

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Machine learning algorithms spend a lot of time processing data because they are not fast enough to commit huge data sets. Instance selection algorithms especially aim to tackle this trouble. However, even instance selection algorithms can suffer from it. We propose a new unsupervised instance selection algorithm based on conjectural hyper-rectangles. In this study, the proposed algorithm is compared with one conventional and four state-of-the-art instance selection algorithms by using fifty-five data sets from different domains. The experimental results demonstrate the supremacy of the proposed algorithm in terms of classification accuracy, reduction rate, and running time. The time and space complexities of the proposed algorithm are log-linear and linear, respectively. Furthermore, the proposed algorithm can obtain better results with an accuracy-reduction trade-off without decreasing reduction rates extremely. The source code of the proposed algorithm and the data sets are available at https://github.com/fatihaydin1/NIS for computational reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

This study uses existing data, which is openly available at locations cited in the footnotes.

Notes

  1. http://archive.ics.uci.edu/ml.

  2. https://www.mathworks.com/help/stats/sample-data-sets.html.

  3. https://www.openml.org/s/88/data.

References

  1. Saha S, Sarker PS, Al SA et al (2022) Cluster-oriented instance selection for classification problems. Inf Sci (Ny) 602:143–158. https://doi.org/10.1016/j.ins.2022.04.036

    Article  Google Scholar 

  2. Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34:133–143. https://doi.org/10.1007/s10462-010-9165-y

    Article  Google Scholar 

  3. Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34:417–435. https://doi.org/10.1109/TPAMI.2011.142

    Article  Google Scholar 

  4. García-Pedrajas N (2011) Evolutionary computation for training set selection. Wiley Interdiscip Rev Data Min Knowl Discov 1:512–523. https://doi.org/10.1002/widm.44

    Article  Google Scholar 

  5. Hart P (1968) The condensed nearest neighbor rule (Corresp.). IEEE Trans Inf Theory 14:515–516. https://doi.org/10.1109/TIT.1968.1054155

    Article  Google Scholar 

  6. Alpaydin E (1997) Voting over multiple condensed nearest neighbors. Artif Intell Rev 11:115–132. https://doi.org/10.1023/A:1006563312922

    Article  Google Scholar 

  7. Gates G (1972) The reduced nearest neighbor rule (Corresp.). IEEE Trans Inf Theory 18:431–433. https://doi.org/10.1109/TIT.1972.1054809

    Article  Google Scholar 

  8. Ullmann J (1974) Automatic selection of reference data for use in a nearest-neighbor method of pattern classification (Corresp.). IEEE Trans Inf Theory 20:541–543. https://doi.org/10.1109/TIT.1974.1055252

    Article  MATH  Google Scholar 

  9. Ritter G, Woodruff H, Lowry S, Isenhour T (1975) An algorithm for a selective nearest neighbor decision rule (Corresp.). IEEE Trans Inf Theory 21:665–669. https://doi.org/10.1109/TIT.1975.1055464

    Article  MATH  Google Scholar 

  10. Tomek I (1976) Two Modifications of CNN. IEEE Trans Syst Man Cybern SMC 6:769–772. https://doi.org/10.1109/TSMC.1976.4309452

    Article  MathSciNet  MATH  Google Scholar 

  11. Gowda K, Krishna G (1979) The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.). IEEE Trans Inf Theory 25:488–490. https://doi.org/10.1109/TIT.1979.1056066

    Article  Google Scholar 

  12. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66. https://doi.org/10.1007/BF00153759

    Article  Google Scholar 

  13. Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19:1450–1464. https://doi.org/10.1109/TKDE.2007.190645

    Article  Google Scholar 

  14. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern SMC 2:408–421. https://doi.org/10.1109/TSMC.1972.4309137

    Article  MathSciNet  MATH  Google Scholar 

  15. Zhao S, Li J (2020) ELS: a fast parameter-free edition algorithm with natural neighbors-based local sets for k nearest neighbor. IEEE Access 8:123773–123782. https://doi.org/10.1109/ACCESS.2020.3005815

    Article  Google Scholar 

  16. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286

    Article  MATH  Google Scholar 

  17. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6:153–172. https://doi.org/10.1023/A:1014043630878

    Article  MathSciNet  MATH  Google Scholar 

  18. Li J, Zhu Q, Wu Q (2020) A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50:1527–1541. https://doi.org/10.1007/s10489-019-01598-y

    Article  Google Scholar 

  19. García-Osorio C, de Haro-García A, García-Pedrajas N (2010) Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif Intell 174:410–441. https://doi.org/10.1016/j.artint.2010.01.001

    Article  MathSciNet  Google Scholar 

  20. de Haro-García A, Cerruela-García G, García-Pedrajas N (2019) Instance selection based on boosting for instance-based learners. Pattern Recognit 96:106959. https://doi.org/10.1016/j.patcog.2019.07.004

    Article  Google Scholar 

  21. Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans Evol Comput 7:561–575. https://doi.org/10.1109/TEVC.2003.819265

    Article  Google Scholar 

  22. de Haro-García A, Pérez-Rodríguez J, García-Pedrajas N (2018) Combining three strategies for evolutionary instance selection for instance-based learning. Swarm Evol Comput 42:160–172. https://doi.org/10.1016/j.swevo.2018.02.022

    Article  Google Scholar 

  23. Dornaika F (2021) Joint feature and instance selection using manifold data criteria: application to image classification. Artif Intell Rev 54:1735–1765. https://doi.org/10.1007/s10462-020-09889-4

    Article  Google Scholar 

  24. Triguero I, Peralta D, Bacardit J et al (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345. https://doi.org/10.1016/j.neucom.2014.04.078

    Article  Google Scholar 

  25. Arnaiz-González Á, Díez-Pastor J-F, Rodríguez JJ, García-Osorio C (2016) Instance selection of linear complexity for big data. Knowledge-Based Syst 107:83–95. https://doi.org/10.1016/j.knosys.2016.05.056

    Article  Google Scholar 

  26. Aslani M, Seipel S (2020) A fast instance selection method for support vector machines in building extraction. Appl Soft Comput 97:106716. https://doi.org/10.1016/j.asoc.2020.106716

    Article  Google Scholar 

  27. Aslani M, Seipel S (2021) Efficient and decision boundary aware instance selection for support vector machines. Inf Sci (Ny) 577:579–598. https://doi.org/10.1016/j.ins.2021.07.015

    Article  MathSciNet  Google Scholar 

  28. Liu C, Wang W, Wang M et al (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowledge-Based Syst 116:58–73. https://doi.org/10.1016/j.knosys.2016.10.031

    Article  Google Scholar 

  29. Akinyelu AA, Ezugwu AE (2019) Nature inspired instance selection techniques for support vector machine speed optimization. IEEE Access 7:154581–154599. https://doi.org/10.1109/ACCESS.2019.2949238

    Article  Google Scholar 

  30. Rico-Juan JR, Valero-Mas JJ, Calvo-Zaragoza J (2019) Extensions to rank-based prototype selection in k-nearest neighbour classification. Appl Soft Comput 85:105803. https://doi.org/10.1016/j.asoc.2019.105803

    Article  Google Scholar 

  31. Ruiz IL, Gómez-Nieto MÁ (2020) Prototype selection method based on the rivality and reliability indexes for the improvement of the classification models and external predictions. J Chem Inf Model 60:3009–3021. https://doi.org/10.1021/acs.jcim.0c00176

    Article  Google Scholar 

  32. Wang Z, Tsai C-F, Lin W-C (2021) Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. Data Technol Appl. https://doi.org/10.1108/DTA-01-2021-0027

    Article  Google Scholar 

  33. Liu H, Motoda H (2002) On issues of instance selection. Data Min Knowl Discov 6:115–130. https://doi.org/10.1023/A:1014056429969

    Article  MathSciNet  Google Scholar 

  34. Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40:6894–6900. https://doi.org/10.1016/j.eswa.2013.06.053

    Article  Google Scholar 

  35. Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2016) Large symmetric margin instance selection algorithm. Int J Mach Learn Cybern 7:25–45. https://doi.org/10.1007/s13042-014-0239-z

    Article  MATH  Google Scholar 

  36. Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recognit 48:1878–1889. https://doi.org/10.1016/j.patcog.2014.11.005

    Article  MATH  Google Scholar 

  37. Leyva E, González A, Pérez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recognit 48:1523–1537. https://doi.org/10.1016/j.patcog.2014.10.001

    Article  Google Scholar 

  38. Yang L, Zhu Q, Huang J et al (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245. https://doi.org/10.1007/s00500-019-03865-z

    Article  Google Scholar 

  39. Kordos M, Blachnik M, Scherer R (2022) Fuzzy clustering decomposition of genetic algorithm-based instance selection for regression problems. Inf Sci (Ny) 587:23–40. https://doi.org/10.1016/j.ins.2021.12.016

    Article  Google Scholar 

  40. Herrera-Semenets V, Hernández-León R, van den Berg J (2022) A fast instance reduction algorithm for intrusion detection scenarios. Comput Electr Eng 101:107963. https://doi.org/10.1016/j.compeleceng.2022.107963

    Article  Google Scholar 

  41. Villuendas-Rey Y (2022) Hybrid data selection with preservation rough sets. Soft Comput. https://doi.org/10.1007/s00500-022-07439-4

    Article  Google Scholar 

  42. Zhai J, Song D (2022) Optimal instance subset selection from big data using genetic algorithm and open source framework. J Big Data 9:87. https://doi.org/10.1186/s40537-022-00640-0

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatih Aydin.

Ethics declarations

Conflict of interests

The authors have declared that no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aydin, F. Unsupervised instance selection via conjectural hyperrectangles. Neural Comput & Applic 35, 5335–5349 (2023). https://doi.org/10.1007/s00521-022-07974-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07974-z

Keywords

Profiles

  1. Fatih Aydin