Skip to main content

Advertisement

Log in

Data reduction based on NN-kNN measure for NN classification and regression

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Data reduction processes are designed not only to reduce the amount of data, but also to reduce noise interference. In this study, we focus on researching sample reduction algorithms for the classification and regression data. A sample quality evaluation measure denoted by NN-kNN, which is inspired by human social behavior, is proposed. This measure is a local evaluation method that can accurately evaluate the quality of samples under uneven and irregular data distribution. Additionally, the measure is easy to understand and applies to both supervised and unsupervised data. Consequently, it respectively studies the sample reduction algorithms based on the NN-kNN measure for classification and regression data. Experiments are carried out to verify the proposed quality evaluation measure and data reduction algorithms. Experimental results show that NN-kNN can evaluate data quality effectively. High quality samples selected by the reduction algorithms can generate high classification and prediction performance. Furthermore, the robustness of the sample reduction algorithms is also validated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  2. An S, Hu QH, Pedrycz W, Zhu PF, Tsang Eric CC (2016) Data-distribution-aware fuzzy rough set model and its application to robust classification. IEEE Trans Cybern 46(12):3073–3085

    Google Scholar 

  3. Bai W, Wang XT, Xin JC, Wang GR (2016) Efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 181:19–28

    Article  Google Scholar 

  4. Breunig MM, Kriegel H-P, Ng RT, Sander J (1999) Optics-of: identifying local outliers. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science 1704:262–270

  5. Chen YX, Dang X, Peng HX, Bart H (2009) Outlier detection with the kernelized spatial depth function. Artif Intell Rev 31(2):288–305

    Google Scholar 

  6. Dai JH, Hu QH,Zhang JH (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9)(SI):2460-2471

  7. Dai JH, Liu Y, Chen JL, Liu XF (2020) Fast feature selection for interval-valued data through kernel density estimation entropy. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-020-01131-5

  8. Ding WP, Lin CT, Witold P (2020) Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce. IEEE Trans Cybern 50(2):425–439

    Article  Google Scholar 

  9. Dua D, Graff C (2019) UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  10. Frumosu FD, Kulahci M (2019) Outliers detection using an iterative strategy for semi-supervised learning. Qual Reliab Eng Int 35(5):1408–1423

    Article  Google Scholar 

  11. Gao JH, Ji WX, Zhang LL (2020) Cube-based incremental outlier detection for streaming computing. Inf Sci 517:361–376

    Article  Google Scholar 

  12. Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435

    Article  Google Scholar 

  13. Hautamaki V, Karkkainen I, Franti P (2001) Outlier detection using k-nearest neighbour graph. IEEE Comput Soc 3:430–433

    Google Scholar 

  14. He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650

    Article  Google Scholar 

  15. Knorr EM, Ng RT, Tucakov V (2000) Distance-based Outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  16. Krzysztof M, Witold R (2020) All-relevant feature selection using multidimensional filters with exhaustive search. Inf Sci 524:277–297

    Article  MathSciNet  Google Scholar 

  17. Li XJ, Lv JC, Yi Z (2020) Outlier detection using structural scores in a high-dimensional space. IEEE Trans Cybern 50(5):2302–2310

    Article  Google Scholar 

  18. Liu HW, Li XL, Li JY, Zhang SC (2018) Efficient outlier detection for high-dimensional data. IEEE Trans Syst Man Cybern-Syst 48(12):2451–2461

    Article  Google Scholar 

  19. Mei BS, Xu YT (2020) Safe sample screening for regularized multi-task learning. Knowl-Based Syst 204:106–248

    Article  Google Scholar 

  20. Otey ME, Ghoting A, Parthasarathy S (2006) Fast distributed outlier detection in mixed-attribute data sets. Data Min Knowl Discov 12(2–3):203–228

    Article  MathSciNet  Google Scholar 

  21. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  22. Ramaswamy S, Rastogi R, Shim K (2000) Effecient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29:427-438

  23. Roth V (2006) Kernel Fisher discriminants for outlier detection. Neural Comput 18(4):942–960

    Article  MathSciNet  Google Scholar 

  24. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423

    Article  MathSciNet  Google Scholar 

  25. Tan AH, Wu W-Z, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539

    Article  Google Scholar 

  26. Tang B, He HB (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Article  Google Scholar 

  27. Verbiest N, Cornelis C, Herrera F (2013) FRPS: A fuzzy rough prototype selection method. Pattern Recogn 46:2770–2782

    Article  Google Scholar 

  28. Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE Trans Fuzzy Syst 25(4):741–753

    Article  Google Scholar 

  29. Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2019) Feature selection based on neighborhood self-Information. IEEE Trans Cybern 99:1–12

    Google Scholar 

  30. Wang CZ, Wang Y, Shao MW, Qian YH, Chen DG (2020) Fuzzy rough attribute reduction for categorical data. IEEE Trans Fuzzy Syst 28(5):818–830

    Article  Google Scholar 

  31. Yang YY, Song SJ, Chen DG, Zhang X (2020) Discernible neighborhood counting based incremental feature selection for heterogeneous data. Int J Mach Learn Cybern 11(5):1115–1127

    Article  Google Scholar 

  32. Yu DR, An S, Hu QH (2011) Fuzzy mutual information based min-redundancy and max-relevance heterogeneous feature selection. Int J Comput Intell Syst 4(4):619–633

    Google Scholar 

  33. Yuan Z, Zhang XY, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (61976027,U1808205), Natural Science Foundation of Hebei Province of China (A2018501040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang An.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, S., Hu, Q., Wang, C. et al. Data reduction based on NN-kNN measure for NN classification and regression. Int. J. Mach. Learn. & Cyber. 13, 765–781 (2022). https://doi.org/10.1007/s13042-021-01327-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01327-3

Keywords

Navigation