A fuzzy K-nearest neighbor classifier to deal with imperfect data

Cadenas, Jose M.; Garrido, M. Carmen; Martínez, Raquel; Muñoz, Enrique; Bonissone, Piero P.

doi:10.1007/s00500-017-2567-x

A fuzzy K-nearest neighbor classifier to deal with imperfect data

Methodologies and Application
Published: 01 April 2017

Volume 22, pages 3313–3330, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jose M. Cadenas¹,
M. Carmen Garrido¹,
Raquel Martínez²,
Enrique Muñoz³ &
…
Piero P. Bonissone⁴

713 Accesses
Explore all metrics

Abstract

The k-nearest neighbors method (kNN) is a nonparametric, instance-based method used for regression and classification. To classify a new instance, the kNN method computes its k nearest neighbors and generates a class value from them. Usually, this method requires that the information available in the datasets be precise and accurate, except for the existence of missing values. However, data imperfection is inevitable when dealing with real-world scenarios. In this paper, we present the kNN$_{imp}$ classifier, a k-nearest neighbors method to perform classification from datasets with imperfect value. The importance of each neighbor in the output decision is based on relative distance and its degree of imperfection. Furthermore, by using external parameters, the classifier enables us to define the maximum allowed imperfection, and to decide if the final output could be derived solely from the greatest weight class (the best class) or from the best class and a weighted combination of the closest classes to the best one. To test the proposed method, we performed several experiments with both synthetic and real-world datasets with imperfect data. The results, validated through statistical tests, show that the kNN$_{imp}$ classifier is robust when working with imperfect data and maintains a good performance when compared with other methods in the literature, applied to datasets with or without imperfection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance

Article Open access 25 September 2021

K-Nearest Neighbour Classification for Interval-Valued Data

Measuring Data Imperfection in a Neighborhood Based Method

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

For example, the fuzzy entropy (Ent($\cdot $)) and the power of fuzzy sets (Pw($\cdot $)) defined by DeLuca and Termini (1972) are the following:
$$\begin{aligned} \mathrm{Ent}(A)= & {} \sum _{a\in A} (\mu (a)\mathrm{log}(\mu _A (a)) + (1-\mu _A (a))log(1-\mu _A (a))); \\ Pw(A)= & {} \sum _{a\in A}\mu _A(a) \end{aligned}$$
where A is a fuzzy set and in the case of continuous fuzzy sets, the sum is understood as an integral.

References

Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man-Mach Stud 36(2):267–287
Article Google Scholar
Aha DW, Kibler D, Albert KM (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithm and experimental analysis framework. J Mult-Valued Logic Soft Comput 17(2–3):255–287
Google Scholar
Barua A, Mudunuri LS, Kosheleva O (2014) Why trapezoidal and triangular membership functions work so well: towards a theoretical explanation. J Uncertain Syst 8(3):164–168
Google Scholar
Berlanga F, Rivas AR, del Jesús M, Herrera F (2010) Gp-coach genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems. Inf Sci 180(8):1183–1200
Article Google Scholar
Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
Book MATH Google Scholar
Bonissone PP, Cadenas JM, Garrido MC, Díaz-Valladares RA (2010) A fuzzy random forest. Int J Approx Reason 51(7):729–747
Article MathSciNet MATH Google Scholar
Cadenas JM, Garrido MC, Martínez R (2013) Nip—an imperfection processor to data mining datasets. Int J Comput Intell Syst 6(1):3–17
Article Google Scholar
Cadenas JM, Garrido MC, Martínez R, Bonissone PP (2012) Extending information processing in a fuzzy random forest. Soft Comput 16(6):845–861
Article Google Scholar
Cano A, Zafra A, Ventura S (2013) Weighted data gravitation classification for standard and imbalanced data. IEEE Trans Cybern 43(6):1672–1687
Article Google Scholar
Clare A, King R (2001) Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European conference on principles of data mining and knowledge discovery, Freiburg, pp 42–53
Cover T, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Crockett K, Bandar Z, Mclean D (2001) Growing a fuzzy decision forest. In: Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne, pp 614–617
DeLuca A, Termini S (1972) A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory. Inf Control 20(4):301–312
Article MathSciNet MATH Google Scholar
Derrac J, García S, Herrera F (2014) Fuzzy nearest neighbor algorithms: taxonomy, experimental analysis and prospects. Inf Sci 260:98–119
Article Google Scholar
Diamon P, Kloeden P (1994) Metric spaces of fuzzy sets: theory and application. World Scientific Publishing, London
Book Google Scholar
Dombi J, Porkolab L (1991) Measures of fuzziness. Ann Univ Sci Bp Sect Comput 12:69–78
MathSciNet MATH Google Scholar
Dubois D, Parde H (1980) Fuzzy sets and system: theory and applications. Academic Press, New York
Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Fernández A, del Jesús M, Herrera F (2009) Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. Int J Approx Reason 50(3):561577
Article MATH Google Scholar
Fix E, Hodges J (1989) Discriminatory analysis, nonparametric discrimination: consistency properties. Int Stat Rev 57(3):238–247
Article MATH Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
Garrido MC, Cadenas JM, Bonissone PP (2010) A classification and regression technique to handle heterogeneous and imperfect information. Soft Comput 14(11):1165–1185
Article Google Scholar
Huang Z (2002) A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans Fuzzy Syst 7(4):446–452
Article Google Scholar
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Gr Stat 5(3): 299–314. http://www.r-project.org/
Inoue T, Abe S (2001) Fuzzy support vector machines for pattern classification. In: Proceedings of international joint conference on neural networks, Washington, pp 1449–1454
Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. IEEE Trans Fuzzy Syst 13(4):428436
Article Google Scholar
Jahromi MZ, Parvinnia E, John R (2009) A method of learning weighted similarity function to improve the performance of nearest neighbor. Inf Sci 179(17):2964–2973
Article MATH Google Scholar
Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B 28(1):1–14
Article Google Scholar
Janikow CZ (2003) Fuzzy decision forest. In: Proceedings of the 22nd international conference of the North American fuzzy information processing society, Chicago, pp 480–483
Johanyák ZC, Kovács S (2005) Distance based similarity measures of fuzzy sets. In: Proceedings of the 3rd Slovakian-Hungarian joint symposium on applied machine intelligence, Herlany, pp 265–276
Kaufmann A (1975) Introduction to the theory of fuzzy subsets: fundamental theoretical elements. Academic Press, New York
MATH Google Scholar
Lee K, Lee K, Lee J (1999) A fuzzy decision tree induction method for fuzzy data. In: Proceedings of IEEE international fuzzy systems conference, Seoul, pp 16–21
Li D, Gu H, Zhang L (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Exp Syst Appl 37(10):6942–6947
Article Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml, University of California, School of Information and Computer Sciences, Irvine
Lin C, Wang S (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464471
Google Scholar
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern Recognit 45(9):30843104
Article Google Scholar
Marsala C (2009) Data mining with ensembles of fuzzy decision trees. In: Proceedings of IEEE symposium on computational intelligence and data mining, Nashville, pp 348–354
Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle River
MATH Google Scholar
Mitra S, Pal SK (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63
Article Google Scholar
Moore RE (1979) Methods and applications of interval analysis. (SIAM) Studies in Applied Mathematics 2, Soc for Industrial and Applied Math, Philadelphia
Nauck D, Krusel R (1997) A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets Syst 89(3):277–288
Article MathSciNet Google Scholar
Olaru C, Wehenkel L (2003) A complete fuzzy decision tree technique. Fuzzy Sets Syst 138(2):221–254
Article MathSciNet Google Scholar
Otero A, Otero J, Sánchez L, Villar JR (2006) Longest path estimation from inherently fuzzy data acquired with GPS using genetic algorithms. In: Proceedings of the international symposium on evolving fuzzy systems, Lancaster, pp 300–305
Palacios AM, Sánchez L, Couso I (2009) Extending a simple genetic cooperative-competitive learning fuzzy classifier to low quality datasets. Evolut Intell 2(1):73–84
Article Google Scholar
Palacios AM, Sánchez L, Couso I (2010) Diagnosis of dyslexia with low quality data with genetic fuzzy systems. Int J Approx Reason 51(8):993–1009
Article Google Scholar
Palacios AM, Sánchez L, Couso I (2011) Future performance modeling in athletism with low quality data-based genetic fuzzy systems. J Mult-Valued Logic Soft Comput 17:207–228
Google Scholar
Palacios AM, Sánchez L, Couso I (2012) Boosting of fuzzy rules with low quality data. J Mult-Valued Logic Soft Comput 19:591–619
MathSciNet Google Scholar
Palacios AM, Sánchez L, Couso I (2013) An extension of the furia classification algorithm to low quality data. Hybrid artificial intelligent systems (LNCS 8073). Springer, Berlin, pp 679–688
Chapter Google Scholar
Palacios AM, Palacios JL, Sánchez L, Alcalá-Fdez J (2015) Genetic learning of the membership functions for mining fuzzy association rules from low quality data. Inf Sci 295:358–378
Article MATH Google Scholar
Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recognit 39(2):180–188
Article MATH Google Scholar
Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 28(7):1100–1110
Article Google Scholar
Ralescu AL, Ralescu DA (1984) Probability and fuzziness. Inf Sci 34(2):85–92
Article MathSciNet MATH Google Scholar
Rumelhart DE, Mcclelland JL (1986) Parallel distributed processing. MIT Press, Cambridge
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodological) 36(2):111–147
MathSciNet MATH Google Scholar
Torra V (2005) Fuzzy c-means for fuzzy hierarchical clustering. In: Proceedings of the 14th IEEE international conference on fuzzy systems, Reno, pp 646–651
Villar J, Otero A, Otero J, Sánchez L (2009) Taximeter verification using imprecise data from GPS. Eng Appl Artif Intell 22(2):250–260
Article Google Scholar
Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit Lett 28(2):207–213
Article Google Scholar
Wilson DR, Martinez TR (2000) An integrated instance-based learning algorithm. Comput Intell 16(1):1–28
Article MathSciNet Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining, 3rd edn. Morgan Kaufmann Publishers, San Francisco
Google Scholar
Younes Z, Abdallah F, Denoeux T (2010) Fuzzy multi-label learning under veristic variables. In: Proceedings of the IEEE international conference on fuzzy systems, Yantai, pp 1–8
Zadeh L (1965) Fuzzy sets. Inf Control 8:183–190
Article MATH Google Scholar

Download references

Acknowledgements

Supported by the project TIN2014-52099-R (EDISON) granted by the Ministry of Economy and Competitiveness of Spain (including ERDF support).

Author information

Authors and Affiliations

Department of Information and Communications Engineering, University of Murcia, Murcia, Spain
Jose M. Cadenas & M. Carmen Garrido
Department of Computer Engineering, Catholic University of Murcia, Murcia, Spain
Raquel Martínez
Department of Computer Science, Università Degli Studi di Milano, Crema, Italy
Enrique Muñoz
Piero P Bonissone Analytics, LLC, San Diego, CA, USA
Piero P. Bonissone

Authors

Jose M. Cadenas
View author publications
You can also search for this author in PubMed Google Scholar
M. Carmen Garrido
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Piero P. Bonissone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose M. Cadenas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cadenas, J.M., Garrido, M.C., Martínez, R. et al. A fuzzy K-nearest neighbor classifier to deal with imperfect data. Soft Comput 22, 3313–3330 (2018). https://doi.org/10.1007/s00500-017-2567-x

Download citation

Published: 01 April 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00500-017-2567-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fuzzy K-nearest neighbor classifier to deal with imperfect data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance

K-Nearest Neighbour Classification for Interval-Valued Data

Measuring Data Imperfection in a Neighborhood Based Method

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A fuzzy K-nearest neighbor classifier to deal with imperfect data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance

K-Nearest Neighbour Classification for Interval-Valued Data

Measuring Data Imperfection in a Neighborhood Based Method

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation