Abstract
Real-world data collections are often heterogeneous (represented by a set of mixed attributes data types: numerical, categorical and fuzzy); since most available similarity measures can only be applied to one type of data, it becomes essential to construct an appropriate similarity measure for comparing such complex data. In this paper, a framework of new and unified similarity measures is proposed for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes. Examples are used to illustrate, compare and discuss the applications and efficiency of the proposed approach to heterogeneous data comparison and clustering.
Similar content being viewed by others
References
Acampora G, Loia V, Salerno S, Vitiello A (2012) A hybrid evolutionary approach for solving the ontology alignment problem. Int J Intell Syst 27:189–216. doi:10.1002/int.20517
Ahmad A, Dey L (2007a) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527
Ahmad A, Dey L (2007b) A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn Lett 28(1):110–118
Bashon Y, Neagu D et al (2010) A new approach for comparing fuzzy objects. In: Information processing and management of uncertainty in knowledge-based systems: applications
Berzal F, Marin N et al (2005) A framework to build fuzzy object-oriented capabilities over an existing. In: Advances in fuzzy object-oriented databases: modeling and applications, pp 177–205
Berzal F, Cubero JC et al (2007) A general framework for computing with words in object-oriented programming. Int J Uncertainty Fuzziness Knowl Based Syst 15:111
Boriah S, Chandola V et al (2008) Similarity measures for categorical data: a comparative evaluation. Red 30(2):3
Chandola V, Boriah S et al (2009) A framework for exploring categorical data
Chen Y, Garcia EK et al (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley-IEEE Press, New York
Choi SS, Cha SH et al (2010) A survey of binary similarity and distance measures. J Syst Cybernet Inf 8(1):43–48
Cross V, Sudkamp TA (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Springer, New York
Duda RO, Hart PE et al (2001) Pattern classification. Wiley, New York
Eskin E, Arnold A et al (2002) A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Citeseer
Garcia S, Luengo J et al (2012) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng
George R, Buckles BP et al (1993) Modelling class hierarchies in the fuzzy object-oriented data model. Fuzzy Sets Syst 60(3):259–272
Goodall DW (1966) A new similarity index based on probability. Biometrics 22(4):882–907
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Hallez A, De Tré G (2007) A hierarchical approach to object comparison. In: Foundations of fuzzy logic and soft computing, pp 191–198
Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Int J Inf Process Manage 36(6):779–808
Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall, Upper Saddle River
Kononenko I, Kukar M (2007) Machine learning and data mining. Woodhead Publishing
Koyuncu M, Yazici A (2003) IFOOD: an intelligent fuzzy object-oriented database architecture. IEEE Trans Knowl Data Eng 15:1137–1154
Lee J, Kuo JY et al (2001) A note on current approaches to extending fuzzy logic to object oriented modeling. Int J Intell Syst 16(7):807–820
Lesot MJ (2005) Similarity, typicality and fuzzy prototypes for numerical data. In: 6th European congress on systems science
Lesot M, Rifqi M et al (2009) Similarity measures for binary and numerical data: a survey. Int J Knowl Eng Soft Data Paradigms 1(1):63–84
Liang M (2004) Data mining: concepts, models, methods, and algorithms. IIE Trans 36(5):495–496
Lourenco F, Lobo V et al (2004) Binary-based similarity measures for categorical data and their application in self-organizing maps. In: Proc JOCLAD
Ma Z, Zhang W et al (2004) Extending object-oriented databases for fuzzy information modeling* 1. Inf Syst 29(5):421–435
Marín N, Medina J et al (2003) Complex object comparison in a fuzzy context. Inf Softw Technol 45(7):431–444
Prade H, Testemale C (1984) Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf Sci 34(2):115–143
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Santini S, Jain R (1999) Similarity measures. IEEE Trans Pattern Anal Mach Intell 21(9):871–883
Sneath PHA, Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical classification
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Xie J, Szymanski B et al (2010) Learning dissimilarities for categorical symbols. In: 4th Workshop on feature selection in data mining. Citeseer
Xu Z, Chen J et al (2008) Clustering algorithm for intuitionistic fuzzy sets. Inf Sci 178(19):3775–3790
Yang Y, Webb GI (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74
Zadeh LA (1965) Fuzzy sets*. Inf Control 8(3):338–353
Zadeh LA (1996) Fuzzy logic = computing with words. IEEE Trans Fuzzy Syst 4(2):103–111
Zadeh L (2000) From computing with numbers to computing with words—from manipulation of measurements to manipulation of perceptions. In: Intelligent systems and soft computing, pp 3–40
Zwick R, Carlstein E et al (1987) Measures of similarity among fuzzy concepts: a comparative analysis. Int J Approx Reason 1(2):221–242
Acknowledgments
The authors are grateful to the anonymous reviewers for their comments that helped improving this manuscript. They would like to acknowledge Alexandru Ardelean’s contribution to the collection of the student-accommodations data set. They would also like to acknowledge gratefully Shaista Rashid and Anna Palczewska for their continuous help and good suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Bashon, Y., Neagu, D. & Ridley, M.J. A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes. Soft Comput 17, 1595–1615 (2013). https://doi.org/10.1007/s00500-012-0974-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-012-0974-6