Skip to main content
Log in

A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Real-world data collections are often heterogeneous (represented by a set of mixed attributes data types: numerical, categorical and fuzzy); since most available similarity measures can only be applied to one type of data, it becomes essential to construct an appropriate similarity measure for comparing such complex data. In this paper, a framework of new and unified similarity measures is proposed for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes. Examples are used to illustrate, compare and discuss the applications and efficiency of the proposed approach to heterogeneous data comparison and clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Acampora G, Loia V, Salerno S, Vitiello A (2012) A hybrid evolutionary approach for solving the ontology alignment problem. Int J Intell Syst 27:189–216. doi:10.1002/int.20517

    Google Scholar 

  • Ahmad A, Dey L (2007a) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527

    Article  Google Scholar 

  • Ahmad A, Dey L (2007b) A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn Lett 28(1):110–118

    Article  Google Scholar 

  • Bashon Y, Neagu D et al (2010) A new approach for comparing fuzzy objects. In: Information processing and management of uncertainty in knowledge-based systems: applications

  • Berzal F, Marin N et al (2005) A framework to build fuzzy object-oriented capabilities over an existing. In: Advances in fuzzy object-oriented databases: modeling and applications, pp 177–205

  • Berzal F, Cubero JC et al (2007) A general framework for computing with words in object-oriented programming. Int J Uncertainty Fuzziness Knowl Based Syst 15:111

    Google Scholar 

  • Boriah S, Chandola V et al (2008) Similarity measures for categorical data: a comparative evaluation. Red 30(2):3

    Google Scholar 

  • Chandola V, Boriah S et al (2009) A framework for exploring categorical data

  • Chen Y, Garcia EK et al (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776

    MathSciNet  MATH  Google Scholar 

  • Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley-IEEE Press, New York

  • Choi SS, Cha SH et al (2010) A survey of binary similarity and distance measures. J Syst Cybernet Inf 8(1):43–48

    Google Scholar 

  • Cross V, Sudkamp TA (2002) Similarity and compatibility in fuzzy set theory: assessment and applications. Springer, New York

  • Duda RO, Hart PE et al (2001) Pattern classification. Wiley, New York

  • Eskin E, Arnold A et al (2002) A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data. Citeseer

  • Garcia S, Luengo J et al (2012) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng

  • George R, Buckles BP et al (1993) Modelling class hierarchies in the fuzzy object-oriented data model. Fuzzy Sets Syst 60(3):259–272

    Article  MathSciNet  Google Scholar 

  • Goodall DW (1966) A new similarity index based on probability. Biometrics 22(4):882–907

    Article  Google Scholar 

  • Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871

    Article  Google Scholar 

  • Hallez A, De Tré G (2007) A hierarchical approach to object comparison. In: Foundations of fuzzy logic and soft computing, pp 191–198

  • Jones KS, Walker S et al (2000) A probabilistic model of information retrieval: development and comparative experiments. Int J Inf Process Manage 36(6):779–808

    Article  Google Scholar 

  • Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  • Kononenko I, Kukar M (2007) Machine learning and data mining. Woodhead Publishing

  • Koyuncu M, Yazici A (2003) IFOOD: an intelligent fuzzy object-oriented database architecture. IEEE Trans Knowl Data Eng 15:1137–1154

    Google Scholar 

  • Lee J, Kuo JY et al (2001) A note on current approaches to extending fuzzy logic to object oriented modeling. Int J Intell Syst 16(7):807–820

    Article  MATH  Google Scholar 

  • Lesot MJ (2005) Similarity, typicality and fuzzy prototypes for numerical data. In: 6th European congress on systems science

  • Lesot M, Rifqi M et al (2009) Similarity measures for binary and numerical data: a survey. Int J Knowl Eng Soft Data Paradigms 1(1):63–84

    Article  Google Scholar 

  • Liang M (2004) Data mining: concepts, models, methods, and algorithms. IIE Trans 36(5):495–496

    Article  Google Scholar 

  • Lourenco F, Lobo V et al (2004) Binary-based similarity measures for categorical data and their application in self-organizing maps. In: Proc JOCLAD

  • Ma Z, Zhang W et al (2004) Extending object-oriented databases for fuzzy information modeling* 1. Inf Syst 29(5):421–435

    Article  Google Scholar 

  • Marín N, Medina J et al (2003) Complex object comparison in a fuzzy context. Inf Softw Technol 45(7):431–444

    Article  Google Scholar 

  • Prade H, Testemale C (1984) Generalizing database relational algebra for the treatment of incomplete or uncertain information and vague queries. Inf Sci 34(2):115–143

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  • Santini S, Jain R (1999) Similarity measures. IEEE Trans Pattern Anal Mach Intell 21(9):871–883

    Article  Google Scholar 

  • Sneath PHA, Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical classification

  • Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  • Xie J, Szymanski B et al (2010) Learning dissimilarities for categorical symbols. In: 4th Workshop on feature selection in data mining. Citeseer

  • Xu Z, Chen J et al (2008) Clustering algorithm for intuitionistic fuzzy sets. Inf Sci 178(19):3775–3790

    Article  MATH  Google Scholar 

  • Yang Y, Webb GI (2009) Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74

    Article  Google Scholar 

  • Zadeh LA (1965) Fuzzy sets*. Inf Control 8(3):338–353

    Article  MathSciNet  MATH  Google Scholar 

  • Zadeh LA (1996) Fuzzy logic = computing with words. IEEE Trans Fuzzy Syst 4(2):103–111

    Article  MathSciNet  Google Scholar 

  • Zadeh L (2000) From computing with numbers to computing with words—from manipulation of measurements to manipulation of perceptions. In: Intelligent systems and soft computing, pp 3–40

  • Zwick R, Carlstein E et al (1987) Measures of similarity among fuzzy concepts: a comparative analysis. Int J Approx Reason 1(2):221–242

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors are grateful to the anonymous reviewers for their comments that helped improving this manuscript. They would like to acknowledge Alexandru Ardelean’s contribution to the collection of the student-accommodations data set. They would also like to acknowledge gratefully Shaista Rashid and Anna Palczewska for their continuous help and good suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasmina Bashon.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bashon, Y., Neagu, D. & Ridley, M.J. A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes. Soft Comput 17, 1595–1615 (2013). https://doi.org/10.1007/s00500-012-0974-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0974-6

Keywords

Navigation