Skip to main content

A New Approach for Calculating Similarity of Categorical Data

  • Conference paper
  • 1767 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 206))

Abstract

Similarity measure is very important in data mining techniques such as clustering, nearest-neighbor classification, outlier detection and so on [1][4]. There are many similarity measures have been proposed. For numeric data, there are many Minkowski distance-based similarity measures. However, the similarity measures for categorical data have been studied for a long time, it also has many issues. The main issue is to understand relationship between categorical attribute values. For categorical data, the similarity measure is not clear as well as numeric data. In this paper, we propose a new approach to understand relationship between categorical data. This approach is based on artificial neural network to extract significant features for computing distance between two categorical data objects.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. In: ACM Computing Surveys (CSUR), pp. 243–254 (2008)

    Google Scholar 

  2. Gershenson, C.: Artificial Neural Networks for Beginners (2003)

    Google Scholar 

  3. Hornik, K.: Multilayer Feedforward Networks are Niversal Approximators. Neural networks 2, 359–366 (1989)

    Article  Google Scholar 

  4. Kelil, A., Wang, S.: SCS: A New Similarity Measure for Categorical Sequences. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 343–352 (2008)

    Google Scholar 

  5. Li, X., Hwang, M.Y., Kim, H., Park, K.S., Bae, K.H., Ryu, K.H.: Extracting Method of Significant Features from Categorical Data. In: International Symposium on Remote Sensing (2010)

    Google Scholar 

  6. Ahmad, A., Dey, L.: A Method to Compute Distance between two Categorical Values of Same Attribute in Unsupervised Learning for Categorical Data Set. Pattern Recognition Letters 28, 110–118 (2007)

    Article  Google Scholar 

  7. Sneat, P.H.A., Sokal, R.R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification (1973)

    Google Scholar 

  8. Metzler, D., Dumais, S., Meek, C.: Similarity Measures for Short Segments of Text. LNCS, pp. 16–27 (2007)

    Google Scholar 

  9. Yi, W.: Artificial Neural Networks (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, C.H., Li, X., Lee, Y.K., Pok, G., Ryu, K.H. (2011). A New Approach for Calculating Similarity of Categorical Data. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Communications in Computer and Information Science, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24106-2_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24106-2_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24105-5

  • Online ISBN: 978-3-642-24106-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics