Abstract
The Similarity is a measure, which is used to measure the strength of the relationship between two objects and their closely degree. According to different object types, similarity calculation method is also different. Similarity calculation is widely used in classifing data, it is the basis of object classification. In this paper, the data objects were divided into three kinds: numerical type, non numeric type and mixed type. And these similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data classification and data cluster.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Technologyes, 3rd edn. China Machine Press (August 2012)
Tan, P., Steinbach, M.: Introduction to Data Mining. China Machine Press (September 2010)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. China Machine Press (March 2012)
Murphy, K.P.: Machine Learning. The MIT Press (August 2012)
Jiang, S., Li, X., Zheng, Q.: Principles and Practice of Data Mining. Publishing House of Electronics Industry (March 2013)
Manning, C.D., Schutze, H.: Foundations of Statistical Naturral Language Processing. Publishing House of Electronics Industry (April 2007)
Santini, S., Jain, R.: Similarity Measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)
Theodoridis, S.: Konstantinos Koutroumbas, Pattern Recognition, 3rd edn. Publishing House of Electronics Industry (December 2006)
Yu, H.: The Similarity measure research and its applications in data mining. Master’s thesis. Fujian Normal University (2009)
Yano, Y., et al.: Associative Memory with Fully Parallel Nearest-Manhattan-Distance Search for Low-Power Real-Time Single-Chip Applications. In: Proc. of IEEE ASP-DAC, pp. 543–544 (January 2004)
Mattausch, H.J., et al.: Fully-parallel Pattern-matching Engine with Dynamic Adaptability to Hamming or Manhattan Distance. In: Symp. on VLSI Circuits Dig. Tech. Papers, pp. 252–255 (June 2002)
Ye, Q.-Z.: The Signed Euclidean Distance Transform and Its Applications. IEEE 1, 495–499 (1988)
Chiou, H.-K., Liu, G.-S., et al.: Multiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnology in Taiwan. In: Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, pp. 172–177 (2009)
Danielsson, P.E.: Euclidean Distance Mapping. Computer Graphics and Image Processing 14, 227–248 (1988)
de Souza, R.M.C.R., de Carvalho, F.A.T.: Dynamic clusterig of interval data based on adaptive Chebyshev distances. Electronics Letters 40(11), 658–660 (2004)
Kamimura, R., Uchida, O.: Greedy Network-Growing by Minkowski Distance Functions. IEEE Transaction on Neural Networks, 2837–2842 (2004)
Taguchi, S.C., Wu, Y.: The Mahalanobis-Taguchi System. McGraw-Hill, New York (2001)
Shen, C., Kim, J., Wang, L.: Scalable Large-Margin Mahalanobis Distance Metric Learning. IEEE Transactions on Neural Networks 21(9), 1524–1530 (2010)
Kim, J., Shen, C., Wang, L.: A scalable algorithm for learning a Mahalanobis distance metric. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part III. LNCS, vol. 5996, pp. 299–310. Springer, Heidelberg (2010)
Jiang, S.-Y.: Efficient Classification Method for Large Dataset. In: Proceeding of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (August 2006)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Proc. Adv. Neural Inf. Process. Syst., pp. 505–512 (December 2003)
Guan-Nan, D.: The Similarity Measure in Clustering. Northeast Dianli University 33(1/2), 156–161 (2013)
Ming, F., Hong-Jian, F.: Introduction to Data Mining (the full version). People Post Press (2013)
Min, W.: The Classification attribute data clustering algorithm. Jiangsu University, Master’s Paper (2008)
Guilin, L., Xiaoyun, C.: The Discussion on the Similarity of Cluster Analysis. Computer Engineering and Applications (2004)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Jiang, S.-Y., Li, Q.H.: An Enhanced K-means Clustering Algorithm. Computer Engineering & Science 28(11), 56–59 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Teng, S., Li, J., Li, R., Zhang, W. (2014). The Calculation of Similarity and Its Application in Data Mining. In: Zu, Q., Vargas-Vera, M., Hu, B. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2013. Lecture Notes in Computer Science, vol 8351. Springer, Cham. https://doi.org/10.1007/978-3-319-09265-2_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-09265-2_57
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09264-5
Online ISBN: 978-3-319-09265-2
eBook Packages: Computer ScienceComputer Science (R0)