Abstract
In this paper, we propose a framework that maps categorical data into a numerical data space via a reference set, aiming to make the existing numerical clustering algorithms directly applicable on the generated image data set as well as to visualize the data. Using statistics theories, we analyze our framework and give the conditions under which the data mapping is efficient and yet preserves a flexible property of the original data, i.e. the data points within the same cluster are more similar. The algorithm is simple and has good effectiveness under some conditions. The experimental evaluation on numerous categorical data sets shows that it not only outperforms the related data mapping approaches but also beats some categorical clustering algorithms in terms of effectiveness and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cox, T.F., Cox, M.A. (eds.): Multidimensional scaling. Chapman and Hall, London (1995)
Ding, C.: Spectral clustering. icml2004 tutorial (2004)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/mlrepository.html
Huang, Z.X.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings ACM SIGMOD International Conference on Management of Data, ACM Press, New York (1997)
Kaufman, L., Rousseeuw, P. (eds.): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
MacQueen: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Symposium at Mathematical Statistics and Probability (1965)
Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proc. 10th International Workshop on Artificial Intelligence and Statistics, pp. 261–268 (2005)
Roweis, S., Lawrenece, S.: Nonlinear dimensionality reduction by locally linear embedding. Science
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. Information Systems 25, 345–366 (2000)
Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. In: Proc. NIPS 2003, pp. 721–728 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, ZY., Sun, J., Shen, YD., Li, M. (2008). R-Map: Mapping Categorical Data for Clustering and Visualization Based on Reference Sets. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_104
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)