Abstract
A new method for data clustering is presented in this paper. It can cluster data set with both continuous and discrete data effectively. By using this method, the values of cluster variable are viewed as missing data. At first, the missing data are initialized randomly. All those data are revised through the iteration by combining Gibbs sampling with the dependency structure that is built according to prior knowledge or built as star-shaped structure alternatively. A penalty coefficient is introduced to extend the MDL scoring function and the optimal cluster number is determined by using the extended MDL scoring function and the statistical methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, S.M., Hsiao, H.R.: A New Method to Estimate Null Values in Relational Database Systems Based on Automatic Clustering Techniques. Information Sciences: an International Journal 69, 1–2 (2005)
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: AutoClass: A Bayesian Classification System. In: Laird, J. (ed.) Proceedings of the 15th International Conference on Machine Learning, pp. 54–64. Morgan Kaufmann, San Mateo (1988)
Cheeseman, P., Stutz, J.: Bayesian Classification (AutoClass): Theory and Results. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.), pp. 153–180. AAAI/MIT Press, Cambridge (1996)
Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)
Mao, S.S., Wang, J.L., Pu, X.L.: Advanced Mathematical Statistics, 1st edn., pp. 401–459. China Higher Education Press, Beijing, Springer, Berlin (1998)
Lam, W., Bacchus, F.: Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Computational Intelligence 4, 269–293 (1994)
Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier Under Zero-one Loss. Machine Learning 130, 2–3 (1997)
Murphy, S.L., Aha, D.W.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, SC., Li, XL., Tang, HY. (2006). Hybrid Data Clustering Based on Dependency Structure and Gibbs Sampling. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_138
Download citation
DOI: https://doi.org/10.1007/11941439_138
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)