Skip to main content

Hybrid Data Clustering Based on Dependency Structure and Gibbs Sampling

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Abstract

A new method for data clustering is presented in this paper. It can cluster data set with both continuous and discrete data effectively. By using this method, the values of cluster variable are viewed as missing data. At first, the missing data are initialized randomly. All those data are revised through the iteration by combining Gibbs sampling with the dependency structure that is built according to prior knowledge or built as star-shaped structure alternatively. A penalty coefficient is introduced to extend the MDL scoring function and the optimal cluster number is determined by using the extended MDL scoring function and the statistical methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, S.M., Hsiao, H.R.: A New Method to Estimate Null Values in Relational Database Systems Based on Automatic Clustering Techniques. Information Sciences: an International Journal 69, 1–2 (2005)

    Google Scholar 

  2. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: AutoClass: A Bayesian Classification System. In: Laird, J. (ed.) Proceedings of the 15th International Conference on Machine Learning, pp. 54–64. Morgan Kaufmann, San Mateo (1988)

    Google Scholar 

  3. Cheeseman, P., Stutz, J.: Bayesian Classification (AutoClass): Theory and Results. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.), pp. 153–180. AAAI/MIT Press, Cambridge (1996)

    Google Scholar 

  4. Geman, S., Geman, D.: Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)

    Article  MATH  Google Scholar 

  5. Mao, S.S., Wang, J.L., Pu, X.L.: Advanced Mathematical Statistics, 1st edn., pp. 401–459. China Higher Education Press, Beijing, Springer, Berlin (1998)

    Google Scholar 

  6. Lam, W., Bacchus, F.: Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Computational Intelligence 4, 269–293 (1994)

    Article  Google Scholar 

  7. Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier Under Zero-one Loss. Machine Learning 130, 2–3 (1997)

    Google Scholar 

  8. Murphy, S.L., Aha, D.W.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, SC., Li, XL., Tang, HY. (2006). Hybrid Data Clustering Based on Dependency Structure and Gibbs Sampling. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_138

Download citation

  • DOI: https://doi.org/10.1007/11941439_138

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49787-5

  • Online ISBN: 978-3-540-49788-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics