Determining the Number of Probability-Based Clustering: A Hybrid Approach

Dai, Tao; Li, Chunping; Sun, Jiaguang

doi:10.1007/978-3-540-30483-8_51

Tao Dai¹⁸,
Chunping Li¹⁸ &
Jiaguang Sun¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3309))

Included in the following conference series:

Advanced Workshop on Content Computing

444 Accesses

Abstract

While analyzing the previous methods for determining the number of probability-based clustering, this paper introduces an improved Monte Carlo Cross-Validation algorithm (iMCCV) and attempts to solve the posterior probabilities spread problem, which cannot be resolved by the Monte Carlo Cross-Validation algorithm. Furthermore, we present a hybrid approach to determine the number of probability-based clustering by combining the iMCCV algorithm and the parallel coordinates visual technology. The efficiency of our approach is discussed with experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Cheeseman, P., Stutz, J.: Bayesian Classification (AutoClass): Theory and Results. In: Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI Press/MIT Press (1995)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B(Methodological) 39(1), 1–38
Google Scholar
Fraley, C., Raftery, A.E.: How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. Computer Journal 41, 578–588 (1998)
Article MATH Google Scholar
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. Massachusetts Institute of Technology (2001)
Google Scholar
Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool for Visualizing Multidimensional Geometry. In: Proceedings of the First conference on Visualization, San Francisco, California, pp. 361–378 (1990)
Google Scholar
Smyth, P.: Clustering using Monte Carlo Cross-Validation. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 126–133. AAAI Press, Menlo Park (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, China
Tao Dai, Chunping Li & Jiaguang Sun

Authors

Tao Dai
View author publications
You can also search for this author in PubMed Google Scholar
Chunping Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaguang Sun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Software, Tsinghua University,
Chi-Hung Chi
School of Software, Tsinghua University, Beijing, PR China
Kwok-Yan Lam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, T., Li, C., Sun, J. (2004). Determining the Number of Probability-Based Clustering: A Hybrid Approach. In: Chi, CH., Lam, KY. (eds) Content Computing. AWCC 2004. Lecture Notes in Computer Science, vol 3309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30483-8_51

Download citation

DOI: https://doi.org/10.1007/978-3-540-30483-8_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23898-0
Online ISBN: 978-3-540-30483-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics