Estimating the Number of Clusters Using Multivariate Location Test Statistics

Choi, Kyungmee; Kim, Deok-Hwan; Choi, Taeryon

doi:10.1007/11881599_43

Kyungmee Choi²³,
Deok-Hwan Kim²⁴ &
Taeryon Choi²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4223))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

Abstract

In the cluster analysis, to determine the unknown number of clusters we use a criterion based on a classical location test statistic, Hotelling’s T ². At each clustering level, its theoretical threshold is studied in view of its statistical distribution and a multiple comparison problem. In order to examine its performance, extensive experiments are done with synthetic data generated from multivariate normal distributions and a set of real image data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Article 18 November 2024

Clustering Large Datasets by Merging K-Means Solutions

Article 29 March 2019

Combinatorial Optimization Approaches for Data Clustering

References

Benjamin, Y., Hochberg, Y.: Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57(1), 289–300 (1995)
Google Scholar
Choi, K., Jun, C.: A systematic approach to the Kansei factors of tactile sense regarding the surface roughness. Applied Economics (in press, 2006)
Google Scholar
Duda, R.D., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Inc., New York (2001)
MATH Google Scholar
Gallegos, M.T.: Maximum likelihood clustering with outliers. In: Jajuga, et al. (eds.) Classification, Clustering, and Data Analysis, Springer, Heidelberg (2002)
Google Scholar
Gordon, A.: Classification, 2nd edn. Chapman and Hall-CRC, London (1999)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. In: Data Mining, Inference, and Prediction, Springer, Heidelberg (2001)
Google Scholar
Hotelling, H.: Multivariate Quality Control. In: Eisenhart, C., Hastay, M.W., Wallis, W.A. (eds.) Techniques of Statistical Analysis, McGraw-Hill, New York (1947)
Google Scholar
Ihaka, R., Gentleman, R.: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5(3), 299–314 (1996)
Article Google Scholar
Jajuga, K., Sokolowski, A., Bock, H.-H. (eds.): Classification, Clustering, and Data Analysis. Springer, Heidelberg (2002)
MATH Google Scholar
Kim, D.H., Chung, C.W., Barnard, K.: Relevance Feedback using Adaptive Clustering for Image Similarity Retrieval. The Journal of Systems and Software 78, 9–23 (2005)
Article Google Scholar
Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)
MATH Google Scholar
Miligan, G.W., Cooper, M.C.: An examination of procedure for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Mojena, R.: Hierarchical grouping methods and stopping rules: An evaluation. The Computer journal 20(4) (1975)
Google Scholar
Rencher, A.C.: Methods of Multivariate Analysis. John Wiley and Sons, Chichester (2002)
Book MATH Google Scholar
Rousseeuw, P.J., Van Driessen, K.: A first algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Article Google Scholar
Tibsirani, R., Walther, G., Hasite, T.: Estimating the number of clusters in a data set via the gap statistic. J.R. Statist. Soc. B 63, 411–423 (2001)
Article Google Scholar
Ward, J.H.: Hierarchical Grouping to optimize an objective function. J. of Amer. Stat. Assoc. 58, 236–244 (1963)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Science and Technology, Hongik University at Jochiwon, Korea
Kyungmee Choi
Department of Electronics Engineering, Inha University at Incheon, Korea
Deok-Hwan Kim
Department of Mathematics and Statisics, University of Maryland at Baltimore, U.S.A.
Taeryon Choi

Authors

Kyungmee Choi
View author publications
You can also search for this author in PubMed Google Scholar
Deok-Hwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taeryon Choi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University,, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Life Science Research Center, School of Electronic Engineering, Xidian University,, 710071, Xi’an, Shaanxi, China
Licheng Jiao
School of Electrical and Electronic Engineering, Xidian University, 710071, Xi’an, China
Guanming Shi
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
College of Mathematics and Information Science, Hebei Normal University, 050016, Shijiazhuang, Hebei, P.R. China
Jing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, K., Kim, DH., Choi, T. (2006). Estimating the Number of Clusters Using Multivariate Location Test Statistics. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_43

Download citation

DOI: https://doi.org/10.1007/11881599_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimating the Number of Clusters Using Multivariate Location Test Statistics

Abstract

Access this chapter

Preview

Similar content being viewed by others

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Clustering Large Datasets by Merging K-Means Solutions

Combinatorial Optimization Approaches for Data Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Estimating the Number of Clusters Using Multivariate Location Test Statistics

Abstract

Access this chapter

Preview

Similar content being viewed by others

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Clustering Large Datasets by Merging K-Means Solutions

Combinatorial Optimization Approaches for Data Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation