Abstract
This paper studies the problem of estimating the number of clusters in the context of logistic regression clustering. The classification likelihood approach is employed to tackle this problem. A model-selection based criterion for selecting the number of logistic curves is proposed and its asymptotic property is also considered. The small sample performance of the proposed criterion is studied by Monto Carlo simulation. In addition, a real data example is presented.
Similar content being viewed by others
References
AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle”, in Proceedings of the Second International Symposium on Information Theory, eds. B.N. Petrov and F. Csáki, Budapest: Akadémia Kiadó, pp. 267–281.
AKAIKE, H. (1978), “A Bayesian Analysis of the Minimum AIC Procedure”, Annals of the Institute of Statistical Mathematics, 30, 9–14.
BAI, Z.D., RAO, C.R., and WU, Y. (1999), “Model Selection with Data-oriented Penalty”, Journal of Statistical Planning and Inference, 77, 103–117.
BIERNACKI, C., CELEUX,G., and GOVAERT,G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725.
BOCK, H.H. (1969), “The Equivalence of Two Extremal Problems and Its Application to the Iterative Classification of Multivariate Data”, Manuscript for the Conference “Medizinische Statistik”, Forschungsinstitut Oberworfach.
BOCK, H.H. (1996), “Probability Models and Hypotheses Testing in Partitioning Cluster Analysis”, in Clustering and Classification, eds. P. Arabie, L.J. Hubert, and G. De Soete, River Edge, New Jersey: World Scientific Publishing, pp. 377–453.
COLLETT, D. (2003), “Modelling Binary Data” (2nd ed.), Boca Raton, FL: Chapman and Hall/CRC.
FAREWELL, B.T., and SPROTT, D. (1988), “The Use of a Mixture Model in the Analysis of Count Data”, Biometrics, 44, 1191–1194.
FOLLMANN, D.A., and LAMBERT, D. (1989), “Generalizing Logistic Regression by Nonparametric Mixing”, Journal of the American Statistical Association, 84, 295–300.
FOLLMANN, D.A., and LAMBERT, D. (1991), “Identifiability for Nonparametric Mixtures of Logistic Regressions”, Journal of Statistical Planning and Inference, 27, 375–381.
HANNAN, E.J., and QUINN, B.G. (1979), “The Determination of the Order of an Autoregression”, Journal of Royal Statistical Society, Series B, 41, 190–195.
HEWLETT, P.S., and PLACKETT, R.L. (1950), “Statistical Aspects of the Independent Joint Action of Poisons, Particularly Insecticides. II. Examination of Data for Agreement with the Hypothesis”, Annals of Applied Biology, 37, 527–552.
HURVICH, C.M., and TSAI, C.L. (1989), “Regression and Time Series Model Selection in ples”, Biometrika, 76, 297–307.
MCCULLAGH, P., and NELDER, J.A. (1989), “Generalized Linear Models” (2nd ed.), London: Chapman and Hall.
NAIK, P.A., SHI, P., and TSAI, C.L. (2007), “Extending the Akaike Information Criterion to Mixture Regression Models”, Journal of the American Statistical Association, 102, 244–254.
QIAN, G., and FIELD, C. (2002), “Law of Iterated Logarithm and Consistent Model Selection Criterion in Logistic Regression”, Statistics & Probability Letters, 56, 101–112.
QIAN, G., and KÜNSCH, H. (1998), “On Model Selection via Stochastic Complexity in Robust Linear Regression”, Journal of Statistical Planning and Inference, 75, 91–116.
SHAO, Q., and WU, Y. (2005), “A Consistent Procedure for Determining the Number of Clusters in Regression Clustering”, Journal of Statistical Planning and Inference, 135, 461–476.
SPÄTH, H. (1979), “Clusterwise Linear Regression”, Computing, 22, 367–373.
SPÄTH, H. (1982), “Algorithm 48: A Fast Algorithm for Clusterwise Linear Regression”, Computing, 29, 175–181.
SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461–464.
WEDEL, M., and DESARBO,W.S. (1995), “A Mixture Likelihood Approach for Generalized Linear Models”, Journal of Classification, 12, 21–55.
WU, Y., and ZEN, M.M. (1999), “A Strong Consistent Information Criterion for Linear Model Selection Based on M-estimation”, Probability Theory and Related Fields, 113, 599–625.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors would like to thank the editor, Prof. Willem J. Heiser, and the anonymous referees for the valuable comments and suggestions, which have led to the improvement of this paper.
Rights and permissions
About this article
Cite this article
Qian, G., Wu, Y. & Shao, Q. A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering. J Classif 26, 183–199 (2009). https://doi.org/10.1007/s00357-009-9035-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-009-9035-y