Skip to main content
Log in

A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

This paper studies the problem of estimating the number of clusters in the context of logistic regression clustering. The classification likelihood approach is employed to tackle this problem. A model-selection based criterion for selecting the number of logistic curves is proposed and its asymptotic property is also considered. The small sample performance of the proposed criterion is studied by Monto Carlo simulation. In addition, a real data example is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AKAIKE, H. (1973), “Information Theory and an Extension of the Maximum Likelihood Principle”, in Proceedings of the Second International Symposium on Information Theory, eds. B.N. Petrov and F. Csáki, Budapest: Akadémia Kiadó, pp. 267–281.

    Google Scholar 

  • AKAIKE, H. (1978), “A Bayesian Analysis of the Minimum AIC Procedure”, Annals of the Institute of Statistical Mathematics, 30, 9–14.

    Article  MATH  MathSciNet  Google Scholar 

  • BAI, Z.D., RAO, C.R., and WU, Y. (1999), “Model Selection with Data-oriented Penalty”, Journal of Statistical Planning and Inference, 77, 103–117.

    Article  MATH  MathSciNet  Google Scholar 

  • BIERNACKI, C., CELEUX,G., and GOVAERT,G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719–725.

    Article  Google Scholar 

  • BOCK, H.H. (1969), “The Equivalence of Two Extremal Problems and Its Application to the Iterative Classification of Multivariate Data”, Manuscript for the Conference “Medizinische Statistik”, Forschungsinstitut Oberworfach.

  • BOCK, H.H. (1996), “Probability Models and Hypotheses Testing in Partitioning Cluster Analysis”, in Clustering and Classification, eds. P. Arabie, L.J. Hubert, and G. De Soete, River Edge, New Jersey: World Scientific Publishing, pp. 377–453.

    Google Scholar 

  • COLLETT, D. (2003), “Modelling Binary Data” (2nd ed.), Boca Raton, FL: Chapman and Hall/CRC.

    MATH  Google Scholar 

  • FAREWELL, B.T., and SPROTT, D. (1988), “The Use of a Mixture Model in the Analysis of Count Data”, Biometrics, 44, 1191–1194.

    Article  MATH  Google Scholar 

  • FOLLMANN, D.A., and LAMBERT, D. (1989), “Generalizing Logistic Regression by Nonparametric Mixing”, Journal of the American Statistical Association, 84, 295–300.

    Article  Google Scholar 

  • FOLLMANN, D.A., and LAMBERT, D. (1991), “Identifiability for Nonparametric Mixtures of Logistic Regressions”, Journal of Statistical Planning and Inference, 27, 375–381.

    Article  MATH  MathSciNet  Google Scholar 

  • HANNAN, E.J., and QUINN, B.G. (1979), “The Determination of the Order of an Autoregression”, Journal of Royal Statistical Society, Series B, 41, 190–195.

    MATH  MathSciNet  Google Scholar 

  • HEWLETT, P.S., and PLACKETT, R.L. (1950), “Statistical Aspects of the Independent Joint Action of Poisons, Particularly Insecticides. II. Examination of Data for Agreement with the Hypothesis”, Annals of Applied Biology, 37, 527–552.

    Article  Google Scholar 

  • HURVICH, C.M., and TSAI, C.L. (1989), “Regression and Time Series Model Selection in ples”, Biometrika, 76, 297–307.

    Article  MATH  MathSciNet  Google Scholar 

  • MCCULLAGH, P., and NELDER, J.A. (1989), “Generalized Linear Models” (2nd ed.), London: Chapman and Hall.

    MATH  Google Scholar 

  • NAIK, P.A., SHI, P., and TSAI, C.L. (2007), “Extending the Akaike Information Criterion to Mixture Regression Models”, Journal of the American Statistical Association, 102, 244–254.

    Article  MATH  MathSciNet  Google Scholar 

  • QIAN, G., and FIELD, C. (2002), “Law of Iterated Logarithm and Consistent Model Selection Criterion in Logistic Regression”, Statistics & Probability Letters, 56, 101–112.

    Article  MATH  MathSciNet  Google Scholar 

  • QIAN, G., and KÜNSCH, H. (1998), “On Model Selection via Stochastic Complexity in Robust Linear Regression”, Journal of Statistical Planning and Inference, 75, 91–116.

    Article  MATH  MathSciNet  Google Scholar 

  • SHAO, Q., and WU, Y. (2005), “A Consistent Procedure for Determining the Number of Clusters in Regression Clustering”, Journal of Statistical Planning and Inference, 135, 461–476.

    Article  MATH  MathSciNet  Google Scholar 

  • SPÄTH, H. (1979), “Clusterwise Linear Regression”, Computing, 22, 367–373.

    Article  MATH  MathSciNet  Google Scholar 

  • SPÄTH, H. (1982), “Algorithm 48: A Fast Algorithm for Clusterwise Linear Regression”, Computing, 29, 175–181.

    Article  MATH  Google Scholar 

  • SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, Annals of Statistics, 6, 461–464.

    Article  MATH  MathSciNet  Google Scholar 

  • WEDEL, M., and DESARBO,W.S. (1995), “A Mixture Likelihood Approach for Generalized Linear Models”, Journal of Classification, 12, 21–55.

    Article  MATH  Google Scholar 

  • WU, Y., and ZEN, M.M. (1999), “A Strong Consistent Information Criterion for Linear Model Selection Based on M-estimation”, Probability Theory and Related Fields, 113, 599–625.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqi Qian.

Additional information

The authors would like to thank the editor, Prof. Willem J. Heiser, and the anonymous referees for the valuable comments and suggestions, which have led to the improvement of this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, G., Wu, Y. & Shao, Q. A Procedure for Estimating the Number of Clusters in Logistic Regression Clustering. J Classif 26, 183–199 (2009). https://doi.org/10.1007/s00357-009-9035-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-009-9035-y

Keywords

Navigation