Abstract
This paper proposes a linear model that uses the principal component scores in shape data and fits the nominal responses in the tangent space of shapes. Multinomial logistic regression for multivariate data and logistic regression for binary responses are considered in this regard. Principal components in the tangent space are employed to improve the estimation of logistic model parameters under multicollinearity and to reduce the dimension of the input data. This paper improves the classification of shape data according to their different nominal groups. Furthermore, we assess the effectiveness of the proposed method using a comprehensive simulation and highlight the benefits of the new method using five real-world data sets.




Similar content being viewed by others
Data Availability
The real data sets analyzed during the current study are available in R software package shapes, [https://cran.r-project.org/web/packages/shape/index.html]. Also, the R codes for simulation study are available from the corresponding author upon request.
References
Aguilera, A.M., & Escabias, M. (2000). Principal component logistic regression, (pp. 175–180). New York: Physica-Verlag HD.
Aguilera, A.M., Escabias, M., & Valderrama, M.J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis, 50, 1905–1924.
Akaike, H. (1987). Factor analysis and aic. Psychometrika, 52, 317–332.
Bar, C. (2010). Elementary differential geometry. Cambridge: Cambridge University Press.
Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Statistical Psychology, 3, 77–85.
Bastien, P., Vinzi, V.E., & Tenenhaus, M. (2005). Pls generalised linear regression. Computational Statistics and Data Analysis, 48, 17–46.
Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.
Bentz, Y., & Merunka, D. (2000). Neural networks and the multinomial logit for brand choice modelling: A hybrid approach. Journal of Forecasting, 19, 177–200.
Bookstein, F.L. (1986). Size and shape spaces for landmark data in two dimensions. Statistical Science, 1, 181–222.
Boothby, W.M. (1986). An introduction to differentiable manifolds and riemannian geometry.
Bozdogan, H. (1994). On the frontiers of statistical modeling: an informational approach. Boston: Kluwer.
Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276.
le Cessie, S., & van Houwelingen, J.C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41, 191–201.
Cootes, T.F., Taylor, C.J., Cooper, D.H., & Graham, J. (1992). Training models of shape from sets of examples, (pp. 9–18). London: Springer.
Cox, D.R., & Snell, E.J. (1989). Analysis of binary data. London: Chapman and Hall.
Czogiel, I., Dryden, I.L., & Brignell, C.J. (2011). Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment. Annals of Applied Statistics, 5, 2603–2629.
Debavelaere, V., Durrleman, S., & Allassonnière, S. (2020). Learning the clustering of longitudinal shape data sets into a mixture of independent or branching trajectories. International Journal of Computer Vision, 128, 2794–2809. https://doi.org/10.1007/s11263-020-01337-8.
Dryden, I.L., & Mardia, K.V. (2016). Statistical shape analysis with applications in R. New York: Wiley.
Dryden, I.L., Hirst, J.D., & Melville, J.L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics, 63, 237–251.
Escabias, M., Aguilera, A.M., & Valderrama, M.J. (2004). Principal component estimation of functional logistic regression: discussion of two different approaches. Journal of Nonparametric Statistics, 16, 365–384.
Ferrando, L., Ventura-Campos, N., & Epifanio, I. (2020). Detecting and visualizing differences in brain structures with spharm and functional data analysis. NeuroImage, 222, 117209.
Frechet, M. (1948). Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré, 10, 215–310.
Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–161.
Hartzel, J., Agresti, A., & Caffo, B. (2001). Multinomial logit random effects models. Statistical Modelling, 1, 81–102.
Hayashi, K., Bentler, P.M., & Yuan, K.H. (2007). On the likelihood ratio test for the number of factors in exploratory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 14, 505–526.
Hosmer, D., Lemeshow, S., & Sturdivant, R.X. (2013). Applied Logistic Regression, 3rd edn. New York: Wiley.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Izenman, A.J. (2009). Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer Science & Business Media.
Jackson, J.E. (1991). A use’s guide to principal components. New York: Wiley.
Jöreskog, K.G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482.
Kaiser, H.F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141–151.
Karcher, H. (1977). Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Math, 30, 509–541.
Kemalbay, G., & Korkmazoğlu, Ö. B. (2014). Categorical principal component logistic regression: A case study for housing loan approval. Procedia-Social and Behavioral Sciences, 109, 730–736.
Kendall, D.G. (1977). The diffusion of shape. Advances in Applied Probability, 9, 428–430. https://doi.org/10.2307/1426091.
Kendall, D.G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16, 81–121.
Kendall, D.G., Barden, D., Carne, T.K., & Le, H. (1999). Shape and shape theory. New York: Wiley.
Kent, J.T. (1994). The complex bingham distribution and shape analysis. Journal of the Royal Statistical Society: Series B (Methodological), 56, 285–299.
Kent, K.T., Dryden, I.L., & Anderson, K.R. (2000). Using circulant symmetry to model featureless objects. Biometrika, 87, 527–544.
Krzanowski, W.J. (1987). Cross-validation in principal component analysis. Biometrics, 43, 584.
Mallett, X.D., Dryden, I.L., Bruegge, R.V., & Evison, M. (2010). An exploration of sample representativeness in anthropometric facial comparison. Journal of Forensic Sciences, 55, 1025–1031.
Marx, B.D. (1992). A continuum of principal component generalized linear regressions. Computational Statistics & Data Analysis, 13, 385–393.
Marx, B.D., & Smith, E.P. (1990). Principal component estimation for generalized linear regression. Biometrika, 77, 23–31.
Nabil, M., & Golalizadeh, M. (2016). On clustering shape data. Journal of Statistical Computation and Simulation, 86, 2995–3008.
O’Higgins, P. (1989). A morphometric study of cranial shape in the hominoidea.
O’Higgins, P., & Dryden, I.L. (1993). Sexual dimorphism in hominoids: further studies of craniofacial shape differences in pan, gorilla and pongo. Journal of Human Evolution, 24, 183–205.
Ozkale, M.R. (2021). Identification of outlying and influential data with principal components regression estimation in binary logistic regression. Communications in Statistics - Theory and Methods, 50, 609–630.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559–572.
Pennec, X. (2006). Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision, 25, 127–154.
Schaefer, R.L. (1986). Alternative estimators in logistic regression when the data are collinear. Journal of Statistical Computation and Simulation, 25, 75–91.
Schaefer, R.L., Roi, L.D., & Wolfe, R.A. (1984). A ridge logistic estimator. Communications in Statistics - Theory and Methods, 13, 99–113.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Shen, W., Wang, Y., Bai, X., Wang, H., & Latecki, L.J. (2013). Shape clustering: Common structure discovery. Pattern Recognition, 46, 539–550.
Simó, A., Ibáñez, M.V., Epifanio, I., & Gimeno, V. (2020). Generalized partially linear models on riemannian manifolds. Journal of the Royal Statistical Society Series C, 69, 641–661. https://doi.org/10.1111/RSSC.12411.
Small, C.G. (1996). The statistical theory of shape. New York: Springer.
Smith, E.P., & Marx, B.D. (1990). Ill-conditioned information matrices, generalized linear models and estimation of the effects of acid rain. Environmetrics, 1, 57–71.
Srivastava, A., Joshi, S.H., Mio, W., & Liu, X. (2005). Statistical shape analysis: clustering, learning, and testing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 590–602.
Stoyan, D., & Stoyan, H. (1994). Fractals, random shapes, and point fields : methods of geometrical statistics. New York: Wiley.
Vago, E., & Kemeny, S. (2006). Logistic ridge regression for clinical data analysis (a case study). Applied Ecology and Environmental Research, 4, 171–179.
Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S. New York: Springer.
Wiseman, D.N., Samra, N., Lara, M.M.R., Penrice, S.C., & Goddard, A.D. (2021). The novel application of geometric morphometrics with principal component analysis to existing g protein-coupled receptor (gpcr) structures. Pharmaceuticals, 14, 953.
Acknowledgements
We would also like to thank the editor, and two referees for their constructive and thoughtful comments which helped us tremendously in improving the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Moghimbeygi, M., Nodehi, A. Multinomial Principal Component Logistic Regression on Shape Data. J Classif 39, 578–599 (2022). https://doi.org/10.1007/s00357-022-09423-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-022-09423-x