Abstract
Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large.
Similar content being viewed by others
References
Aitchison, J.: The statistical analysis of compositional data (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 139–177 (1982)
Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman and Hall, London (1986)
Chen, M., Kianifard, F.: Estimation of treatment difference and standard deviation with blinded data in clinical trials. Biom. J. 45, 135–142 (2003)
Cuesta-Albertos, J.A., Cuevas, A., Fraiman, R.: On projection-based tests for directional and compositional data. Stat. Comput. 19, 367–380 (2009)
Jung, S., Foskey, M., Marron, J.S.: Principal arc analysis on direct product manifolds. Ann. Appl. Stat. 5, 578–603 (2011)
Kent, J.T.: The Fisher-Bingham distribution on the sphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 71–80 (1982)
Kent, J.T., Mardia, K.V., McDonnell, P.: The complex Bingham quartic distribution and shape analysis. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68, 747–765 (2006)
Kume, A., Walker, S.G.: Sampling from compositional and directional distributions. Stat. Comput. 16, 261–265 (2006)
Kume, A., Wood, A.T.A.: Saddlepoint approximations for the Bingham and Fisher-Bingham normalising constants. Biometrika 92, 465–476 (2005)
Rivest, L.: On the information matrix for symmetric distributions on the hypersphere. Ann. Stat. 12, 1085–1089 (1984)
Matz, A.W.: Maximum likelihood parameter estimation for the quartic exponential distribution. Technometrics 20, 475–484 (1978)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)
Scealy, J.L.: Modelling techniques for compositional data using distributions defined on the hypersphere. PhD thesis, Australian National University (2010)
Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 73, 351–375 (2011)
Stephens, M.A.: Use of the von mises distribution to analyse continuous proportions. Biometrika 69, 197–203 (1982)
Sundberg, R.: On estimation and testing for the folded normal distribution. Commun. Stat., Theory Methods 3, 55–72 (1974)
Acknowledgements
This research was supported by an Australian Research Council discovery project grant. We thank Chris Field, David Bulger and Michail Tsagris for useful conversations. Thanks also to two reviewers for their comments which has improved the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Proposition 1
Appendix: Proof of Proposition 1
The value \(\hat{t}\) is obtained by solving the saddlepoint equation
which implies
From Kume and Wood (2005, p. 468), it follows that the lower and upper bounds for \(\hat{t}\) are
and therefore
Using this bound, the limit conditions in Theorem 3 in Scealy and Welsh (2011) and (22), it follows that
which implies
The solution within the bounds (23) to the above quadratic equation is
and when β=0 the O(κ −1) terms are exactly zero for all κ giving
This is slightly different from \(\hat{t}\) given in Kume and Wood (2005, Sect. 3.1) because they set their matrix A to be a matrix of zeros, which is not equivalent to our approach. Note that both approaches still lead to the same normalising constant approximation for the von Mises-Fisher case.
Under the limit conditions in Theorem 3 in Scealy and Welsh (2011) and using the fact that \(\hat{t}=O(1)\), it follows that
and
It also follows that
so
and therefore
Using similar asymptotic arguments to Scealy and Welsh (2011, pp. 371–372), with the dominated convergence theorem, it follows that
(note that Kent et al. (2006, p. 754) give a detailed proof for the complex Bingham quartic distribution normalising constant which is similar) and therefore
with in the limit.
Rights and permissions
About this article
Cite this article
Scealy, J.L., Welsh, A.H. Fitting Kent models to compositional data with small concentration. Stat Comput 24, 165–179 (2014). https://doi.org/10.1007/s11222-012-9361-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9361-5