Skip to main content
Log in

Fitting Kent models to compositional data with small concentration

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aitchison, J.: The statistical analysis of compositional data (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 139–177 (1982)

    MATH  MathSciNet  Google Scholar 

  • Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman and Hall, London (1986)

    Book  MATH  Google Scholar 

  • Chen, M., Kianifard, F.: Estimation of treatment difference and standard deviation with blinded data in clinical trials. Biom. J. 45, 135–142 (2003)

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos, J.A., Cuevas, A., Fraiman, R.: On projection-based tests for directional and compositional data. Stat. Comput. 19, 367–380 (2009)

    Article  MathSciNet  Google Scholar 

  • Jung, S., Foskey, M., Marron, J.S.: Principal arc analysis on direct product manifolds. Ann. Appl. Stat. 5, 578–603 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Kent, J.T.: The Fisher-Bingham distribution on the sphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 71–80 (1982)

    MATH  MathSciNet  Google Scholar 

  • Kent, J.T., Mardia, K.V., McDonnell, P.: The complex Bingham quartic distribution and shape analysis. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68, 747–765 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Kume, A., Walker, S.G.: Sampling from compositional and directional distributions. Stat. Comput. 16, 261–265 (2006)

    Article  MathSciNet  Google Scholar 

  • Kume, A., Wood, A.T.A.: Saddlepoint approximations for the Bingham and Fisher-Bingham normalising constants. Biometrika 92, 465–476 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Rivest, L.: On the information matrix for symmetric distributions on the hypersphere. Ann. Stat. 12, 1085–1089 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  • Matz, A.W.: Maximum likelihood parameter estimation for the quartic exponential distribution. Technometrics 20, 475–484 (1978)

    Article  MATH  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)

    Book  MATH  Google Scholar 

  • Scealy, J.L.: Modelling techniques for compositional data using distributions defined on the hypersphere. PhD thesis, Australian National University (2010)

  • Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 73, 351–375 (2011)

    Article  MathSciNet  Google Scholar 

  • Stephens, M.A.: Use of the von mises distribution to analyse continuous proportions. Biometrika 69, 197–203 (1982)

    Article  MathSciNet  Google Scholar 

  • Sundberg, R.: On estimation and testing for the folded normal distribution. Commun. Stat., Theory Methods 3, 55–72 (1974)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported by an Australian Research Council discovery project grant. We thank Chris Field, David Bulger and Michail Tsagris for useful conversations. Thanks also to two reviewers for their comments which has improved the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. L. Scealy.

Appendix: Proof of Proposition 1

Appendix: Proof of Proposition 1

The value \(\hat{t}\) is obtained by solving the saddlepoint equation

which implies

$$ \frac{(\kappa-2\hat{t})^2}{p(\kappa-2\hat{t})+\kappa^2}=\frac{1}{1+ \frac{p-1}{\kappa-2\hat{t}} -\sum_{m=2}^p \frac{1}{ ( \kappa -2\beta_m -2\hat{t} )}}. $$
(22)

From Kume and Wood (2005, p. 468), it follows that the lower and upper bounds for \(\hat{t}\) are

$$ \frac{\kappa-2\beta_2}{2} -\frac{p}{4} -\frac{1}{4} \bigl(p^2+4p \kappa^2 \bigr)^{1/2}, \qquad\frac{\kappa-2\beta_2}{2} - \frac{1}{2}, $$
(23)

and therefore

$$ \kappa-2\beta_2 -2\hat{t} > 1. $$

Using this bound, the limit conditions in Theorem 3 in Scealy and Welsh (2011) and (22), it follows that

$$ \frac{(\kappa-2\hat{t})^2}{p(\kappa-2\hat{t})+\kappa^2}=\frac{1}{1+ O(\kappa^{-1})}=1+O\bigl(\kappa^{-1}\bigr), $$

which implies

The solution within the bounds (23) to the above quadratic equation is

(24)

and when β=0 the O(κ −1) terms are exactly zero for all κ giving

$$ \hat{t}=\frac{\kappa}{2}-\frac{p}{4}-\frac{1}{4} \bigl( p^2+4\kappa^2 \bigr)^{1/2}. $$

This is slightly different from \(\hat{t}\) given in Kume and Wood (2005, Sect. 3.1) because they set their matrix A to be a matrix of zeros, which is not equivalent to our approach. Note that both approaches still lead to the same normalising constant approximation for the von Mises-Fisher case.

Under the limit conditions in Theorem 3 in Scealy and Welsh (2011) and using the fact that \(\hat{t}=O(1)\), it follows that

$$ \hat{t}\kappa(\kappa-2\hat{t})^{-1}-\hat{t}=O\bigl( \kappa^{-1}\bigr), \qquad T=O\bigl(\kappa^{-1}\bigr), $$

and

$$ \Biggl( \biggl(1-\frac{2\hat{t}}{\kappa} \biggr) \prod_{m=2}^p \biggl( 1-\frac{2\hat{t}}{\kappa-2\beta_m} \biggr) \Biggr )^{-1/2}=1+ O\bigl( \kappa^{-1}\bigr). $$

It also follows that

$$ \bigl(K_2(\hat{t}) \bigr)^{-1/2}=2^{-1} \kappa^{1/2} \bigl(1+ O\bigl(\kappa^{-1}\bigr) \bigr), $$

so

$$ \hat{f}^*(1)=(2\pi)^{-1/2}2^{-1}\kappa^{1/2} \bigl(1+O\bigl(\kappa^{-1}\bigr) \bigr) $$

and therefore

Using similar asymptotic arguments to Scealy and Welsh (2011, pp. 371–372), with the dominated convergence theorem, it follows that

(note that Kent et al. (2006, p. 754) give a detailed proof for the complex Bingham quartic distribution normalising constant which is similar) and therefore

with in the limit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scealy, J.L., Welsh, A.H. Fitting Kent models to compositional data with small concentration. Stat Comput 24, 165–179 (2014). https://doi.org/10.1007/s11222-012-9361-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9361-5

Keywords

Navigation