Fitting Kent models to compositional data with small concentration

Scealy, J. L.; Welsh, A. H.

doi:10.1007/s11222-012-9361-5

Fitting Kent models to compositional data with small concentration

Published: 17 October 2012

Volume 24, pages 165–179, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

J. L. Scealy¹ &
A. H. Welsh¹

734 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Compositional data can be transformed to directional data by the square root transformation and then modelled by using the Kent distribution. The current approach for estimating the parameters in the Kent model for compositional data relies on a large concentration assumption which assumes that the majority of the transformed data is not distributed too close to the boundaries of the positive orthant. When the data is distributed close to the boundaries with large variance significant folding may result. To treat this case we propose new estimators of the parameters derived based on the actual folded Kent distribution which are obtained via the EM algorithm. We show that these new estimators significantly reduce the bias in the current estimators when both the sample size and amount of folding is moderately large. We also propose using a saddlepoint density approximation for the Kent distribution normalising constant in order to more accurately estimate the shape parameters when the concentration is small or only moderately large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Flexible Transformations for Modeling Compositional Data

Exploring Compositional Data with the Robust Compositional Biplot

A Regression Model for Compositional Data Based on the Shifted-Dirichlet Distribution

References

Aitchison, J.: The statistical analysis of compositional data (with discussion). J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 139–177 (1982)
MATH MathSciNet Google Scholar
Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman and Hall, London (1986)
Book MATH Google Scholar
Chen, M., Kianifard, F.: Estimation of treatment difference and standard deviation with blinded data in clinical trials. Biom. J. 45, 135–142 (2003)
Article MathSciNet Google Scholar
Cuesta-Albertos, J.A., Cuevas, A., Fraiman, R.: On projection-based tests for directional and compositional data. Stat. Comput. 19, 367–380 (2009)
Article MathSciNet Google Scholar
Jung, S., Foskey, M., Marron, J.S.: Principal arc analysis on direct product manifolds. Ann. Appl. Stat. 5, 578–603 (2011)
Article MATH MathSciNet Google Scholar
Kent, J.T.: The Fisher-Bingham distribution on the sphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 44, 71–80 (1982)
MATH MathSciNet Google Scholar
Kent, J.T., Mardia, K.V., McDonnell, P.: The complex Bingham quartic distribution and shape analysis. J. R. Stat. Soc., Ser. B, Stat. Methodol. 68, 747–765 (2006)
Article MATH MathSciNet Google Scholar
Kume, A., Walker, S.G.: Sampling from compositional and directional distributions. Stat. Comput. 16, 261–265 (2006)
Article MathSciNet Google Scholar
Kume, A., Wood, A.T.A.: Saddlepoint approximations for the Bingham and Fisher-Bingham normalising constants. Biometrika 92, 465–476 (2005)
Article MATH MathSciNet Google Scholar
Rivest, L.: On the information matrix for symmetric distributions on the hypersphere. Ann. Stat. 12, 1085–1089 (1984)
Article MATH MathSciNet Google Scholar
Matz, A.W.: Maximum likelihood parameter estimation for the quartic exponential distribution. Technometrics 20, 475–484 (1978)
Article MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)
Book MATH Google Scholar
Scealy, J.L.: Modelling techniques for compositional data using distributions defined on the hypersphere. PhD thesis, Australian National University (2010)
Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hypersphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 73, 351–375 (2011)
Article MathSciNet Google Scholar
Stephens, M.A.: Use of the von mises distribution to analyse continuous proportions. Biometrika 69, 197–203 (1982)
Article MathSciNet Google Scholar
Sundberg, R.: On estimation and testing for the folded normal distribution. Commun. Stat., Theory Methods 3, 55–72 (1974)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by an Australian Research Council discovery project grant. We thank Chris Field, David Bulger and Michail Tsagris for useful conversations. Thanks also to two reviewers for their comments which has improved the presentation of the paper.

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
J. L. Scealy & A. H. Welsh

Authors

J. L. Scealy
View author publications
You can also search for this author in PubMed Google Scholar
A. H. Welsh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. L. Scealy.

Appendix: Proof of Proposition 1

The value $\hat{t}$ is obtained by solving the saddlepoint equation

which implies

$$ \frac{(\kappa-2\hat{t})^2}{p(\kappa-2\hat{t})+\kappa^2}=\frac{1}{1+ \frac{p-1}{\kappa-2\hat{t}} -\sum_{m=2}^p \frac{1}{ ( \kappa -2\beta_m -2\hat{t} )}}. $$

(22)

From Kume and Wood (2005, p. 468), it follows that the lower and upper bounds for $\hat{t}$ are

$$ \frac{\kappa-2\beta_2}{2} -\frac{p}{4} -\frac{1}{4} \bigl(p^2+4p \kappa^2 \bigr)^{1/2}, \qquad\frac{\kappa-2\beta_2}{2} - \frac{1}{2}, $$

(23)

and therefore

$$ \kappa-2\beta_2 -2\hat{t} > 1. $$

Using this bound, the limit conditions in Theorem 3 in Scealy and Welsh (2011) and (22), it follows that

$$ \frac{(\kappa-2\hat{t})^2}{p(\kappa-2\hat{t})+\kappa^2}=\frac{1}{1+ O(\kappa^{-1})}=1+O\bigl(\kappa^{-1}\bigr), $$

which implies

The solution within the bounds (23) to the above quadratic equation is

(24)

and when β=0 the O(κ ⁻¹) terms are exactly zero for all κ giving

$$ \hat{t}=\frac{\kappa}{2}-\frac{p}{4}-\frac{1}{4} \bigl( p^2+4\kappa^2 \bigr)^{1/2}. $$

This is slightly different from $\hat{t}$ given in Kume and Wood (2005, Sect. 3.1) because they set their matrix A to be a matrix of zeros, which is not equivalent to our approach. Note that both approaches still lead to the same normalising constant approximation for the von Mises-Fisher case.

Under the limit conditions in Theorem 3 in Scealy and Welsh (2011) and using the fact that $\hat{t}=O(1)$, it follows that

$$ \hat{t}\kappa(\kappa-2\hat{t})^{-1}-\hat{t}=O\bigl( \kappa^{-1}\bigr), \qquad T=O\bigl(\kappa^{-1}\bigr), $$

and

$$ \Biggl( \biggl(1-\frac{2\hat{t}}{\kappa} \biggr) \prod_{m=2}^p \biggl( 1-\frac{2\hat{t}}{\kappa-2\beta_m} \biggr) \Biggr )^{-1/2}=1+ O\bigl( \kappa^{-1}\bigr). $$

It also follows that

$$ \bigl(K_2(\hat{t}) \bigr)^{-1/2}=2^{-1} \kappa^{1/2} \bigl(1+ O\bigl(\kappa^{-1}\bigr) \bigr), $$

so

$$ \hat{f}^*(1)=(2\pi)^{-1/2}2^{-1}\kappa^{1/2} \bigl(1+O\bigl(\kappa^{-1}\bigr) \bigr) $$

and therefore

Using similar asymptotic arguments to Scealy and Welsh (2011, pp. 371–372), with the dominated convergence theorem, it follows that

(note that Kent et al. (2006, p. 754) give a detailed proof for the complex Bingham quartic distribution normalising constant which is similar) and therefore

with in the limit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scealy, J.L., Welsh, A.H. Fitting Kent models to compositional data with small concentration. Stat Comput 24, 165–179 (2014). https://doi.org/10.1007/s11222-012-9361-5

Download citation

Received: 17 April 2012
Accepted: 05 October 2012
Published: 17 October 2012
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11222-012-9361-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fitting Kent models to compositional data with small concentration

Abstract

Access this article

Similar content being viewed by others

A Review of Flexible Transformations for Modeling Compositional Data

Exploring Compositional Data with the Robust Compositional Biplot

A Regression Model for Compositional Data Based on the Shifted-Dirichlet Distribution

References

Acknowledgements